1994

Finding structure in reinforcement learning

1995

UTree algorithm (McCallum 1995)

  • Find state abstractions from sample interactions with the environment, focus directly on modeling the value function

TD models: modeling the world at a mixture of time scales


1998

Hierachical solution of Markov Decision Processes using macro-actions

  • Milos Hauskrecht et al. @ UAI 1998
  • Focus on how to constructing macro-actions automatically
  • A macro-action is a local policy defined for a particular region

1999

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning


2004

MDP homomorphism (Ravindran 2004)

  • Mapping

2005

Theoretical results on reinforcement learning with temporally abstract behaviors


2006

Controlled Markov Process (CMP) homomorphisms

  • CMP is an MDP without the latter’s reward function.

2009

Binary action search for learning continuous-action control policies

  • discretizing the continuous action.

2011

Automatic construction of temporally extended actions for MDPs using bisimulation metrics

  • Tags:#mdp
  • Pablo Samuel Castro et al. @ EWRL 2011
  • Option framework
      • is the set of states where hte option is available
      • is the option’s policy
      • is the probability of the option terminating at each state.
      • An option is started in state the policy is followed until the option is terminated, as dictated by
  • Bisimulation metrics

2024