1994 §
Finding structure in reinforcement learning §
1995 §
UTree algorithm (McCallum 1995) §
- Find state abstractions from sample interactions with the environment, focus directly on modeling the value function
TD models: modeling the world at a mixture of time scales §
1998 §
Hierachical solution of Markov Decision Processes using macro-actions §
- Milos Hauskrecht et al. @ UAI 1998
- Focus on how to constructing macro-actions automatically
- A macro-action is a local policy defined for a particular region
1999 §
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning §
2004 §
MDP homomorphism (Ravindran 2004) §
2005 §
Theoretical results on reinforcement learning with temporally abstract behaviors §
2006 §
Controlled Markov Process (CMP) homomorphisms §
- CMP is an MDP without the latter’s reward function.
2009 §
Binary action search for learning continuous-action control policies §
- discretizing the continuous action.
2011 §
Automatic construction of temporally extended actions for MDPs using bisimulation metrics §
- Tags:#mdp
- Pablo Samuel Castro et al. @ EWRL 2011
- Option framework
-
- is the set of states where hte option is available
- is the option’s policy
- is the probability of the option terminating at each state.
- An option is started in state the policy is followed until the option is terminated, as dictated by
- Bisimulation metrics
2024 §