Relationale Exploration, Lernen und Inferenz - Grundlagen des Autonomen Lernens in natürlichen Umgebungen
Zusammenfassung der Projektergebnisse
In this project we used the task of robot table tennis as a test-bed to study several learning paradigms of sequential decision making under the constraints of a physical system. These constraints encouraged the development of learning algorithms focused on modularity, sample efficiency and safety. In imitation learning, we developed robust learning methods for probabilistic movement primitives. The probabilistic nature of the primitives was leveraged in a new set of operators we introduced to temporally scale and couple the primitives in a safe way. In reinforcement learning, we developed sample efficient optimizers to locally improve pre-trained primitives. Sample efficiency was obtained by modeling the agent’s behavior. One of the main takeaway of our work was that modeling the reward was more efficient than modeling the forward dynamics. We then layered our model-based principle to hierarchical reinforcement learning to allow the composition of multiple primitives. In the future, we want to extend our work to the two robot table tennis that we have setup at the MPI in Tübingen and that allows training through self-play. We hope that such a goal will foster our understanding of the mechanisms with which robots can autonomously learn skills within the constraints of the physical world.
Projektbezogene Publikationen (Auswahl)
- Exploration in model-based reinforcement learning by empirically estimating learning progress. In Neural Information Processing Systems (NIPS 2012), 2012
Manuel Lopes, Tobias Lang, and Marc Toussaint
- Exploration in relational domains for model-based reinforcement learning. Journal of Machine Learning Research, 13:3691–3734, 2012
Tobias Lang, Marc Toussaint, and Kristian Kersting
- Active learning for teaching a robot grounded relational symbols. In Proc. of the Int. Joint Conf. on Artificial Intelligence (IJCAI 2013), 2013
Johannes Kulick, Marc Toussaint, Tobias Lang, and Manuel Lopes
- Exploiting symmetries for scaling loopy belief propagation and relational training. Machine Learning, 92(1): 91–132, 2013
Babak Ahmadi, Kristian Kersting, Martin Mladenov, and Sriraam Natarajan
- Reduce and re-lift: Bootstrapped lifted likelihood maximization for MAP. In Marie desJardins and Michael L. Littman, editors, Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI-2013), 2013
Fabian Hadiji and Kristian Kersting
- Efficient lifting of MAP LP relaxations using k-locality. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS-2014), pages 623–632, 2014
Martin Mladenov, Kristian Kersting, and Amir Globerson
- Lifted message passing as reparametrization of graphical models. In Nevin L. Zhang and Jin Tian, editors, Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI-2014), pages 603–612, 2014
Martin Mladenov, Amir Globerson, and Kristian Kersting
- Model-based relational RL when object existence is partially observable. In Proc. of the Int. Conf. on Machine Learning (ICML 2014), 2014
Ngo Anh Vien and Marc Toussaint
- Poisson dependency networks: Gradient boosted models for multivariate count data. Machine Learning, 100(2-3):477–507, 2015
Fabian Hadiji, Alejandro Molina, Sriraam Natarajan, and Kristian Kersting
(Siehe online unter https://doi.org/10.1007/s10994-015-5506-z) - “Model-Free Trajectory Optimization for Reinforcement Learning”. In: International Conference on Machine Learning (ICML). 2016
Riad Akrour, Abbas Abdolmaleki, Hany Abdulsamad, and Gerhard Neumann
- “Probabilistic Inference for Determining Options in Reinforcement Learning”. In: Machine Learning (2016)
Christian Daniel, Herke van Hoof, Jan Peters, and Gerhard Neumann
(Siehe online unter https://doi.org/10.1007/s10994-016-5580-x) - “Using Probabilistic Movement Primitives for Striking Movements”. In: International Conference on Humanoid Robots (Humanoids). 2016
Sebastian Gomez-Gonzalez, Gerhard Neumann, Bernhard Schölkopf, and Jan Peters
(Siehe online unter https://doi.org/10.1109/HUMANOIDS.2016.7803322) - “Empowered skills”. In: International Conference on Robotics and Automation (ICRA). 2017
Alexander Gabriel, Riad Akrour, Jan Peters and Gerhard Neumann
(Siehe online unter https://doi.org/10.1109/ICRA.2017.7989760) - “Layered Direct Policy Search for Learning Hierarchical Skills”. In: International Conference on Robotics and Automation (ICRA). 2017
Felix End, Riad Akrour, Jan Peters and Gerhard Neumann
(Siehe online unter https://doi.org/10.1109/ICRA.2017.7989761) - “Local Bayesian Optimization of Motor Skills”. In: International Conference on Machine Learning (ICML). 2017
Riad Akrour, Dmitry Sorokin, Jan Peters, and Gerhard Neumann
- “Adaptation and Robust Learning of Probabilistic Movement Primitives”. In: IEEE Transactions on Robotics (T-RO) (2018)
Sebastian Gomez-Gonzalez, Gerhard Neumann, Bernhard Schölkopf, and Jan Peters
(Siehe online unter https://doi.org/10.1109/TRO.2019.2937010) - “Model-Free Trajectory-based Policy Optimization with Monotonic Improvement”. In: Journal of Machine Learning Resource (JMLR) (2018)
Riad Akrour, Abbas Abdolmaleki, Hany Abdulsamad, Jan Peters, and Gerhard Neumann
(Siehe online unter https://doi.org/10.5445/IR/1000118268) - “Regularizing Reinforcement Learning with State Abstraction”. In: International Conference on Intelligent Robots and Systems (IROS). 2018
Riad Akrour, Filipe Veiga, Jan Peters, and Gerhard Neumann
(Siehe online unter https://doi.org/10.1109/IROS.2018.8594201)