The Intersection of Planning and Learning through Cost-to-go Approximations, Imitation and Symbolic Regression


This thesis explores the intersection between planning and learning methods for autonomous sequential decision-making. Planning is a model-based approach to autonomous sequential decision-making where action policies are derived automatically through a model of an en- vironment. Alternatively, learning methods learn action policies through interaction with an environment. The planning and learning approaches can be likened to current theories of human cognition which propose a fast and associative system works in conjunction with a slow and deliberative one. From this observation previous work has conjectured that in order to create intelligent systems that are more general and robust than existing ones, a combination of planning and learning methods may be required. Two common high-level approaches for combining planning and learning are to use learning to help guide the search effort of planners and to use planners to teach learning algorithms. This thesis examines these two high-level approaches through the topics of cost-to-go approxima- tions, symbolic regression and imitation. We propose and study a number of new algorithms which provide new insights into methods that combine planning and learning, namely, we intro- duce methods for learning value and policy functions from lookeaheads; learning from single demonstrations produced by planners; and learning heuristics for planning algorithms.

The Univesity of Melbourne