Tractable novelty exploration over Continuous and Discrete Sequential Decision Problems

Abstract

Sequential decision problems, where an agent is trying to find a sequence of actions to maximise a utility function or to satisfy a goal condition, have been the focus of different research communities: Control, Reinforcement Learning and AI Planning. My talk is within the AI planning community, where I will focus on the latest advances over width-based planning algorithms, which have shown to yield state-of-the-art AI Planners that rely mostly on structural exploration features, rather than goal-oriented heuristics or gradients.

Width-based planners search for a solution using a measure of the novelty of states, where states need to be defined over a set of features. It is known that state novelty evaluation is exponential on the cardinality of the set of features. In this talk, I will address two limitations of current width-based planning: 1) How to define state features over continuous dynamics, where the space of features is unbounded, and 2) present new methods to obtain polynomial approximations of novelty through sampling and bloom filters.

I will discuss the performance of the resulting polynomial planners over discrete sequential decision problems and compare with state-of-the-art deep reinforcement learning algorithms over “classical control” benchmarks from openAI gym, showing that width-based planners can find policies of the same quality with significantly less computational resources.

Date
01 Sep 2021 00:00
Event
School of Computing and Information Systems Seminar
Location
The University of Melbourne

Video available via the link above.

Related