I am an Associate Professor in Artificial intelligencee at the School of Computing and Information Systems, The University of Melbourne. I’m a member of the Agent Lab group and the Digital Agriculture, Food and Wine lab.
My research focuses on how to introduce different approaches to the problem of inference in sequential decision problems, as well as applications to autonomous systems in agriculture.
I completed my PhD at the Artificial Intelligence and Machine Learning Group, Universitat Pompeu Fabra, under the supervision of Prof. Hector Geffner. I was a research fellow for 3 years under the supervision of Prof. Peter Stuckey and Prof. Adrian Pearce, working on solving Mining Scheduling problems through automated planning, constraint programming and operations research techniques.
Graduate Certificate in University Teaching, 2020
The University of Melbourne
PhD in Artificial Intelligence, 2012
Universitat Pompeu Fabra
MEng in Artificial Intelligence, 2007
Universitat Pompeu Fabra
BSc in Computer Science, 2004
Universitat Pompeu Fabra
[12/25] The Melbourne summer of AI & reasoning – We organised ICAPS 2025 in Melbourne co-located with KR and CPAIOR
[12/25] New Information Systems paper on Applying Organizational Mining to Discover Agent Systems from Event Data
[11/25] Summer school lecture on Classical Planning and PDDL at OPS-S 2025
[11/25] New ICAPS paper on Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation
[11/25] New ICAPS workshop paper on On the Complexity of Computing the Planning Width
[11/25] New papers on RL Scheduling for HPC Systems 1, 2, 3
[09/25] New ECAI paper on On the Computational Complexity of Partial Satisfaction Planning
[07/25] New ACL paper on Planning-Driven Programming: A Large Language Model Programming Workflow
[07/25] New Information Systems paper on Process Mining over Sensor Data: Goal Recognition for Powered Transhumeral Prostheses
[04/25] New AAAI paper on Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts
Planning as a Service (PaaS) is an extendable API to deploy planners online in local or cloud servers
Farm.bot is an open-source robotic platform to explore problems on AI and Automation (Planning, Vision, Learning) for small scale …
Width Based Planning searches for solutions through a general measure of state novelty. Performs well over black-box simulators and …
Planimation is a framework to visualise sequential solutions of planning problems specified in PDDL
Awarded top performance classical planners in serveral International Planning Competitions 2008 - 2019
Invariants, Traps, Un-reachability Certificates, and Dead-end Detection
Software to support AI courses in Mel & RMIT Unis (Melbourne, AUS)
Classical Planners playing Atari 2600 games as well as Deep Reinforcement Learning
classical planners computing infinite loopy plans, and FOND planners synthesizing controllers expressed as policies.
Lightweight Automated Planning ToolKiT (LAPKT) to build, use or extend basic to advanced Automated Planners
Width-based algorithms search for solutions through a general definition of state novelty. These algorithms have been shown to result in state-of-the-art performance in classical planning, and have been successfully applied to model-based and model-free settings where the dynamics of the problem are given through simulation engines. Width-based algorithms performance is understood theoretically through the notion of planning width, providing polynomial guarantees on their runtime and memory consumption. To facilitate synergies across research communities, this paper summarizes the area of width-based planning, and surveys current and future research directions.
Agent system mining is a recently introduced type of process mining that takes a bottom-up approach to the data-driven analysis of socio-technical systems that execute business processes in organizations. Instead of the top-down approach used in conventional process mining that studies a system in terms of its global state evolution, agent system mining analyzes the system as if it is composed of autonomous agents, each with its local state and behavior, interacting with other agents and the environment to contribute to the emerging global behavior of the business process. Recently, Agent Miner, the first algorithm for discovering agent systems from event data generated by process-aware information systems, has been proposed. The quality of the agent systems discovered by this algorithm depends on the quality of the agent types (or agents), which are identified from the available information about agent behavior in event data.
High Performance Computing (HPC) systems are designed to execute large-scale computational workloads by leveraging parallel processing across multiple compute nodes. These systems support a wide range of applications, including scientific simulations, engineering design, and artificial intelligence, where tasks usually require substantial computational power and execution time. Efficient job scheduling in HPC environments is essential to ensure timely job completion and effective resource utilization. In HPC scheduling, two important components are the job queue and the computing cluster. The job queue holds jobs submitted by users, each describing the resources it needs — such as CPU cores, GPUs, memory, and expected runtime — before it can run. The computing cluster is the pool of available hardware resources that can be assigned to these jobs. Since resources are limited and jobs arrive over time with varying demands, the scheduler must continuously decide which jobs to run and how to allocate resources to them. This process, known as job scheduling, involves both selecting jobs from the queue and assigning suitable resources across the cluster. These decisions must respect system constraints, such as resource availability and job placement requirements, and aim to utilize resources well while maintaining job waiting time as low as possible.
Traditional HPC schedulers often rely on simple heuristics like First-Come-First-Served (FCFS) or Shortest Job First (SJF), which are easy to implement but can lead to poor resource utilization and job starvation under dynamic workloads. Meta-heuristic algorithms have been used to improve decision quality by exploring a broader solution space, but they are computationally costly and require careful tuning. Supervised machine learning methods have also been applied to predict job priorities, offering better scheduling decisions than fixed heuristics. However, they depend on labeled datasets and struggle to adapt to real-time system changes. In contrast, Deep Reinforcement Learning (DRL) enables a scheduler to learn scheduling policies through interaction with the system, using performance feedback to improve decisions over time. It does not rely on labeled data or predefined heuristics, and instead learns to optimize long-term scheduling outcomes by exploring and evaluating different actions in varied system states. This makes DRL well-suited for handling dynamic workloads, complex resource constraints, and changing system conditions in HPC environments. However, applying DRL to HPC scheduling introduces several key challenges. First, designing an effective job selector is difficult due to the unbounded and dynamic nature of the job queue. As jobs continuously arrive and depart, the number of scheduling actions and the information required to represent each job can vary significantly over time. This makes it challenging to define a consistent and scalable state representation and to construct a manageable action space for the agent. Second, job selection and resource allocation are often handled separately, which can result in poor coordination and inefficient scheduling. Addressing this requires a unified DRL framework that jointly considers both decisions. Third, scheduling objectives — such as minimizing job waiting time and maximizing resource utilization — can conflict and shift depending on workload intensity and system state, requiring the scheduler to dynamically adjust its priorities. Fourth, as HPC systems are upgraded, modified, or newly developed, changes in hardware architecture can alter the structure and semantics of the system state used by DRL-based schedulers. These changes affect how job and resource features are represented and interpreted, making it difficult to directly reuse DRL models trained in previous environments without adaptation.
In particular, this thesis makes the following contributions to DRL-based HPC scheduling. Develops a DRL-based job selector designed to handle unbounded job queues and support efficient backfilling. Presents a hierarchical DRL scheduler that jointly manages job selection and resource allocation in HPC environments. Introduces a dynamic controller that adjusts scheduling objectives based on real-time system conditions. Proposes a transfer learning framework that enables DRL schedulers to adapt efficiently to evolving HPC architectures.
Efficient job scheduling in high-performance computing (HPC) systems necessitates the simultaneous consideration of system-centric objectives, such as maximizing resource utilization, and user-centric objectives, such as minimizing job waiting times. In practice, the relative importance of these objectives is not static, but shifts dynamically in response to fluctuations in workload characteristics and system state. However, existing scheduling frameworks — including both traditional workload managers and reinforcement learning (RL)-based methods — typically rely on fixed policies or fixed reward functions that encode a predetermined combination of objectives. As a result, they lack the flexibility to adjust their scheduling priorities as workload intensity or system conditions change. In this work, we propose MetaPilot, a deep reinforcement learning-based controller designed to enable dynamic adaptation to shifting scheduling objectives in HPC systems.
The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs’ reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies—including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL)—across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates—a necessary prerequisite for validity—but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems.
The width of a classical planning instance, among other metrics, indicates the computational difficulty of the instance. However, no result exists on the complexity of computing the width itself. In this paper, we address this by utilising an optimisation complexity framework. We focus on planning instances with polynomially bounded solutions, and prove that computing their width is OptP[O(log log L)]-hard, where L is the size of the instance. In turn, for the upper bound, we show that computing width is in OptP[O(log L)]^OptP[O(log L)]. These results contribute to the understanding of width as a proxy measure for the computational difficulty of planning, and suggest that exploiting other structural restrictions beyond bounding solution length can provide further insights on the complexity of width computation.
Giacomo Rosa [2024 - current] co-supervised with Prof. Sebastian Sardina and Dr. Jean Honorio, Topic: Exploration methods for Planning
Jiajia Song [2024 - current] co-supervised with Prof. Sebastian Sardina and Dr. William Umboh, Topic: What Makes AI Planning Hard? From Complexity Analysis to Algorithm Design
David Adams [2024 - current] co-supervised with Dr. Renata Borovica-Gajic, Topic: Exploration Methods for Databases
Qingtan Shen [2023 - current] co-supervised with A/Prof. Artem Polyvyanyy and Dr. Timotheus Kampik, Topic: Multi-agent system discovery
Muhammad Bilal [2023 - current] co-supervised with Dr. Wafa Johal and Prof. Denny Oetomo, Topic: Towards Interactive Robot Learning for Complex Sequential Tasks
Ciao Lei [2022 - current]. co-supervised with Dr. Kris Ehinger and A/Prof Sigfredo Fuentes, Topic: Generalized vision planning problems and their applications in Agriculture
Zhiaho Pei [2022 - current]. co-supervized with Dr. Angela Rojas, Dr. Fjalar De Haan and Dr. Enayat A. Moallemi, Topic: Robust decision making for complex systems
Sukai Huang [2022 - 2025]. co-supervized with Prof. Trevor Cohn, Thesis:
Integrating Natural Language in Sequential Decision Problems First Employment: Post-Doc @ Monash University [2025- current]
Lingfei Wang [2021 - 2025], co-supervised with Dr.Maria Rodriguez. Thesis:
Job Scheduling in High Performance Computing Clusters with Deep Reinforcement Learning First Employment: TBA
Guang Hu [2021 - 2025], co-supervised with Dr.Tim Miller. Thesis:
“Seen Is Believing”: Modeling and Solving Epistemic Planning Problems using Justified Perspectives First Employment: TBA
Zihang Su [2020 - 2024], co-supervised with Dr. Artem Polyvyanyy and Prof. Sebastian Sardina. Thesis:
Evidence-Based Goal Recognition Using Process Mining Techniques First Employment: Post-Doc @ Tsinghua University [2024 - current]
Chenyuan Zhang [2020 - 2024], co-supervised with A/Prof. Charles Kemp (Psychology). Thesis:
Planning and Goal Recognition in Humans and Machines First Employment: Post-Doc @ Monash University [2024 - current] Best Student Paper Award AAMAS (2024)
Anubhav Singh [2019 - 2024], co-supervized with Dr. Miquel Ramirez and Prof. Peter Stuckey. Thesis:
Lazy Constraint Generation and Tractable Approximations for Large-scale Planning Problems First Employment: Post-Doc @ Universtiy of Toronto [2024 - current]
Stefan O'Toole [2018 - 2022], co-supervized with Dr. Miquel Ramirez and Prof. Adrian Pearce. Thesis:
The Intersection of Planning and Learning through Cost-to-go Approximations, Imitation and Symbolic Regression First Employment: Meta [2022 - current]
Toby Davies, [2013-2017], co-supervized with Prof. Adrian Pearce, Prof. Peter Stuckey and Prof. Harald Sondergaard. Thesis:
Learning from Conflict in Multi-Agent, Classical, and Temporal Planning. First Employment: Google [2017 - current]. Best Paper Award ICAPS (2015), Best PhD Thesis, Melbourne School of Engineering 2018
Giacomo Rosa [2023-2024]. Thesis:
Count-Based Novelty Exploration and
ECAI24 paper
Zhiaho Pei [2021]. co-supervized with Dr. Angela Rojas, Dr. Fjalar De Haan and Dr. Enayat A. Moallemi, Thesis:
Robust decision making for complex systems
Marco Marasco [2021]. co-supervized with Dr. Angela Rojas, Dr. Fjalar De Haan and Dr. Enayat A. Moallemi, Thesis:
Adaptive Policy making for systems of electricity provision
Jiayuan Chang [2021]. co-supervized with A/Prof Sigfredo Fuentes, Thesis:
FarmBot.io Automated Planning: simulation and integration
Yajing Ma [2021]. co-supervized with A/Prof Sigfredo Fuentes, Thesis:
Electronic Nose for pest detection
Dmitry Grebenyuk [2018-2020], co-supervised with Dr. Miquel Ramirez, and Dr. Kris Ehinger. Thesis:
Agnostic Features for generalized policies computed with Deep Reinforcement Learning (DRL). First Employment: Start-up working on Image Processing using DRL
Guang Hu [2018-2020], co-supervised with Dr.Tim Miller. Thesis:
What you get is what you see: Decomposing Epistemic Planning using Functional STRIPS. PhD Candidate [2021 - current]
Ciao Lei [2019-2020]. Thesis:
Regression and Width in Classical Planning and
ICAPS21 paper
ICAPS (2025)ICAPS (2019)
Optimisation and Planning ICAPS 2025 Summer School – Organizer, 2025
AgentsVic Autumn Symposium on Reasoning and Learning for Autonomous Agents – Organizer, 2024
International Conference on Automated Planning and Scheduling – Publicity co-chair, ICAPS (2010)
First Unsolvability International Planning Competition – Co-Organizer, UIPC-1 (2016)
Heuristics and Search for Domain-independent Planning – Co-Organizer, ICAPS workshop HSDIP (2015,2016,2017,2018)
Demonstration track – Co-Chair AAAI 2023
Student Abstract track – Co-Chair, AAAI (2018,2019)
Journal Presentation track – Co-Chair ICAPS (2018)
AAAI (2020,2021,2022,2023)IJCAI (2021,2023)Jan 2026International Joint Conferences on Artificial Intelligence IJCAI (2011,2013,2015,2017,2018,2020,2022)
Association for the Advancement of Artificial Intelligence, AAAI (2013,2015,2016,2017,2018,2019)
European Conference on Artificial Intelligence, ECAI (2014,2016)
International Conference on Automated Planning and Scheduling, ICAPS (2015,2016,2017,2018,2020)
Symposium on Combinatorial Search SOCS (2020,2021,2022,2023)
Journal of Artificial Intelligence Research, JAIR
Reviewer Artificial Intelligence, Elsevier AIJ
Pacman Capture the flag Inter-University Contest, run for Unimelb AI coure and
Hall of Fame contest, 2016 - current
AI Planning for Autonomy (Lecturer), at M.Sc. AI specialization, The University of Melbourne,
2016 - current
Data Structures and Algorithms (Lecturer), at The University of Melbourne,
2016 - current
Software Agents (Lecturer), at M.Sc. Software, The University of Melbourne, 2013, 2014, 2015
Autonomous Systems, at M.Sc. Intelligent Interactive Systems, University Pompeu Fabra, 2012
Advanced course on AI: workshop on RoboSoccer simulator, at Polytechnic School, University Pompeu Fabra, 2009, 2010, 2011
Artificial Intelligence course, at Polytechnic School, University Pompeu Fabra, 2010, 2011
Introduction to Data Structures and Algorithms course, at Polytechnic School, University Pompeu Fabra, 2008
Programming course, at Polytechnic School, University Pompeu Fabra, 2008, 2009, 2010, 2011