Avatar

Associate Professor in Artificial Intelligence

The University of Melbourne

Biography

I am an Associate Professor in Artificial intelligencee at the School of Computing and Information Systems, The University of Melbourne. I’m a member of the Agent Lab group and the Digital Agriculture, Food and Wine lab.

My research focuses on how to introduce different approaches to the problem of inference in sequential decision problems, as well as applications to autonomous systems in agriculture.

I completed my PhD at the Artificial Intelligence and Machine Learning Group, Universitat Pompeu Fabra, under the supervision of Prof. Hector Geffner. I was a research fellow for 3 years under the supervision of Prof. Peter Stuckey and Prof. Adrian Pearce, working on solving Mining Scheduling problems through automated planning, constraint programming and operations research techniques.

Interests

  • AI planning
  • Search
  • Learning
  • Verification
  • Constraint Programming
  • Operations Research
  • Intention Recognition
  • Sequential Decision Problems
  • Autonomous Systems

Education

  • Graduate Certificate in University Teaching, 2020

    The University of Melbourne

  • PhD in Artificial Intelligence, 2012

    Universitat Pompeu Fabra

  • MEng in Artificial Intelligence, 2007

    Universitat Pompeu Fabra

  • BSc in Computer Science, 2004

    Universitat Pompeu Fabra

Recent News

All news»

[12/25] The Melbourne summer of AI & reasoning – We organised ICAPS 2025 in Melbourne co-located with KR and CPAIOR

[12/25] New Information Systems paper on Applying Organizational Mining to Discover Agent Systems from Event Data

[11/25] Summer school lecture on Classical Planning and PDDL at OPS-S 2025

[11/25] New ICAPS paper on Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

[11/25] New ICAPS workshop paper on On the Complexity of Computing the Planning Width

[11/25] New papers on RL Scheduling for HPC Systems 1, 2, 3

[09/25] New ECAI paper on On the Computational Complexity of Partial Satisfaction Planning

[07/25] New ACL paper on Planning-Driven Programming: A Large Language Model Programming Workflow

[07/25] New Information Systems paper on Process Mining over Sensor Data: Goal Recognition for Powered Transhumeral Prostheses

[04/25] New AAAI paper on Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts

Projects

AI Planning Solvers Online

Planning as a Service (PaaS) is an extendable API to deploy planners online in local or cloud servers

Farm.bot at The University of Melbourne

Farm.bot is an open-source robotic platform to explore problems on AI and Automation (Planning, Vision, Learning) for small scale …

Width Based Planning

Width Based Planning searches for solutions through a general measure of state novelty. Performs well over black-box simulators and …

Planimation

Planimation is a framework to visualise sequential solutions of planning problems specified in PDDL

Classical Planners

Awarded top performance classical planners in serveral International Planning Competitions 2008 - 2019

Trapper

Invariants, Traps, Un-reachability Certificates, and Dead-end Detection

AI 4 Education

Software to support AI courses in Mel & RMIT Unis (Melbourne, AUS)

Arcade Learning Environment

Classical Planners playing Atari 2600 games as well as Deep Reinforcement Learning

Linear Temporal Logic, Planning and Synthesis

classical planners computing infinite loopy plans, and FOND planners synthesizing controllers expressed as policies.

LAPKT

Lightweight Automated Planning ToolKiT (LAPKT) to build, use or extend basic to advanced Automated Planners

Recent Publications

Quickly discover relevant content by filtering publications.

Applying Organizational Mining to Discover Agent Systems from Event Data

Agent system mining is a recently introduced type of process mining that takes a bottom-up approach to the data-driven analysis of socio-technical systems that execute business processes in organizations. Instead of the top-down approach used in conventional process mining that studies a system in terms of its global state evolution, agent system mining analyzes the system as if it is composed of autonomous agents, each with its local state and behavior, interacting with other agents and the environment to contribute to the emerging global behavior of the business process. Recently, Agent Miner, the first algorithm for discovering agent systems from event data generated by process-aware information systems, has been proposed. The quality of the agent systems discovered by this algorithm depends on the quality of the agent types (or agents), which are identified from the available information about agent behavior in event data.

Job Scheduling in High Performance Computing Clusters with Deep Reinforcement Learning

High Performance Computing (HPC) systems are designed to execute large-scale computational workloads by leveraging parallel processing across multiple compute nodes. These systems support a wide range of applications, including scientific simulations, engineering design, and artificial intelligence, where tasks usually require substantial computational power and execution time. Efficient job scheduling in HPC environments is essential to ensure timely job completion and effective resource utilization. In HPC scheduling, two important components are the job queue and the computing cluster. The job queue holds jobs submitted by users, each describing the resources it needs — such as CPU cores, GPUs, memory, and expected runtime — before it can run. The computing cluster is the pool of available hardware resources that can be assigned to these jobs. Since resources are limited and jobs arrive over time with varying demands, the scheduler must continuously decide which jobs to run and how to allocate resources to them. This process, known as job scheduling, involves both selecting jobs from the queue and assigning suitable resources across the cluster. These decisions must respect system constraints, such as resource availability and job placement requirements, and aim to utilize resources well while maintaining job waiting time as low as possible.

Traditional HPC schedulers often rely on simple heuristics like First-Come-First-Served (FCFS) or Shortest Job First (SJF), which are easy to implement but can lead to poor resource utilization and job starvation under dynamic workloads. Meta-heuristic algorithms have been used to improve decision quality by exploring a broader solution space, but they are computationally costly and require careful tuning. Supervised machine learning methods have also been applied to predict job priorities, offering better scheduling decisions than fixed heuristics. However, they depend on labeled datasets and struggle to adapt to real-time system changes. In contrast, Deep Reinforcement Learning (DRL) enables a scheduler to learn scheduling policies through interaction with the system, using performance feedback to improve decisions over time. It does not rely on labeled data or predefined heuristics, and instead learns to optimize long-term scheduling outcomes by exploring and evaluating different actions in varied system states. This makes DRL well-suited for handling dynamic workloads, complex resource constraints, and changing system conditions in HPC environments. However, applying DRL to HPC scheduling introduces several key challenges. First, designing an effective job selector is difficult due to the unbounded and dynamic nature of the job queue. As jobs continuously arrive and depart, the number of scheduling actions and the information required to represent each job can vary significantly over time. This makes it challenging to define a consistent and scalable state representation and to construct a manageable action space for the agent. Second, job selection and resource allocation are often handled separately, which can result in poor coordination and inefficient scheduling. Addressing this requires a unified DRL framework that jointly considers both decisions. Third, scheduling objectives — such as minimizing job waiting time and maximizing resource utilization — can conflict and shift depending on workload intensity and system state, requiring the scheduler to dynamically adjust its priorities. Fourth, as HPC systems are upgraded, modified, or newly developed, changes in hardware architecture can alter the structure and semantics of the system state used by DRL-based schedulers. These changes affect how job and resource features are represented and interpreted, making it difficult to directly reuse DRL models trained in previous environments without adaptation.

In particular, this thesis makes the following contributions to DRL-based HPC scheduling. Develops a DRL-based job selector designed to handle unbounded job queues and support efficient backfilling. Presents a hierarchical DRL scheduler that jointly manages job selection and resource allocation in HPC environments. Introduces a dynamic controller that adjusts scheduling objectives based on real-time system conditions. Proposes a transfer learning framework that enables DRL schedulers to adapt efficiently to evolving HPC architectures.

MetaPilot: A DRL-Based Controller for Dynamic Adaptation to Shifting Scheduling Objectives in HPC Systems

Efficient job scheduling in high-performance computing (HPC) systems necessitates the simultaneous consideration of system-centric objectives, such as maximizing resource utilization, and user-centric objectives, such as minimizing job waiting times. In practice, the relative importance of these objectives is not static, but shifts dynamically in response to fluctuations in workload characteristics and system state. However, existing scheduling frameworks — including both traditional workload managers and reinforcement learning (RL)-based methods — typically rely on fixed policies or fixed reward functions that encode a predetermined combination of objectives. As a result, they lack the flexibility to adjust their scheduling priorities as workload intensity or system conditions change. In this work, we propose MetaPilot, a deep reinforcement learning-based controller designed to enable dynamic adaptation to shifting scheduling objectives in HPC systems.

Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs’ reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies—including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL)—across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates—a necessary prerequisite for validity—but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems.

On the Complexity of Computing the Planning Width

The width of a classical planning instance, among other metrics, indicates the computational difficulty of the instance. However, no result exists on the complexity of computing the width itself. In this paper, we address this by utilising an optimisation complexity framework. We focus on planning instances with polynomially bounded solutions, and prove that computing their width is OptP[O(log log L)]-hard, where L is the size of the instance. In turn, for the upper bound, we show that computing width is in OptP[O(log L)]^OptP[O(log L)]. These results contribute to the understanding of width as a proxy measure for the computational difficulty of planning, and suggest that exploiting other structural restrictions beyond bounding solution length can provide further insights on the complexity of width computation.

Students

Current Students

Ph.D.

  • Giacomo Rosa [2024 - current] co-supervised with Prof. Sebastian Sardina and Dr. Jean Honorio, Topic: Exploration methods for Planning

  • Jiajia Song [2024 - current] co-supervised with Prof. Sebastian Sardina and Dr. William Umboh, Topic: What Makes AI Planning Hard? From Complexity Analysis to Algorithm Design

  • David Adams [2024 - current] co-supervised with Dr. Renata Borovica-Gajic, Topic: Exploration Methods for Databases

  • Qingtan Shen [2023 - current] co-supervised with A/Prof. Artem Polyvyanyy and Dr. Timotheus Kampik, Topic: Multi-agent system discovery

  • Muhammad Bilal [2023 - current] co-supervised with Dr. Wafa Johal and Prof. Denny Oetomo, Topic: Towards Interactive Robot Learning for Complex Sequential Tasks

  • Ciao Lei [2022 - current]. co-supervised with Dr. Kris Ehinger and A/Prof Sigfredo Fuentes, Topic: Generalized vision planning problems and their applications in Agriculture

  • Zhiaho Pei [2022 - current]. co-supervized with Dr. Angela Rojas, Dr. Fjalar De Haan and Dr. Enayat A. Moallemi, Topic: Robust decision making for complex systems

Alumni

Ph.D.

Masters

Honours and Awards

Distinguished Program Committee - IJCAI-ECAI 2022

The quality of my reviews were ranked in the top 3% out of 3000+ reviewers.

Winner (PROBE planner) and Runner-up (BFWS planner)

Winner - Agile Track | Runner-up - Satisficing Track (BFWS planners)

Winner - Time Track | Runner-Up - Quality and Coverage tracks (LAPKT planners)

Best Dissertation Award (ICAPS)

Text of Award: Nir Lipovetzky takes a new, and very original, look at automated planning: how to reason your way to a plan, instead of searching (blindly or heuristically) for it. First, he has developed a range of novel inference techniques that, combined, produce classical planners that can work with very little backtracking – in many cases none at all – and perform well enough to be awarded at two IPCs. Second, he has invented a novel measure of the hardness of a planning problem, called “width”, and has shown that by properly exploiting it, a simple blind search can do as well as the best-performing heuristic search planners.

Service

Conference Chair

  • International Conference on Automated Planning and Scheduling, ICAPS (2025)

Program Chair

  • International Conference on Automated Planning and Scheduling, ICAPS (2019)

Organizing Committee

  • Optimisation and Planning ICAPS 2025 Summer School – Organizer, 2025

  • AgentsVic Autumn Symposium on Reasoning and Learning for Autonomous Agents – Organizer, 2024

  • International Conference on Automated Planning and Scheduling – Publicity co-chair, ICAPS (2010)

  • First Unsolvability International Planning Competition – Co-Organizer, UIPC-1 (2016)

  • Heuristics and Search for Domain-independent Planning – Co-Organizer, ICAPS workshop HSDIP (2015,2016,2017,2018)

  • Demonstration track – Co-Chair AAAI 2023

  • Student Abstract track – Co-Chair, AAAI (2018,2019)

  • Journal Presentation track – Co-Chair ICAPS (2018)

Senior Program Committee

  • Association for the Advancement of Artificial Intelligence, AAAI (2020,2021,2022,2023)
  • International Joint Conferences on Artificial Intelligence IJCAI (2021,2023)
  • Association for Computational Linguistics (ACL) conference Rolling Review, Area Chair, Jan 2026

Program Committee

  • International Joint Conferences on Artificial Intelligence IJCAI (2011,2013,2015,2017,2018,2020,2022)

  • Association for the Advancement of Artificial Intelligence, AAAI (2013,2015,2016,2017,2018,2019)

  • European Conference on Artificial Intelligence, ECAI (2014,2016)

  • International Conference on Automated Planning and Scheduling, ICAPS (2015,2016,2017,2018,2020)

  • Symposium on Combinatorial Search SOCS (2020,2021,2022,2023)

Reviewer

  • Journal of Artificial Intelligence Research, JAIR

  • Reviewer Artificial Intelligence, Elsevier AIJ

Other

  • ICAPS Awards Committee 2024

Teaching

  • Pacman Capture the flag Inter-University Contest, run for Unimelb AI coure and Hall of Fame contest, 2016 - current

  • AI Planning for Autonomy (Lecturer), at M.Sc. AI specialization, The University of Melbourne, 2016 - current

  • Data Structures and Algorithms (Lecturer), at The University of Melbourne, 2016 - current

  • Software Agents (Lecturer), at M.Sc. Software, The University of Melbourne, 2013, 2014, 2015

  • Autonomous Systems, at M.Sc. Intelligent Interactive Systems, University Pompeu Fabra, 2012

  • Advanced course on AI: workshop on RoboSoccer simulator, at Polytechnic School, University Pompeu Fabra, 2009, 2010, 2011

  • Artificial Intelligence course, at Polytechnic School, University Pompeu Fabra, 2010, 2011

  • Introduction to Data Structures and Algorithms course, at Polytechnic School, University Pompeu Fabra, 2008

  • Programming course, at Polytechnic School, University Pompeu Fabra, 2008, 2009, 2010, 2011

Contact