Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

Abstract

The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs’ reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies—including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL)—across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates—a necessary prerequisite for validity—but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems.

Publication
Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS)

Related