Goal-Guided Reinforcement Learning: Leveraging Large Language Models for Long-horizon Task Decomposition

1National University of Singapore, Singapore
2University of Delaware, USA

Abstract

Reinforcement learning (RL) has long struggled with exploration in vast state-action spaces, particularly for intricate tasks that necessitate a series of well-coordinated actions. Meanwhile, large language models (LLMs) equipped with fundamental knowledge have been utilized for task planning across various domains. However, planning for long-term objectives can be demanding, as they function independently from task environments where their knowledge might not be perfectly aligned, hence often overlooking possible physical limitations. To this end, we propose a goal-based RL framework that leverages prior knowledge of LLMs to benefit the training process. We introduce a hierarchical module that features a goal generator to segment a long-horizon task into reachable subgoals and a policy planner to generate action sequences based on the current goal. Subsequently, the policies derived from LLMs guide the RL to achieve each subgoal sequentially. We evaluate the proposed framework across two distinct simulation environments, each presenting tasks that require a long sequence of actions for success. The results demonstrate its efficiency and robustness in handling novel tasks with complex state and action spaces.

Video Presentation

Background

When tasked with a complex long-horizon task, making decisions for the next step can be challenging from a large number of available options. And in this cases, if LLMs are directly instructed to target the task objective, potential environmental constraints can be easily overlooked. Instead, it is intuitive to divide the task objective to several manageable subgoals and achieve them sequentially.

Normal and Anomalous Representations Normal and Anomalous Representations

Framework

Our proposed LLM-assisted RL framework consists of two hierarchically connected modules:
(1) A subgoal generator (left side) that takes in the distant task objective and initial states of the environment.
(2) An policy generator (right side) based on the current state and the corresponding goal to output action sequences.

Normal and Anomalous Representations

The goal generator breaks down the complex task into several subgoals in text. Subsequently, the policy generator produces actions as the LLM policy based on the caption of a given state and current goal. The disparity between the agent's policy and LLM policy serves as an additional policy loss for RL algorithms. Additionally, a goal inspector measures the cosine similarity between the encoded embeddings of the goal and the state caption to check if the goal is reached.

Task Environments

FoodPreparation and Entertainment [1] in VirtualHome [2]

ROMAN Robot Environment [3]

Results

Reference

[1] Tan, Weihao, et al. "True knowledge comes from practice: Aligning llms with embodied environments via reinforcement learning." arXiv preprint arXiv:2401.14151 (2024).
[2] Puig, Xavier, et al. "Virtualhome: Simulating household activities via programs." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[3] Triantafyllidis, Eleftherios, et al. "Hybrid hierarchical learning for solving complex sequential tasks using the robotic manipulation network ROMAN." Nature Machine Intelligence 5.9 (2023): 991-1005.