A general-purpose robotic agent framework based on LLMs. The LLM can independently reason, plan, and execute actions to operate diverse robot types across various scenarios to complete unpredictable, complex tasks.
LEO-RobotAgent is a general-purpose robotic intelligent agent framework based on Large Language Models (LLMs). Under this framework, LLMs can operate different types of robots in various scenarios to complete unpredictable complex tasks, demonstrating high generalizability and robustness.
The LLM-based general-purpose robotic agent framework, LEO-RobotAgent, is shown in the figure above. The large model can autonomously think, plan, and act within this clear framework. We provide a modular and easily registrable collection of tools, enabling the LLM to flexibly invoke various tools according to different needs. At the same time, the framework provides a human-computer interaction mechanism, allowing the algorithm to collaborate with humans like a partner.
The LLM relies on preset prompts and user tasks to output information, actions, and action parameters. The tool collection can cover various domains based on actual situations, requiring basic information such as enable status, tool name, corresponding function, and tool description. Observations provide varied feedback content depending on the tool. During the loop, the History is continuously accumulated for subsequent operations by the LLM.
The figure above shows an application system designed around LEO-RobotAgent. We built this complete system based on ROS and Web technologies. Users can directly operate the visual interface to configure existing tools, converse and interact with the Agent, monitor topics, etc. The system is easy to extend and get started with in terms of tool registration and node management.
Demonstration
The demonstration video above presents four sets of experiments, namely basic features verification, real UAV experiment, UAV urban searching experiment, and long-horizon task experiment with the wheeled robot equipped with the robotic arm.
The simulation and corresponding real-world experiments are shown above. An example of the Agent's operation process and output during a task can be found in this file.
Project Content
Our framework has been verified for feasibility on UAVs, custom-made wheeled mobile robots (with robotic arms), and mechanical dogs. The project contains their relevant ready-made control nodes.
Development Environment: Ubuntu 20.04 + ROS Noetic. The core framework works in other environments, but robots may need self-adaptation. The following installation steps may omit some detailed libraries; they are for reference only. Supplements are welcome.
General Configuration
First, download this repository to your workspace.
Install Python dependencies (Python 3.8 confirmed to work):
Note: This project uses the Qwen3 series models, including Qwen-VL, so adaptation is only ensured for these models. If there are conflicts in the LLM output format, you can modify it in src/agent/src/api_agent.py.
Next are the dependencies required for the corresponding robots. Below are the robots already adapted to LEO-RobotAgent. Configure as needed. Remember to catkin_make and source at the end.
Our application system is Web-based. The interface is shown above. The System panel in the top left can start various preset terminal commands (including but not limited to roslaunch, rosrun). You can also add your own. LEO-RobotAgent is our core architecture; all buttons will actually open a terminal (closing the corresponding terminal will shut down the node), facilitating debugging output. The Camera panel allows switching and viewing Image format topics.
The Tools panel allows you to set the tools available to the Agent. You can check tasks to enable them, or double-click to change the configuration like in Excel. You can also add new tools via the bottom button (not visible in the image). Any changes must be saved by clicking Save. The LEO-RobotAgent node must be restarted for changes to take effect.
On the right is the Chat Interface. Entering commands can issue tasks. You can also input during task execution to interrupt, temporarily modify tasks, or point out errors. After the task for the current stage is completed, the Agent will output the final answer in a green bubble. You can continue to issue tasks afterwards (memory is retained). Blue bubbles indicate tool calling Actions, and yellow indicates Observation results.
Preset questions, tool configurations, and preset terminal commands are saved under src/agent/config. They are automatically loaded every time the web interface is opened. You can perform more detailed additions, deletions, and modifications there, or check which program file is running and develop it yourself.
Running the Program
First, start the server: python3 src/agent/webui/server.py.
Next, open in a browser: src/agent/webui/web_ui.html. Then start RosBridge and VideoServer (if you want camera feed) in the System panel.
Then, depending on the robot:
UAV:
Configure and save needed tools in the Tools panel (uav_fly is necessary).
Sequentially start via buttons: QGC, UAV sim, UAV fly (wait for gazebo to load fully), Vision, LEO-RobotAgent.
Wheeled Robot with Arm:
Configure and save needed tools in the Tools panel (car_run, arm_grasp are necessary).
Sequentially start via buttons: Car sim, Car ctrl (wait for gazebo to load fully), Arm ctrl, Vision, LEO-RobotAgent.
Mechanical Dog:
Configure and save needed tools in the Tools panel (dog_run is necessary).
Sequentially start via buttons: Dog sim, Dog joint (wait for gazebo to load fully), Dog ctrl, Vision, LEO-RobotAgent.
Finally, input commands in the chat interface to issue tasks for automatic execution.
About the Vision Node
The Vision node provides VLM and object detection as visual tools. The implementation method of VLM can be rewritten in vision.py according to different model interface implementations; object detection uses yolov8l-worldv2. You can choose and download models from Ultralytics and place them in src/agent/weights.
You can fill in uav, car, or dog in src/agent/config/vision_device.txt to adapt to cameras and other topics.
Developing New Tools
If you want to develop new tools based on this project, here is an example of creating the simplest tool.
First, define a new function add under AgentTools in src/agent/src/tools.py:
def add(self, nums):
return nums['a'] + nums['b']
Then add a tool in the Web Tools panel, fill in the corresponding content, check and save it, for example:
Name: add, Function: add, Description: Input a dictionary with a, b. Return: the result of a + b.
It is now ready for use. From this, you can also implement complex algorithms in your own project and then register them into tools.py by setting ROS topics as interfaces.
Modifying the Prompt
src/agent/src/api_agent.py contains the core code of this framework. The prompts within can be modified according to your own tasks. Tools and Vision also use LLM and VLM, which can be modified independently.
🔥 Manual Execution (Legacy)
You can still manually open multiple terminals to run these commands.
roslaunch px4 mavros_posix_sitl.launch
# Choose your own world
roslaunch px4 mavros_posix_sitl.launch world:=/path/to/your.world
# Takeoff/Land commands
commander takeoff
commander land
UAV Control Node
source ./devel/setup.bash && rosrun agent fly.py
🦾 Wheeled Robot with Arm
Car Launch
source ./devel/setup.bash
# Without Arm
roslaunch agent gazebo_car.launch
# With Arm
roslaunch armcar_moveit_config demo_gazebo.launch
# No GUI
roslaunch armcar_moveit_config demo_gazebo.launch gazebo_gui:=false
Car Nodes
# Car Control Node
source ./devel/setup.bash && rosrun agent car_ctrl.py
# Arm Control Node
source ./devel/setup.bash && rosrun agent arm_ctrl.py