Note: sudo is required on macOS for USB access to the RealSense camera.
The app will:
Connect to your RealSense camera
Start detecting people using MediaPipe Pose
Print detection status and Twist commands to console
Expose HTTP API on http://localhost:5050 for OpenClaw
Console Output
[DETECTION] Person #1: distance=1.35m, x=0.12m (conf=92%)
[TARGET] Following Person #1 (target: 1.0m)
[TWIST] linear_x=0.15 m/s, angular_z=-0.08 rad/s
HTTP API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| /start | POST | Start following (optional: description param) |
| /stop | POST | Stop following |
| /set_target | POST | Set target by description (uses VLM) |
| /set_distance | POST | Set target follow distance in meters |
| /status | GET | Get current status |
| /snapshot | GET | Get annotated camera frame |
| /mission | POST | Start autonomous mission with goal |
| /mission | GET | Get current mission status |
| /mission/cancel | POST | Cancel current mission |
| /analyze | POST | Analyze scene with custom VLM prompt |
| /events | GET/POST | Configure event webhooks |
| /events/test | POST | Test webhook connectivity |
| /teleop | POST | Natural language movement command |
| /move | POST | Move forward/backward by distance or time |
| /turn | POST | Turn left/right by angle |
| /velocity | POST | Set raw velocity command |
| /sequence | POST | Execute command sequence |
| /manual/status | GET | Get manual control status |
| /find_and_follow | POST | Search, find, approach, and track object |
| /find_object | POST | Find object in scene via VLM |
| /approach_object | POST | Find and approach an object |
| /look_for | POST | Scan environment for object |
| /objects | GET | List all visible objects |
| /health | GET | Health check endpoint |
Autonomous Missions
Start multi-step missions that execute independently:
# Follow until condition
curl -X POST http://localhost:5050/mission \
-H "Content-Type: application/json" \
-d '{"goal": "follow the person in red until they sit down"}'
# Find a specific person
curl -X POST http://localhost:5050/mission \
-H "Content-Type: application/json" \
-d '{"goal": "find a person wearing a hat"}'
# Patrol/scan the area
curl -X POST http://localhost:5050/mission \
-H "Content-Type: application/json" \
-d '{"goal": "patrol the area and report who you see"}'
# Check mission status
curl http://localhost:5050/mission
# Cancel mission
curl -X POST http://localhost:5050/mission/cancel
Scene Analysis
Use the VLM to analyze the current scene with custom questions:
curl -X POST http://localhost:5050/analyze \
-H "Content-Type: application/json" \
-d '{"prompt": "Is the person heading toward the exit?"}'
Example prompts:
"Is the person sitting or standing?"
"What is the person doing?"
"Are there obstacles between me and the target?"
"How many people are facing the camera?"
Event Webhooks
The robot can post events to OpenClaw or other services:
# Get current webhook config
curl http://localhost:5050/events
# Configure webhook
curl -X POST http://localhost:5050/events \
-H "Content-Type: application/json" \
-d '{"webhook_url": "http://localhost:18789/webhook", "enabled": true}'
# Test webhook
curl -X POST http://localhost:5050/events/test
Events posted automatically:
person_lost - Target lost for >2 seconds
person_found - Person detected after being lost
mission_completed - Mission finished successfully
mission_failed - Mission encountered an error
target_reached - Robot at target follow distance
Manual Teleoperation
Control the robot with natural language commands via the /teleop endpoint:
# Natural language command (parsed and executed)
curl -X POST http://localhost:5050/teleop \
-H "Content-Type: application/json" \
-d '{"command": "move forward 1 meter, turn left, go forward for 5 seconds, stop"}'
# Move by distance
curl -X POST http://localhost:5050/move \
-H "Content-Type: application/json" \
-d '{"distance": 1.0}'
# Move for duration
curl -X POST http://localhost:5050/move \
-H "Content-Type: application/json" \
-d '{"duration": 3.0, "velocity": 0.3}'
# Turn (positive=left, negative=right)
curl -X POST http://localhost:5050/turn \
-H "Content-Type: application/json" \
-d '{"angle": 90}'
# Command sequence
curl -X POST http://localhost:5050/sequence \
-H "Content-Type: application/json" \
-d '{"commands": [
{"type": "move", "distance": 1.0},
{"type": "turn", "angle": -90},
{"type": "move", "duration": 5, "velocity": 0.3}
]}'
Supported teleop commands:
"move forward/backward X meters" - Move by distance
"go forward for X seconds" - Move for duration
"turn left/right" - Turn 90 degrees
"turn left/right X degrees" - Turn specific angle
"turn around" - Turn 180 degrees
"stop" / "halt" - Stop all movement
Chain commands with "then", "and", or commas
Object Detection
Find and approach objects (not just people) using VLM:
# Find an object
curl -X POST http://localhost:5050/find_object \
-H "Content-Type: application/json" \
-d '{"object": "red chair"}'
# Approach an object
curl -X POST http://localhost:5050/approach_object \
-H "Content-Type: application/json" \
-d '{"object": "water bottle", "distance": 0.3}'
# Scan for an object (rotate to search)
curl -X POST http://localhost:5050/look_for \
-H "Content-Type: application/json" \
-d '{"object": "trash can"}'
# List all visible objects
curl http://localhost:5050/objects
Can find any describable object:
Furniture: chairs, tables, couches, desks
Electronics: laptops, phones, monitors, TVs
Household: bottles, cups, bags, boxes
Other: doors, plants, toys, books, balls, etc.
The VLM provides position estimates (left/center/right) and distance estimates (close/medium/far) which are converted to movement commands.
Find and Follow (Smart Object Tracking)
The /find_and_follow endpoint is the most powerful object tracking command:
# Basic - follow until target is lost
curl -X POST http://localhost:5050/find_and_follow \
-H "Content-Type: application/json" \
-d '{"object": "person", "distance": 1.0}'
# Continuous mode - search for new targets when lost
curl -X POST http://localhost:5050/find_and_follow \
-H "Content-Type: application/json" \
-d '{"object": "person", "distance": 1.0, "continuous": true}'
Parameters:
object: Description of what to find (use "person" for people)
distance: Target follow distance in meters (default: 0.5)
track: Keep tracking after reaching target (default: true)
continuous: Search for new target when current is lost (default: false)
max_search_rotations: Max rotations when searching (default: 1.5)
Behavior:
Checks if object is visible in current camera view
If not found, rotates to search (30° increments, up to 540° by default)
Once found, turns to face and approaches
When at target distance, continues tracking (adjusting position as object moves)
If target is lost:
Default: Mission ends, robot stops
Continuous mode: Robot searches for a new target and continues
Continuous mode is useful for scenarios like:
Security patrol: "Follow anyone who enters the area"
Reception: "Greet and follow visitors as they arrive"
Demo: "Keep following people until I say stop"
This means you can say "find and follow the red ball" even if the ball is behind the robot - it will search, locate, and pursue it.
OpenClaw Integration
1. Initial OpenClaw Setup
If you haven't set up OpenClaw yet:
# Install OpenClaw
npm install -g openclaw
# Run onboarding (sets up workspace, gateway, etc.)
openclaw onboard
2. Configure OpenAI API Key
OpenClaw needs an OpenAI API key for the chat model:
# Run configuration wizard
openclaw configure --section model
Follow the prompts to enter your OpenAI API key.
3. Set the Model to gpt-4o-mini
Edit ~/.openclaw/openclaw.json and ensure the model is set to a chat model (not a code completion model):
Note: The default openai/codex-mini-latest is a code completion model and won't work for chat.
4. Install the Follow-Robot Skill
# Copy skill to OpenClaw workspace
cp -r skill/follow-robot ~/.openclaw/workspace/skills/
5. Add Robot to TOOLS.md
Add the following to ~/.openclaw/workspace/TOOLS.md so the agent knows about the robot:
## Active Devices
### š¤ Follow Robot (RealSense Camera)
A robot follower system running at `http://localhost:5050` with RealSense depth camera.
**When to use:** Any question about robots, following, tracking, persons, distance, or commands like "start following", "stop", "who do you see", "how far away".
**Quick commands (use exec with curl):**
- Status: `curl -s http://localhost:5050/status`
- Start: `curl -s -X POST http://localhost:5050/start`
- Stop: `curl -s -X POST http://localhost:5050/stop`
- Set distance: `curl -s -X POST http://localhost:5050/set_distance -H "Content-Type: application/json" -d '{"distance": 1.5}'`
See `skills/follow-robot/SKILL.md` for full documentation.
6. Restart the Gateway
# Clear any cached sessions and restart
rm -f ~/.openclaw/agents/main/sessions/sessions.json
openclaw gateway restart
7. Open WebChat
openclaw webchat
Or navigate to http://127.0.0.1:18789/ in your browser.
Chat Commands
Once everything is configured, chat naturally:
"How far away is the person?"
"Start following"
"Stop following"
"Set the follow distance to 1.5 meters"
"What's the robot status?"
"Follow the person in the red shirt"
Troubleshooting OpenClaw
"NO_REPLY" in chat:
Clear the session cache: rm -f ~/.openclaw/agents/main/sessions/sessions.json
Restart gateway: openclaw gateway restart
Agent doesn't respond to robot commands:
Ensure TOOLS.md has the robot section (step 5 above)
Verify the skill is installed: openclaw skills list | grep follow
API key issues:
Re-run: openclaw configure --section model
Check ~/.openclaw/.env contains your OPENAI_API_KEY