Full-stack AI browser automation system with multi-model LLM strategies, real-time dashboard, and live VNC streaming. Built on browser-use 0.11.9, FastAPI, and Playwright.
Give the agent a task in plain English -- it controls a real Chromium browser to complete it. Watch every step live through VNC streaming or periodic screenshots, with full agent reasoning visible in real time.
Features at a Glance
6 LLM models from 4 providers (Anthropic, OpenAI, Google, Moonshot) -- selectable from the dashboard
5 multi-model strategies for reliability, quality, and resilience
Council Planning -- multiple models generate plans independently, then vote on the best one
Live VNC streaming -- watch the browser in real time via embedded noVNC
Plan tab with step-by-step progress indicators (done/current/pending)
Compact activity log with collapsible reasoning details
Capybara splash screen shown when idle, completed, failed, or stopped
Real-time WebSocket updates for all agent events
Single-file dashboard -- no build step, no dependencies, just HTML
Multi-Model Strategies
Click the gear icon in the top-right corner to open the model configuration bar.
Five strategies let you combine multiple LLM models for different goals:
| Strategy | How It Works | When to Use |
|----------|-------------|-------------|
| Single | One model handles all steps | Simple tasks, cost-sensitive |
| Fallback Chain | Primary runs; auto-switches to secondary on error/rate-limit | Reliability |
| | Strong model plans first; fast model executes browser steps | Complex multi-step tasks |
| | Primary acts; judge model validates every step + final verdict | Quality-critical tasks |
| | Primary runs; on failure/loop/stall, all council models convene to diagnose, advise, and replan | Hard tasks, anti-stall |
Planner + Executor
Consensus (Judge)
LLM Council
Planner + Executor
Select the Planner + Executor strategy to split planning from execution. A reasoning model generates a high-level plan before the run starts. A fast model then follows the plan step-by-step in the browser.
The plan appears in both the Activity tab (as a summary card) and the Plan tab (as interactive step indicators).
Council Planning
Enable the Council Planning checkbox to have multiple models each generate a plan independently, then vote on the best one. This produces higher-quality plans by combining different perspectives.
How it works:
Phase 1 -- Plan Generation: All selected council members independently produce a numbered plan (in parallel)
Phase 2 -- Voting: Each member votes for the best plan (not their own), judging on specificity, completeness, conciseness, and correct ordering
Winner Selection: The plan with the most votes is injected into the executor agent. Ties are broken by plan thoroughness.
Live Dashboard
Task Running
When a task is running, the left panel shows the live browser via VNC streaming. The right panel shows real-time status, current task, strategy badges, and the tabbed log panel.
Plan Tab
The Plan tab shows each step from the generated plan with radio-style progress indicators:
Pulsing coral dot -- current step being executed
Green checkmark -- completed step (struck through)
Empty circle -- pending step
Dashed circle -- skipped step
The tab badge shows progress like "Plan (2/6)".
Activity Log
The Activity tab shows a compact log of each agent step. Each entry displays:
Step badge with number
Model tag (which LLM produced this step)
URL the browser is on
Timestamp
Next goal -- the agent's stated intent (the main visible content)
Action pills -- what the agent actually did (click, type, scroll, etc.)
"show details" toggle -- expands to reveal the full evaluation, internal reasoning, and memory
Click "show details" on any entry to see the agent's full reasoning:
Task Completed
When a task finishes, a result card appears with the agent's output. The VNC area shows a capybara splash screen indicating the task status.
LLM Council Strategy
The LLM Council is the most advanced strategy. Select it to see the council member picker, where you choose which models participate.
A single primary model runs the task normally. When it gets stuck, all selected council members convene to diagnose the problem, provide advice, and optionally propose a revised plan.
Council Triggers
| Trigger | Threshold | What Happens |
|---------|-----------|--------------|
| Consecutive failures | 2+ errors | Council convenes with error context |
| Action loop | 3 identical actions | Council diagnoses why the agent is repeating itself |
| Step stall | 60+ seconds | Council investigates why the step is hanging |
3-Tier Loop Detection
Strict fingerprint -- Exact match on normalized URL + action type + target element + input text (3 repeats)
Loose fingerprint -- Same action type + same domain only (4 repeats)
URL stall -- Same URL with no new extracted content (5 repeats)
Council Recovery Flow
All council members are queried in parallel with the current state, action history, and error context
Each member provides: DIAGNOSE (root cause), ADVICE (next action), and optionally REPLAN (revised steps)
Council feedback is injected into the agent's memory via ActionResult.long_term_memory
If a council member proposes a replan, it replaces agent.state.plan
3-step cooldown between loop-triggered councils to prevent meta-loops
Available Models
Six models from four providers, all with vision support:
| Model | Provider | Tier | ID |
|-------|----------|------|----|
| Claude Sonnet 4.5 | Anthropic | fast | anthropic/claude-sonnet-4.5 |
| Claude Opus 4.5 | Anthropic | reasoning | anthropic/claude-opus-4.5 |
| GPT-4o | OpenAI | fast | openai/gpt-4o |
| Kimi K2.5 | Moonshot AI | fast | moonshotai/kimi-k2.5 |
| Gemini 2.5 Flash | Google | fast | google/gemini-2.5-flash |
| Gemini 2.5 Pro | Google | reasoning | google/gemini-2.5-pro |
To add more models, add entries to the AVAILABLE_MODELS list in agent_server.py. They automatically appear in the dashboard dropdowns and are available as council members.
Architecture
+------------------+
[Browser Dashboard] <---WS---> | FastAPI Server | <--controls--> [browser-use Agent]
(single HTML SPA) | agent_server.py | |
+------------------+ v
[Browser Dashboard] <-noVNC-> [x11vnc :5999] <--- [Xvfb :99] <--- [Chromium]
This installs libatk, libasound, libxkbcommon, fonts, and ~40 other system libraries that Chromium needs at runtime. This is separate from playwright install chromium which only downloads the browser binary.