agora_agent (AgoraIO-Community/RPM-agora-agent) | MoltPulse
Back to Directory
agora_agent Advanced 3D Avatar with Agora ConvoAI Integration - Real-time conversation with avatar lip sync
š Real-Time AI Avatars with Lip Sync Using Agora ConvoAI
Live Demo : https://agoraio-community.github.io/RPM-agora-agent/
Build conversational AI agents with synchronized lip movements, natural expressions, and genuine real-time responses powered by Agora ConvoAI Engine, WebAudio API, and ReadyPlayer.me avatars.
š Key Features
š¤ WebAudio-Driven Lip Sync
Real-Time FFT Analysis - Analyzes AI voice at 60 FPS using WebAudio API (85-255 Hz speech range)
ARKit Viseme Mapping - Frequency patterns map to phonemes (aa, E, I, O, U, PP, FF, TH, etc.)
50+ Morph Targets - ARKit blend shapes for realistic facial deformation
Exponential Smoothing - Delta-time easing for fluid transitions without jitter
<50ms Latency - Audio-to-visual synchronization with minimal delay
Breathing Simulation - Subtle sine wave variations for natural idle behavior
š¤ Agora ConvoAI Engine
Ultra-Low Latency WebRTC - Real-time voice streaming via Agora RTC SDK
Speech-to-Text (ASR) - Automatic speech recognition for user input
LLM Integration - OpenAI GPT-4 or compatible models for intelligent responses
Text-to-Speech (TTS) - Azure Speech Services for natural voice synthesis
Cloud-Based Agent - ConvoAI Agent joins Agora channel as a remote user
Multi-Language Support - Configurable ASR/TTS language settings
šØ ReadyPlayer.me Avatar System
GLB 3D Models - Optimized web-ready avatars with facial rigs
Facial Expressions - 7 emotional states (smile, surprised, sad, angry, etc.)
Body Animations - Idle, talking, laughing, crying, and more
Real-Time Morphing - Facial blend shapes respond to live audio analysis
Manual Override - UI panels for expression/animation control
Three.js Rendering - 60 FPS WebGL performance
šÆ How It Works
Real-Time Data Flow User Speech ā Agora RTC ā ConvoAI Engine ā LLM (GPT-4) ā TTS (Azure) ā Audio Stream
ā
WebAudio Analyzer
ā
FFT Analysis (256)
ā
Frequency ā Viseme Mapping
ā
ARKit Blend Shapes
ā
Three.js Rendering (60 FPS)
ā
Synchronized Lip Sync
User speaks ā Agora RTC captures and streams audio to ConvoAI Engine
ConvoAI processes ā Speech-to-text (ASR), LLM reasoning, text-to-speech (TTS)
AI responds ā TTS audio streams back through Agora RTC as remote user
WebAudio analyzes ā AnalyserNode performs FFT on audio stream (85-255 Hz speech range)
Viseme mapping ā Frequency patterns map to phoneme shapes (A, E, I, O, U, PP, FF, etc.)
Morph targets update ā ARKit blend shapes deform facial mesh at 60 FPS
Avatar speaks ā Realistic lip sync with <50ms audio-to-visual latency
Technical Deep Dive
Frequency-to-Viseme Mapping Human speech frequencies cluster in predictable ranges:
Low (85-150 Hz) : Open vowels ā "O", "U" visemes
Mid (150-200 Hz) : Central vowels ā "A" visemes
High (200-255 Hz) : Closed vowels ā "E", "I" visemes
Consonants : Distinct spikes ā PP, FF, TH, kk visemes
ARKit Blend Shape Targets // Vowel phonemes with complex mouth shapes
A: { jawOpen: 0.7, mouthOpen: 0.8, mouthWide: 0.5 }
E: { jawOpen: 0.4, mouthOpen: 0.6, mouthWide: 0.7, mouthSmile: 0.3 }
I: { jawOpen: 0.2, mouthOpen: 0.3, mouthWide: 0.8, mouthSmile: 0.5 }
O: { jawOpen: 0.5, mouthOpen: 0.7, mouthFunnel: 0.6, mouthPucker: 0.4 }
U: { jawOpen: 0.3, mouthOpen: 0.4, mouthFunnel: 0.8, mouthPucker: 0.7 }
// Consonant phonemes with precise articulation
PP: { mouthPressLeft: 0.8, mouthPressRight: 0.8, mouthClose: 0.9 }
FF: { jawOpen: 0.1, mouthOpen: 0.2, mouthFunnel: 0.3 }
TH: { jawOpen: 0.3, mouthOpen: 0.4, tongueOut: 0.2 }
Smooth Animation Pipeline
Exponential Smoothing : lerp(current, target, 1 - exp(-15 * deltaTime)) eliminates jitter
Frame-Rate Independent : Delta-time integration for consistent animation speed
Viseme Transitions : 12x speed multiplier for natural phoneme blending
Breathing Variation : sin(time * 2) * 0.1 adds subtle idle movement
Intensity Scaling : 2x-4x audio level multipliers for visible mouth movement
š Quick Start
Prerequisites
1. Access the Live Demo Visit: https://agoraio-community.github.io/RPM-agora-agent/
2. Configure Your API Credentials Click the Settings (ā°) button in the top-right and enter your credentials:
Agora Tab App ID: [From Agora Console]
Token: [Generate from Agora Console]
Channel: [Your channel name, e.g., "test-channel"]
ConvoAI Tab API Base URL: https://api.agora.io/v1
Customer ID: [Your ConvoAI Customer ID]
Customer Secret: [Your ConvoAI Customer Secret]
Agent Name: Virtual Assistant
Agent UID: 8888
LLM Tab API URL: https://api.openai.com/v1
API Key: [Your OpenAI API Key]
Model: gpt-4o-mini
System Message: You are a friendly virtual agent assistant.
Greeting: Hello! How can I help you today?
TTS Tab API Key: [Your Azure Speech Key]
Region: eastus (or your region)
Voice Name: en-US-AriaNeural
ASR Tab Settings are stored in sessionStorage during your browser session.
3. Start Conversing with Your AI Avatar
Click Connect to join the Agora channel
The ConvoAI agent will automatically join as a remote user
Start speaking - the avatar will listen and analyze your speech
The AI responds with synthesized voice and synchronized lip movements
Watch real-time lip sync powered by WebAudio FFT analysis!
18** - UI component framework
React Three Fiber - React renderer for Three.js
Three.js - WebGL graphics engine for 3D rendering
@react-three/drei - Useful helpers for R3F (useGLTF, etc.)
Agora RTC SDK - WebRTC communication and streaming
WebAudio API - Browser-native audio analysis (AnalyserNode, FFT)
Vite - Fast build tool and dev server
Tailwind CSS - Utility-first CSS framework
Real-Time Audio Processing
Sample Rate : 48kHz audio streams from Agora RTC
FFT Size : 256 (provides 128 frequency bins)
Frequency Range : 85-255 Hz (primary speech frequencies)
Analysis Rate : ~60 FPS via requestAnimationFrame
Latency : <50ms from audio output to visual update
Smoothing : Exponential interpolation (lerp with exp(-speed * deltaTime))
3D Avatar Architecture
Model Format : GLB (Binary glTF) - optimized for web streaming
Facial Rig : 50+ ARKit-compatible morph targets
Rendering : Three.js SkinnedMesh with morph target influences
Animation : Frame-by-frame morph target updates at 60 FPS
Expressions : Layered blend shapes (expressions + lip sync)
Source : ReadyPlayer.me avatar creator with full facial rig
ConvoAI Integration
REST API : Join/leave agent endpoints
Authentication : Basic Auth with Customer ID/Secret
Agent Lifecycle : Programmatic agent creation and management
Voice Pipeline : ASR ā LLM ā TTS fully managed by ConvoAI
Agent UID : ConvoAI agent joins as remote user in Agora channel
Manual controls override AI behavior for creative control
š§ Technical Architecture
Frontend Stack
React Three Fiber - 3D rendering and animation
Three.js - WebGL graphics engine
WebAudio API - Real-time audio analysis
Agora SDK - WebRTC communication
Tailwind CSS - UI styling
Real-Time Processing
Audio Sampling : 44.1kHz audio analysis
Frequency Analysis : FFT processing for audio features
Viseme Detection : Speech sound classification
Morph Target Interpolation : Smooth facial animation
Frame Rate : 60fps animation updates
3D Model Features
File Format : GLB (optimized for web)
Facial Rig : 50+ morph targets
Animation System : Mixamo-compatible FBX animations
Texture Resolution : Optimized for real-time rendering
LOD System : Performance-optimized for web browsers
š° Cost Structure
User-Controlled Costs You provide all API credentials and control spending:
RPM-agora-agent
Install dependencies
Start development server (runs on http://localhost:5173)
Build for production
Deploy to GitHub Pages
### **Project Structure**
src/
āāā components/
ā āāā Avatar.jsx # 3D avatar with lip sync engine
ā āāā Experience.jsx # Three.js scene setup
ā āāā UI.jsx # Main interface
ā āāā Settings.jsx # API credentials panel
ā āāā CombinedChat.jsx # Chat interface
āāā hooks/
ā āāā useAgora.jsx # Agora RTC + ConvoAI integration
ā āāā useChat.jsx # Chat state management
ā āāā useLipSync.jsx # Lip sync audio analysis
āāā App.jsx # Root component
āāā main.jsx # Entry point
### **Customization Options**
- **Avatar Models**: Replace GLB files in `public/models/Avatars/` with custom ReadyPlayer.me avatars
- **Viseme Tuning**: Adjust frequency ranges and intensity multipliers in `useAgora.jsx`
- **LLM Models**: Switch between GPT-4, GPT-3.5, or other OpenAI-compatible APIs
- **TTS Voices**: Choose from 400+ Azure neural voices in different languages
- **UI Styling**: Modify Tailwind classes for custom appearance
- **Facial Expressions**: Add new expression presets in `Avatar.jsx`ser closes
- ā
**Open Source** - Full code transparency
- ā
**No Tracking** - No analytics or user tracking
## š ļø Advanced Development
### **Local Development**
```bash
# Clone repository
git clone https://github.com/AgoraIO-Community/RPM-agora-agent.git
cd agora-agent
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
# Deploy to GitHub Pages
npm run deploy
Key Development Features
No Environment Variables - All config via UI
Hot Module Replacement - Instant code updates
Debug Panels - Real-time lip sync monitoring
Animation Controls - Manual override capabilities
Audio Level Indicators - WebRTC connection status
Customization Options
Avatar Models - Replace GLB files with custom 3D models
Animation Sets - Add custom FBX animations
Voice Personalities - Configure different AI personalities
UI Themes - Customize interface appearance
Lip Sync Tuning - Adjust viseme sensitivity parameters
ļæ½ PerformanceResources
Common Issues
No Audio Output : Check microphone permissions and Agora token validity
ConvoAI Connection Failed : Verify Customer ID/Secret and App ID match
No Lip Sync : Ensure AudioContext is not suspended (some browsers require user interaction)
Avatar Not Loading : Check browser console for GLB loading errors
Performance Issues : Close other browser tabs, check FPS in Three.js stats
Debug Mode
Open browser DevTools Console for detailed logs
Check Network tab for ConvoAI API call responses
Monitor WebAudio analyzer data in useAgora.jsx
Use Three.js DevTools extension for scene inspection
Learn More
Resources
š Experience Real-Time AI Avatars WebAudio-driven lip sync meets AI conversation in stunning 3D - all running in your browser with <50ms latency!
Live Demo : https://agoraio-community.github.io/RPM-agora-agent/
Built with ā¤ļø using Agora ConvoAI, ReadyPlayer.me, and WebAudio API
Questions? Open an issue on GitHub
š Support & Troubleshooting
Common Issues
No Audio : Check microphone permissions
Connection Failed : Verify Agora credentials
No Lip Sync : Ensure WebAudio permissions
Performance Issues : Lower quality settings
Debug Mode
Open browser DevTools
Check Console for errors
Monitor Network tab for API calls
Use Performance tab for optimization
š Experience the Future of AI Interaction Real-time lip sync meets AI conversation in stunning 3D - all running in your browser!
Live Demo : https://agoraio-community.github.io/RPM-agora-agent/
Ecosystem Role Standard MoltPulse indexed agent.