How it works
Drag an AI Agent node into the IVR designer and connect it wherever you want to delegate to AI: initial triage, structured data capture, smart FAQ or lead qualification. The caller speaks, the AI agent replies in natural voice, and at the end escalates to a human if needed.
Streaming pipeline
- Caller audio → VAD detects speech/silence on the client.
- Audio → streaming STT (Deepgram, ElevenLabs, Whisper).
- Transcript → streaming LLM (OpenAI, Groq, Cerebras, Together).
- LLM JSON output (
{response, action, variables}) → streaming TTS sentence-by-sentence. - TTS audio → playback to the caller via
uuid_audio_forkbidirectional.
Measured end-to-end latency: ~600 ms to first bot audio (with Groq + Deepgram). Better than humans saying "hi, hold on".
Barge-in with anti-echo cooldown
If the caller speaks during a bot reply, VAD detects speech_start, aborts the in-flight TTS and cancels the LLM generation. Processes the new utterance without infinite self-interruption loop (configurable post-playback cooldown).
LLM JSON output
The LLM always replies with structured JSON:
{
"response": "Got it, can you confirm your ID number?",
"action": "continue", // continue | transfer | hangup
"variables": {
"intent": "invoice-question",
"verified_email": "[email protected]"
}
} response becomes voice; action decides the next IVR
node; variables merges into the session for downstream nodes
(webhook, function, condition).
BYO model
- STT: Deepgram Nova-2/Nova-3, ElevenLabs Scribe v2 Realtime, Whisper.
- LLM: any OpenAI-compatible. Tested: OpenAI, Groq (~120 ms TTFT), Cerebras, Together.
- TTS: ElevenLabs v2/v3 (audio tags supported —
[laughs],[sighs]), OpenAI TTS.
Your API keys are AES-256-GCM encrypted in DB. We don't touch your token cost — you pay your provider directly.
Safeguards
- max_turns — cut off after N exchanges (prevents loops).
- max_duration_sec — cut off by timeout.
- Automatic routing to human — the LLM can request a transfer when it detects frustration or out-of-scope topics.
- Persisted conversation — turns, variables, metrics (tokens, ms per stage) saved for audit.
Post-call visualization
Each AI conversation lands in AIAgentsPage → Conversations with the full transcript, captured variables, exit_reason and per-stage metrics. Ideal for prompt coaching and variant A/B testing.