How it works

Drag an AI Agent node into the IVR designer and connect it wherever you want to delegate to AI: initial triage, structured data capture, smart FAQ or lead qualification. The caller speaks, the AI agent replies in natural voice, and at the end escalates to a human if needed.

Streaming pipeline

Caller audio → VAD detects speech/silence on the client.
Audio → streaming STT (Deepgram, ElevenLabs, Whisper).
Transcript → streaming LLM (OpenAI, Groq, Cerebras, Together).
LLM JSON output ({response, action, variables}) → streaming TTS sentence-by-sentence.
TTS audio → playback to the caller via uuid_audio_fork bidirectional.

Measured end-to-end latency: ~600 ms to first bot audio (with Groq + Deepgram). Better than humans saying "hi, hold on".

Barge-in with anti-echo cooldown

If the caller speaks during a bot reply, VAD detects speech_start, aborts the in-flight TTS and cancels the LLM generation. Processes the new utterance without infinite self-interruption loop (configurable post-playback cooldown).

LLM JSON output

The LLM always replies with structured JSON:

{
  "response": "Got it, can you confirm your ID number?",
  "action": "continue",         // continue | transfer | hangup
  "variables": {
    "intent": "invoice-question",
    "verified_email": "[email protected]"
  }
}

response becomes voice; action decides the next IVR node; variables merges into the session for downstream nodes (webhook, function, condition).

BYO model

STT: Deepgram Nova-2/Nova-3, ElevenLabs Scribe v2 Realtime, Whisper.
LLM: any OpenAI-compatible. Tested: OpenAI, Groq (~120 ms TTFT), Cerebras, Together.
TTS: ElevenLabs v2/v3 (audio tags supported — [laughs], [sighs]), OpenAI TTS.

Your API keys are AES-256-GCM encrypted in DB. We don't touch your token cost — you pay your provider directly.

Safeguards

max_turns — cut off after N exchanges (prevents loops).
max_duration_sec — cut off by timeout.
Automatic routing to human — the LLM can request a transfer when it detects frustration or out-of-scope topics.
Persisted conversation — turns, variables, metrics (tokens, ms per stage) saved for audit.

Post-call visualization

Each AI conversation lands in AIAgentsPage → Conversations with the full transcript, captured variables, exit_reason and per-stage metrics. Ideal for prompt coaching and variant A/B testing.

Conversational AI agents inside your IVR