Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tavus.io/llms.txt

Use this file to discover all available pages before exploring further.

Conversational Video Interface (CVI) is a framework for creating real-time multimodal video interactions with AI. It enables an AI agent to see, hear, and respond naturally, mirroring human conversation. CVI is the world’s fastest interface of its kind. It allows you to map a human face and conversational ability onto your AI agent. With CVI, you can achieve low-latency utterance-to-utterance response: the full round-trip time from when a participant speaks to when the replica replies. CVI provides a comprehensive solution, with the option to plug in your existing components as required. At a glance
  • CVI — Real-time multimodal video: the agent sees, hears, and responds; media runs over WebRTC (powered by Daily).
  • Latency — Utterance-to-utterance round-trip is optimized for real-time use (participant speaks → replica replies).
  • Three pillarsPersona (behavior, knowledge, and CVI layer pipeline); Replica (visual digital human, Phoenix); Conversation (live session linking persona and replica).
  • Pipeline (in order) — Perception (Raven) → Conversational Flow (Sparrow) → Speech recognition (STT) → Large language model (LLM) → Text-to-speech (TTS) → Realtime replica (Phoenix). Raven is visual perception; Sparrow handles turn-taking and interruptibility; Phoenix is the real-time visual replica engine.
  • Where to configure — Most layers are set on the Persona.

Key Concepts

CVI is built around three core concepts that work together to create real-time, humanlike interactions with an AI agent:

Persona

The Persona defines the agent’s behavior, tone, and knowledge. It also configures the CVI layer and pipeline.

Replica

The Replica brings the persona to life visually. It renders a photorealistic human-like avatar using Phoenix.

Conversation

A Conversation is a real-time video session that connects the persona and replica through a WebRTC connection.

Key Features

Natural Interaction

CVI uses facial cues, body language, and real-time turn-taking to enable natural, human-like conversations.

Modular pipeline

Customize the Perception, STT, LLM and TTS layers to control identity, behavior, and responses.

Lifelike AI replicas

Choose from over 100+ hyper-realistic stock replicas or customize your own with human-like voice and expression.

Multilingual support

Hold natural conversations in 42+ languages using the supported TTS engines.

World's lowest latency

Experience real-time interactions with low utterance-to-utterance latency and smooth turn-taking.

Layers

The Conversational Video Interface (CVI) is built on a modular layer system, where each layer handles a specific part of the interaction. Together, they capture input, process it, and generate a real-time, human-like response. Here’s how the layers work together:
Uses Raven to analyze user expressions, gaze, background, and screen content. This visual context helps the replica understand and respond more naturally.Configure the Perception layer
Controls the natural dynamics of conversation, including turn-taking and interruptibility. Uses Sparrow for intelligent turn detection, enabling the replica to decide when to speak and when to listen.Configure the Conversational Flow layer
This layer transcribes user speech in real time with lexical and semantic awareness.Configure the Speech Recognition (STT) layer
Processes the user’s transcribed speech and visual input using a low-latency LLM. Tavus provides ultra-low latency optimized LLMs or lets you integrate your own.Configure the Large Language Model (LLM) layer
Converts the LLM response into speech using the supported TTS Engines (Cartesia (Default), ElevenLabs).Configure the Text-to-Speech (TTS) layer
Delivers a high-quality, synchronized digital human using Tavus’s real-time avatar engine (Phoenix).Replica overview
Most layers are configurable via the Persona.

Getting Started

You can quickly create a conversation by using the Developer Portal or following the steps in the CVI Quickstart guide. If you use Cursor, Copilot, or another AI coding agent, use the copy-paste checklist on AI Prompt: CVI Quickstart.