Skip to main content
Conversational Video Interface (CVI) is a framework for creating real-time multimodal video interactions with AI. It enables an AI agent to see, hear, and respond naturally, mirroring human conversation. CVI is the world’s fastest interface of its kind. It maps a face (visual) and a PAL (behavior) onto your AI agent. With CVI, you can achieve low-latency utterance-to-utterance response: the full round-trip from when a participant speaks to when the PAL responds. CVI provides a comprehensive solution, with the option to plug in your existing components as required.

At a glance

Building with an AI coding agent or automation? Use https://docs.tavus.io/llms.txt for the canonical page index, https://docs.tavus.io/llms-full.txt for the full bundled docs export, and https://docs.tavus.io/openapi.yaml for the HTTP API contract.
  • CVI - Real-time multimodal video: the agent sees, hears, and responds; media runs over WebRTC (powered by Daily).
  • Latency - Utterance-to-utterance round-trip is optimized for real-time use (participant speaks → PAL replies).
  • Three pillars - PAL (behavior, knowledge, and CVI layer pipeline); Face (visual likeness, Phoenix); Conversation (live session linking a PAL and its face).
  • Pipeline (in order) - Perception (Raven) → Conversational Flow (Sparrow) → Speech recognition (STT) → Large language model (LLM) → Text-to-speech (TTS) → Realtime replica (Phoenix). Raven is visual perception; Sparrow handles turn-taking and interruptibility; Phoenix is the real-time visual face engine.
  • Where to configure - Most layers are set on the PAL.

Key Concepts

CVI is built around three core concepts that work together to create real-time, humanlike interactions with an AI agent:

PAL

The PAL defines the agent’s behavior, tone, and knowledge. It also configures the CVI layer and pipeline.

Face

The Face brings the PAL to life visually. It renders a photorealistic human-like avatar using Phoenix.

Conversation

A Conversation is a real-time video session that connects a PAL and its face through a WebRTC connection.

Key Features

Natural Interaction

CVI uses facial cues, body language, and real-time turn-taking to enable natural, human-like conversations.

Modular pipeline

Customize the Perception, STT, LLM and TTS layers to control identity, behavior, and responses.

Lifelike AI faces

Choose from over 100+ hyper-realistic stock faces or customize your own with human-like voice and expression.

Multilingual support

Hold natural conversations in 42+ languages using the supported TTS engines.

World's lowest latency

Experience real-time interactions with low utterance-to-utterance latency and smooth turn-taking.

Layers

The Conversational Video Interface (CVI) is built on a modular layer system, where each layer handles a specific part of the interaction. Together, they capture input, process it, and generate a real-time, human-like response. Here’s how the layers work together:
Uses Raven to analyze user expressions, gaze, background, and screen content. This visual context helps the PAL understand and respond more naturally.Configure the Perception layer
Controls the natural dynamics of conversation, including turn-taking and interruptibility. Uses Sparrow for intelligent turn detection, enabling the PAL to decide when to speak and when to listen.Configure the Conversational Flow layer
This layer transcribes user speech in real time with lexical and semantic awareness.Configure the Speech Recognition (STT) layer
Processes the user’s transcribed speech and visual input using a low-latency LLM. Tavus provides ultra-low latency optimized LLMs or lets you integrate your own.Configure the Large Language Model (LLM) layer
Converts the LLM response into speech using the supported TTS Engines (Cartesia (Default), ElevenLabs, Azure).Configure the Text-to-Speech (TTS) layer
Delivers a high-quality, synchronized face using Tavus’s real-time avatar engine (Phoenix).Face overview
Most layers are configurable via the PAL.

Getting Started

You can quickly create a conversation by using the PAL Maker or following the steps in the API Conversation Quickstart guide. If you use Cursor, Copilot, or another AI coding agent, use the copy-paste checklist on CVI App: AI Prompt. For web apps, start with CVI App Quickstart, then choose an embed path in Embed CVI. React apps that want Tavus-provided UI should use the @tavus/cvi-ui component library, including blocks, components, hooks, and server helpers.