CVI enables real-time, human-like video interactions through configurable lifelike replicas.
Conversational Video Interface (CVI) is a framework for creating real-time multimodal video interactions with AI. It enables an AI agent to see, hear, and respond naturally, mirroring human conversation.CVI is the world’s fastest interface of its kind. It allows you to map a human face and conversational ability onto your AI agent. With CVI, you can achieve utterance-to-utterance latency with SLAs under 1 second. This is the full round-trip time for a participant to say something and the replica to reply.CVI provides a comprehensive solution, with the option to plug in your existing components as required.
The Conversational Video Interface (CVI) is built on a modular layer system, where each layer handles a specific part of the interaction. Together, they capture input, process it, and generate a real-time, human-like response.Here’s how the layers work together:
1. Transport
Handles real-time audio and video streaming using WebRTC (powered by Daily). This layer captures the user’s microphone and camera input and delivers output back to the user.This layer is always enabled. You can configure input/output for audio (mic) and video (camera).
Create .env in the my-tavus-app folder (if not already there):
Copy
Ask AI
VITE_TAVUS_API_KEY=your_api_key_hereVITE_REPLICA_ID=rfe12d8b9597 // Default replica_id, can be overridden in .envVITE_PERSONA_ID=pdced222244b // Default persona_id, can be overridden in .env
Important: DO NOT create .env outside of the my-tavus-app folder. It must be inside your project directory where src exists.