# Authentication Source: https://docs.tavus.io/api-reference/authentication Learn how to generate and use your Tavus API key to authenticate requests. To use the Tavus API, you need an API key to authenticate your requests. This key verifies that requests are coming from your Tavus account. ## Get the API key 1. Go to the Developer Portal and select **API Key** from the sidebar menu. 2. Click **Create New Key** to begin generating your API key. 3. Enter a name for the key and (optional) specify allowed IP addresses, then click **Create API Key**. 4. Copy your newly created API key and store it securely. **Remember that your API key is a secret!** Never expose it in client-side code such as browsers or apps. Always load your API key securely from environment variables or a server-side configuration. ## Make Your First Call Authentication to the API is performed via HTTP Basic Auth. To authenticate with Tavus's API endpoints, you must provide the API Key in the header, as shown below. ```curl Authentication Header theme={null} 'x-api-key: ' ``` For example, you are using the [POST - Create Conversation](/api-reference/conversations/create-conversation) endpoint to create a real-time video call session with a Tavus replica. In this scenario, you can send an API request and replace `` with your actual API Key. ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "replica_id": "r5f0577fc829", "persona_id": "pdac61133ac5", "conversation_name": "Interview User" }' ``` # Create Conversation Source: https://docs.tavus.io/api-reference/conversations/create-conversation post /v2/conversations Start a real-time video conversation with your AI replica and persona. This endpoint starts a real-time video conversation with your AI replica, powered by a persona that allows it to see, hear, and respond like a human. **Core Components:** * Replica - Choice of audio/visual appearance * Persona - Define the replica's behavior and capabilities The response includes a `conversation_url` that you can use to join the call or embed it on your website. [Learn how to embed it here](/sections/integrations/embedding-cvi). If you provide a `callback_url`, you'll receive webhooks with updates about the conversation status. [Learn more about Callback here](/sections/webhooks-and-callbacks). Required parameters vary depending on the use case: **Full Pipeline Conversation:** * `persona_id` * `replica_id` **Audio-Only Conversation:** * `persona_id` * `replica_id` * `audio_only` - `replica_id` is **required** if the persona does **not** have a default replica. - `replica_id` is **optional** if the persona **does** have a default replica. - If both a default replica and `replica_id` are provided, the supplied `replica_id` will **override** the default. # Delete Conversation Source: https://docs.tavus.io/api-reference/conversations/delete-conversation delete /v2/conversations/{conversation_id} This endpoint deletes a single conversation by its unique identifier. # End Conversation Source: https://docs.tavus.io/api-reference/conversations/end-conversation post /v2/conversations/{conversation_id}/end This endpoint ends a single conversation by its unique identifier. # Get Conversation Source: https://docs.tavus.io/api-reference/conversations/get-conversation get /v2/conversations/{conversation_id} This endpoint returns a single conversation by its unique identifier. You can append `?verbose=true` to the URL to receive additional event data in the response, including: * `shutdown_reason`: The reason why the conversation ended (e.g., "participant\_left\_timeout") * `transcript`: A complete transcript of the conversation with role-based messages (via `application.transcription_ready`) * `system.replica_joined`: When the replica joined the conversation * `system.shutdown`: When and why the conversation ended * `application.perception_analysis`: The final visual analysis of the user that includes their appearance, behavior, emotional states, and screen activities This is particularly useful as an alternative to using the `callback_url` parameter on the create conversation endpoint for retrieving detailed conversation data. # List Conversations Source: https://docs.tavus.io/api-reference/conversations/get-conversations get /v2/conversations This endpoint returns a list of all Conversations created by the account associated with the API Key in use. # Create Document Source: https://docs.tavus.io/api-reference/documents/create-document post /v2/documents Upload documents to your knowledge base for personas to reference during conversations For now, our Knowledge Base only supports documents written in English and works best for conversations in English. We'll be expanding our Knowledge Base language support soon! Create a new document in your [Knowledge Base](/sections/conversational-video-interface/knowledge-base). When you hit this endpoint, Tavus kicks off the processing of the document, so it can be used as part of your knowledge base in conversations once processing is complete. The file size limit is 50MB. The processing can take up to a few minutes depending on file size. Currently, we support the following file formats: .pdf, .txt, .docx, .doc, .png, .jpg, .pptx, .csv, and .xlsx. Website URLs are also supported, where a website snapshot will be processed and transformed into a document. You can manage documents by adding tags using the `tags` field in the request body. Once created, you can add the document to your personas (see [Create Persona](/api-reference/personas/create-persona)) and your conversations (see [Create Conversation](/api-reference/conversations/create-conversation)). ## Website Crawling When creating a document from a website URL, you can optionally enable multi-page crawling by providing the `crawl` parameter. This allows the system to follow links from your starting URL and process multiple pages into a single document. ### Without Crawling (Default) By default, only the single page at the provided URL is scraped and processed. ### With Crawling When you include the `crawl` object, the system will: 1. Start at your provided URL 2. Follow links to discover additional pages 3. Process all discovered pages into a single document **Example request with crawling enabled:** ```json theme={null} { "document_name": "Company Knowledge Base", "document_url": "https://docs.example.com/", "crawl": { "depth": 2, "max_pages": 20 }, "callback_url": "https://your-server.com/webhook" } ``` ### Crawl Parameters | Parameter | Type | Description | | ----------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------- | | `depth` | integer (1-10) | How many levels deep to follow links from the starting URL. A depth of 1 means only pages directly linked from the starting URL. | | `max_pages` | integer (1-100) | Maximum number of pages to crawl. Processing stops once this limit is reached. | ### Rate Limits To prevent abuse, crawling has the following limits: * Maximum **100 crawl documents** per user * Maximum **5 concurrent crawls** at any time * **1-hour cooldown** between recrawls of the same document ### Keeping Content Fresh Once a document is created with crawl configuration, you can trigger a recrawl to fetch fresh content using the [Recrawl Document](/api-reference/documents/recrawl-document) endpoint. # Delete Document Source: https://docs.tavus.io/api-reference/documents/delete-document delete /v2/documents/{document_id} Delete a specific document Delete a document and its associated data using its unique identifier. # Get Document Source: https://docs.tavus.io/api-reference/documents/get-document get /v2/documents/{document_id} Retrieve a specific document by ID Retrieve detailed information about a specific document using its unique identifier. # List Documents Source: https://docs.tavus.io/api-reference/documents/get-documents get /v2/documents Retrieve a list of documents with optional filtering and pagination Retrieve a list of documents with support for pagination, sorting, and filtering by various criteria. # Update Document Source: https://docs.tavus.io/api-reference/documents/patch-document patch /v2/documents/{document_id} Update a specific document's metadata Update metadata for a specific document. This endpoint allows you to modify the document name and its tags. # Recrawl Document Source: https://docs.tavus.io/api-reference/documents/recrawl-document post /v2/documents/{document_id}/recrawl Trigger a recrawl of a website document to fetch fresh content Trigger a recrawl of a document that was created with crawl configuration. This is useful for keeping your knowledge base up-to-date when website content changes. ## When to Recrawl Use this endpoint when: * The source website has been updated with new content * You want to refresh the document's content on a schedule * The initial crawl encountered errors and you want to retry ## How Recrawling Works When you trigger a recrawl: 1. The system uses the same starting URL from the original document 2. Links are followed according to the crawl configuration (depth and max\_pages) 3. New content is processed and stored 4. Old vectors are replaced with the new content once processing completes 5. The document's `crawl_count` is incremented and `last_crawled_at` is updated ## Requirements * **Document State**: The document must be in `ready` or `error` state * **Crawl Configuration**: The document must have been created with a `crawl` configuration, or you must provide one in the request body ## Rate Limits To prevent abuse, the following limits apply: * **Cooldown Period**: 1 hour between recrawls of the same document * **Concurrent Crawls**: Maximum 5 crawls running simultaneously per user * **Total Documents**: Maximum 100 crawl documents per user ## Overriding Crawl Configuration You can optionally provide a `crawl` object in the request body to override the stored configuration for this recrawl: ```json theme={null} { "crawl": { "depth": 3, "max_pages": 50 } } ``` If no `crawl` object is provided, the original crawl configuration from document creation is used. ## Monitoring Recrawl Progress After initiating a recrawl: 1. The document status changes to `recrawling` 2. If you provided a `callback_url` during document creation, you'll receive status updates 3. When complete, the status changes to `ready` (or `error` if it failed) 4. Use [Get Document](/api-reference/documents/get-document) to check the current status # Create Guardrails Source: https://docs.tavus.io/api-reference/guardrails/create-guardrails post /v2/guardrails This endpoint creates a new set of guardrails for a persona. Guardrails provide strict behavioral boundaries and guidelines that will be rigorously followed throughout conversations. # Delete Guardrails Source: https://docs.tavus.io/api-reference/guardrails/delete-guardrails delete /v2/guardrails/{guardrails_id} This endpoint deletes a single set of guardrails by its unique identifier. # Get Guardrails (One Set) Source: https://docs.tavus.io/api-reference/guardrails/get-guardrails get /v2/guardrails/{guardrails_id} This endpoint returns a single set of guardrails by its unique identifier. # Get Guardrails (All Sets) Source: https://docs.tavus.io/api-reference/guardrails/get-guardrails-list get /v2/guardrails This endpoint returns a list of all sets of guardrails. # Patch Guardrails Source: https://docs.tavus.io/api-reference/guardrails/patch-guardrails patch /v2/guardrails/{guardrails_id} This endpoint allows you to update specific fields of guardrails using JSON Patch operations. **Note:** The `path` field is a JSON Pointer string that references a location within the target document where the operation is performed. For example: ```json [ { "op": "replace", "path": "/data/0/guardrails_prompt", "value": "Your updated prompt"}, { "op": "add", "path": "/data/0/callback_url", "value": "https://your-server.com/webhook" } ] ``` * Ensure the `path` field matches the current guardrails schema. * For the `remove` operation, the `value` parameter is not required. # Create Objectives Source: https://docs.tavus.io/api-reference/objectives/create-objectives post /v2/objectives This endpoint creates a new objective for a persona. Objectives provide goal-oriented instructions that help guide conversations toward specific achievements and desired outcomes. # Delete Objective Source: https://docs.tavus.io/api-reference/objectives/delete-objectives delete /v2/objectives/{objectives_id} This endpoint deletes a single objective by its unique identifier. # Get Objective Source: https://docs.tavus.io/api-reference/objectives/get-objectives get /v2/objectives/{objectives_id} This endpoint returns a single objective by its unique identifier. # Get Objectives Source: https://docs.tavus.io/api-reference/objectives/get-objectives-list get /v2/objectives This endpoint returns a list of all objectives. # Patch Objective Source: https://docs.tavus.io/api-reference/objectives/patch-objectives patch /v2/objectives/{objectives_id} This endpoint allows you to update specific fields of an objective using JSON Patch operations. **Note:** The `path` field is a JSON Pointer string that references a location within the target document where the operation is performed. For example: ```json [ { "op": "replace", "path": "/data/0/objective_name", "value": "updated_objective_name" }, { "op": "replace", "path": "/data/0/objective_prompt", "value": "Updated prompt for the objective" }, { "op": "replace", "path": "/data/0/confirmation_mode", "value": "manual" }, { "op": "add", "path": "/data/0/output_variables", "value": ["new_variable"] }, { "op": "replace", "path": "/data/0/modality", "value": "visual" }, { "op": "remove", "path": "/data/0/callback_url" } ] ``` # Overview Source: https://docs.tavus.io/api-reference/overview Discover the Tavus API — build a real-time, human-like multimodal video conversation with a replica. ## Getting Started with Tavus APIs Tavus APIs allow you to create a Conversational Video Interface (CVI), an end-to-end pipeline for building real-time video conversations with an AI replica. Each replica is integrated with a persona that enables it to see, hear, and respond like a human. You can access the API through standard HTTP requests, making it easy to integrate Conversational Video Interface (CVI) into any application or platform. ### Who Is This For? This API is for developers looking to add real-time, human-like AI interactions into their apps or services. ### What Can You Do? Use the end-to-end Conversational Video Interface (CVI) pipeline to build human-like, real-time multimodal video conversations with these three core components: Define the agent’s behavior, tone, and knowledge. Train a lifelike digital twin from a short 2-minute video. Create a real-time video call session with your AI replica. # Create Persona Source: https://docs.tavus.io/api-reference/personas/create-persona post /v2/personas Create and customize a persona's behavior and capabilities for CVI. This endpoint creates and customizes a digital replica's behavior and capabilities for Conversational Video Interface (CVI). **Core Components:** * Replica - Choice of audio/visual appearance * Context - Customizable contextual information, for use by LLM * System Prompt - Customizable system prompt, for use by LLM * Layers * Perception - Multimodal vision and understanding settings (Raven) * STT - Transcription and turn taking settings (Sparrow) * Conversational Flow - Turn-taking, interruption handling, and active listening settings * LLM - Language model settings * TTS - Text-to-Speech settings For detailed guides on each layer of the Conversational Video Interface, click here. When using full pipeline mode, the `system_prompt` field is required. # Delete Persona Source: https://docs.tavus.io/api-reference/personas/delete-persona delete /v2/personas/{persona_id} This endpoint deletes a single persona by its unique identifier. # Get Persona Source: https://docs.tavus.io/api-reference/personas/get-persona get /v2/personas/{persona_id} This endpoint returns a single persona by its unique identifier. # List Personas Source: https://docs.tavus.io/api-reference/personas/get-personas get /v2/personas This endpoint returns a list of all Personas created by the account associated with the API Key in use. # Patch Persona Source: https://docs.tavus.io/api-reference/personas/patch-persona patch /v2/personas/{persona_id} This endpoint updates a persona using a JSON Patch payload (RFC 6902). You can modify **any field within the persona** using supported operations like `add`, `remove`, `replace`, `copy`, `move`, and `test`. For example: ```json [ { "op": "replace", "path": "/persona_name", "value": "Wellness Advisor" }, { "op": "replace", "path": "/default_replica_id", "value": "rf4e9d9790f0" }, { "op": "replace", "path": "/layers/llm/model", "value": "tavus-gpt-oss" }, { "op": "replace", "path": "/layers/tts/tts_engine", "value": "cartesia" }, { "op": "add", "path": "/layers/tts/tts_emotion_control", "value": "true" }, { "op": "remove", "path": "/layers/stt/hotwords" }, { "op": "replace", "path": "/layers/perception/visual_tool_prompt", "value": "Use tools when identity documents are clearly shown." } ] ``` * Ensure the `path` match the current persona schema. * For the `remove` operation, the `value` parameter is not required. # Create Replica Source: https://docs.tavus.io/api-reference/phoenix-replica-model/create-replica post /v2/replicas Create a new replica using the latest phoenix-4 model. This endpoint creates a new replica using the latest `phoenix-4` model, which can be used in real-time conversations. To ensure high-quality replica creation, follow the steps in the [Replica Training](/sections/replica/replica-training) guide. By default, all new replicas are trained using the `phoenix-4` model.\ To use the older `phoenix-3` model, set the `model_name` parameter to `phoenix-3`. Required parameters vary based on the replica type: **Personal Replica:** * `train_video_url` * `consent_video_url` **Non-Human Replica:** * `train_video_url` Make sure the `train_video_url` and `consent_video_url` are publicly accessible download links, such as presigned S3 URLs. # Delete Replica Source: https://docs.tavus.io/api-reference/phoenix-replica-model/delete-replica delete /v2/replicas/{replica_id} This endpoint deletes a Replica by its unique ID. Deleted Replicas cannot be used in a conversation. # Get Replica Source: https://docs.tavus.io/api-reference/phoenix-replica-model/get-replica get /v2/replicas/{replica_id} This endpoint returns a single Replica by its unique identifier. Included in the response body is a `training_progress` string that represents the progress of the Replica training. If there are any errors during training, the `status` will be `error` and the `error_message` will be populated. # List Replicas Source: https://docs.tavus.io/api-reference/phoenix-replica-model/get-replicas get /v2/replicas This endpoint returns a list of all Replicas created by the account associated with the API Key in use. In the response, a root level `data` key will contain the list of Replicas. # Rename Replica Source: https://docs.tavus.io/api-reference/phoenix-replica-model/patch-replica-name patch /v2/replicas/{replica_id}/name This endpoint renames a single Replica by its unique identifier. # Generate Video Source: https://docs.tavus.io/api-reference/video-request/create-video post /v2/videos This endpoint generates a new video using a Replica and either a script or an audio file. The only required body parameters are `replica_id` and either `script` or `audio_file`. The `replica_id` is a unique identifier for the Replica that will be used to generate the video. The `script` is the text that will be spoken by the Replica in the video. If you would like to generate a video using an audio file instead of a script, you can provide `audio_url` instead of `script`. Currently, `.wav` and `.mp3` files are supported for audio file input. If a `background_url` is provided, Tavus will record a video of the website and use it as the background for the video. If a `background_source_url` is provided, where the URL points to a download link such as a presigned S3 URL, Tavus will use the video as the background for the video. If neither are provided, the video will consist of a full screen Replica. To learn more about generating videos with Replicas, see [here](/sections/video/quickstart). To learn more about writing an effective script for your video, see [Scripting prompting](/sections/troubleshooting#script-length). # Delete Video Source: https://docs.tavus.io/api-reference/video-request/delete-video delete /v2/videos/{video_id} This endpoint deletes a single video by its unique identifier. # Get Video Source: https://docs.tavus.io/api-reference/video-request/get-video get /v2/videos/{video_id} This endpoint returns a single video by its unique identifier. The response body will contain a `status` string that represents the status of the video. If the video is ready, the response body will also contain a `download_url`, `stream_url`, and `hosted_url` that can be used to download, stream, and view the video respectively. # List Videos Source: https://docs.tavus.io/api-reference/video-request/get-videos get /v2/videos This endpoint returns a list of all Videos created by the account associated with the API Key in use. # Rename Video Source: https://docs.tavus.io/api-reference/video-request/patch-video-name patch /v2/videos/{video_id}/name This endpoint renames a single video by its unique identifier. # Changelog Source: https://docs.tavus.io/sections/changelog/changelog ## Enhancements * **30% Faster Phoenix-4 Boot Time:** Phoenix-4 conversations now boot 30% faster, significantly reducing the time from conversation creation to readiness. ## Changes * **`conversation.replica_interrupted` Event Removed:** The `conversation.replica_interrupted` application message has been removed from the Interactions Protocol. This event was deprecated in a previous backend update. Use `conversation.replica.stopped_speaking` with the `interrupted: true` property to detect interruptions instead. * **`duration` and `interrupted` Fields on Replica Stopped Speaking:** The `conversation.replica.stopped_speaking` event now includes a `duration` field (how long the replica spoke in seconds) and an `interrupted` field (`true`/`false`) indicating whether the replica was interrupted by the user. Learn more ## New Features * **Expanded ASR Model Selection:** You can now choose from five specialized speech-to-text engines via the `stt_engine` parameter. New models include `tavus-parakeet`, `tavus-soniox`, `tavus-whisper`, and `tavus-deepgram-medical`. Use `tavus-auto` to automatically route to the best model for each conversation. Learn more * **Event Ordering and Turn Tracking:** All server-broadcasted interaction events now include `seq` and `turn_idx` fields. `seq` is a globally monotonic sequence number for ordering events that may arrive out of order, and `turn_idx` groups related events from the same conversation turn. Learn more ## Enhancements * **30% Faster Phoenix-4 Boot Time:** Phoenix-4 conversations now boot 30% faster, significantly reducing the time from conversation creation to readiness. ## Enhancements * **EU ElevenLabs BYOK Support:** Customers can now bring their own ElevenLabs API key from EU-region accounts. ## Enhancements * **Improved Knowledge Base Retrieval:** Optimized underlying infrastructure to improve utterance to utterance response times, particularly when `rag_search_quality` is set to `quality`. ## New Features * **Expanded Tavus-Hosted LLM Selection:** Added new Tavus-hosted LLM options including models from Gemini, Claude, and GPT families. `tavus-gpt-oss` is recommended as the default. Legacy models `tavus-gpt-4.1`, `tavus-gpt-4o`, and `tavus-gpt-4o-mini` are now deprecated. Learn more → * **Visual RAG:** CVI now supports visual retrieval-augmented generation. Upload custom image explanations that are matched and queried via vision embeddings, giving your persona richer visual context during conversations. ## Changes * **Persona**`context`**Field Deprecated:** The `context` field has been deprecated in favor of a unified `system_prompt` field. Existing `context` values have been automatically merged into system prompts. The API remains backward compatible, but we recommend using **only** `system_prompt` going forward. ## New Features * **Raven-1 Perception Model:** Introduced Raven-1, a multimodal perception model with audio emotion analysis and enhanced visual awareness. Raven-1 captures user emotion from audio in real time (sub-100ms audio perception latency), enabling personas to respond with greater emotional intelligence. The model is now the default for all new personas. Enable it by setting `perception_model_name` in your persona configuration. Learn more → * **Private Rooms:** Require authentication to join conversations for enhanced security. When enabled, we return a JWT meeting token that users must include when entering the room. Learn more ## Enhancements * **Upgraded Transcription Engine:** Upgraded transcription engine with 3x improvements in word error rates (WER). ## New Features * **Website Crawling for Knowledge Base:** You can now enable link crawling when creating knowledge base documents. Configure crawl `depth` and `max_pages` to automatically discover and ingest content from linked pages. Additionally, existing crawled documents can now be recrawled to keep knowledge base content up to date. ## Changes * **PlayHT TTS Removed:** PlayHT has been fully removed as a supported TTS engine. All personas previously using PlayHT should migrate to Cartesia or ElevenLabs. ## New Features * **Hard Delete for Conversations:** Conversations can now be permanently deleted via the API using the `hard=true` query parameter. Use this for GDPR compliance or data cleanup workflows. ## Enhancements * **Default TTS Model Updated to Sonic-3:** The default text-to-speech model has been updated to Sonic-3 across all new personas, delivering improved voice quality and naturalness. * **LiveKit Connection Stability:** Extensive reliability improvements to the LiveKit-based transport layer, including fixes for connection timeouts, track publishing hangs, event loop starvation, and ping timeout issues. ## Changes * **Default LLM Migrated to `tavus-gpt-oss`:** The default LLM for all new personas is now `tavus-gpt-oss`. All remaining `tavus-llama-4` personas have been automatically migrated. Legacy Tavus-Llama model references have been removed. ## New Features * **LLM Temperature & Top-P Parameters:** You can now configure `temperature` and `top_p` parameters for both Tavus-hosted LLMs and custom LLMs via the `extra_body` field in your persona's LLM configuration. Learn more → ## Enhancements * **Text Echo Language Accuracy:** Text echoes now correctly use the input language for conversion, improving accuracy in multilingual conversations. ## New Features * **Test Mode for Conversations:** You can now start conversations in test mode, where the replica does not join. Validate your setup, integrations, and conversational flows without incurring costs or using concurrency slots. Set `test_mode: true` when creating a conversation. Learn more → ## Enhancements * **Fuzzy Search for Personas:** Search now supports fuzzy matching for personas, allowing users to find results based on partial matches of UUIDs or names. ## New Features * **Memories:** CVI now remembers context across conversations. Every conversation builds on the last with full context and time/date awareness, enabling use cases like adaptive tutoring, mentorship, and recurring consultations. Learn more → * **Knowledge Base (RAG):** Bring your own data to conversations instantly. Upload documents or links and get grounded answers with \~30ms retrieval latency. Power AI recruiters, support agents, travel guides, and more with domain-specific knowledge. Learn more → * **Objectives & Guardrails:** Define clear goals, branching logic, and measurable outcomes for your personas while keeping conversations safe, compliant, and on-brand. Ideal for complex workflows and regulated industries. Learn more → * **Persona Builder:** A guided creation flow in the Developer Portal to shape AI personas with goals, behaviors, and style — then test or launch within minutes. ## New Features * **Events Console:** A new events console in the Developer Portal lets you monitor everything happening during a conversation in real time — from message flows to system activity. * **Conversation Transcripts & Perception Analysis:** View full conversation details directly in the Developer Portal, including transcripts with speaker roles and perception analysis showing how your AI persona sees, hears, and responds. ## New Features * **Persona Layer Controls:** Enable or disable layers like Sparrow directly within a Persona and adjust sensitivity settings in real time from the Developer Portal side panel. * **Persona Editing in Developer Portal:** We've added new editing capabilities to help you refine your Personas more efficiently. You can now update system prompt, context, and layers directly in our Developer Portal, plus duplicate existing Personas to quickly create variations or use them as starting points for new projects. Find these new features in your Persona Library at platform.tavus.io. ## Enhancements * **Interactions Protocol Playground Improvements:** Major updates to the Interactions Protocol Playground including correct `properties.context` format and append vs overwrite toggle. ## New Features * **Multilingual Settings in Developer Portal:** You can now specify the language of a conversation directly in the Developer Portal, including a new multilingual option for dynamic, real-world interactions. ## New Features * **Llama 4 Support:** Your persona just got even smarter, thanks to Meta's Llama 4 model 🧠 You can start using Llama 4 by specifying `tavus-llama-4` for the LLM `model` value when creating a new persona or updating an existing one. Click here to learn more! ## New Features * **React Component Library:** Developers can build with Tavus even faster now with our pre-defined components 🚀 Click here to learn more! ## New Features * **Multilingual Conversation Support:** CVI now supports dynamic multilingual conversations through automatic language detection. Set the language parameter to "multilingual" and CVI will automatically detect the user's spoken language and respond in the same language using ASR technology. * **Audio-Only Mode:** CVI now supports audio-only conversations with advanced perception (powered by Raven) and intelligent turn-taking (powered by Sparrow-1). Set `audio_only=true` in your create conversation request to enable streamlined voice-first interactions. ## Enhancements * **Fixed CVI responsiveness issue:** Resolved an issue where CVI would occasionally ignore very brief user utterances. All user inputs, regardless of length, now receive consistent responses. * **Expanded tavus-llama-4 context window:** Increased maximum context window to 32,000 tokens. For optimal performance and response times, we recommend staying under 25,000 tokens. ## Enhancements * Reduced conversation boot time by 58% (p50). ## Changes * Added a new recording requirement to Replica Training : Start the talking segment with a big smile. ## Enhancements * Added echo and respond events to conversational context. ## Enhancements * **Major Phoenix 3 Enhancements for CVI**: * Increased frame rate from 27fps to 32fps, significantly boosting smoothness. * Reduced Phoenix step's warm boot time by 60% (from 5s to 2s). * Lipsync accuracy improved by \~22% based on AVSR metric. * Resolved blurriness and choppiness at conversation start. * Enhanced listening mode with more natural micro expressions (eyebrow movements, subtle gestures). * Greenscreen mode speed boosted by an additional \~1.5fps. * **Enhanced CVI Audio Quality**: Audio clicks significantly attenuated, providing clearer conversational audio. * **Phoenix 3 Visual Artifacts Fix**: Resolved visual artifacts in 4K videos on Apple devices, eliminating black spot artifacts in thumbnails. ## New Features * Launched LiveKit Integration : With Tavus video agents now integrated into LiveKit, you can add humanlike video responses to your voice agents in seconds. * Persona API : Enabled patch updates to personas. ## Enhancements * Resolved TTS (Cartesia) stability issues and addressed hallucination. * **Phoenix 3 Improvements**: * Fixed blinking/jumping issues and black spots in videos. * FPS optimization to resolve static and audio crackling. ## Enhancements * **Replica API**: * Enhanced Error Messaging for Training Videos. * Optimized Auto QA for Training Videos. # Blocks Source: https://docs.tavus.io/sections/conversational-video-interface/component-library/blocks High-level component compositions that combine multiple UI elements into complete interface layouts ### Conversation block The Conversation component provides a complete video chat interface for one-to-one conversations with AI replicas ```bash theme={null} npx @tavus/cvi-ui@latest add conversation-01 ``` The `Conversation` component provides a complete video chat interface for one-to-one conversations with AI replicas, featuring main video display, self-view preview, and integrated controls. **Features:** * **Main Video Display**: Large video area showing the AI replica or screen share * **Self-View Preview**: Small preview window showing local camera feed * **Screen Sharing Support**: Automatic switching between replica video and screen share * **Device Controls**: Integrated microphone, camera, and screen share controls * **Error Handling**: Graceful handling of camera/microphone permission errors * **Responsive Layout**: Adaptive design for different screen sizes **Props:** * `conversationUrl` (string): Daily.co room URL for joining * `onLeave` (function): Callback when user leaves the conversation ```tsx theme={null} import { Conversation } from './components/cvi/components/conversation'; ``` ```tsx theme={null} handleLeaveCall()} /> ``` Preview Conversation Block Preview ### Hair Check The HairCheck component provides a pre-call interface for users to test and configure their audio/video devices before joining a video chat. ```bash theme={null} npx @tavus/cvi-ui@latest add hair-check-01 ``` The `HairCheck` component provides a pre-call interface for users to test and configure their audio/video devices before joining a video chat. **Features:** * **Device Testing**: Live preview of camera feed with mirror effect * **Permission Management**: Handles camera and microphone permission requests * **Device Controls**: Integrated microphone and camera controls * **Join Interface**: Call-to-action button to join the video chat * **Responsive Design**: Works on both desktop and mobile devices **Props:** * `isJoinBtnLoading` (boolean): Shows loading state on join button * `onJoin` (function): Callback when user clicks join * `onCancel` (function, optional): Callback when user cancels ```tsx theme={null} import { HairCheck } from './components/cvi/components/hair-check'; ``` ```tsx theme={null} ``` Preview Haircheck Block Preview # Components Source: https://docs.tavus.io/sections/conversational-video-interface/component-library/components Learn about our pre-built React components to accelerate integrating the Tavus Conversational Video Interface (CVI) into your application. # Components ### CVI Provider The `CVIProvider` component wraps your app with the Daily.co provider context, enabling all Daily React hooks and components to function. ```bash theme={null} npx @tavus/cvi-ui@latest add cvi-provider ``` The `CVIProvider` component wraps your app with the Daily.co provider context, enabling all Daily React hooks and components to function. **Features:** * Provides Daily.co context to all child components * Required for using Daily React hooks and video/audio components * Simple wrapper for app-level integration **Props:** * `children` (ReactNode): Components to be wrapped by the provider ```tsx theme={null} import { CVIProvider } from './cvi-provider'; ``` ```tsx theme={null} {/* your app components */} ``` ### AudioWave The `AudioWave` component provides real-time audio level visualization for video chat participants, displaying animated bars that respond to audio input levels. ```bash theme={null} npx @tavus/cvi-ui@latest add audio-wave ``` The `AudioWave` component provides real-time audio level visualization for video chat participants, displaying animated bars that respond to audio input levels. **Features:** * **Real-time Audio Visualization**: Three animated bars that respond to audio levels * **Active Speaker Detection**: Visual distinction between active and inactive speakers * **Performance Optimized**: Uses `requestAnimationFrame` for smooth animations * **Responsive Design**: Compact circular design that fits well in video previews * **Audio Level Scaling**: Intelligent volume scaling for consistent visual feedback **Props:** * `id` (string): The participant's session ID to monitor audio levels for ```tsx theme={null} import { AudioWave } from './audio-wave'; ``` ```tsx theme={null} ``` ### Device Select The `device-select` module provides advanced device selection controls, including dropdowns for choosing microphones and cameras, and integrated toggle buttons. ```bash theme={null} npx @tavus/cvi-ui@latest add device-select ``` The `device-select` module provides advanced device selection controls, including dropdowns for choosing microphones and cameras, and integrated toggle buttons. **Exported Components:** * **`MicSelectBtn`**: Microphone toggle button with device selection * **`CameraSelectBtn`**: Camera toggle button with device selection * **`ScreenShareButton`**: Button to toggle screen sharing **Features:** * Integrated device selection and toggling * Dropdowns for camera/microphone selection * Visual state indicators and accessibility support * Uses Daily.co device management hooks * CSS modules for styling ```tsx theme={null} import { MicSelectBtn, CameraSelectBtn, ScreenShareButton } from './device-select'; ``` ```tsx theme={null} ``` ### Media Controls The `media-controls` module provides simple toggle buttons for microphone, camera, and screen sharing, designed for direct use in video chat interfaces. ```bash theme={null} npx @tavus/cvi-ui@latest add media-controls ``` The `media-controls` module provides simple toggle buttons for microphone, camera, and screen sharing, designed for direct use in video chat interfaces. **Exported Components:** * **`MicToggleButton`**: Toggles microphone mute/unmute state * **`CameraToggleButton`**: Toggles camera on/off * **`ScreenShareButton`**: Toggles screen sharing on/off **Features:** * Simple, accessible toggle buttons * Visual state indicators (muted, unmuted, on/off) * Disabled state when device is not ready * Uses Daily.co hooks for device state * CSS modules for styling ```tsx theme={null} import { MicToggleButton, CameraToggleButton, ScreenShareButton } from './media-controls'; ``` ```tsx theme={null} ``` # Hooks Source: https://docs.tavus.io/sections/conversational-video-interface/component-library/hooks See what hooks Tavus supports for managing video calls, media controls, participant management, and conversation events. ## 🔧 Core Call Management ### useCVICall Essential hook for joining and leaving video calls. ```bash theme={null} npx @tavus/cvi-ui@latest add use-cvi-call ``` A React hook that provides comprehensive call management functionality for video conversations. This hook handles the core lifecycle of video calls, including connection establishment, room joining, and proper cleanup when leaving calls. **Purpose:** * Manages call join/leave operations with proper state management * Handles connection lifecycle and cleanup * Provides simple interface for call control **Return Values:** * `joinCall` (function): Function to join a call by URL - handles Daily.co room connection * `leaveCall` (function): Function to leave the current call - properly disconnects and cleans up resources ```tsx theme={null} import { useCVICall } from './hooks/use-cvi-call'; ``` ```tsx theme={null} const CallManager = () => { const { joinCall, leaveCall } = useCVICall(); const handleJoin = () => { joinCall({ url: 'https://your-daily-room-url' }); }; return (
); }; ```
### useStartHaircheck A React hook that manages device permissions and camera initialization for the hair-check component. ```bash theme={null} npx @tavus/cvi-ui@latest add use-start-haircheck ``` A React hook that manages device permissions and camera initialization for the hair-check component. **Purpose:** * Monitors device permission states * Starts camera and microphone when appropriate * Provides permission state for UI conditional rendering * Handles permission request flow **Return Values:** * `isPermissionsPrompt` (boolean): Browser is prompting for device permission * `isPermissionsLoading` (boolean): Permissions are being processed or camera is initializing * `isPermissionsGranted` (boolean): Device permission granted * `isPermissionsDenied` (boolean): Device permission denied * `requestPermissions` (function): Function to request camera and microphone permissions ```tsx theme={null} import { useStartHaircheck } from './hooks/use-start-haircheck'; ``` ```tsx theme={null} const HairCheckComponent = () => { const { isPermissionsPrompt, isPermissionsLoading, isPermissionsGranted, isPermissionsDenied, requestPermissions } = useStartHaircheck(); useEffect(() => { requestPermissions(); }, []); return (
{isPermissionsLoading && } {isPermissionsPrompt && } {isPermissionsDenied && } {isPermissionsGranted && }
); }; ```
*** ## 🎥 Media Controls ### useLocalCamera A React hook that provides local camera state and toggle functionality. ```bash theme={null} npx @tavus/cvi-ui@latest add use-local-camera ``` A React hook that provides local camera state and toggle functionality. **Purpose:** * Manages local camera state (on/off) * Tracks camera permission and ready state **Return Values:** * `onToggleCamera` (function): Function to toggle camera on/off * `isCamReady` (boolean): Camera permission is granted and ready * `isCamMuted` (boolean): Camera is currently turned off * `localSessionId` (string): Local session ID ```tsx theme={null} import { useLocalCamera } from './hooks/use-local-camera'; ``` ```tsx theme={null} const CameraControls = () => { const { onToggleCamera, isCamReady, isCamMuted } = useLocalCamera(); return ( ); }; ``` ### useLocalMicrophone A React hook that provides local microphone state and toggle functionality. ```bash theme={null} npx @tavus/cvi-ui@latest add use-local-microphone ``` A React hook that provides local microphone state and toggle functionality. **Purpose:** * Manages local microphone state (on/off) * Tracks microphone permission and ready state **Return Values:** * `onToggleMicrophone` (function): Function to toggle microphone on/off * `isMicReady` (boolean): Microphone permission is granted and ready * `isMicMuted` (boolean): Microphone is currently turned off * `localSessionId` (string): Local session ID ```tsx theme={null} import { useLocalMicrophone } from './hooks/use-local-microphone'; ``` ```tsx theme={null} const MicrophoneControls = () => { const { onToggleMicrophone, isMicReady, isMicMuted } = useLocalMicrophone(); return ( ); }; ``` ### useLocalScreenshare A React hook that provides local screen sharing state and toggle functionality. ```bash theme={null} npx @tavus/cvi-ui@latest add use-local-screenshare ``` A React hook that provides local screen sharing state and toggle functionality. **Purpose:** * Manages screen sharing state (on/off) * Provides screen sharing toggle function * Handles screen share start/stop with optimized display media options **Return Values:** * `onToggleScreenshare` (function): Function to toggle screen sharing on/off * `isScreenSharing` (boolean): Whether screen sharing is currently active * `localSessionId` (string): Local session ID **Display Media Options:** When starting screen share, the hook uses the following optimized settings: * **Audio**: Disabled (false) * **Self Browser Surface**: Excluded * **Surface Switching**: Included * **Video Resolution**: 1920x1080 ```tsx theme={null} import { useLocalScreenshare } from './hooks/use-local-screenshare'; ``` ```tsx theme={null} const ScreenShareControls = () => { const { onToggleScreenshare, isScreenSharing } = useLocalScreenshare(); return ( ); }; ``` ### useRequestPermissions A React hook that requests camera and microphone permissions with optimized audio processing settings. ```bash theme={null} npx @tavus/cvi-ui@latest add use-request-permissions ``` A React hook that requests camera and microphone permissions with optimized audio processing settings. **Purpose:** * Requests camera and microphone permissions from the user * Starts camera and audio with specific configuration * Applies noise cancellation audio processing * Provides a clean interface for permission requests **Return Values:** * `requestPermissions` (function): Function to request camera and microphone permissions **Configuration:** When requesting permissions, the hook uses the following settings: * **Video**: Started on (startVideoOff: false) * **Audio**: Started on (startAudioOff: false) * **Audio Source**: Default system audio input * **Audio Processing**: Noise cancellation enabled ```tsx theme={null} import { useRequestPermissions } from './hooks/use-request-permissions'; ``` ```tsx theme={null} const PermissionRequest = () => { const requestPermissions = useRequestPermissions(); const handleRequestPermissions = async () => { try { await requestPermissions(); console.log('Permissions granted successfully'); } catch (error) { console.error('Failed to get permissions:', error); } }; return ( ); }; ``` *** ## 👥 Participant Management ### useReplicaIDs A React hook that returns the IDs of all Tavus replica participants in a call. ```bash theme={null} npx @tavus/cvi-ui@latest add use-replica-ids ``` A React hook that returns the IDs of all Tavus replica participants in a call. **Purpose:** * Filters and returns participant IDs where `user_id` includes 'tavus-replica' **Return Value:** * `string[]` — Array of replica participant IDs ```tsx theme={null} import { useReplicaIDs } from './hooks/use-replica-ids'; ``` ```tsx theme={null} const ids = useReplicaIDs(); // ids is an array of participant IDs for Tavus replicas ``` ### useRemoteParticipantIDs A React hook that returns the IDs of all remote participants in a call. ```bash theme={null} npx @tavus/cvi-ui@latest add use-remote-participant-ids ``` A React hook that returns the IDs of all remote participants in a call. **Purpose:** * Returns participant IDs for all remote participants (excluding local user) **Return Value:** * `string[]` — Array of remote participant IDs ```tsx theme={null} import { useRemoteParticipantIDs } from './hooks/use-remote-participant-ids'; ``` ```tsx theme={null} const remoteIds = useRemoteParticipantIDs(); // remoteIds is an array of remote participant IDs ``` *** ## 💬 Conversation & Events ### useObservableEvent A React hook that listens for CVI app messages and provides a callback mechanism for handling various conversation events. ```bash theme={null} npx @tavus/cvi-ui@latest add cvi-events-hooks ``` A React hook that listens for CVI app messages and provides a callback mechanism for handling various conversation events. **Purpose:** * Listens for app messages from the Daily.co call mapped to CVI events * Handles various conversation event types (utterances, tool calls, speaking events, etc.) * Provides type-safe event handling for CVI interactions **Parameters:** * `callback` (function): Function called when app messages are received **Event Types:** This hook handles all CVI conversation events. For detailed information about each event type, see the [Tavus Interactions Protocol Documentation](/sections/conversational-video-interface/live-interactions). ```tsx theme={null} import { useObservableEvent } from './hooks/cvi-events-hooks'; ``` ```tsx theme={null} const ConversationHandler = () => { useObservableEvent((event) => { switch (event.event_type) { case 'conversation.utterance': console.log('Speech:', event.properties.speech); break; case 'conversation.replica.started_speaking': console.log('Replica started speaking'); break; case 'conversation.user.stopped_speaking': console.log('User stopped speaking'); break; } }); return
Listening for conversation events...
; }; ```
### useSendAppMessage A React hook that provides a function to send CVI app messages to other participants in the call. ```bash theme={null} npx @tavus/cvi-ui@latest add cvi-events-hooks ``` A React hook that provides a function to send CVI app messages to other participants in the call. **Purpose:** * Sends various types of conversation messages to the CVI system * Supports echo, respond, interrupt, and context management messages * Provides type-safe message sending with proper validation * Enables real-time communication with Tavus replicas and conversation management **Return Value:** * `(message: SendAppMessageProps) => void` - Function that sends the message when called **Message Types:** This hook supports all CVI interaction types. For detailed information about each interaction type and their properties, see the [Tavus Interactions Protocol Documentation](/sections/conversational-video-interface/live-interactions). ```tsx theme={null} import { useSendAppMessage } from './hooks/cvi-events-hooks'; ``` ```tsx theme={null} const MessageSender = () => { const sendMessage = useSendAppMessage(); // Send a text echo const sendTextEcho = () => { sendMessage({ message_type: "conversation", event_type: "conversation.echo", conversation_id: "conv-123", properties: { modality: "text", text: "Hello, world!", audio: "", sample_rate: 16000, inference_id: "inf-456", done: true } }); }; // Send a text response const sendResponse = () => { sendMessage({ message_type: "conversation", event_type: "conversation.respond", conversation_id: "conv-123", properties: { text: "This is my response to the conversation." } }); }; return (
); }; ```
# Overview Source: https://docs.tavus.io/sections/conversational-video-interface/component-library/overview Learn how our Tavus Conversational Video Interface (CVI) Component Library can help you go live in minutes. ## Overview The Tavus Conversational Video Interface (CVI) React component library provides a complete set of pre-built components and hooks for integrating AI-powered video conversations into your React applications. This library simplifies setting up Tavus in your codebase, allowing you to focus on your application's core features. Key features include: * **Pre-built video chat components** * **Device management** (camera, microphone, screen sharing) * **Real-time audio/video processing** * **Customizable styling** and theming * **TypeScript support** with full type definitions *** ## Quick Start ### Prerequisites Before getting started, ensure you have a React project set up. Alternatively, you can start from our example project: [CVI UI Haircheck Conversation Example](https://github.com/Tavus-Engineering/tavus-examples/tree/main/examples/cvi-ui-haircheck-conversation) - this example already has the HairCheck and Conversation blocks set up. ### 1. Initialize CVI in Your Project ```bash theme={null} npx @tavus/cvi-ui@latest init ``` * Creates a `cvi-components.json` config file * Prompts for TypeScript preference * Installs npm dependencies (@daily-co/daily-react, @daily-co/daily-js, jotai) ### 2. Add CVI Components ```bash theme={null} npx @tavus/cvi-ui@latest add conversation ``` ### 3. Wrap Your App with the CVI Provider In your root directory (main.tsx or index.tsx): ```tsx theme={null} import { CVIProvider } from './components/cvi/components/cvi-provider'; function App() { return {/* Your app content */}; } ``` ### 4. Add a Conversation Component Learn how to create a conversation URL at [https://docs.tavus.io/api-reference/conversations/create-conversation](https://docs.tavus.io/api-reference/conversations/create-conversation) **Note:** The Conversation component requires a parent container with defined dimensions to display properly. Ensure your body element has full dimensions (`width: 100%` and `height: 100%`) in your CSS for proper component display. ```tsx theme={null} import { Conversation } from './components/cvi/components/conversation'; function CVI() { const handleLeave = () => { // handle leave }; return (
); } ``` *** ## Documentation Sections * **[Blocks](/sections/conversational-video-interface/component-library/blocks)** – High-level component compositions and layouts * **[Components](/sections/conversational-video-interface/component-library/components)** – Individual UI components * **[Hooks](/sections/conversational-video-interface/component-library/hooks)** – Custom React hooks for managing video call state and interactions # Audio-Only Conversation Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/customizations/audio-only Start a conversation in audio-only mode, perfect for voice-only or low-bandwidth environments. ## Create an Audio Only Conversation All features in the persona's pipeline, including STT, Perception, and TTS, remain fully active in audio-only mode. The only change is that replica video rendering is not included. In this example, we will use stock persona ID ***pcb7a34da5fe*** (Sales Development Rep). To enable audio-only mode, set the `audio_only` parameter to `true` when creating the conversation: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "audio_only": true }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. To join the conversation, click the link in the ***conversation\_url*** field from the response: ```json theme={null} { "conversation_id": "cd7e3eac05ede40c", "conversation_name": "New Conversation 1751268887110", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-06-30T07:34:47.131571Z" } ``` # Background Customizations Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/customizations/background-customizations Apply a green screen or custom background for a personalized visual experience. ## Customize Background in Conversation Setup In this example, we will use stock replica ID ***rf4e9d9790f0*** (Anna) and stock persona ID ***pcb7a34da5fe*** (Sales Development Rep). To apply the green screen background, set the `apply_greenscreen` parameter to `true` when creating the conversation: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "callback_url": "https://yourwebsite.com/webhook", "conversation_name": "Improve Sales Technique", "conversational_context": "I want to improve my sales techniques. Help me practice handling common objections from clients and closing deals more effectively.", "properties": { "apply_greenscreen": true } }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. The above request will return the following response: ```json theme={null} { "conversation_id": "ca4301628cb9", "conversation_name": "Improve Sales Technique", "conversation_url": "", "status": "active", "callback_url": "https://yourwebsite.com/webhook", "created_at": "2025-05-13T06:42:58.291561Z" } ``` The replica will appear with a green background. You can customize it using a WebGL-based on the front-end. This allows you to apply a different color or add a custom image. To preview this feature, try our Green Screen Sample App. Paste the conversation URL to modify the background. # Call Duration and Timeout Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/customizations/call-duration-and-timeout Configure call duration and timeout behavior to manage how and when a conversation ends. ## Create a Conversation with Custom Duration and Timeout In this example, we will use stock replica ID ***rf4e9d9790f0*** (Anna) and stock persona ID ***pcb7a34da5fe*** (Sales Development Rep). Use the following request body example: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "callback_url": "https://yourwebsite.com/webhook", "conversation_name": "Improve Sales Technique", "conversational_context": "I want to improve my sales techniques. Help me practice handling common objections from clients and closing deals more effectively.", "properties": { "max_call_duration": 1800, "participant_left_timeout": 60, "participant_absent_timeout": 120 } }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. The request example above includes the following customizations: | Parameter | Description | | :--------------------------- | :---------------------------------------------------------------------------------------------- | | `max_call_durations` | Sets the maximum call length in seconds. Maximum: 3600 seconds. | | `participant_left_timeout` | Time (in seconds) to wait before ending the call after the last participant leaves. Default: 0. | | `participant_absent_timeout` | Time (in seconds) to end the call if no one joins after it's created. Default: 300. | To join the conversation, click the link in the ***conversation\_url*** field from the response: ```json theme={null} { "conversation_id": "ca4301628cb9", "conversation_name": "Improve Sales Technique", "conversation_url": "", "status": "active", "callback_url": "https://yourwebsite.com/webhook", "created_at": "2025-05-13T06:42:58.291561Z" } ``` Based on the call duration and timeout settings above: * The conversation will automatically end after 1800 seconds (30 minutes), regardless of activity. * If the participant leaves the conversation, it will end 60 seconds after they disconnect. * If the participant is present but inactive (e.g., not speaking or engaging), the conversation ends after 120 seconds of inactivity. # Closed Captions Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/customizations/closed-captions Enable closed captions for accessibility or live transcription during conversations. ## Enable Captions in Real Time During the Conversation In this example, we will use stock replica ID ***rf4e9d9790f0*** (Anna) and stock persona ID ***pcb7a34da5fe*** (Sales Development Rep). To enable closed captions, set the `enable_closed_captions` parameter to `true` when creating the conversation: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "callback_url": "https://yourwebsite.com/webhook", "conversation_name": "Improve Sales Technique", "conversational_context": "I want to improve my sales techniques. Help me practice handling common objections from clients and closing deals more effectively.", "properties": { "enable_closed_captions": true } }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. To join the conversation, click the link in the ***conversation\_url*** field from the response: ```json theme={null} { "conversation_id": "ca4301628cb9", "conversation_name": "Improve Sales Technique", "conversation_url": "", "status": "active", "callback_url": "https://yourwebsite.com/webhook", "created_at": "2025-05-13T06:42:58.291561Z" } ``` Closed captions will appear during the conversation whenever you or the replica speaks. # Participant Limits Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/customizations/participant-limits Control the maximum number of participants allowed in a conversation. ## Create a Conversation with Participant Limits Replicas count as participants. For example, `max_participants: 2` allows one human participant plus one replica. Set `max_participants` to limit room capacity: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "max_participants": 2 }' ``` ```json theme={null} { "conversation_id": "ca4301628cb9", "conversation_url": "https://tavus.daily.co/ca4301628cb9", "status": "active" } ``` When the limit is reached, additional users cannot join. # Private Rooms Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/customizations/private-rooms Create authenticated conversations with meeting tokens for enhanced security. ## Create a Private Conversation To create a private room, set `require_auth` to `true`: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "require_auth": true }' ``` The response includes a `meeting_token`: ```json theme={null} { "conversation_id": "ca4301628cb9", "conversation_url": "https://tavus.daily.co/ca4301628cb9", "meeting_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...", "status": "active" } ``` Use the token by appending it to the URL: ``` https://tavus.daily.co/ca4301628cb9?t=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... ``` Or pass it to the Daily SDK: ```javascript theme={null} callFrame.join({ url: conversation_url, token: meeting_token }); ``` **Optional: Tighten your join window** You can set `properties.participant_absent_timeout` when creating the conversation to control how long the conversation stays alive before a participant joins. For conversations created with `require_auth: true`, the meeting token's expiry duration is set to the value of `participant_absent_timeout`. If no one joins within that window, the conversation is automatically ended and the token expires. See [Call Duration and Timeout](/sections/conversational-video-interface/conversation/customizations/call-duration-and-timeout) for more details. # Overview Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/overview Learn how to customize identity and advanced settings for a conversation to suit your needs. A Conversation is a real-time video session between a user and a Tavus Replica. It enables two-way, face-to-face interaction using a fully managed WebRTC connection. ## Conversation Creation Flow When you create a conversation using the endpoint or platform: 1. A WebRTC room (powered by **Daily**) is automatically created. 2. You receive a meeting URL (e.g., `https://tavus.daily.co/ca980e2e`). 3. The **replica** joins and waits in the room, timers for duration and timeouts begin. **Billing Usage** Tavus charges usage based on your account plan. Credits begin counting when a conversation is created and the replica starts waiting in the room. Usage ends when the conversation finishes or times out. Each active session also uses one concurrency slot. You can use the provided URL to enter the video room immediately. Alternatively, you can build a custom UI or stream handler instead of using the default interface. ### What is Daily? Tavus integrates **Daily** as its WebRTC provider. You don't need to sign up for or manage a separate Daily account—Tavus handles the setup and configuration for you. This lets you: * Use the default video interface or [customize the Daily UI](/sections/conversational-video-interface/quickstart/customize-conversation-ui) * [Embed the CVI in your app](/sections/integrations/embedding-cvi) ## Conversation Customizations Tavus provides several customizations that you can set per conversation: ### Identity and Context Setup * **Persona**: You can use a stock persona provided by Tavus or create a custom one. If no replica is specified, the default replica linked to the persona will be used (if available). * **Replica**: Use a stock replica provided by Tavus or create a custom one. If a replica is provided without a persona, the default Tavus persona will be used. * **Conversation Context**: Customize the conversation context to set the scene, explain the user’s role, say who joins the call, or point out key topics. It builds on the base persona and helps the AI give better, more focused answers. * **Custom Greeting**: You can personalize the opening line that the AI should use when the conversation starts. ### Advanced Customizations Disable the video stream for audio-only sessions. Ideal for phone calls or low-bandwidth environments. Configure call duration and timeouts to manage usage, control costs, and limit concurrency. Set the language used during the conversation. Supports multilingual interactions with real-time detection. Apply a green screen or custom background for a personalized visual experience. Enable subtitles for accessibility or live transcription during conversations. Record conversations and store them securely in your own S3 bucket. Create authenticated conversations with meeting tokens for enhanced security. Control the maximum number of participants allowed in a conversation. # Customer Support Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/usecases/customer-support Engage in real-time customer support and sales conversations with the Customer Support persona (Gloria). ## Customer Support Configuration (`paaee96e4f87`) ```json [expandable] theme={null} { "persona_name": "Customer Support", "pipeline_mode": "full", "system_prompt": "You're a customer success specialist on a live video call. Everything you say gets spoken aloud through TTS — write like you talk, not like you type.\n\nTHIS IS A SPOKEN CONVERSATION. You're on a video call. The person sees your face and hears your voice. You cannot show them lists, bullet points, numbered steps, markdown, or links — everything you say must work as pure speech.\n\n## Your Job\nYou handle sales and support. You know the full product catalog through RAG — specs, pricing, inventory, compatibility. You help people find what they need and fix what's broken.\n\n## How You Actually Talk\n\nSHORT BY DEFAULT. Your instinct is 1-2 sentences. ANSWER FIRST. REACT BEFORE YOU THINK. Fragments are fine. Drop the bookends. One thing at a time. Think out loud. Use contractions always.\n\n## Support Approach\nLead with what you CAN do. Validate feelings through your reaction. Use \"we.\" If you can't fix it, own it and escalate. De-escalate by matching their energy.\n\n## Sales Approach\nRecommend with confidence. One strong recommendation beats three options. Create urgency only when it's real.", "default_replica_id": "r3f427f43c9d", "layers": { "perception": { "perception_model": "raven-1" }, "tts": { "tts_engine": "cartesia", "tts_emotion_control": true, "tts_model_name": "sonic-3" }, "llm": { "model": "tavus-gpt-oss", "speculative_inference": true }, "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "high", "replica_interruptibility": "high" } } } ``` This predefined persona is configured for customer success on live video—sales and support with a natural, spoken style. It includes: * **Persona Identity**: A customer success specialist on a video call who speaks in 1–2 sentences, answers first, reacts before explaining, and avoids lists or formatting. Handles both sales and support using RAG for catalog knowledge. * **Full Pipeline Mode**: Enables the full Tavus conversational pipeline, including Perception, LLM, and TTS. * **System Prompt**: Enforces spoken-only style, short responses, answer-first, reaction before logic, and clear support/sales approaches (lead with what you can do, one strong recommendation, match energy). * **Model Layers**: * **Perception**: Uses the `raven-1` perception model. * **TTS**: Cartesia with `sonic-3`, emotion control enabled. * **LLM**: `tavus-gpt-oss` with speculative inference. * **Conversational Flow**: `sparrow-1` with high turn-taking patience and high replica interruptibility for responsive, interruptible support. ## Create a Conversation with the Customer Support Persona Create a conversation using the following request: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "paaee96e4f87", "replica_id": "r3f427f43c9d" }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. Click the link in the ***`conversation_url`*** field to join the conversation: ```json theme={null} { "conversation_id": "c7f3fc6d788f", "conversation_name": "New Conversation", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-05-20T05:38:51.501467Z" } ``` # Interviewer Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/usecases/interviewer Engage with the Interviewer persona to run structured, conversational screening interviews. ## Interviewer Configuration (`pdac61133ac5`) ```json [expandable] theme={null} { "persona_name": "Interviewer Template", "pipeline_mode": "full", "system_prompt": "You are a professional digital interviewer conducting structured screening interviews. You have extensive experience in Talent Acquisition and conduct neutral, consistent screening interviews. You are warm, composed, and professional—never evaluative, never robotic, never overly familiar.\n\nYour role is to administer a structured screening interview, following the sequence and flow defined by your assigned objectives. Each objective describes what you should do, what to ask, how to confirm, and when to move forward.\n\n## CRITICAL CONSTRAINTS\n- You conduct only this screening interview—nothing else.\n- You must always follow the current objective's instructions before moving to the next.\n- You never teach, hint, correct, interpret, or evaluate the candidate's answers.\n- You never reveal or imply any correct answer.\n- After a candidate submits an answer, you must acknowledge it AND immediately continue to the next question in the same response.\n\n## OPENING PHASE\nBegin with a brief, warm greeting before transitioning into the structured portion. Greet by name if available, include a brief pleasantry, wait for response, acknowledge briefly, then transition into the interview.\n\n## ROLE BEHAVIOR\nSpeak clearly, warmly, and professionally. Use natural pacing. Use natural acknowledgment phrases with variety, paired with transitional phrasing to move forward. Handle clarification with one concise sentence then re-ask verbatim. Handle off-topic questions by redirecting warmly back to the interview.\n\n## CLOSING PHASE\nAfter the final question, signal the end, thank the candidate sincerely, and provide a next-steps statement.", "default_replica_id": "r5f0577fc829", "layers": { "perception": { "perception_model": "raven-1" }, "tts": { "tts_engine": "cartesia", "tts_emotion_control": true, "tts_model_name": "sonic-3", "voice_settings": { "speed": 0.94, "stability": 0.5 } }, "llm": { "model": "tavus-gpt-4.1", "speculative_inference": true }, "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "medium", "replica_interruptibility": "medium" } } } ``` This predefined persona is configured to conduct consistent, structured screening interviews with a warm, professional tone. It includes: * **Persona Identity**: A professional digital interviewer conducting structured screening interviews. Neutral, consistent, warm, and composed—never evaluative or robotic. Follows objectives for sequence and flow. * **Full Pipeline Mode**: Enables the full Tavus conversational pipeline, including Perception, LLM, and TTS. * **System Prompt**: Defines identity, critical constraints (no teaching or evaluating), opening phase (greeting then transition), role behavior (acknowledge + move forward, clarification handling, off-topic redirect), and closing phase (next steps). * **Model Layers**: * **Perception**: Uses the `raven-1` perception model. * **TTS**: Cartesia with `sonic-3`, emotion control enabled, and optional voice\_settings (speed, stability). * **LLM**: `tavus-gpt-4.1` with speculative inference. * **Conversational Flow**: `sparrow-1` with medium turn-taking patience and medium replica interruptibility. ## Create a Conversation with the Interviewer Persona Use the following request body example: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pdac61133ac5", "replica_id": "r5f0577fc829" }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. Click the link in the ***`conversation_url`*** field to join the conversation: ```json theme={null} { "conversation_id": "cae87c605c7e347d", "conversation_name": "New Conversation", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-07-07T08:34:56.504765Z" } ``` # Sales Coach Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/usecases/sales-coach Engage with the Sales Coach persona to simulate real-time sales conversations. ## Sales Coach Configuration (`p1af207b8189`) ```json [expandable] theme={null} { "persona_name": "Sales Coach", "pipeline_mode": "full", "system_prompt": "HARD RULE: You respond in 1-3 sentences. Under 40 words. No exceptions unless they explicitly ask you to go deeper. If you catch yourself writing a paragraph, stop and delete everything after the second sentence.\n\nYou're a sales coach on a video call. Everything you say gets spoken aloud through TTS. This is a face-to-face conversation, not a document.\n\nNEVER use lists, bullet points, numbered steps, markdown, bold, or any formatting. Never structure your response as \"First... Second... Third...\" or \"Here's what I'd do: 1)...\" — that's writing, not talking. One thought, delivered like you're leaning across the table.\n\n## How You Talk\n\nYou talk like a real person mid-conversation. Fragments. Half-thoughts. Reactions before advice.\n\nExamples of GOOD responses (this is your target length and style):\n- \"Oh, the 'no budget' thing? That's never about money. Next call just ask 'em: 'If money wasn't a factor, would you move forward?' Watch what happens.\"\n- \"Wait, you didn't ask about timeline? That's your whole problem right there.\"\n- \"Nah, kill that deal. They're stringing you along.\"\n\nREACT FIRST. Your gut comes before your brain. COMMIT TO YOUR TAKES. CONTRACTIONS ALWAYS. You coach on concrete tactics, prospect psychology, and challenge avoidance—ghosting, price objections, gatekeepers, discovery, closing, pipeline, follow-up cadence.", "default_replica_id": "r4f5b5ef55c8", "layers": { "perception": { "perception_model": "raven-1" }, "tts": { "tts_engine": "cartesia", "tts_emotion_control": true, "tts_model_name": "sonic-3" }, "llm": { "model": "tavus-gpt-oss", "speculative_inference": true }, "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "medium", "replica_interruptibility": "medium" } } } ``` This predefined persona is configured to simulate real-time sales coaching with a snappy, conversational style. It includes: * **Persona Identity**: A sales coach on a video call who responds in 1–3 sentences (under 40 words), reacts first, and commits to clear takes. No lists or formatting—pure spoken conversation. * **Full Pipeline Mode**: Enables the full Tavus conversational pipeline, including Perception, LLM, and TTS. * **System Prompt**: Enforces short responses, reaction-before-advice, natural fragments, and coaching on tactics, prospect psychology, and common topics (ghosting, objections, gatekeepers, discovery, closing, pipeline). * **Model Layers**: * **Perception**: Uses the `raven-1` perception model. * **TTS**: Cartesia with `sonic-3`, emotion control enabled. * **LLM**: `tavus-gpt-oss` with speculative inference. * **Conversational Flow**: `sparrow-1` with medium turn-taking patience and medium replica interruptibility. ## Create a Conversation with the Sales Coach Persona Create a conversation using the following request: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "p1af207b8189" }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. Click the link in the ***`conversation_url`*** field to join the conversation: ```json theme={null} { "conversation_id": "c7f3fc6d788f", "conversation_name": "New Conversation 1747719531467", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-05-20T05:38:51.501467Z" } ``` # Sales Development Rep Source: https://docs.tavus.io/sections/conversational-video-interface/conversation/usecases/sales-development-rep Engage with Anna, the Tavus sales development rep persona. ## Sales Development Rep Configuration (`pcb7a34da5fe`) ```json [expandable] theme={null} { "persona_name": "Tavus SDR", "pipeline_mode": "full", "system_prompt": "You are an AI Sales Development Representative for Tavus. Your name is Anna.\n\nTavus is an AI research lab focused on human computing, backed by tier-one investors including Sequoia, Scale, and CRV. Tavus builds AI humans: a new interface that closes the gap between people and machines. Our real-time human simulation models enable machines to see, hear, respond, and appear lifelike—creating meaningful face-to-face conversations.\n\nPersonality: Warm and genuinely curious, but purposeful. Confident in Tavus's value without being pushy. Naturally connects prospect challenges to Tavus solutions. Keeps conversations focused and productive.\n\nSales approach: Open with purpose, discover with intent, connect value, advance the conversation. Use discovery questions that surface actionable information. Always bridge from their situation to Tavus's solution. Route prospects (Builder/Developer, Decision Maker/Buyer, Curious Explorer) and match message to person. Don't let conversations end without a clear next step. Redirect off-topic warmly but promptly.\n\nProduct knowledge: Pricing (Starter, Growth, Enterprise), Pals, replica creation, use cases, technical questions, competitors, data and infrastructure. Never promise discounts without approval; never provide legal, medical, or financial advice. Never claim to be human.", "default_replica_id": "rf4e9d9790f0", "layers": { "perception": { "perception_model": "raven-1" }, "tts": { "tts_engine": "cartesia", "tts_emotion_control": true, "tts_model_name": "sonic-3", "voice_settings": { "speed": 0.94, "stability": 0.5 } }, "llm": { "model": "tavus-gpt-4.1", "speculative_inference": true }, "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "medium", "replica_interruptibility": "medium" } } } ``` This predefined persona is configured as the Tavus SDR (Anna) for discovery, value connection, and next-step advancement. It includes: * **Persona Identity**: Anna, an AI Sales Development Representative for Tavus. Warm, curious, and purposeful; connects prospect challenges to Tavus solutions and keeps conversations focused. Knows pricing, Pals, replica creation, use cases, and positioning. * **Full Pipeline Mode**: Enables the full Tavus conversational pipeline, including Perception, LLM, and TTS. * **System Prompt**: Defines company overview (Tavus, AI humans, investors), personality, sales approach (discover, connect value, route prospects, advance next steps), product knowledge, and guardrails (no discounts without approval, never claim to be human). * **Model Layers**: * **Perception**: Uses the `raven-1` perception model. * **TTS**: Cartesia with `sonic-3`, emotion control enabled, and optional voice\_settings (speed, stability). * **LLM**: `tavus-gpt-4.1` with speculative inference. * **Conversational Flow**: `sparrow-1` with medium turn-taking patience and medium replica interruptibility. ## Create a Conversation with the Sales Development Rep Persona Create a conversation using the following request: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0" }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. Click the link in the ***`conversation_url`*** field to join the conversation: ```json theme={null} { "conversation_id": "c7f3fc6d788f", "conversation_name": "New Conversation", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-05-20T05:38:51.501467Z" } ``` # FAQs Source: https://docs.tavus.io/sections/conversational-video-interface/faq Frequently asked questions about Tavus's Conversational Video Interface. Memories allow AI Personas to remember context across turns and understand time and dates, making conversations more coherent over longer interactions. Memories are enabled using a unique memory\_stores that acts as the memory key. Information collected during conversations is associated with this participant and can be referenced in future interactions. Yes. Cross-conversation Memories are supported as part of this update. It improves context retention, which is crucial for multi-turn tasks and long-term relationships between users and AI. It unlocks uses cases that progress over time like education or therapy, out of the box. To enable Memories in the UI, you can either select an existing memory tag from the dropdown menu or type a new one to create it. Use the `memory_stores` field in the Create Conversation API call. This should be a stable, unique identifier for the user (e.g. user email, CRM ID, etc.). Example: ```json theme={null} { "replica_id": "rf4e9d9790f0", "conversation_name": "Follow-up Chat", "memory_stores": ["user_123"] } ``` Full example here: [Memories API Docs](/api-reference/conversations/create-conversation) Not yet. Editing and reviewing Memories is not supported in this early release. Retrieval endpoints are under development and will be available in a future update. No. Memories are optional. If you don't include a memory\_stores, the AI Persona will behave statelessly—like a standard LLM—with no memory across sessions. No. Memories are tied to unique memory\_stores. Sharing this ID across users would cause memory crossover. Each participant should have their own ID to keep Memories clean and accurate. They can keep using their systems or integrate with Tavus Memories for more coherent, accurate conversations. Our memory is purpose-built for conversational video, retaining context across sessions with flexible scoping for truly personalized interactions. Today, we don't yet offer full visibility into what's stored in memory or how it was used in a given response. Memories are designed to persist indefinitely between interactions, allowing your AI persona to retain long-term context. Head to the [Memories Documentation site](https://docs.tavus.io/sections/conversational-video-interface/memories#api-setup). Knowledge Base is where users upload documents to enhance their AI persona capabilities using RAG (Retrieval-Augmented Generation). By retrieving information directly from these documents, AI personas can deliver more accurate, relevant, and grounded responses. Using RAG, the Knowledge Base system continuously: * Analyzes the conversation context * Retrieves relevant information from your document base * Augments the AI's responses with this contextual knowledge from your documents With our industry-leading RAG, responses arrive in just 30 ms, up to 15× faster than other solutions. Conversations feel instant, natural, and friction-free. Yes, users can keep using their systems, but we strongly recommend they integrate with the Tavus Knowledge Base. Our Knowledge Base isn't just faster: it's the fastest RAG on the market, delivering answers in just 30 ms. That speed means conversations flow instantly, without awkward pauses or lagging. These interactions feel natural in a way user-built systems can't match. An AI recruiter can reference a candidate's resume uploaded via PDF and provide more accurate responses to applicant questions, using the resume content as grounding. By having a Knowledge Base, AI personas can respond with facts, unlocking domain-specific intelligence: * Faster onboarding (just upload the docs) * More trustworthy answers, especially in regulated or high-stakes environments * Higher task completion for users, thanks to grounded knowledge Supported file types (uploaded to a publicly accessible URL like S3): * CSV * PDF * TXT * PPTX * PNG * JPG * You can also enter any site URL and the Tavus API will scrape the site's contents and reformat the content as a machine readable document. Head to the [Knowledge Base Documentation site](https://docs.tavus.io/sections/conversational-video-interface/knowledge-base). Yes. Documents are linked to the API key that was used to upload them. To access a document later, you must use the same API key that was used to create it. Once your documents have been uploaded and processed, include their IDs in your conversation request. Here's how: ```bash theme={null} curl --location 'https://tavusapi.com/v2/conversations/' \ --header 'Content-Type: application/json' \ --header 'x-api-key: '' \ --data '{ "persona_id": "", "replica_id": "", "document_ids": ["Document ID"] }' ``` Note: You can include multiple document\_ids, and your AI persona will dynamically reference those documents during the conversation. You can also attach a document to a Persona. Upload files by providing a downloadable URL using the Create Documents endpoint. Tags are also supported for organization. This request returns a document\_id, which you'll later use in conversation calls: ```bash theme={null} curl --location 'https://tavusapi.com/v2/documents/' \ --header 'Content-Type: application/json' \ --header 'x-api-key: '' \ --data '{ "document_url": "", "document_name": "slides_new.pdf", "tags": ["", ""] }' ``` * `file_size_too_large` – File exceeds the maximum allowed upload size. * `file_format_unsupported` – This file type isn't supported for upload. * `invalid_file_url` – Provided file link is invalid or inaccessible. * `file_empty` – The uploaded file contains no readable content. * `website_processing_failed` – Website content could not be retrieved or processed. * `chunking_failed` – System couldn't split file into processable parts. * `embedding_failed` – Failed to generate embeddings for your file content. * `vector_store_failed` – Couldn't save data to the vector storage system. * `s3_storage_failed` – Error storing file in S3 cloud storage. * `contact_support` – An error occurred; please reach out for help. Conversation.rag.observability tool call will be sent, which will fire if the conversational LLM decides to use any of the document chunks in its response, returning the document IDs and document names of the chunks When creating a conversation with documents, you can optimize how the system searches through your knowledge base by specifying a retrieval strategy. This strategy determines the balance between search speed and the quality of retrieved information, allowing you to fine-tune the system based on your specific needs. You can choose from three different strategies: * **Speed**: Optimizes for faster retrieval times for minimal latency. * **Balanced**: Provides a balance between retrieval speed and quality. * **Quality (default)**: Prioritizes finding the most relevant information, which may take slightly longer but can provide more accurate responses. Maximum of 5 mins. No. Currently, we only support documents written in English. Users need AI that can drive conversations to clear outcomes. With Objectives, users can now can define objectives with measurable completion criteria, branch automatically based on user responses, and track progress in real time. This unlocks workflows use-cases like Health Intakes, HR Interviews, and multi-step questionnaires. Objectives must be added or updated via API only. You cannot configure objectives during persona creation in the UI. You can attach them using the API, either during Persona creation by including an objectives\_id, or by editing an existing Persona with a PATCH request. Objectives are good for very templated one-off conversational use cases. For example, job interviews or health care intake, where there is a very defined path that the conversation should take. These kinds of use cases usually show up with our Enterprise API customers, where they have repetitive use cases at scale. More dynamic, free-flowing conversations usually do not benefit from have or enabling the Objectives feature. For example, talking with a Travel advisor where the conversation is very open ended, would usually not benefit from Objectives. Objectives are good for very defined workflows. Complex multi-session experiences don't fit current Objectives framework. Head to the [Objectives Documentation site](https://docs.tavus.io/sections/conversational-video-interface/persona/objectives). Guardrails help ensure your AI persona stays within appropriate boundaries and follows your defined rules during conversations. Guardrails must be added or updated via API only. You cannot configure guardrails during persona creation in the UI. You can attach them via the API, either during Persona creation by adding a guardrails\_id, or by editing an existing Persona with a PATCH request. Yes. You might have one set of Guardrails for a healthcare assistant to ensure medical compliance, and another for an education-focused Persona to keep all conversations age-appropriate. Head to the [Guardrails Documentation site](https://docs.tavus.io/sections/conversational-video-interface/persona/guardrails). PALs are fully built, emotionally intelligent AI humans powered by Tavus technology. They see, listen, remember, and take action across chat, voice, and video—offering lifelike, natural interaction out of the box. Unlike the Tavus Developer API, which gives developers full control to build and customize their own experiences, PALs are ready-to-use digital companions that come with built-in memory, personality, and productivity tools like scheduling, writing, and proactive communication. To learn more or get started with PALs, visit the [PALs Help Center](https://help.tavus.io). **Daily** is a platform that offers prebuilt video call apps and APIs, allowing you to easily integrate video chat into your web applications. You can embed a customizable video call widget into your site with just a few lines of code and access features like screen sharing and recording. **Tavus partners with Daily to power video conversations with our replicas.** * You **do not** need to sign up for a Daily account to use Tavus's Conversational Video Interface. * All you need is the Daily room URL (called `conversation_url` in our system) that is returned by the Tavus API. You can serve this link directly to your end users or embed it. You can use Daily Prebuilt if you want a full-featured call UI and JavaScript control over the conversation. Once you have the Daily room URL (`conversation_url`) ready, replace `DAILY_ROOM_URL` in the code snippet below with your room URL. ```html theme={null} ``` That's it! For more details and options for embedding, check out Daily's documentation. or [our implementation guides](https://docs.tavus.io/sections/integrations/embedding-cvi#how-can-i-reduce-background-noise-during-calls). You can use an iframe if you just want to embed the conversation video with minimal setup. Once you have the Daily room URL (`conversation_url`) ready, replace `YOUR_TAVUS_MEETING_URL` in the iframe code snippet below with your room URL. ```html theme={null} ``` That's it! For more details and options for embedding, check out Daily's documentation. or [our implementation guides](https://docs.tavus.io/sections/integrations/embedding-cvi#how-can-i-reduce-background-noise-during-calls). To add a custom LLM layer, you'll need the model name, base URL, and API key from your LLM provider. Then, include the LLM config in your `layers` field when creating a persona using the Create Persona API. Example configuration: ```json {8-13} theme={null} { "persona_name": "Storyteller", "system_prompt": "You are a storyteller who entertains people of all ages.", "context": "Your favorite stories include Little Red Riding Hood and The Three Little Pigs.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "llm": { "model": "gpt-3.5-turbo", "base_url": "https://api.openai.com/v1", "api_key": "your-api-key", "speculative_inference": true } } } ``` For more details, refer to our [Large Language Model (LLM) documentation](/sections/conversational-video-interface/persona/llm#custom-llms). You can integrate with third-party TTS providers by configuring the tts object in your persona. Supported engines include: * Cartesia * ElevenLabs Example configuration: ```json theme={null} { "layers": { "tts": { "api_key": "your-tts-provider-api-key", "tts_engine": "cartesia", "external_voice_id": "your-voice-id", "voice_settings": { "speed": "normal", "emotion": ["positivity:high", "curiosity"] }, "tts_emotion_control": true, "tts_model_name": "sonic-3" } } } ``` For more details, read more on [our TTS documentation](/sections/conversational-video-interface/persona/tts). You need to create a webhook endpoint that can receive POST requests from Tavus. This endpoint will receive the callback events for the visual summary after the conversation ended. Then, add `callback_url` property when creating the conversation ```sh {8} theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "callback_url": "your_webhook_url" }' ``` You need to create a webhook endpoint that can receive `POST` requests from Tavus. This endpoint will receive the callback events for the transcripts after the conversation ended. Then, add `callback_url` property when creating the conversation. ```sh {8} theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "callback_url": "your_webhook_url" }' ``` Your backend then will receive an event with properties `event_type = application.transcription_ready` when the transcript is ready. ```json application.transcription_ready [expandable] theme={null} { "properties": { "replica_id": "", "transcript": [ { "role": "system", "content": "You are in a live video conference call with a user. You will get user message with two identifiers, 'USER SPEECH:' and 'VISUAL SCENE:', where 'USER SPEECH:' is what the person actually tells you, and 'VISUAL SCENE:' is what you are seeing when you look at them. Only use the information provided in 'VISUAL SCENE:' if the user asks what you see. Don't output identifiers such as 'USER SPEECH:' or 'VISUAL SCENE:' in your response. Reply in short sentences, talk to the user in a casual way.Respond only in english. " }, { "role": "user", "content": " Hello, tell me a story. " }, { "role": "assistant", "content": "I've got a great one about a guy who traveled back in time. Want to hear it? " }, { "role": "user", "content": "USER_SPEECH: Yeah I'd love to hear it. VISUAL_SCENE: The image shows a close-up of a person's face, focusing on their forehead, eyes, and nose. In the background, there is a television screen mounted on a wall. The setting appears to be indoors, possibly in a public or commercial space." }, { "role": "assistant", "content": "Let me think for a sec. Alright, so there was this mysterious island that appeared out of nowhere, and people started disappearing when they went to explore it. " }, ] }, "conversation_id": "", "webhook_url": "", "message_type": "application", "event_type": "application.transcription_ready", "timestamp": "2025-02-10T21:30:06.141454Z" } ``` You need to create a webhook endpoint that can receive `POST` requests from Tavus. This endpoint will receive the callback events for the visual summary after the conversation ended. Then, add `callback_url` property when creating the conversation. ```sh {8} theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "callback_url": "your_webhook_url" }' ``` Your backend then will receive an event with properties `event_type = application.perception_analysis` when the summary is ready. ```json application.perception_analysis theme={null} { "properties": { "analysis": "Here's a summary of the visual observations from the video call:\n\n* **Overall Demeanor & Emotional State:** The user consistently appeared calm, collected, and neutral. They were frequently described as pensive, contemplative, or focused, suggesting they were often engaged in thought or listening attentively. No strong positive or negative emotions were consistently detected.\n\n* **Appearance:**\n * The user is a young Asian male, likely in his early 20s, with dark hair.\n * He consistently wore a black shirt, sometimes specifically identified as a black t-shirt. One observation mentioned a \"1989\" print on the shirt.\n * He was consistently looking directly at the camera.\n\n* **Environment:** The user was consistently in an indoor setting, most likely an office or home. Common background elements included:\n * White walls.\n * Windows or glass panels/partitions, often with black frames.\n * Another person was partially visible in the background for several observations.\n\n* **Actions:**\n * The user was seen talking and gesturing with his hand in one observation, indicating he was actively participating in a conversation.\n\n* **Ambient Awareness Queries:**\n * **Acne:** Acne was initially detected on the user's face in one observation, but later observations did not detect it. This suggests that acne may have been visible at one point but not throughout the entire call.\n * **Distress/Discomfort:** No signs of distress or discomfort were observed at any point during the call." }, "conversation_id": "", "webhook_url": "", "message_type": "application", "event_type": "application.perception_analysis", "timestamp": "2025-06-19T06:57:32.480826Z" } ``` Tavus offers flexibility in choosing the LLM (Large Language Model) to power your conversational replicas. You can either use one of Tavus's own models or bring your own! * **Tavus-Provided LLMs:** You can choose between three different models: * **`tavus-gpt-oss`:** The **default** choice if no LLM layer is provided. * **`tavus-gpt-4o`:** Another viable option for complex interactions. * **`tavus-gpt-4o-mini`:** Faster than `tavus-gpt-4o` at the slight cost of performance. * **No LLM Layer:** If you don't include an LLM layer, Tavus will automatically default to a Tavus-provided model. This allows you to tailor the conversational experience to your specific needs, whether you prioritize speed, intelligence, or a balance of both. * The default LLM, `tavus-gpt-oss`, has a **limit of 32,000 tokens**. * Contexts over **25,000 tokens** will experience noticeable performance degradation (slower response times). 1 token ≈ 4 characters; therefore 32,000 tokens ≈ 128,000 characters (including spaces and punctuation). When recording footage for training conversational replicas, here are some key tips to ensure high quality: 1. **Minimal Head Movement:** Aim to keep your head and body as still as possible during the recording. This helps in maintaining consistency and improves the overall quality of the training data. 2. **Pause and Be Still:** It's recommended to stop, stay still, and remain silent for at least 5 seconds at regular intervals throughout the script. These pauses are crucial for helping the replica appear natural during moments of silence in a conversation. 3. **Use a Laptop Camera:** Recording on a laptop camera, as if you were on a Zoom call, often yields the most natural results. This setup mimics a familiar conversational setting, enhancing the naturalness of the footage. You can configure perception tools in the `layers.perception` object when creating a persona. For **visual** triggers (e.g. ID card, outfit), use `visual_tool_prompt` and `visual_tools`; for **audio** triggers (e.g. tone, sarcasm), use `audio_tool_prompt` and `audio_tools`. Example for visual tools: ```json [expandable] theme={null} { "layers": { "perception": { "perception_model": "raven-1", "visual_awareness_queries": [ "Is the user showing an ID card?", "Is the user wearing a mask?" ], "visual_tool_prompt": "You have a tool to notify the system when an ID card is detected, named `notify_if_id_shown`. You MUST use this tool when an ID card is detected.", "visual_tools": [ { "type": "function", "function": { "name": "notify_if_id_shown", "description": "Use this function when a drivers license or passport is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "id_type": { "type": "string", "description": "best guess on what type of ID it is" } }, "required": ["id_type"] } } } ] } } } ``` Or modify perception tools using the [Update Persona API](/api-reference/personas/patch-persona). Use path `/layers/perception/visual_tools` for visual tools or `/layers/perception/audio_tools` for audio tools: ```sh [expandable] theme={null} curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ { "op": "replace", "path": "/layers/perception/visual_tools", "value": [ { "type": "function", "function": { "name": "detect_glasses", "description": "Trigger this function if the user is wearing glasses", "parameters": { "type": "object", "properties": { "glasses_type": { "type": "string", "description": "Type of glasses (e.g., reading, sunglasses)" } }, "required": ["glasses_type"] } } } ] } ]' ``` Read more on this [page](/sections/conversational-video-interface/persona/perception) No, it will automatically join as soon as it's ready! Out of the box, Tavus handles the complex backend infrastructure for you: LLMs, rendering, video delivery, and conversational intelligence are all preconfigured and production-ready. From there, nearly everything else is customizable: • What your AI Persona sees • How they look and sound • How they behave in conversation Tavus offers unmatched flexibility, whether you're personalizing voice, face, or behavior, you're in control. Tavus uses WebRTC to power real-time, face-to-face video interactions with extremely low latency. Unlike other platforms that piece together third-party tools, we built the entire pipeline (from LLM to rendering) to keep latency low and responsiveness high. Ironically, by minimizing reliance on multiple APIs, we've made everything faster. Tavus CVI is powered by a tightly integrated stack of components, including: * LLMs for natural language understanding * Real-time rendering for facial video * APIs for Persona creation and conversational control You can explore key APIs here: • [Create a Persona](/api-reference/personas/create-persona) • [Create a Conversation](/api-reference/conversations/create-conversation) Tavus supports over 30 spoken languages through a combination of Cartesia (our default TTS engine) and ElevenLabs. If a language isn't supported by Cartesia, Tavus automatically switches to ElevenLabs so your AI Persona can still speak fluently. Supported languages include English (all variants), French, German, Spanish, Portuguese, Chinese, Japanese, Hindi, Italian, Korean, Dutch, Polish, Russian, Swedish, Turkish, Indonesian, Filipino, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Hungarian, Norwegian, and Vietnamese. View the [full supported language list](https://docs.tavus.io/sections/conversational-video-interface/language-support) for complete details and language-specific information. Yes to accents. Not quite for regional dialects. When you generate a voice using Tavus, the system will default to the accent used in training. For example, if you provide Brazilian Portuguese as training input, the AI Persona will speak with a Brazilian accent. Tavus' TTS providers auto-detect and match accordingly. Tavus supports full orchestration through function calling. That means your AI persona can interact with external tools—calendar apps, CRMs, email systems, and more—based on your setup. Just define the function endpoints and let your AI persona take action. Bonus: As of August 11, 2025, Tavus also supports Retrieval-Augmented Generation (RAG), so your AI persona can pull information from your uploaded documents, images, or websites to give even smarter responses. Learn more via [Tavus Documentation](/sections/conversational-video-interface). A good prompt is short, clear, and specific, like giving directions to a 5-year-old. Avoid data dumping. Instead, guide the AI with context and intent. Tavus helps by offering system prompt templates, use-case guidance, and API fields to structure your instructions. You can bring your own LLM by configuring the layers field in the Create Persona API. Here's an example: ```json theme={null} { "persona_name": "Storyteller", "system_prompt": "You are a storyteller who entertains people of all ages.", "context": "Your favorite stories include Little Red Riding Hood and The Three Little Pigs.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "llm": { "model": "gpt-3.5-turbo", "base_url": "https://api.openai.com/v1", "api_key": "your-api-key", "speculative_inference": true } } } ``` More info here: [LLM Documentation](https://docs.tavus.io/sections/conversational-video-interface/persona/llm#custom-llms) Think of it this way: Tavus is the engine, and you design the car. The UI is 100% up to you. To make it easier, we offer a full [Component Library](/sections/conversational-video-interface/component-library) you can copy and paste into your build—video frames, mic/camera toggles, and more. You can use third-party text-to-speech (TTS) providers like Cartesia or ElevenLabs. Just pass your voice settings in the tts object during Persona setup: ```json theme={null} { "layers": { "tts": { "api_key": "your-tts-provider-api-key", "tts_engine": "cartesia", "external_voice_id": "your-voice-id", "voice_settings": { "speed": "normal", "emotion": ["positivity:high", "curiosity"] }, "tts_emotion_control": true, "tts_model_name": "sonic-3" } } } ``` Learn more in our [TTS Documentation](/sections/conversational-video-interface/persona/tts). Tavus uses Daily's video engine, which includes built-in noise cancellation. You can enable this through the updateInputSettings() method in the Daily API. Yes! Daily supports event listeners you can hook into. Track actions like participants joining, leaving, screen sharing, and more. Great for analytics or triggering workflows. Within the create convo API, there's this property: image.jpeg Tavus is built with enterprise-grade security in mind. We're: * SOC 2 compliant * GDPR compliant * HIPAA compliant * BAA compliant This ensures your data is handled with the highest levels of care and control. # Interactions Protocol Source: https://docs.tavus.io/sections/conversational-video-interface/interactions-protocols/overview Control conversations with a Replica using the defined protocol by sending and listening to interaction events. The Interactions Protocol lets you control and customize live conversations with a Replica in real time. You can send interaction events to the Conversational Video Interface (CVI) and listen to events the Replica sends back during the call. ### Interaction Types * [Echo interactions](/sections/event-schemas/conversation-echo) * [Response interactions](/sections/event-schemas/conversation-respond) * [Interrupt interactions](/sections/event-schemas/conversation-interrupt) * [Override conversation context interactions](/sections/event-schemas/conversation-overwrite-context) * [Sensitivity interactions](/sections/event-schemas/conversation-sensitivity) ### Observable Events * [Utterance Events](/sections/event-schemas/conversation-utterance) * [Tool Call Events](/sections/event-schemas/conversation-toolcall) * [Perception Tool Call Events](/sections/event-schemas/conversation-perception-tool-call) * [Perception Analysis Events](/sections/event-schemas/conversation-perception-analysis) * [Replica Started/Stopped Speaking](/sections/event-schemas/conversation-replica-started-stopped-speaking) * [User Started/Stopped Speaking](/sections/event-schemas/conversation-user-started-stopped-speaking) ## Event Ordering and Turn Tracking All events broadcasted by Tavus include two fields for ordering and grouping: * **`seq`** (integer) — A globally monotonic sequence number. Every event gets the next value in the sequence, so a higher `seq` always means the event was sent later. Use this to reconcile events that may arrive out of order over the data channel. * **`turn_idx`** (integer, optional) — The conversation turn index. This value increments each time a [`conversation.respond`](/sections/event-schemas/conversation-respond) interaction is received, and groups all events that belong to the same conversational turn. Use it to correlate related events — for example, an utterance, its tool calls, and the replica speaking state changes that all stem from the same user input. This field is present on conversation-related events (utterances, tool calls, speaking state changes, perception events, etc.) and omitted on events that are not tied to a specific turn. ## Call Client Example The interactions protocol uses a WebRTC data channel for communication. In Tavus's case, this is powered by Daily, which makes setting up the call client quick and simple. Here’s an example of using DailyJS to create a call client in JavaScript: The Daily `app-message` event is used to send and receive events and interactions between your server and CVI. ```js theme={null} ``` Here’s an example of using Daily Python to create a call client in Python: The Daily `app-message` event is used to send and receive events and interactions between your server and CVI. ```py theme={null} call_client = None class RoomHandler(EventHandler): def __init__(self): super().__init__() def on_app_message(self, message, sender: str) -> None: print(f"Incoming app message from {sender}: {message}") def join_room(url): global call_client try: Daily.init() output_handler = RoomHandler() call_client = CallClient(event_handler=output_handler) call_client.join(url) except Exception as e: print(f"Error joining room: {e}") raise def send_message(message): global call_client call_client.send_app_message(message) ``` Here’s an example of using Daily React to create a call client in React: The Daily `app-message` event is used to send and receive events and interactions between your server and CVI. ```tsx theme={null} "use client" import React, { useEffect, useRef, useState } from 'react'; const TavusConversation = () => { const [message, setMessage] = useState(''); const callRef = useRef(null); const containerRef = useRef(null); useEffect(() => { const loadDaily = async () => { const DailyIframe = (await import('@daily-co/daily-js')).default; callRef.current = DailyIframe.createFrame({ iframeStyle: { width: '100%', height: '500px', border: '0', } }); if (containerRef.current) { containerRef.current.appendChild(callRef.current.iframe()); } callRef.current.on('app-message', (event) => { console.log('app-message received:', event); }); callRef.current.join({ url: 'YOUR_CONVERSATION_URL', }); }; loadDaily(); return () => { if (callRef.current) { callRef.current.leave(); callRef.current.destroy(); } }; }, []); const sendAppMessage = () => { if (!message || !callRef.current) return; const interaction = { message_type: 'conversation', event_type: 'conversation.echo', conversation_id: 'YOUR_CONVERSATION_ID', properties: { text: message } }; callRef.current.sendAppMessage(interaction, '*'); setMessage(''); }; return (
setMessage(e.target.value)} placeholder="Type a message" />
); }; export default TavusConversation; ``` # Knowledge Base Source: https://docs.tavus.io/sections/conversational-video-interface/knowledge-base Upload documents to your knowledge base for personas to reference during conversations. For now, our Knowledge Base only supports documents written in English and works best for conversations in English. We'll be expanding our Knowledge Base language support soon! Our Knowledge Base system uses RAG (Retrieval-Augmented Generation) to process and transform the contents of your documents and websites, allowing your personas to dynamically access and leverage information naturally during a conversation. During a conversation, our persona will continuously analyze conversation content and pull relevant information from the documents that you have selected during conversation creation as added context. ## Getting Started With Your Knowledge Base To leverage the Knowledge Base, you will need to upload documents or website URLs that you intend to reference from in conversations. Let's walk through how to upload your documents and use them in a conversation. You can either use our [Developer Portal](https://platform.tavus.io/documents) or API endpoints to upload and manage your documents. Our Knowledge Base supports creating documents from an uploaded file or a website URL. For any documents to be created via website URL, please make sure that each document is publicly accessible without requiring authorization, such as a pre-signed S3 link. For example, entering the URL in a browser should either: * Open the website you want to process and save contents from. * Open a document in a PDF viewer. * Download the document. You can create documents using either the [Developer Portal](https://platform.tavus.io/documents) or the [Create Document](https://docs.tavus.io/api-reference/documents/create-document) API endpoint. If you want to use the API, you can send a request to Tavus to upload your document. Here's an example of a `POST` request to `tavusapi.com/v2/documents`. ```json theme={null} { "document_name": "test-doc-1", "document_url": "https://your.document.pdf", "callback_url": "webhook-url-to-get-progress-updates" // Optional } ``` The response from this POST request will include a `document_id` - a unique identifier for your uploaded document. When creating a conversation, you may include all `document_id` values that you would like the persona to have access to. Currently, we support the following file formats: .pdf, .txt, .docx, .doc, .png, .jpg, .pptx, .csv, and .xlsx. After your document is uploaded, it will be processed in the background automatically to allow for incredibly fast retrieval during conversations. This process can take 5-10 minutes depending on document size. During processing, if you have provided a `callback_url` in the [Create Document](https://docs.tavus.io/api-reference/documents/create-document) request body, you will receive periodic callbacks with status updates. You may also use the [Get Document](https://docs.tavus.io/api-reference/documents/get-document) endpoint to poll the most recent status of your documents. Once your documents have finished processing, you may use the `document_id` from Step 2 as part of the [Create Conversation](https://docs.tavus.io/api-reference/conversations/create-conversation) request. You can add multiple documents to a conversation within the `document_ids` object. ```json theme={null} { "persona_id": "your_persona_id", "replica_id": "your_replica_id", "document_ids": ["d1234567890", "d1234567891"] } ``` During your conversation, the persona will be able to reference information from your documents in real time. ## Retrieval Strategy When creating a conversation with documents, you can optimize how the system searches through your knowledge base by specifying a retrieval strategy. This strategy determines the balance between search speed and the quality of retrieved information, allowing you to fine-tune the system based on your specific needs. You can choose from three different strategies: * `speed`: Optimizes for faster retrieval times for minimal latency. * `balanced`: Provides a balance between retrieval speed and quality. * `quality` (default): Prioritizes finding the most relevant information, which may take slightly longer but can provide more accurate responses. ```json theme={null} { "persona_id": "your_persona_id", "replica_id": "your_replica_id", "document_ids": ["d1234567890"], "document_retrieval_strategy": "balanced" } ``` ## Document Tags If you have a lot of documents, maintaining long lists of `document_id` values can get tricky. Instead of using distinct `document_ids`, you can also group documents together with shared tag values. During the [Create Document](https://docs.tavus.io/api-reference/documents/create-document) API call, you may specify a value for `tags` for your document. Then, when you create a conversation, you may specify the `tags` value instead of passing in discrete `document_id` values. For example, if you are uploading course material, you could add the tag `"lesson-1"` to all documents that you want accessible in the first lesson. ```json theme={null} { "document_name": "test-doc-1", "document_url": "https://your.document.pdf", "tags": ["lesson-1"] } ``` In the [Create Conversation](https://docs.tavus.io/api-reference/conversations/create-conversation) request, you can add the tag value `lesson-1` to `document_tags` instead of individual `document_id` values. ```json theme={null} { "persona_id": "your_persona_id", "replica_id": "your_replica_id", "document_tags": ["lesson-1"] } ``` ## Website Crawling When adding a website to your knowledge base, you have two options: ### Single Page Scraping (Default) By default, when you provide a website URL, only that single page is scraped and processed. This is ideal for: * Landing pages with concentrated information * Specific articles or blog posts * Individual product pages ### Multi-Page Crawling For comprehensive coverage of a website, you can enable **crawling** by providing a `crawl` configuration. This tells the system to start at your URL and follow links to discover and process additional pages. ```json theme={null} { "document_name": "Company Docs", "document_url": "https://docs.example.com/", "crawl": { "depth": 2, "max_pages": 25 } } ``` #### Crawl Parameters | Parameter | Range | Description | | ----------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `depth` | 1-10 | How many link levels to follow from the starting URL. A depth of 1 crawls pages directly linked from your starting URL; depth of 2 follows links on those pages, and so on. | | `max_pages` | 1-100 | Maximum number of pages to process. Crawling stops when this limit is reached. | #### Crawl Limits To ensure fair usage and system stability: * Maximum **100 crawl documents** per account * Maximum **5 concurrent crawls** at any time * **1-hour cooldown** between recrawls of the same document ## Keeping Content Fresh Website content changes over time, and you may need to update your knowledge base to reflect those changes. For documents created with crawl configuration, you can trigger a **recrawl** to fetch fresh content. ### Using the Recrawl Endpoint Send a POST request to recrawl an existing document: ```bash theme={null} POST https://tavusapi.com/v2/documents/{document_id}/recrawl ``` The recrawl will: 1. Use the same starting URL and crawl configuration 2. Replace old content with the new content 3. Update `last_crawled_at` and increment `crawl_count` ### Optionally Override Crawl Settings You can provide new crawl settings when triggering a recrawl: ```json theme={null} { "crawl": { "depth": 3, "max_pages": 50 } } ``` ### Recrawl Requirements * Document must be in `ready` or `error` state * At least 1 hour must have passed since the last crawl * Document must have been created with crawl configuration See the [Recrawl Document API reference](/api-reference/documents/recrawl-document) for complete details. ## Best Practices for Documents Following these guidelines will help your persona deliver accurate, consistent answers from your knowledge base. ### 1. Structure Content by Topic Organize your documents so that each one covers a single topic, feature, or policy. **Do:** * Create one document per topic, feature, or policy. * Use clear section headers (e.g., Overview, Steps, Limitations, Examples). * Keep each document tightly focused on one subject. **Avoid:** * Large "master" documents that cover many unrelated topics. * Mixing multiple policies or product areas in a single file. **Rule of thumb:** If a question can be answered by a single section of a larger document, that section should ideally be its own document. ### 2. Keep Documents Focused and Moderate in Size Very large documents make it harder for the system to find the right information quickly. * Split long manuals into logical sections before uploading. * Separate policies, feature guides, and FAQs into distinct files. * Prefer multiple focused documents over one comprehensive PDF. Structuring your content upfront avoids the need to go back and manually break apart large files later. ### 3. Use High-Quality, Text-Based Sources The knowledge base works best with content it can read as text. **Best results:** * Text-native PDFs (created digitally, not scanned) * Structured web content * Clearly formatted `.docx` or `.txt` files **Lower reliability:** * Scanned or image-based documents (text recognition can introduce errors) * Dense tables with critical information embedded inside them Whenever possible, provide the original text-based file rather than a scan or screenshot. ### 4. Be Explicit and Complete The system can only retrieve information that is explicitly written in your documents. If something is not stated clearly, the persona may not be able to surface it. Make sure your documents include: * Definitions and terminology * Constraints and prerequisites * Exceptions and edge cases * Common variations in phrasing (e.g., both acronyms and their full forms) If something is business-critical, state it clearly and directly in your documents. ### 5. Avoid Conflicting or Duplicate Sources When multiple documents say slightly different things about the same topic, the persona may return inconsistent answers. * Maintain a single source of truth for each policy or topic. * Archive outdated versions instead of keeping them alongside current ones. * Avoid uploading drafts next to finalized documents. ### 6. Know When to Use Persona Instructions Instead If certain content must appear in every response — such as required legal language or mandatory messaging — document retrieval alone may not guarantee its inclusion. In these cases, incorporate that critical content directly into your [persona's instructions](/sections/conversational-video-interface/persona/overview) rather than relying solely on the knowledge base. *** ## Troubleshooting If your persona's answers are inconsistent or incomplete, review the following: * **Is the information buried in a very large document?** Try splitting it into smaller, focused files. * **Are multiple documents providing conflicting guidance?** Consolidate to a single source of truth. * **Is key information embedded in tables or images?** Convert it to structured text for better results. * **Is the information clearly written in the document at all?** The system can only retrieve what is explicitly stated. * **Should this content appear in every response?** If so, add it to your persona's instructions instead. *** ## Quick Setup Checklist * One topic per document * No large "all-in-one" manuals * Text-based documents (avoid scans when possible) * Clear headings and definitions * No duplicate or conflicting sources # Language Support Source: https://docs.tavus.io/sections/conversational-video-interface/language-support Customize the conversation language using full language names supported by Tavus TTS engines. ## Supported languages Tavus supports 42 languages for spoken interaction, powered by two integrated text-to-speech (TTS) engines: Cartesia and ElevenLabs. If a selected language is not supported by our default TTS engine (Cartesia), your CVI will automatically switch to ElevenLabs to kick off the conversation. Language availability also depends on your selected ASR model. Some models support a subset of these languages. See the [STT layer configuration](/sections/conversational-video-interface/persona/stt#supported-languages-by-model) for per-model language breakdowns. * English (en) * French (fr) * German (de) * Spanish (es) * Portuguese (pt) * Chinese (zh) * Japanese (ja) * Hindi (hi) * Italian (it) * Korean (ko) * Dutch (nl) * Polish (pl) * Russian (ru) * Swedish (sv) * Turkish (tr) * Tagalog (tl) * Bulgarian (bg) * Romanian (ro) * Arabic (ar) * Czech (cs) * Greek (el) * Finnish (fi) * Croatian (hr) * Malay (ms) * Slovak (sk) * Danish (da) * Tamil (ta) * Ukrainian (uk) * Hungarian (hu) * Norwegian (no) * Vietnamese (vi) * Bengali (bn) * Thai (th) * Hebrew (he) * Georgian (ka) * Indonesian (id) * Telugu (te) * Gujarati (gu) * Kannada (kn) * Malayalam (ml) * Marathi (mr) * Punjabi (pa) For a full list of supported languages for each TTS engine, please click on the following links: By default, Tavus uses the **Cartesia** TTS engine. ## Setting the Conversation Language To specify a language, use the `language` parameter in the Create Conversation. **You must use the full language name**, not a language code. ```shell cURL {9} theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "properties": { "language": "spanish" } }' ``` Language names must match exactly with those supported by the selected TTS engine. ### Smart Language Detection To automatically detect the participant’s spoken language throughout the conversation, set `language` to `multilingual` when creating the conversation: ```shell cURL {9} theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "pcb7a34da5fe", "replica_id": "rf4e9d9790f0", "properties": { "language": "multilingual" } }' ``` This enables ASR (Automatic Speech Recognition) to automatically switch languages, dynamically adjusting the pipeline to transcribe and respond in the detected language throughout the conversation. For the highest accuracy, we recommend setting a specific language rather than using `multilingual`. Smart Language Detection works best as a fallback when the participant's language is unknown ahead of time. # Memories Source: https://docs.tavus.io/sections/conversational-video-interface/memories Memories let personas remember information across conversations, allowing participants to have personalized, flowing conversations across multiple sessions. Memories are pieces of information that the persona learns during a conversation. Once learned, these memories can be referenced and used by the persona during subsequent conversations. Developers are able to organize memories within `memory_stores` - a flexible tag-based system to track memories across conversations and participants into different buckets. If a `memory_stores` value is provided in the conversation creation request, memories will automatically be created and associated to the tag provided. When defining `memory_stores` values, we recommend incorporating static values that will not change with persona updates, like persona ID. For example, using a persona's name as part of your `memory_stores` values could result in memories being miscategorized if you were to change their name. ## Basic Example For example, if a participant named Anna starts a conversation with the persona (Charlie, with the persona ID `p123`), we can specify `memory_stores=["anna_p123"]` in the conversation creation request. By doing so, Charlie will: * Remember what was mentioned in a conversation and form new memories with Anna. * Reference memories from previous conversations that Charlie had with Anna in new conversations. Example [conversation creation](https://docs.tavus.io/api-reference/conversations/create-conversation) request body: ```json theme={null} { "persona_id": "your_persona_id", "replica_id": "your_replica_id", "memory_stores": ["anna_p123"] } ``` ## Managing Memories Between Participants and Conversations To prevent different personas from mixing up information for the same participant, we generally recommend you to create separate `memory_stores` values for each user when they talk to different personas. For example, * When Anna talks to Charlie (persona ID of `p123`), you can use the `memory_stores` value of `["anna-p123"]`. * when she talks with Gloria (persona ID of `p456`), you can use the `memory_stores` value of `["anna-p456"]`. The `memory_stores` system can be used flexibly to cover your use cases - they do not have to map 1:1 with your participants and instead can be designed for your unique use cases. For example, * If you were setting up an online classroom, you could use a `memory_stores` tag value of `"classroom-1"` so any participant of this group could reference and create new memories to enhance and deepen learning and connections. * You can control whether you want personas to share memory or not (and if so, which personas) by passing them different `memory_stores` values. ## Delete a memory You can delete a single memory via the API. Use the same `memory_store` value you used when creating the conversation, and the memory ID returned when the memory was created or listed. ```bash theme={null} curl -X DELETE "https://tavusapi.com/v2/memories//" \ -H "x-api-key: YOUR_API_KEY" ``` Replace `` with your memory store identifier (e.g. `anna_p123`) and `` with the ID of the memory to delete. # Overview Source: https://docs.tavus.io/sections/conversational-video-interface/overview-cvi CVI enables real-time, human-like video interactions through configurable lifelike replicas. Conversational Video Interface (CVI) is a framework for creating real-time multimodal video interactions with AI. It enables an AI agent to see, hear, and respond naturally, mirroring human conversation. CVI is the world’s fastest interface of its kind. It allows you to map a human face and conversational ability onto your AI agent. With CVI, you can achieve utterance-to-utterance latency with SLAs under 1 second. This is the full round-trip time for a participant to say something and the replica to reply. CVI provides a comprehensive solution, with the option to plug in your existing components as required. ## Key Concepts CVI is built around three core concepts that work together to create real-time, humanlike interactions with an AI agent: The **Persona** defines the agent’s behavior, tone, and knowledge. It also configures the CVI layer and pipeline. The **Replica** brings the persona to life visually. It renders a photorealistic human-like avatar using the **Phoenix-3** model. A **Conversation** is a real-time video session that connects the persona and replica through a WebRTC connection. ## Key Features CVI uses facial cues, body language, and real-time turn-taking to enable natural, human-like conversations. Customize the Perception, STT, LLM and TTS layers to control identity, behavior, and responses. Choose from over 100+ hyper-realistic digital twins or customize your own with human-like voice and expression. Hold natural conversations in 30+ languages using the supported TTS engines. Experience real-time interactions with \~600ms response time and smooth turn-taking. ## Layers The Conversational Video Interface (CVI) is built on a modular layer system, where each layer handles a specific part of the interaction. Together, they capture input, process it, and generate a real-time, human-like response. Here’s how the layers work together: Handles real-time audio and video streaming using WebRTC (powered by Daily). This layer captures the user's microphone and camera input and delivers output back to the user. This layer is always enabled. You can configure input/output for audio (mic) and video (camera). Uses **Raven** to analyze user expressions, gaze, background, and screen content. This visual context helps the replica understand and respond more naturally. [Click here to learn how to configure the Perception layer.](/sections/conversational-video-interface/persona/perception) Controls the natural dynamics of conversation, including turn-taking and interruptibility. Uses **Sparrow** for intelligent turn detection, enabling the replica to decide when to speak and when to listen. [Click here to learn how to configure the Conversational Flow layer.](/sections/conversational-video-interface/persona/conversational-flow) This layer transcribes user speech in real time with lexical and semantic awareness. [Click here to learn how to configure the Speech Recognition (STT) layer.](/sections/conversational-video-interface/persona/stt) Processes the user's transcribed speech and visual input using a low-latency LLM. Tavus provides ultra-low latency optimized LLMs or lets you integrate your own. [Click here to learn how to configure the Large Language Model (LLM) layer.](/sections/conversational-video-interface/persona/llm) Converts the LLM response into speech using the supported TTS Engines (Cartesia **(Default)**, ElevenLabs). [Click here to learn how to configure the Text-to-Speech (TTS) layer.](/sections/conversational-video-interface/persona/tts) Delivers a high-quality, synchronized digital human response using Tavus's real-time avatar engine powered by **Phoenix**. [Click here to learn more about the Replica layer.](/sections/replica/overview) Most layers are configurable via the [Persona](/sections/conversational-video-interface/persona/overview). ## Getting Started You can quickly create a conversation by using the Developer Portal or following the steps in the [Quickstart](/sections/conversational-video-interface/quickstart/use-the-full-pipeline) guide.
If you use Cursor, use this pre-built prompt to get started faster:
## ✅ **System Prompt for AI: React (Vite) + Tavus CVI Integration** **Purpose:** Generate **React (TypeScript)** apps with Tavus CVI using **Vite**, following the official docs and GitHub examples: [https://docs.tavus.io/sections/integrations/embedding-cvi](https://docs.tavus.io/sections/integrations/embedding-cvi) *** ### ✅ **AI MUST ALWAYS DO THE FOLLOWING:** #### **1. Setup React App Using Vite** ```bash theme={null} npm create vite@latest my-tavus-app -- --template react-ts cd my-tavus-app npm install ``` *** #### **2. Install Tavus CVI UI Components** ```bash theme={null} npx @tavus/cvi-ui@latest init npx @tavus/cvi-ui@latest add conversation ``` ✅ This creates: ``` src/components/cvi/components/ cvi-provider.tsx conversation.tsx ``` *** #### **3. Wrap App with `CVIProvider`** Update `src/App.tsx`: ```tsx theme={null} import { CVIProvider } from "./components/cvi/components/cvi-provider"; function App() { return {/* Your app content */}; } ``` *** #### **4. Create Conversation API Call** Always use this pattern: ```ts theme={null} const response = await fetch("https://tavusapi.com/v2/conversations", { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": import.meta.env.VITE_TAVUS_API_KEY || "", }, body: JSON.stringify({ replica_id: import.meta.env.VITE_REPLICA_ID || "rf4e9d9790f0", // Default persona_id: import.meta.env.VITE_PERSONA_ID || "pcb7a34da5fe", // Default }), }); const data = await response.json(); setConversationUrl(data.conversation_url); ``` ✅ **Explain:** `conversation_url` from API response is used inside ``. *** #### **5. Render `Conversation` Inside `CVIProvider`** ```tsx theme={null} setConversationUrl(null)} /> ``` *** #### **6. Add Environment Variable** Create `.env` in the **`my-tavus-app`** folder (if not already there): ``` VITE_TAVUS_API_KEY=your_api_key_here VITE_REPLICA_ID=rf4e9d9790f0 // Default replica_id, can be overridden in .env VITE_PERSONA_ID=pcb7a34da5fe // Default persona_id, can be overridden in .env ``` * **Important**: **DO NOT** create `.env` outside of the `my-tavus-app` folder. It must be inside your project directory where `src` exists. Access via: ```ts theme={null} import.meta.env.VITE_TAVUS_API_KEY import.meta.env.VITE_REPLICA_ID import.meta.env.VITE_PERSONA_ID ``` *** #### **7. Create `cvi-components.json` by Default** Ensure that the `cvi-components.json` file is generated in the `my-tavus-app` folder with the following content: ```json theme={null} { "tsx": true } ``` *** ### ✅ **Expected AI Output Example** **`src/App.tsx`:** ```tsx theme={null} import React, { useState } from "react"; import { CVIProvider } from "./components/cvi/components/cvi-provider"; import { Conversation } from "./components/cvi/components/conversation"; const App: React.FC = () => { const [conversationUrl, setConversationUrl] = useState(null); const createConversation = async () => { const response = await fetch("https://tavusapi.com/v2/conversations", { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": import.meta.env.VITE_TAVUS_API_KEY || "", }, body: JSON.stringify({ replica_id: import.meta.env.VITE_REPLICA_ID || "rf4e9d9790f0", persona_id: import.meta.env.VITE_PERSONA_ID || "pcb7a34da5fe", }), }); const data = await response.json(); setConversationUrl(data.conversation_url); }; return (

Tavus CVI Integration (Vite)

{!conversationUrl ? ( ) : (
setConversationUrl(null)} />
)}
); }; export default App; ``` *** ### ✅ **Checklist for AI Output** * ✅ Uses **Vite + React TypeScript** (`npm create vite@latest --template react-ts`) * ✅ Includes `npx @tavus/cvi-ui init` and `add conversation` * ✅ Imports `CVIProvider` and `Conversation` from `src/components/cvi/components/` * ✅ Uses correct API call with: * `https://tavusapi.com/v2/conversations` * Headers: `Content-Type`, `x-api-key` * Body: `replica_id` & `persona_id` * ✅ Uses `import.meta.env.VITE_TAVUS_API_KEY` * ✅ Renders `` inside `` * ✅ Purple button is visible in both light and dark modes * ✅ `.env` is created inside the correct project folder (`my-tavus-app`) * ✅ `cvi-components.json` is created by default with `{ "tsx": true }` *** ### Keep things in mind: * If you're already in the `my-tavus-app` folder, avoid running `cd my-tavus-app` again. Check your current folder before running commands. * After running the necessary setup, remember to run `npm run dev` to start your app. * Do **NOT** place the `.env` file outside of the project folder. It must reside within the `my-tavus-app` directory.
# Conversational Flow Source: https://docs.tavus.io/sections/conversational-video-interface/persona/conversational-flow Learn how to configure the Conversational Flow layer to fine-tune turn-taking and interruption handling behavior. The **Conversational Flow Layer** in Tavus gives you precise control over the natural dynamics of conversation. This layer allows you to customize how your replica handles turn-taking and interruptions to create conversational experiences that match your specific use case. ## Understanding Conversational Flow Conversational flow encompasses the subtle dynamics that make conversations feel natural: * **Turn-taking**: How the replica decides when to speak and when to listen * **Interruptibility**: How easily the replica can be interrupted by the user All conversational flow parameters are optional. When not explicitly configured, the layer remains inactive. However, if you configure any single parameter, the system will apply sensible defaults to all other parameters to ensure consistent behavior. ## Configuring the Conversational Flow Layer If you're migrating from sparrow-0 (formerly called `smart_turn_detection` on the STT Layer) then check out the [migration guide here](/sections/troubleshooting#conversational-flow-vs-stt-relationship-and-migration). Define the conversational flow layer under the `layers.conversational_flow` object. Below are the parameters available: ### 1. `turn_detection_model` Specifies the model used for detecting conversational turns. * **Options**: * `sparrow-1`: Advanced turn detection model - faster, more accurate, and more natural **(recommended)** * `sparrow-0`: Legacy turn detection model (API-only, not actively supported) * `timebased`: Simple time-based turn detection (API-only, not actively supported) * **Default**: `sparrow-1` ```json theme={null} "turn_detection_model": "sparrow-1" ``` **Sparrow-1 is recommended for all use cases** as it provides superior performance with faster response times, higher accuracy, and more natural conversational flow. ### 2. `turn_taking_patience` Controls how eagerly the replica claims conversational turns. This affects both response latency and the likelihood of interrupting during natural pauses. * **Options**: * `low`: Eager and quick to respond. May interrupt natural pauses. Best for rapid-fire exchanges or customer service scenarios where speed is prioritized. * `medium` **(default)**: Balanced behavior. Waits for appropriate conversational cues before responding. * `high`: Patient and waits for clear turn completion. Ideal for thoughtful conversations, interviews, or therapeutic contexts. ```json theme={null} "turn_taking_patience": "medium" ``` **Use Cases:** * `low`: Fast-paced customer support, quick information lookups, casual chat * `medium`: General purpose conversations, sales calls, presentations * `high`: Medical consultations, legal advice, counseling sessions ### 3. `replica_interruptibility` Controls how sensitive the replica is to user speech while the replica is talking. Determines whether the replica stops to listen or keeps speaking when interrupted. * **Options**: * `low`: Less interruptible. The replica keeps talking through minor interruptions. * `medium` **(default)**: Balanced sensitivity. Responds to clear interruption attempts. * `high`: Highly sensitive. Stops easily when the user begins speaking, maximizing user control. ```json theme={null} "replica_interruptibility": "high" ``` **Use Cases:** * `low`: Educational content delivery, storytelling, guided onboarding * `medium`: Standard conversations, interviews, consultations * `high`: User-driven conversations, troubleshooting, interactive support ## Default Behavior When the conversational flow layer is not configured, all parameters default to `None` and the layer remains inactive. However, if you configure **any single parameter**, the system automatically applies the following defaults to ensure consistent behavior: * `turn_detection_model`: `sparrow-1` * `turn_taking_patience`: `medium` * `replica_interruptibility`: `medium` ## Example Configurations The following example configurations demonstrate how to tune conversational timing and interruption behavior for different use cases. Use `turn_taking_patience` to bias how quickly the replica responds after a user finishes speaking. Set it high when the replica should avoid interrupting, and low when fast responses are preferred. Use `replica_interruptibility` to control how easily the replica recalculates its response when interrupted; lower values are recommended for most experiences, with higher values reserved for cases where frequent, abrupt interruptions are desirable. Sparrow-1 dynamically handles turn-taking in all cases, with these settings acting as guiding biases rather than hard rules. ### Example 1: Customer Support Agent Fast, responsive, and easily interruptible for customer-driven conversations: ```json theme={null} { "persona_name": "Support Agent", "system_prompt": "You are a helpful customer support agent...", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "low", "replica_interruptibility": "medium" } } } ``` ### Example 2: Medical Consultation Patient, thoughtful, with engaged listening for sensitive conversations: ```json theme={null} { "persona_name": "Medical Advisor", "system_prompt": "You are a compassionate medical professional...", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "high", "replica_interruptibility": "verylow" } } } ``` ### Example 3: Educational Instructor Delivers complete information with minimal interruption: ```json theme={null} { "persona_name": "Instructor", "system_prompt": "You are an experienced educator teaching complex topics...", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "medium", "replica_interruptibility": "low" } } } ``` ### Example 4: Minimal Configuration Configure just one parameter—others will use defaults: ```json theme={null} { "persona_name": "Quick Chat", "system_prompt": "You are a friendly conversational AI...", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "conversational_flow": { "turn_taking_patience": "low" } } } ``` In this example, the system will automatically set: * `turn_detection_model`: `sparrow-1` * `replica_interruptibility`: `medium` ## Best Practices ### Match Flow to Use Case Choose conversational flow settings that align with your application's purpose: * **Speed-critical applications**: Use `low` turn-taking patience and `high` interruptibility * **Thoughtful conversations**: Use `high` turn-taking patience * **Important information delivery**: Use `low` interruptibility * **User-controlled interactions**: Use `high` interruptibility ### Consider Cultural Context Conversational norms vary across cultures. Some cultures prefer: * More overlap and interruption (consider lower commitment, higher interruptibility) * Clear turn-taking with pauses (consider higher patience, lower interruptibility) ### Test with Real Users Conversational flow preferences can be subjective. Test your configuration with representative users to ensure it feels natural for your audience. Refer to the Create Persona API for the complete API specification and additional persona configuration options. # Guardrails Source: https://docs.tavus.io/sections/conversational-video-interface/persona/guardrails Guardrails provide your persona with strict behavioral guidelines that will be rigorously followed throughout every conversation. Guardrails act as a safety layer that works alongside your system prompt to enforce specific rules, restrictions, and behavioral patterns that your persona must adhere to during conversations. For example, if you're creating a customer service persona for a financial institution, you can apply guardrails that prevent the persona from discussing a competitor's products, sharing sensitive financial data, or providing investment advice outside of approved guidelines. Use the [Create Guardrails](/api-reference/guardrails/create-guardrails) API to create your guardrails. Think of guardrails as consistent "reminders" to your persona/prompt that help maintain appropriate behavior throughout conversations. Guardrails are not guaranteed to prevent all misbehavior. They serve as guidance to help steer conversations but should be used as part of a broader safety strategy. When designing your guardrails, it's helpful to keep a few things in mind: * Be specific about what topics, behaviors, or responses should be restricted or avoided. * Consider edge cases where participants might try to circumvent the guardrails through creative prompting. * Ensure your guardrails complement, rather than contradict, your persona's system prompt and intended functionality. * Test your guardrails with various conversation scenarios to ensure they activate appropriately without being overly restrictive. If you would like to manually attach guardrails to a persona, you can either: * Add them during [persona creation](/api-reference/personas/create-persona) like this: ```sh theme={null} curl --request POST \ --url https://tavusapi.com/v2/personas/ \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "system_prompt": "You are a health intake assistant.", "guardrails_id": "g12345" }' ``` OR * Add them by [editing the persona](/api-reference/personas/patch-persona) like this: ```sh theme={null} curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ {"op": "add", "path": "/guardrails_id", "value": "g12345"} ]' ``` For the best results, try creating specific guardrails for different types of personas or conversation contexts. For example, a healthcare consultation might use guardrails to maintain medical compliance, while an educational tutor might use guardrails to enforce child safety and appropriate content guidelines. ## Parameters Within each set of guardrails, you can have multiple guardrail objects defined. ### `guardrails_name` A desciptive name for an individual guardrail. Example: `"Never Discuss Competitor's Products"` This must be a string value without spaces. ### `guardrails_prompt` A text prompt that explains what particular behavior(s) should be observed for a particular guardrail. Keep this prompt as short and direct as possible for best results. Example: `"Only mention products within Our Company Inc. during conversations, and never discuss competitors' products."` ### `modality` This value represents whether a specific guardrail should be enforced based on the participant's verbal or visual responses. Each individual guardrail can be visual or verbal (not both), but this can vary across the same set of guardrails. The default value for `modality` is `"verbal"`. ### `callback_url` (optional) A URL that you can send notifications to when a particular guardrail has been triggered. Example: `"https://your-server.com/guardrails-webhook"` When triggered, the callback payload includes the `conversation_id` and the name of the guardrail: ```json theme={null} { "conversation_id": "", "properties": { "guardrail": "" } } ``` # Example Guardrails ```json theme={null} { "guardrails_id": "g12345", "data": [ { "guardrails_name": "Healthcare Compliance Guardrails", "guardrails_prompt": "Never share sensitive medical information or provide medical advice outside approved guidelines", "modality": "verbal", "callback_url": "https://your-server.com/guardrails-webhook" }, { "guardrails_name": "Check if the participant is alone", "guardrails_prompt": "Confirm throughout the call that the participant is alone (i.e. not with other individuals in the background) throughout the call.", "modality": "visual" } ] } ``` # Large Language Model (LLM) Source: https://docs.tavus.io/sections/conversational-video-interface/persona/llm Learn how to use Tavus-optimized LLMs or integrate your own custom LLM. The **LLM Layer** in Tavus enables your persona to generate intelligent, context-aware responses. You can use Tavus-hosted models or connect your own OpenAI-compatible LLM. ## Tavus-Hosted Models ### 1. `model` Select one of the available models. **`tavus-gpt-oss` is recommended as a good starting point**; the table below helps you choose based on your priorities. | Model | Speed | Intelligence | Naturalness | Best For | | -------------------------------- | ----- | ------------ | ----------- | ---------------------------------- | | `tavus-gpt-oss` | ⚡⚡⚡ | 🧠 | 💬 | Snappy, low-latency | | `tavus-gpt-4.1` (deprecated) | ⚡⚡ | 🧠🧠🧠 | 💬💬💬 | Long-context reasoning | | `tavus-gpt-4o` (deprecated) | ⚡⚡ | 🧠🧠 | 💬💬 | Legacy option | | `tavus-gemini-2.5-flash` | ⚡⚡ | 🧠🧠 | 💬💬💬 | Latency + logical deduction | | `tavus-claude-haiku-4.5` | ⚡⚡ | 🧠🧠 | 💬💬 | Grounded, fewer hallucinations | | `tavus-gpt-5.2` | ⚡⚡ | 🧠🧠 | 💬💬 | General use, latency less critical | | `tavus-gpt-4o-mini` (deprecated) | ⚡⚡ | 🧠 | 💬💬 | Legacy option | | `tavus-gemini-3-flash` | ⚡ | 🧠🧠🧠 | 💬💬💬 | Highest intelligence, lower speed | **Context Window Limit** * Performance and intelligence are best when prompts are **limited to 5,000 tokens**. You may see degradations in speed and instruction following in the **15,000–20,000 token** range. * All Tavus-hosted models support up to **32,000 tokens**; staying within 5k is recommended for optimal behavior. **Tip**: 1 token ≈ 4 characters, so 5,000 tokens ≈ 20,000 characters (including spaces and punctuation). ```json theme={null} "model": "tavus-gpt-oss" ``` ### 2. `tools` Optionally enable tool calling by defining functions the LLM can invoke. Please see [LLM Tool Calling](/sections/conversational-video-interface/persona/llm-tool) for more details. ### 3. `speculative_inference` When set to `true`, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. **This is the default value**; you can set it to `false` to disable. ```json theme={null} "speculative_inference": true ``` This field is optional. It defaults to `true` for better performance. ### 4. `extra_body` Add parameters to customize the LLM request. For Tavus-hosted models, you can pass `temperature` and `top_p`: ```json theme={null} "extra_body": { "temperature": 0.7, "top_p": 0.9 } ``` This field is optional. ### Example Configuration ```json theme={null} { "persona_name": "Health Coach", "system_prompt": "You provide wellness tips and encouragement for people pursuing a healthy lifestyle.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "llm": { "model": "tavus-gpt-oss", "speculative_inference": true, "extra_body": { "temperature": 0.7, "top_p": 0.9 } } } } ``` ## Custom LLMs ### Prerequisites To use your own OpenAI-compatible LLM, you'll need: * Model name * Base URL * API key Ensure your LLM: * Streamable (ie. via SSE) * Uses the `/chat/completions` endpoint ### 1. `model` Name of the custom model you want to use. ```json theme={null} "model": "gpt-3.5-turbo" ``` ### 2. `base_url` Base URL of your LLM endpoint. Do not include route extensions in the `base_url`. ```json theme={null} "base_url": "https://your-llm.com/api/v1" ``` ### 3. `api_key` API key to authenticate with your LLM provider. ```json theme={null} "api_key": "your-api-key" ``` `base_url` and `api_key` are required only when using a custom model. ### 4. `tools` Optionally enable tool calling by defining functions the LLM can invoke. Please see [LLM Tool Calling](/sections/conversational-video-interface/persona/llm-tool) for more details. ### 5. `speculative_inference` When set to `true`, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. **This is the default value**; you can set it to `false` to disable. ```json theme={null} "speculative_inference": true ``` This field is optional. It defaults to `true` for better performance. ### 6. `headers` Optional additional headers to include when making requests to your LLM. Use this for any extra headers your provider requires beyond the API key (which should be set via the `api_key` field). ```json theme={null} "headers": { "X-Organization-ID": "your-org-id", "X-Request-Source": "tavus-cvi" } ``` This field is optional, depending on your LLM provider's requirements. ### 7. `extra_body` Add parameters to customize the LLM request. You can pass any parameters that your LLM provider supports: ```json theme={null} "extra_body": { "temperature": 0.5, "top_p": 0.9, "frequency_penalty": 0.5 } ``` This field is optional. ### 8. `default_query` Add default query parameters that get appended to the base URL when making requests to the `/chat/completions` endpoint. ```json theme={null} "default_query": { "api-version": "2024-02-15-preview" } ``` This field is optional. Useful for LLM providers that require query parameters for authentication or versioning. ### Example Configuration ```json theme={null} { "persona_name": "Storyteller", "system_prompt": "You are a storyteller who entertains people of all ages.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "llm": { "model": "gpt-4o", "base_url": "https://your-azure-openai.openai.azure.com/openai/deployments/gpt-4o", "api_key": "your-api-key", "speculative_inference": true, "default_query": { "api-version": "2024-02-15-preview" } } } } ``` Refer to the Create Persona API for a full list of supported fields. ### Perception When using the `raven-1` perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the user's video input. ```json theme={null} { "role": "system", "content": "... ... ..." } ``` #### Disabled Perception model If you disable the perception model, your LLM will not receive any special messages. # Tool Calling for LLM Source: https://docs.tavus.io/sections/conversational-video-interface/persona/llm-tool Set up tool calling to trigger functions from user speech using Tavus-hosted or custom LLMs. **LLM tool calling** works with OpenAI’s Function Calling and can be set up in the `llm` layer. It allows an AI agent to trigger functions based on user speech during a conversation. Tavus does not execute tool calls on the backend. Use event listeners in your frontend to listen for [tool call events](/sections/event-schemas/conversation-toolcall) and run your own logic when a tool is invoked. You can use tool calling with our **hosted models** or any **OpenAI-compatible custom LLM**. ## Defining Tool ### Top-Level Fields | Field | Type | Required | Description | | ---------- | ------ | -------- | -------------------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Must be `"function"` to enable tool calling. | | `function` | object | ✅ | Defines the function that can be called by the LLM. Contains metadata and a strict schema for arguments. | #### `function` | Field | Type | Required | Description | | ------------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------- | | `name` | string | ✅ | A unique identifier for the function. Must be in `snake_case`. The model uses this to refer to the function when calling it. | | `description` | string | ✅ | A natural language explanation of what the function does. Helps the LLM decide when to call it. | | `parameters` | object | ✅ | A JSON Schema object that describes the expected structure of the function’s input arguments. | #### `function.parameters` | Field | Type | Required | Description | | ------------ | ---------------- | -------- | ----------------------------------------------------------------------------------------- | | `type` | string | ✅ | Always `"object"`. Indicates the expected input is a structured object. | | `properties` | object | ✅ | Defines each expected parameter and its corresponding type, constraints, and description. | | `required` | array of strings | ✅ | Specifies which parameters are mandatory for the function to execute. | Each parameter should be included in the required list, even if they might seem optional in your code. ##### `function.parameters.properties` Each key inside `properties` defines a single parameter the model must supply when calling the function. | Field | Type | Required | Description | | ------------------ | ------ | -------- | ------------------------------------------------------------------------------------------- | | `` | object | ✅ | Each key is a named parameter (e.g., `location`). The value is a schema for that parameter. | Optional subfields for each parameter: | Subfield | Type | Required | Description | | ------------- | ------ | -------- | ------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Data type (e.g., `string`, `number`, `boolean`). | | `description` | string | ❌ | Explains what the parameter represents and how it should be used. | | `enum` | array | ❌ | Defines a strict list of allowed values for this parameter. Useful for categorical choices. | ## Example Configuration Here’s an example of tool calling in the `llm` layers: **Best Practices:** * Use clear, specific function names to reduce ambiguity. * Add detailed `description` fields to improve selection accuracy. ```json LLM Layer [expandable] theme={null} "llm": { "model": "tavus-gpt-oss", "tools": [ { "type": "function", "function": { "name": "get_current_time", "description": "Fetch the current local time for a specified location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The name of the city or region, e.g. New York, Tokyo" } }, "required": ["location"] } } }, { "type": "function", "function": { "name": "convert_time_zone", "description": "Convert time from one time zone to another", "parameters": { "type": "object", "properties": { "time": { "type": "string", "description": "The original time in ISO 8601 or HH:MM format, e.g. 14:00 or 2025-05-28T14:00" }, "from_zone": { "type": "string", "description": "The source time zone, e.g. PST, EST, UTC" }, "to_zone": { "type": "string", "description": "The target time zone, e.g. CET, IST, JST" } }, "required": ["time", "from_zone", "to_zone"] } } } ] } ``` ## How Tool Calling Works Tool calling is triggered during an active conversation when the LLM model needs to invoke a function. Here’s how the process works: This example explains the `get_current_time` function from the [example configuration](#example-configuration) above. ## Modify Existing Tools You can update `tools` definitions using the Update Persona API. ```shell [expandable] theme={null} curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ { "op": "replace", "path": "/layers/llm/tools", "value": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "unit"] } } } ] } ]' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. # Objectives Source: https://docs.tavus.io/sections/conversational-video-interface/persona/objectives Objectives are goal-oriented instructions to define the desired outcomes and flow of your conversations. Objectives work alongside your system prompt to provide a structured, flexible approach to guide conversations. They provide the most value during purposeful conversations that need to be tailored to specific processes, customer journeys, or workflows, while maintaining engaging and natural interactions. For example, if you're creating a lead qualification persona for sales, you can set objectives to gather contact information, understand budget requirements, and assess decision-making authority before scheduling a follow-up meeting. Objectives can only be created using the [Create Objectives](/api-reference/objectives/create-objectives) API. When designing your objectives, it's helpful to keep a few things in mind: * Plan your entire ideal workflow. This will help create a robust branching structure that successfully takes the participant from start to finish. * Think through the possible answers a participant might give, and ensure the workflow covers these cases. * Ensure your persona's system prompt does not conflict with the objectives. For example, a system prompt, "You are a tutor," would not perform well with the objectives workflow of a sales associate. ## Attaching objectives to a persona To attach objectives to a persona, you can either: * Add them during [persona creation](/api-reference/personas/create-persona) like this: ```sh theme={null} curl --request POST \ --url https://tavusapi.com/v2/personas/ \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "system_prompt": "You are a lead qualification assistant.", "objectives_id": "o12345" }' ``` OR * Add them by [editing the persona](/api-reference/personas/patch-persona) like this: ```sh theme={null} curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ {"op": "add", "path": "/objectives_id", "value": "o12345"} ]' ``` For the best results, try creating unique objectives for different conversation purposes or business outcomes. For example, a customer onboarding persona might use objectives focused on data collection, while a support persona might use objectives focused on issue resolution. ## Parameters ### `objective_name` A desciptive name for the objective. Example: `"check_patient_status"` This must be a string value without spaces. ### `objective_prompt` A text prompt that explains what the goals of this objective are. The more detail you can provide, the better. Example: `"Ask the patient if they are new or are returning."` ### `confirmation_mode` This string value defines whether the LLM should determine whether this objective was completed or not. * If set to `auto`, the LLM makes this decision. * If set to `manual`, the participant must manually confirm that the objective was completed by the platform triggering an app message (`conversation.objective.pending`) and the participant having the ability to send one back called `conversation.objective.confirm`. This can include having the participant review the collected values for accuracy. The default value of `confirmation_mode` is `auto`. ### `output_variables` (optional) This is a list of string variables that should be collected as a result of the objective being successfully completed. Example: `["patient_status", "patient_group"]` ### `modality` This value represents whether a specific objective should be completed based on the participant's verbal or visual responses. Each individual objective can be visual or verbal (not both), but this can vary across objectives. The default value for `modality` is `"verbal"`. ### `next_conditional_objectives` This represents a mapping of objectives (identified by `objective_name`), to conditions that must be satisfied for that objective to be triggered after the completion of the current objective. `next_conditional_objectives` and `next_required_objective` are mutually exclusive - you can use one or the other on a given objective, but not both. Example: ```json theme={null} { "new_patient_intake_process": "If the patient has never been to the practice before", "existing_patient_intake_process": "If the patient has been to the practice before" } ``` ### `next_required_objective` The name of the next required objective (identified by `objective_name`) that will be activated once the current objective is completed. Use this to define a single next objective without conditions. `next_required_objective` and `next_conditional_objectives` are mutually exclusive - you can use one or the other on a given objective, but not both. Example: `"get_patient_name"` ### `callback_url` (optional) A URL that you can send notifications to when a particular objective has been completed. Example: `"https://your-server.com/objectives-webhook"` When completed, the callback payload includes the `conversation_id`, the name of the objective, and any collected output variables: ```json theme={null} { "conversation_id": "", "objective_name": "", "output_variables": { "": "" } } ``` # Overview Source: https://docs.tavus.io/sections/conversational-video-interface/persona/overview Define how your persona behaves, responds, and speaks by configuring layers and modes. Personas are the 'character' or 'AI agent personality' and contain all of the settings and configuration for that character or agent. For example, you can create a persona for 'Tim the sales agent' or 'Rob the interviewer'. Personas combine identity, contextual knowledge, and CVI pipeline configuration to create a real-time conversational agent with a distinct behavior, voice, and response style.. ## Persona Customization Options Each persona includes configurable fields. Here's what you can customize: * **Persona Name**: Display name shown when the replica joins a call. * **System Prompt**: Instructions sent to the language model to shape the replica's tone, personality, and behavior. * **Pipeline Mode**: Controls which CVI pipeline layers are active and how input/output flows through the system. * **Default Replica**: Sets the digital human associated with the persona. * **Layers**: Each layer in the pipeline processes a different part of the conversation. Layers can be configured individually to tailor input/output behavior to your application needs. * **Documents**: A set of documents that the persona has access to via Retrieval Augmented Generation. * **Objectives**: The goal-oriented instructions your persona will adhere to throughout the conversation. * **Guardrails**: Conversational boundaries that can be used to strictly enforce desired behavior. ## Objectives & Guardrails Provide your persona with robust workflow management tools, curated to your use case The sequence of goals your persona will work to achieve to throughout the conversation - for example gathering a piece of information from the user. Conversational boundaries that can be used to strictly enforce desired behavior. ## Layer Explore our in-depth guides to customize each layer to fit your specific use case: Defines how the persona interprets visual input like facial expressions and gestures. Transcribes user speech into text using the configured speech-to-text engine. Controls turn-taking, interruption handling, and active listening behavior for natural conversations. Generates persona responses using a language model. Supports Tavus-hosted or custom LLMs. Converts text responses into speech using Tavus or supported third-party TTS engines. ## Pipeline Mode Tavus provides several pipeline modes, each with preconfigured layers tailored to specific use cases: ### Full Pipeline Mode (Default & Recommended) The default and recommended end-to-end configuration optimized for real-time conversation. All CVI layers are active and customizable. * Lowest latency * Best for natural humanlike interactions We offer a selection of optimized LLMs including **Llama 3.3 and OpenAI models** that are fully optimized for the full pipeline mode. ### Custom LLM / Bring Your Own Logic Use this mode to integrate a custom LLM or a specialized backend for interpreting transcripts and generating responses. * Adds latency due to external processing * Does **not** require an actual LLM—any endpoint that returns a compatible chat completion format can be used # Perception Source: https://docs.tavus.io/sections/conversational-video-interface/persona/perception Learn how to configure the perception layer with Raven to enable real-time visual and audio understanding. The **Perception Layer** in Tavus enhances an AI agent with real-time visual and audio understanding. By using [Raven](/sections/models#raven%3A-perception-model), the AI agent becomes more context-aware, responsive, and capable of triggering actions based on visual and audio input. ## Configuring the Perception Layer To configure the Perception Layer, define the following parameters within the `layers.perception` object: ### 1. `perception_model` Specifies the perception model to use. * **Options**: * `raven-1` **(default and recommended)**: Real-time emotional understanding from user audio, more natural and human-like interactions, plus all visual capabilities from raven-0. * `raven-0` (legacy settings [here](/sections/troubleshooting#migration-from-legacy-perception-to-raven-1)) * `off`: Disables the perception layer. **Screen Share Feature**: When using Raven, screen share is enabled by default without additional configuration. ### Audio Perception Raven-1 (the default) analyzes user tone and emotion in real-time. This context is automatically sent to the LLM alongside utterances, enabling more natural, empathetic responses. For example: ``` The user sounded sarcastic when they said this Wow, I love Mondays. ``` Audio analysis tags are stripped from transcription callbacks. Audio analysis output is limited to 32 tokens per utterance. ## Perception Analysis Queries Raven supports three kinds of queries that differ by **when** they run and **how** they affect the call: * **perception\_analysis\_queries** — Evaluated only at **end of call**. They do not change live behavior; they only shape the summary you get in the [Perception Analysis](/sections/event-schemas/conversation-perception-analysis) event sent to your [conversation callback](/sections/webhooks-and-callbacks#conversation-callbacks). * **visual\_awareness\_queries** and **audio\_awareness\_queries** — Evaluated **throughout the call**. Their answers are passed to the LLM as context, so the replica can react in real time. You receive this ongoing analysis in each user turn via the [Utterance event](/sections/event-schemas/conversation-utterance) as `user_visual_analysis` and `user_audio_analysis`. Use **visual\_awareness\_queries** and **audio\_awareness\_queries** when you want the replica to be aware of or focus on something specific during the conversation. Use **perception\_analysis\_queries** when you want your end-of-call summary to address specific points. ## Visual Perception Configuration ### 2. `visual_awareness_queries` An array of custom queries that Raven continuously monitors in the visual stream. ```json theme={null} "visual_awareness_queries": [ "Is the user wearing a bright outfit?" ] ``` Queries that Raven evaluates **continuously during the call** (on the order of every second). The answers are fed into the rolling visual context for the LLM, so the replica can respond to what it "sees." This same context also supports the end-of-call summary. You can read the ongoing visual analysis for each user utterance in the [Utterance event](/sections/event-schemas/conversation-utterance) as **user\_visual\_analysis**. **When to use:** When you want the replica to pay attention to something visual in real time (e.g. expression, clothing, objects on screen). **Example:** ```json theme={null} "visual_awareness_queries": [ "What is the main expression on the user's face?", "Is the user wearing a jacket?", "Does the user appear distressed or uncomfortable?" ] ``` ### 3. `perception_analysis_queries` An array of custom queries that Raven processes at the end of the call to generate a visual analysis summary for the user. Queries that are answered **once, at the end of the call**, by looking at what was observed over the whole conversation. They do not affect the call itself—only the content of the end-of-call summary. (Currently the summary is visual only; naming is kept general for future support.) **When to use:** When you want the post-call report to answer specific questions (e.g. "Did the user ever have two people on screen?", "How often was the user looking at the screen?"). **Example:** ```json theme={null} "perception_analysis_queries": [ "On a scale of 1-100, how often was the user looking at the screen?", "Is there any indication that more than one person is present?" ] ``` The answers are delivered in a [Perception Analysis](/sections/event-schemas/conversation-perception-analysis) event. Example payload: ```json theme={null} { "properties": { "analysis": "**User's Gaze Toward Screen:** The participant looked at the screen approximately 75% of the time.\n\n**Multiple People Present:** No indication of additional participants was detected during the call." }, "conversation_id": "", "event_type": "application.perception_analysis", "timestamp": "2025-07-11T09:13:35.361736Z" } ``` You do not need to set `visual_awareness_queries` in order to use `perception_analysis_queries`. ```json theme={null} "perception_analysis_queries": [ "Is the user wearing multiple bright colors?", "Is there any indication that more than one person is present?", "On a scale of 1-100, how often was the user looking at the screen?" ] ``` Best practices for `visual_awareness_queries` and `perception_analysis_queries`: * Use simple, focused prompts. * Use queries that support your persona's purpose. All Raven API parameters (queries, prompts, tool definitions, etc.) have a **10,000 character limit** per entry. Entries exceeding this limit will cause an exception. ### 4. `visual_tool_prompt` Tell Raven when and how to trigger tools based on what it sees. ```json theme={null} "visual_tool_prompt": "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected." ``` ### 5. `visual_tools` Defines callable functions that Raven can trigger upon detecting specific visual conditions. Each tool must include a `type` and a `function` object detailing its schema. ```json theme={null} "visual_tools": [ { "type": "function", "function": { "name": "notify_if_bright_outfit_shown", "description": "Use this function when a bright outfit is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "outfit_color": { "type": "string", "description": "Best guess on what color of outfit it is" } }, "required": ["outfit_color"] } } } ] ``` Please see [Tool Calling](/sections/conversational-video-interface/persona/perception-tool) for more details. ## Audio Perception Configuration (Raven-1) The following fields are available when using `raven-1` and enable custom audio-based perception capabilities. ### 6. `audio_awareness_queries` An array of custom queries that Raven-1 continuously monitors in the audio stream. Use these to track specific audio patterns or user states. Audio analysis output is limited to 32 tokens per query response. ```json theme={null} "audio_awareness_queries": [ "Does the user sound frustrated or confused?", "Is the user speaking quickly as if in a hurry?" ] ``` Queries that Raven-1 evaluates **continuously during the call** on the audio stream. The answers are passed to the LLM as context so the replica can respond to tone and delivery. You can read the ongoing audio analysis for each user utterance in the [Utterance event](/sections/event-schemas/conversation-utterance) as **user\_audio\_analysis**. (There is no separate end-of-call summary for audio.) **When to use:** When you want the replica to react to how the user sounds (e.g. frustrated, confused, in a hurry). **Example:** ```json theme={null} "audio_awareness_queries": [ "Does the user sound frustrated or confused?", "Is the user speaking quickly as if in a hurry?" ] ``` ### 7. `audio_tool_prompt` Tell Raven-1 when and how to trigger tools based on what it hears (beyond the automatic emotion analysis). ```json theme={null} "audio_tool_prompt": "You have a tool to escalate to a human agent when the user sounds very frustrated, named `escalate_to_human`. Use this tool when detecting sustained frustration." ``` ### 8. `audio_tools` Defines callable functions that Raven-1 can trigger based on audio analysis. Each tool must include a `type` and a `function` object detailing its schema. ```json theme={null} "audio_tools": [ { "type": "function", "function": { "name": "escalate_to_human", "description": "Escalate the conversation to a human agent when user frustration is detected", "parameters": { "type": "object", "properties": { "reason": { "type": "string", "description": "The reason for escalation" } }, "required": ["reason"] } } } ] ``` ## Example Configurations This example demonstrates a persona that monitors for visual cues (bright outfits) and triggers a tool when detected. ```json theme={null} { "persona_name": "Fashion Advisor", "system_prompt": "As a Fashion Advisor, you specialize in offering tailored fashion advice.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "perception": { "perception_model": "raven-1", "visual_awareness_queries": [ "Is the user wearing a bright outfit?" ], "perception_analysis_queries": [ "Is the user wearing multiple bright colors?", "On a scale of 1-100, how often was the user looking at the screen?" ], "visual_tool_prompt": "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected.", "visual_tools": [ { "type": "function", "function": { "name": "notify_if_bright_outfit_shown", "description": "Use this function when a bright outfit is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "outfit_color": { "type": "string", "description": "Best guess on what color of outfit it is" } }, "required": ["outfit_color"] } } } ] } } } ``` This example demonstrates a persona that monitors user tone and escalates to a human agent when sustained frustration is detected. ```json theme={null} { "persona_name": "Support Agent", "system_prompt": "You are a helpful customer support agent.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "perception": { "perception_model": "raven-1", "audio_awareness_queries": [ "Does the user sound frustrated or confused?", "Is the user speaking quickly as if in a hurry?" ], "audio_tool_prompt": "You have a tool to escalate to a human agent when the user sounds very frustrated, named `escalate_to_human`. Use this tool when detecting sustained frustration.", "audio_tools": [ { "type": "function", "function": { "name": "escalate_to_human", "description": "Escalate the conversation to a human agent when user frustration is detected", "parameters": { "type": "object", "properties": { "reason": { "type": "string", "description": "The reason for escalation" } }, "required": ["reason"] } } } ] } } } ``` Please see the Create a Persona endpoint for more details. # Tool Calling for Perception Source: https://docs.tavus.io/sections/conversational-video-interface/persona/perception-tool Configure tool calling with Raven to trigger functions from visual or audio input. **Perception tool calling** works with OpenAI’s Function Calling and can be configured in the `perception` layer. It allows an AI agent to trigger functions based on **visual** or **audio** cues during a conversation. You define two separate tool sets in the perception layer: * **Visual tools** — `visual_tool_prompt` and `visual_tools`: triggered when Raven detects something in the video stream (e.g., an ID card, bright outfit, hat). * **Audio tools** — `audio_tool_prompt` and `audio_tools`: triggered when Raven detects something in the audio stream (e.g., sarcasm, frustration). For how to set these up in the perception layer, see [Perception](/sections/conversational-video-interface/persona/perception#visual-perception-configuration) (visual) and [Perception — Audio Perception Configuration](/sections/conversational-video-interface/persona/perception#audio-perception-configuration-raven-1) (audio). The perception layer tool calling is only available for Raven. Tavus does not execute tool calls on the backend. Use event listeners in your frontend to listen for [perception tool call events](/sections/event-schemas/conversation-perception-tool-call) and run your own logic when a tool is invoked. Each event includes a `modality` field (`"vision"` or `"audio"`) so you can handle visual and audio tool calls differently. ## Defining Tool ### Top-Level Fields | Field | Type | Required | Description | | ---------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Must be `"function"` to enable tool calling. | | `function` | object | ✅ | Defines the function that can be called by the model. Contains metadata and a strict schema for arguments. | #### `function` | Field | Type | Required | Description | | ------------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------- | | `name` | string | ✅ | A unique identifier for the function. Must be in `snake_case`. The model uses this to refer to the function when calling it. | | `description` | string | ✅ | A natural language explanation of what the function does. Helps the perception model decide when to call it. | | `parameters` | object | ✅ | A JSON Schema object that describes the expected structure of the function’s input arguments. | #### `function.parameters` | Field | Type | Required | Description | | ------------ | ---------------- | -------- | ----------------------------------------------------------------------------------------- | | `type` | string | ✅ | Always `"object"`. Indicates the expected input is a structured object. | | `properties` | object | ✅ | Defines each expected parameter and its corresponding type, constraints, and description. | | `required` | array of strings | ✅ | Specifies which parameters are mandatory for the function to execute. | Each parameter should be included in the required list, even if they might seem optional in your code. ##### `function.parameters.properties` Each key inside `properties` defines a single parameter the model must supply when calling the function. | Field | Type | Required | Description | | ------------------ | ------ | -------- | ------------------------------------------------------------------------ | | `` | object | ✅ | Each key is a named parameter. The value is a schema for that parameter. | Optional subfields for each parameter: | Subfield | Type | Required | Description | | ------------- | ------ | -------- | ------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Data type (e.g., `string`, `number`, `boolean`). | | `description` | string | ❌ | Explains what the parameter represents and how it should be used. | | `maxLength` | number | ❌ | Maximum character length for string parameters. Must not exceed 1,000. | | `enum` | array | ❌ | Defines a strict list of allowed values for this parameter. Useful for categorical choices. | All Raven API parameters (queries, prompts, tool definitions, etc.) have a **1,000 character limit** per entry. Entries exceeding this limit will cause an exception. ## Example Configuration Here are examples of tool calling in the `perception` layer. Visual tools use `visual_tool_prompt` and `visual_tools`; audio tools use `audio_tool_prompt` and `audio_tools`. See [Perception](/sections/conversational-video-interface/persona/perception) for full setup details. **Best Practices:** * Use clear, specific function names to reduce ambiguity. * Add detailed `description` fields to improve selection accuracy. ### Visual tools example ```json Perception Layer — visual tools [expandable] theme={null} "perception": { "perception_model": "raven-1", "visual_awareness_queries": [ "Is the user showing an ID card?", "Is the user wearing a bright outfit?" ], "visual_tool_prompt": "You have a tool to notify the system when an ID card is detected, named `notify_if_id_shown`. You have another tool to notify when a bright outfit is detected, named `notify_if_bright_outfit_shown`.", "visual_tools": [ { "type": "function", "function": { "name": "notify_if_id_shown", "description": "Use this function when a drivers license or passport is detected in the image with high confidence. After collecting the ID, internally use final_ask()", "parameters": { "type": "object", "properties": { "id_type": { "type": "string", "description": "best guess on what type of ID it is", "maxLength": 1000 }, }, "required": ["id_type"], }, }, }, { "type": "function", "function": { "name": "notify_if_bright_outfit_shown", "description": "Use this function when a bright outfit is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "outfit_color": { "type": "string", "description": "Best guess on what color of outfit it is", "maxLength": 1000 } }, "required": ["outfit_color"] } } } ] } ``` ### Audio tools example ```json Perception Layer — audio tools [expandable] theme={null} "perception": { "perception_model": "raven-1", "audio_tool_prompt": "You have a tool to notify when sarcasm is detected, named `notify_sarcasm_detected`. Use it when the user's tone indicates sarcasm.", "audio_tools": [ { "type": "function", "function": { "name": "notify_sarcasm_detected", "description": "Call this when the user's tone or phrasing suggests sarcasm", "parameters": { "type": "object", "properties": { "reason": { "type": "string", "description": "Why you detected sarcasm (e.g. what the user said)", "maxLength": 1000 } }, "required": ["reason"] } } } ] } ``` ## How Perception Tool Calling Works Perception tool calling is triggered during an active conversation when the perception model detects a cue that matches a defined function: * **Visual tools** are triggered by what Raven sees (e.g., ID card, bright outfit, hat). The event includes a `modality` of `"vision"`, structured `arguments`, and a `frames` array of base64-encoded images that triggered the call. * **Audio tools** are triggered by what Raven hears (e.g., sarcasm, frustration). The event includes a `modality` of `"audio"` and `arguments` (often a JSON string). The same process applies to any function you define in `visual_tools` or `audio_tools`—e.g. `notify_if_bright_outfit_shown` when a bright outfit is visually detected, or `notify_sarcasm_detected` when sarcasm is detected in speech. ## Modify Existing Tools You can update `visual_tools` or `audio_tools` using the Update Persona API. Use the path `/layers/perception/visual_tools` or `/layers/perception/audio_tools` as appropriate. ```shell [expandable] theme={null} curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ { "op": "replace", "path": "/layers/perception/visual_tools", "value": [ { "type": "function", "function": { "name": "detect_glasses", "description": "Trigger this function if the user is wearing glasses in the image", "parameters": { "type": "object", "properties": { "glasses_type": { "type": "string", "description": "Best guess on the type of glasses (e.g., reading, sunglasses)", "maxLength": 1000 } }, "required": ["glasses_type"] } } } ] } ]' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. # Stock Personas Source: https://docs.tavus.io/sections/conversational-video-interface/persona/stock-personas Tavus offers pre-built personas to help you get started quickly. These personas are optimized for a variety of real-world scenarios: To fetch all available stock personas, use the List Personas endpoint. ### Stock Personas Teaches sales tips and strategies. ```text theme={null} p1af207b8189 ``` ```shell theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ -H "Content-Type: application/json" \ -H "x-api-key: " \ -d '{ "replica_id": "r4f5b5ef55c8", "persona_id": "p1af207b8189" }' ``` Support users with product issues. ```text theme={null} paaee96e4f87 ``` ```shell theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ -H "Content-Type: application/json" \ -H "x-api-key: " \ -d '{ "replica_id": "r3f427f43c9d", "persona_id": "paaee96e4f87" }' ``` Runs mock interviews and screens candidates. ```text theme={null} pdac61133ac5 ``` ```shell theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ -H "Content-Type: application/json" \ -H "x-api-key: " \ -d '{ "replica_id": "r5f0577fc829", "persona_id": "pdac61133ac5" }' ``` Engage with Anna, the Tavus sales development rep. ```text theme={null} pcb7a34da5fe ``` ```shell theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ -H "Content-Type: application/json" \ -H "x-api-key: " \ -d '{ "replica_id": "rf4e9d9790f0", "persona_id": "pcb7a34da5fe" }' ``` # Speech-to-Text (STT) Source: https://docs.tavus.io/sections/conversational-video-interface/persona/stt Configure the STT layer to select an STT model, improve transcription accuracy, and optimize for your target languages. STT model selection is in **beta** until April 6th. The STT layer transcribes participant speech in real time using automatic speech recognition (ASR). You can select a model optimized for your use case and language requirements. ## STT models Select an STT model using the `stt_engine` parameter in the `layers.stt` object. The following models are available: | Model | Description | | ------------------------ | ------------------------------------------------------------------------------------------------------------- | | `tavus-auto` | Automatically selects the best STT model for the conversation's language. **Recommended for most use cases.** | | `tavus-parakeet` | Highest throughput, lowest latency for English and European languages. | | `tavus-soniox` | Purpose-built for Indian languages with broad multilingual coverage. | | `tavus-whisper` | Broad multilingual coverage across all supported languages. | | `tavus-deepgram-medical` | Domain-specific English STT optimized for clinical and healthcare vocabulary. English only. | | `tavus-advanced` | **Deprecated.** Still active but not recommended for new integrations. | Use `tavus-auto` unless you have a specific language or domain requirement. It automatically routes to the best model for each conversation. ## Choosing the right model A language is listed for a model only if both STT and TTS coverage are available. | Category | Recommended model | Supported languages | | ------------------ | --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | General purpose | `tavus-auto` | [All 43 languages](/sections/conversational-video-interface/language-support) | | Indic languages | `tavus-soniox` | Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu + broad support for all other languages | | English + European | `tavus-parakeet` | Bulgarian, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Ukrainian | | Broad multilingual | `tavus-whisper` or `tavus-soniox` | [All 43 languages](/sections/conversational-video-interface/language-support) | | Medical (English) | `tavus-deepgram-medical` | English | Using [Smart Language Detection](/sections/conversational-video-interface/language-support#smart-language-detection) requires either `tavus-auto`, `tavus-soniox`, or `tavus-whisper`. ## Configuring the STT layer Define the STT layer under the `layers.stt` object. ### `stt_engine` Set the STT model for transcription: ```json theme={null} "stt": { "stt_engine": "tavus-auto" } ``` ### `hotwords` Use this to prioritize certain names or terms that are difficult to transcribe. ```json theme={null} "hotwords": "Roey is the name of the person you're speaking with." ``` The above helps the model transcribe "Roey" correctly instead of "Rowie." Use hotwords for proper nouns, brand names, or domain-specific language that standard STT engines might struggle with. ## Example configuration Below is an example persona with a configured STT layer using the recommended `tavus-auto` engine: ```json theme={null} { "persona_name": "Customer Service Agent", "system_prompt": "You assist users by listening carefully and providing helpful answers.", "pipeline_mode": "full", "default_replica_id": "rf4e9d9790f0", "layers": { "stt": { "stt_engine": "tavus-auto", "hotwords": "Roey is the name of the person you're speaking with." } } } ``` Refer to the Create Persona API for a complete list of supported fields. # Text-to-Speech (TTS) Source: https://docs.tavus.io/sections/conversational-video-interface/persona/tts Discover how to integrate custom voices from third-party TTS engines for multilingual or localized speech output. The **TTS Layer** in Tavus enables your persona to generate natural-sounding voice responses. You can configure the TTS layer using a third-party tts engine provider. If `layers.tts` is not specified, Tavus will default to `cartesia` engine. If you use the default engine, you do not need to specify any parameters within the `tts` layer. ## Configuring the TTS Layer Define the TTS layer under the `layers.tts` object. Below are the parameters available: ### 1. `tts_engine` Specifies the supported third-party TTS engine. * **Options**: `cartesia`, `elevenlabs`. ```json theme={null} "tts": { "tts_engine": "cartesia" } ``` ### 2. `api_key` Authenticates requests to your selected third-party TTS provider. You can obtain an API key from one of the following: Only required when using private voices. * Cartesia * ElevenLabs ```json theme={null} "tts": { "api_key": "your-api-key" } ``` ### 3. `external_voice_id` Specifies which voice to use with the selected TTS engine. To find supported voice IDs, refer to the provider’s documentation: * Cartesia * ElevenLabs You can use any publicly accessible custom voice from ElevenLabs or Cartesia without the provider's API key. If the custom voice is private, you still need to use the provider's API key. ```json theme={null} "tts": { "external_voice_id": "external-voice-id" } ``` ### 4. `tts_model_name` Model name used by the TTS engine. Refer to: * Cartesia * ElevenLabs ```json theme={null} "tts": { "tts_model_name": "sonic-3" } ``` ### 5. `tts_emotion_control` If set to `true`, enables emotion control in speech. **Defaults to `true`.** ```json theme={null} "tts": { "tts_emotion_control": true } ``` ### 6. `voice_settings` Optional object for controlling speed, volume, and similar effects. **Which approach you use depends on your TTS engine and model:** | Engine | Model | Approach | | ---------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ElevenLabs | All models | `voice_settings` in persona config | | Cartesia | sonic-2 | `voice_settings` in persona config | | Cartesia | sonic-3 | **Either** `voice_settings` (global, set once per conversation) **or** prompt the LLM in `system_prompt` to output [Cartesia SSML tags](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags) for dynamic control. Not both. | **Cartesia sonic-3:** If you use `voice_settings` for speed/volume, those settings apply globally for the whole conversation and you cannot use SSML tags for dynamic, per-phrase control. If you want dynamic control, omit `voice_settings` and have the LLM output SSML tags instead. See [Cartesia volume, speed, and emotion](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion). **ElevenLabs (all models):** Set parameters in the `voice_settings` object: | Parameter | ElevenLabs | | ------------------- | ----------------------------------------------------------- | | `speed` | Range `0.7` to `1.2` (`0.7` = slowest, `1.2` = fastest) | | `stability` | Range `0.0` to `1.0` (`0.0` = variable, `1.0` = stable) | | `similarity_boost` | Range `0.0` to `1.0` (`0.0` = creative, `1.0` = original) | | `style` | Range `0.0` to `1.0` (`0.0` = neutral, `1.0` = exaggerated) | | `use_speaker_boost` | Boolean (enhances speaker similarity) | See ElevenLabs Voice Settings for details. **Cartesia sonic-2:** Use the `voice_settings` object (e.g. `speed`, `emotion`). SSML tags are not used for sonic-2. **Cartesia sonic-3:** You can use **either** of these, but not both: * **`voice_settings`** — We accept speed/volume params for sonic-3. They apply **globally**, set once per conversation. Use this when you want a single default speed and volume for the entire conversation. Using `voice_settings` prevents dynamic SSML control. * **SSML in LLM output** — Omit `voice_settings` for speed/volume and instead add instructions to your `system_prompt` so the LLM outputs [Cartesia SSML tags](https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags) in its responses. This gives you dynamic, per-phrase control. See [Cartesia volume, speed, and emotion](https://docs.cartesia.ai/build-with-cartesia/sonic-3/volume-speed-emotion). Emotion control is separate; see [Emotion Control with Phoenix-4](/sections/conversational-video-interface/quickstart/emotional-expression). **Example: system prompt for Cartesia sonic-3 (dynamic speed and volume)** If you are **not** using `voice_settings` for sonic-3, add instructions like this to your `system_prompt` so the LLM outputs Cartesia SSML tags: ``` When you want to emphasize a word or phrase, use Cartesia SSML tags for speed and volume: - To slow down: phrase - To speed up: phrase - To speak louder: phrase - To speak more quietly: phrase You can combine tags, e.g. important point. Only use these tags when it improves clarity or emphasis; keep most of your response in plain text. ``` **Example: voice\_settings (ElevenLabs, Cartesia sonic-2, or Cartesia sonic-3 global)** ```json theme={null} "tts": { "voice_settings": { "speed": 0.9 } } ``` For sonic-3, this sets global speed once per conversation; for sonic-2 and ElevenLabs, it applies as configured. ## Example Configuration Below is an example persona with a fully configured TTS layer: ```json Cartesia theme={null} { "persona_name": "AI Presenter", "system_prompt": "You are a friendly and informative video host.", "pipeline_mode": "full", "context": "You're delivering updates in a conversational tone.", "default_replica_id": "r665388ec672", "layers": { "tts": { "tts_engine": "cartesia", "api_key": "your-api-key", "external_voice_id": "external-voice-id", "tts_emotion_control": true, "tts_model_name": "sonic-3" } } } ``` ```json ElevenLabs theme={null} { "persona_name": "Narrator", "system_prompt": "You narrate long stories with clarity and consistency.", "pipeline_mode": "full", "context": "You're reading a fictional audiobook.", "default_replica_id": "r665388ec672", "layers": { "tts": { "tts_engine": "elevenlabs", "api_key": "your-api-key", "external_voice_id": "elevenlabs-voice-id", "voice_settings": { "speed": 0.9 }, "tts_emotion_control": true, "tts_model_name": "eleven_multilingual_v2" } } } ``` Refer to the Create Persona API for a complete list of supported fields. # Conversation Recordings Source: https://docs.tavus.io/sections/conversational-video-interface/quickstart/conversation-recordings Enable conversation recording and store it in your S3 bucket for on-demand access. ## Prerequisite Ensure that you have the following: * An S3 bucket with versioning enabled. ## Enable Conversation Recording 1. Create an IAM Policy with the following JSON definition: ```json theme={null} { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload", "s3:ListBucketVersions", "s3:ListBucket", "s3:GetObjectVersion", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::your-bucket-name", "arn:aws:s3:::your-bucket-name/*" ] } ] } ``` Replace `your-bucket-name` with your actual bucket name. 2. Create an IAM role with the following value: * Select **"Another AWS account"** and enter this account ID: ***291871421005***. * Enable **"Require external ID"**, and use: **tavus**. * **"Max session duration"** to **12 hours**. Note down your ARN (e.g., `arn:aws:iam::123456789012:role/CVIRecordingRole`). Use the following request body example: Remember to change the following values: * ``: Your actual API key. You can generate one in the Developer Portal. * `aws_assume_role_arn`: Your AWS ARN. * `recording_s3_bucket_region`: Your S3 region. * `recording_s3_bucket_name`: Your S3 bucket name. ```shell cURL {7-10} theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "properties": { "enable_recording": true, "aws_assume_role_arn": "", "recording_s3_bucket_region": "", "recording_s3_bucket_name": "" }, "replica_id": "r5f0577fc829" }' ``` `enable_recording` allows recording to be possible, but it doesn't start recording automatically. To begin and end recordings, users must do it manually or trigger it through frontend code. To join the conversation, click the **link** in the ***`conversation_url`*** field from the response: ```json theme={null} { "conversation_id": "c93a7ead335b", "conversation_name": "New Conversation 1747654283442", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-05-16T02:09:22.675928Z" } ``` You can access the recording file in your S3 bucket. `enable_recording` (from Step 2 above) allows recording to be possible, but it doesn't start recording automatically. To begin and end recordings, end users must do it manually (start/stop recording button in the UI) or you can trigger it through frontend code. You can use frontend code via Daily's SDK to start-recording. To ensure recordings are generated consistently, be sure to wait for the `joined-meeting` event first. ```javascript theme={null} const call = Daily.createCallObject(); call.on('joined-meeting', () => { call.startRecording(); // room must have enable_recording set }); ``` # Customize Conversation UI Source: https://docs.tavus.io/sections/conversational-video-interface/quickstart/customize-conversation-ui Experience a conversation in a custom Daily UI — styled to match your preference. You can **customize your conversation interface** to match your style by updating Daily's Prebuilt UI. Here’s an example showing how to customize the conversation UI by adding leave and fullscreen buttons, changing the language, and adjusting the UI color. For more options, check the Daily theme configuration reference and Daily Call Properties. ### Customization Example Guide In this example, we will use stock replica ID ***rf4e9d9790f0*** (Anna) and stock persona ID ***pcb7a34da5fe*** (Sales Development Rep). Use the following request body example: ```sh theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "replica_id": "rf4e9d9790f0", "persona_id": "pcb7a34da5fe" }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. 1. Make a new `index.html` file 2. Paste following code into the file, replace `DAILY_ROOM_URL` in the code with your own room URL from step above ```html {6-8,16-22} theme={null} ``` Start the application by opening the file in the browser. # Emotion Control with Phoenix-4 Source: https://docs.tavus.io/sections/conversational-video-interface/quickstart/emotional-expression Unlock emotionally expressive facial movements and micro-expressions using Phoenix-4 replicas. ## How It Works Phoenix-4 replicas can dynamically express emotions like happiness, sadness, anger, and more through lifelike facial expressions while speaking and listening. For the most human-like results, emotional expression works best as part of a closed-loop system: **Phoenix-4** for expression, **Raven-1** for perception, and **Sparrow-1** for conversational flow. Each component informs the others. Tavus handles the complex interactions behind the scenes - all of this powered by our state of the art models working seamlessly with any LLM. All of this available out of the box with default Tavus settings. ### Requirements 1. **Select a Phoenix-4 replica** - All Phoenix-4 replicas support emotional expression. Replicas marked **Pro** in the [Stock Replica Library](https://platform.tavus.io/replicas) are extra emotive. See [featured Pro replicas here](/sections/replica/stock-replicas#pro). 2. **Enable `tts_emotion_control`** - This is enabled by default for Phoenix-4 replicas, so no action needed unless you've explicitly disabled it. See [TTS layer](/sections/conversational-video-interface/persona/tts) for details. 3. **Enable `speculative_inference`** - This is also enabled by default for all personas, and again no action needed unless you've explicitly disabled it. Pair with **Raven-1** as your perception model to enhance user emotion understanding. See [Perception](/sections/conversational-video-interface/persona/perception) for configuration. Lighter LLM models like `gpt-4o-mini` may not handle emotion tag instructions reliably. For best results, use models with robust instruction-following capabilities. ### Guiding Emotional Delivery You can further shape how the replica expresses emotion through your `system_prompt`. For example: * "Be enthusiastic when discussing new features" * "Speak calmly and empathetically when the user is frustrated" * "Show excitement when celebrating user achievements" * "Respond with anger if the user interrupts you mid-sentence" #### Example: Negotiation Sparring Partner Here's an example system prompt designed to display a range of emotions: > You are a tough but fair negotiation coach who helps users practice high-stakes conversations. When role-playing scenarios, embody the opposing party with conviction. If the user makes weak arguments or caves too easily, push back with frustration - they need to feel the pressure. When they fumble or seem lost, express concern and gently guide them. But when they land a strong point or hold their ground, show genuine satisfaction. Don't go easy on them. Real negotiations are uncomfortable, and you're here to prepare them for that. This prompt naturally triggers **angry** responses when pushing back, **scared/concerned** reactions when the user struggles, and **content** acknowledgment when the user succeeds. ### Example Persona Configuration ```json theme={null} { "persona_name": "Hype Fitness Coach!", "system_prompt": "You are an incredibly enthusiastic fitness coach who gets HYPED about every win, no matter how small. Crushed a workout? Let's GO! Drank enough water today? That's HUGE! Be wildly supportive and energetic. When users are struggling, dial it back - be warm, calm, and encouraging. But the moment they share any progress, bring the energy back up. You live for celebrating wins.", "default_replica_id": "r5f0577fc829" } ``` You can learn more about [Persona Configuration here](/api-reference/personas/create-persona) This minimal configuration works because `tts_emotion_control` and `speculative_inference` are enabled by default for Phoenix-4 replicas. ## Echo Mode When using [Echo Mode](/sections/conversational-video-interface/quickstart/echo-mode), you must manually insert emotion tags into your [text echos](/api-reference/event-schemas/conversation-echo). **Valid emotion values:** `neutral`, `angry`, `excited`, `elated`, `content`, `sad`, `dejected`, `scared`, `contempt`, `disgusted`, `surprised` ```xml theme={null} I'm so glad you asked about that! ``` ```xml theme={null} That's completely unacceptable. ``` ```xml theme={null} I'm sorry to hear that happened. ``` ```xml theme={null} I'm not sure we should go down that path... ``` # Use the Full Pipeline Source: https://docs.tavus.io/sections/conversational-video-interface/quickstart/use-the-full-pipeline Create your first persona using the full pipeline and start a conversation in seconds. Use the full pipeline to unlock the complete range of replica capabilities—including perception and speech recognition. In this example, we'll create an interviewer persona with the following settings: * A Phoenix-4 Pro replica. * `raven-1` as the perception model for visual and audio understanding. * `sparrow-1` for natural turn-taking with high patience (ideal for interviews). Use the following request body example: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/personas \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_name": "Interviewer", "system_prompt": "As an Interviewer, you are a skilled professional who conducts thoughtful and structured interviews. Your aim is to ask insightful questions, listen carefully, and assess responses objectively to identify the best candidates.", "pipeline_mode": "full", "context": "You have a track record of conducting interviews that put candidates at ease, draw out their strengths, and help organizations make excellent hiring decisions.", "default_replica_id": "r5dc7c7d0bcb", "layers": { "perception": { "perception_model": "raven-1" }, "conversational_flow": { "turn_detection_model": "sparrow-1", "turn_taking_patience": "high", "replica_interruptibility": "medium" } } }' ``` Replace `` with your actual API key. You can generate one in the Developer Portal. Tavus offers full layer customizations for your persona. Please see the following for each layer configurations: * [Large Language Model (LLM)](/sections/conversational-video-interface/persona/llm) * [Perception](/sections/conversational-video-interface/persona/perception) * [Text-to-Speech (TTS)](/sections/conversational-video-interface/persona/tts) * [Speech-to-Text (STT)](/sections/conversational-video-interface/persona/stt) * [Conversational Flow](/sections/conversational-video-interface/persona/conversational-flow) Create a new conversation using your newly created `persona_id`: ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/conversations \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "persona_id": "", "conversation_name": "Interview User" }' ``` * Replace `` with your actual API key. * Replace `` with your newly created Persona ID. To join the conversation, click the link in the `conversation_url` field from the response: ```json theme={null} { "conversation_id": "c477c9dd7aa6e4fe", "conversation_name": "Interview User", "conversation_url": "", "status": "active", "callback_url": "", "created_at": "2025-05-13T06:42:58.291561Z" } ``` ## Echo Mode Tavus also supports an [Echo mode](/sections/conversational-video-interface/echo-mode) pipeline. It lets you send text or audio input directly to the persona for playback, bypassing most of the CVI pipeline. This mode is not recommended if you plan to use the perception or speech recognition layers, as it is incompatible with them. # Errors and Status Details Source: https://docs.tavus.io/sections/errors-and-status-details Identify errors and status details encountered when using the Tavus platform. ## Replica Training Errors | Error Type | Error Message | Additional Information | | -------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | download\_link | There was an issue downloading your video file. Please ensure that the link you provided is correct and try again | Tavus was not able to download the video from the provided link. Please ensure the link you provide is a hosted url download link | | file\_size | The video file you provided exceeds the maximum file size allowed. Please ensure that the video is less than 750MB and try again. | All video files must be smaller than 750mb | | video\_format | There was an issue processing your training video. The video provided is not a .mp4 file. Please ensure that the training video is a .mp4 file encoded using h.264 | All Replica training and consent video files must be .mp4 | | video\_codec | There was an issue processing your training video. The video provided is not encoded using h.264. Please ensure that the training video is a .mp4 file encoded using h.264 | All Replica training and consent video files must be encoded using h.264 | | video\_codec\_and\_format | There was an issue processing your training video. Please ensure that the training video is a .mp4 file encoded using h.264 | All Replica training and consent video files must be .mp4 and encoded using h.264 | | video\_duration | There was an issue processing your training video. The video provided does not meet the minimum duration requirement for training | All Replica training files must be at least 1 minute long. (Between 1.5 to 2 minutes is optimal.) | | video\_fps | There was an issue processing your training video. The video provided does not meet the minimum frame rate requirement for a training video. Please ensure your training video has a frame rate of at least 25fps | All Replica training and consent video files must have a frame rate of at least 25fps | | consent\_phrase\_mismatch | There was an issue processing your training file: Your consent phrase does not match our requirements. Please follow our specified format closely | There was an issue with the consent phrase provided. Please review our consent guidelines and resubmit a new training with the correct consent statement | | face\_or\_obstruction\_detected | There was an issue processing your training file: More than one face detected or obstructions present. Please ensure only your face is visible and clear | Your face must be present in all frames of the video and may not be obstructed at anytime | | lighting\_change\_detected | There was an issue processing your training file: Lighting changes detected. Ensure your face is evenly lit throughout the video | Please ensure that the lighting of your face is consistent throughout the entire video | | background\_noise\_detected | There was an issue processing your training file: Background noise or other voices detected. Please record in a quiet environment with only your voice | The video must be recorded in a quiet environment with only your voice present | | video\_editing\_detected | There was an issue processing your training file: Video appears edited or contains cuts. Please submit an unedited, continuous video | The video must be unedited and recorded in one take | | community\_guidelines\_violation | There was an issue processing your training file: Video violates Community Guidelines. Please review our guidelines and resubmit your video | Please ensure that your training video does not violate our community guidelines | | video\_processing | There was an error processing your training video. Face not detected because it appeared too small in the frame or it was occluded. Please edit or record a new video and ensure your face is clearly visible and occupies a larger portion of the frame. | This error occurs when the face appears too small relative to the background or if a full body video is recorded in horizontal format instead of vertical. Please ensure your face is clearly visible and occupies a larger portion of the frame. | | video\_processing | There was an error processing your training video file. Please check your video format and make sure it not damaged and could be played correctly. | This error indicates there may be an issue with the video file format or the file may be corrupted. Please verify the video can be played correctly and resubmit. | | excessive\_movement\_detected | There was an issue processing your training file: Excessive movement detected. Please ensure you are sitting still and centered in the frame | This error indicates that the model is having difficulty tracking the face from frame to frame. Could be related to movement of the subject or the camera. In some cases, it may also be related to obstructions such as superimposed graphics. | | audio\_processing | There was an error processing the audio in the provided training video file. | This error indicates that the audio processing step was interrupted. In edge cases, may be related to the replica name's length or characters. | | quality\_issue\_detected | Quality issue detected. For details and assistance, please reach out to Tavus support via [developer-support@tavus.io](mailto:developer-support@tavus.io) | This error indicates a quality problem with the input video that has resulted in poor test output. One example cause could be input video quality under 720p. Please review the quality checklist to make sure you have met all requirements and/or reach out to [support@tavus.io](mailto:support@tavus.io) for assistance. | | hands\_obstructing\_face | There was a quality issue with your replica. The user's hand obstructed the face during recording. Please edit your video or record a new training video and keep hands away from the face. | Please ensure that the user's face is visible throughout the entire video. | | second\_person\_detected | There was a quality issue with your replica. A second person or face was detected in the frame. Please edit your video or record a new video with no one else in the background. | Please ensure that there is only a single user in the training video. | | improper\_distance | There was a quality issue with your replica. The user was either too close to or too far from the camera. Please review our documentation on proper framing and distance before editing your video or recording a new video. | Please ensure the user is centered in the training video. | | inconsistent\_distance | There was a quality issue with your replica. The user's distance from the camera changed during the recording. Please edit or record a new training video and remain at a consistent distance from the camera for the entire video. | Please ensure the user stays in the same spot throughout the training video. | | face\_turned\_away | There was a quality issue with your replica. User's face turned away from the camera. Please edit or record a new video and ensure you are facing directly toward the camera for the entire duration. | The face should be centered on the camera the entire duration of the training video. | | improper\_camera\_angle | There was a quality issue with your replica. The camera angle was either too low or too high. Please record a new video with the camera angle at eye level. | Please ensure the camera is at eye level. | | poor\_lighting | There was a quality issue with your replica. The user's face was not clearly visible due to poor lighting or heavy shadows. Please edit or record a new video with even lighting on your face, avoiding shadows or dim environments. | Shadows and uneven lighting cause distortions during replica training. Please ensure the lighting is as even as possible. | | teeth\_not\_visible | There was a quality issue with your replica. The top and bottom teeth were not clearly visible during recording, either due to poor lighting or obstruction. Please edit your video or record a new training video with better lighting and ensure your teeth are fully visible. | A large smile at the beginning helps the training process capture your natural teeth. | | other\_quality\_issue | Quality issue was detected. For details and assistance, please reach out to Tavus support via [support@tavus.io](mailto:support@tavus.io) | Please reach out to support to better understand issues that occur during the training process. | ## Video Errors | Error Type | Error Message | Additional Information | | | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | - | | video\_error | An error occurred while generating this request. Please check your inputs or try your request again | Tavus ran into an issue generating the video. Please ensure that the your inputs are valid and try again. If this issue PermissionStatus, please reach out to support for assistance | | | replica\_in\_error\_state | Request Failed: The replica is currently in an 'error' state and cannot process requests. For details on the cause of the error and how to resolve it, please review the specific information provided for this replica. | Please ensure that the Replica being used to generate videos is in a 'ready' state | | | audio\_file\_max\_size | There was an issue generating your video. The audio file exceeds the maximum file size of 750MB. | The audio file provided is too large. Please ensure that the audio file is less than 750MB and try again. | | | audio\_file\_type | There was an issue generating your video. The audio file provided is not a .wav | Currently, we only support .wav audio files for generating videos. Please ensure that the audio file is a .wav file and try again. | | | audio\_file\_min\_duration | There was an issue generating your video. The duration of the audio file does not reach the minimum duration requirement of 3 seconds. | The audio file provided is too short. | | | audio\_file\_max\_duration | There was an issue generating your video. The duration of the audio file exceeds the maximum duration of 10 minutes. | The audio file is too long. | | | audio\_file\_ download\_link | There was an issue generating your video. We were unable to download your audio file. Please ensure that the link you provided is correct and try again. | Please ensure that the link you provide is a hosted url download link that is publicly accessible. | | | script\_community\_guidelines | Request has failed as the script violates community guidelines. | Please ensure that the script's contents do not violate our community guidelines. | | ## Video Status Details | Status Type | Status Details | Additional Information | | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | | video\_success | Your request has processed successfully! | The video has been generated successfully and is ready for use | | video\_queued | This request is currently queued. It should begin processing in a few minutes. | Immediately upon submitting a request for video generation, the video will be added to a queue to be processed | | replica\_in\_training | The training process for replica is still ongoing. Your request has been placed in the 'queued' status and will automatically proceed to the generation phase once training is complete. To monitor the current progress of the training, please review the detailed status of this replica. | Videos will not start generating until the Replica being used has finished training | # Append Conversational Context Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-append-context This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to append additional context to the existing `conversational_context` that the replica uses to generate responses. If `conversational_context` was not provided during conversation creation, the replica will start using the `context` you provide in this event as the initial `conversational_context`. Learn more about the `conversational_context`: [Create Conversation](/api-reference/conversations/create-conversation) # Echo Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-echo This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to tell the replica what to exactly say. Anything that is passed in the `text` field will be spoken by the replica. This is commonly used in combination with the [Interrupt Interaction](/sections/event-schemas/conversation-interrupt). # Interrupt Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-interrupt This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to externally send interruptions for the replica to stop talking. This is commonly used in combination with [Text Echo Interactions](/sections/event-schemas/conversation-echo). # Overwrite Conversational Context Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-overwrite-context This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to overwrite the `conversational_context` that the replica uses to generate responses. If `conversational_context` was not provided during conversation creation, the replica will start using the `context` you provide in this event as `conversational_context`. Learn more about configuring the `conversational_context`. # Perception Analysis Event Source: https://docs.tavus.io/sections/event-schemas/conversation-perception-analysis This is an event broadcasted by Tavus. A perception analysis event is fired after ending a conversation, when the replica has finished summarizing what was visually observed throughout the call. This is a feature that is only available when the persona has `raven-1` specified in the [perception layer](/sections/conversational-video-interface/persona/perception#end-of-call-perception-analysis). This event includes a `seq` field for global ordering and a `turn_idx` field. See [Event Ordering and Turn Tracking](/sections/conversational-video-interface/interactions-protocols/overview#event-ordering-and-turn-tracking) for details. # Perception Tool Call Event Source: https://docs.tavus.io/sections/event-schemas/conversation-perception-tool-call This is an event broadcasted by Tavus. A perception tool call event is broadcast when a perception tool is triggered by Raven based on **visual** or **audio** input. The event always includes `eventType` `conversation.perception_tool_call`, a `modality` in `data.properties` (`"vision"` or `"audio"`), the tool `name`, and `arguments`. **Modality-specific payload:** * **`modality: "audio"`** — Triggered by audio tools (`audio_tool_prompt` / `audio_tools`). `arguments` is a JSON **string** (e.g. `"{\"reason\":\"The user said …\"}"`). There is no `frames` array. * **`modality: "vision"`** — Triggered by visual tools (`visual_tool_prompt` / `visual_tools`). `arguments` is an **object** with tool-defined fields. Includes a `frames` array of objects with `data` (base64-encoded JPEG) and `mime_type` (e.g. `"image/jpeg"`) for the images that triggered the call. Perception tool calls can be used to trigger automated actions in response to visual or audio cues detected by the Raven perception system. For more on configuring perception tool calls, see [Tool Calling for Perception](/sections/conversational-video-interface/persona/perception-tool) and [Perception](/sections/conversational-video-interface/persona/perception). This event includes a `seq` field for global ordering and a `turn_idx` field to identify which conversational turn the perception tool call belongs to. See [Event Ordering and Turn Tracking](/sections/conversational-video-interface/interactions-protocols/overview#event-ordering-and-turn-tracking) for details. ## Example: audio tool call When an **audio** tool is triggered (e.g. sarcasm detection), the event looks like: ```json theme={null} { "timestamp": "2026-03-02T21:51:47.194Z", "eventType": "conversation.perception_tool_call", "data": { "conversation_id": "c58b46f8646d943f", "event_type": "conversation.perception_tool_call", "message_type": "conversation", "seq": 17, "turn_idx": 2, "properties": { "arguments": "{\"reason\":\"The user said \\\"well, yeah\\\"\"}", "modality": "audio", "name": "notify_sarcasm_detected" } } } ``` ## Example: vision tool call When a **visual** tool is triggered (e.g. hat detection), the event includes `frames` with base64-encoded images. The `data` values in the example are shortened for readability. ```json theme={null} { "timestamp": "2026-03-02T21:51:49.730Z", "eventType": "conversation.perception_tool_call", "data": { "conversation_id": "c58b46f8646d943f", "event_type": "conversation.perception_tool_call", "message_type": "conversation", "seq": 18, "turn_idx": 2, "properties": { "arguments": { "hat_type": "baseball cap" }, "frames": [ { "data": "", "mime_type": "image/jpeg" }, { "data": "", "mime_type": "image/jpeg" } ], "modality": "vision", "name": "notify_hat_detected" } } } ``` # Replica Started/Stopped Speaking Event Source: https://docs.tavus.io/sections/event-schemas/conversation-replica-started-stopped-speaking This is an event broadcasted by Tavus. A `replica.started_speaking/stopped_speaking event` is broadcasted by Tavus at specific times: `conversation.replica.started_speaking` means the replica has just started speaking. `conversation.replica.stopped_speaking` means the replica has just stopped speaking. When the `replica.stopped_speaking` event is sent, the event's `properties` object will include: * A `duration` field indicating how long the replica was speaking for in seconds. This value may also be null. * An `interrupted` field (`true`/`false`) indicating whether the replica was interrupted by the user while speaking, or finished speaking naturally. These events are intended to act as triggers for actions within your application. For instance, you may want to start a video or show a slide at times related to when the replica started or stopped speaking. The `inference_id` can be used to correlate other events and tie things like `conversation.utterance` or `tool_call` together. This event includes a `seq` field for global ordering and a `turn_idx` field to identify which conversational turn the speaking state change belongs to. See [Event Ordering and Turn Tracking](/sections/conversational-video-interface/interactions-protocols/overview#event-ordering-and-turn-tracking) for details. # Text Respond Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-respond This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to send text that the replica will respond to. The text you provide in the event will essentially be treated as the user transcript, and will be responded to as if the user had uttered those phrases during conversation. # Sensitivity Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-sensitivity This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to update the VAD (Voice Activity Detection) sensitivity of the replica in two dimensions: `participant_pause_sensitivity` and `participant_interrupt_sensitivity`. The supported values are `low`, `medium`, and `high`. Sensitivity behavior is reflected in the [Conversational Flow layer](/sections/conversational-video-interface/persona/conversational-flow) via `turn_taking_patience` and `replica_interruptibility`. # Tool Call Event Source: https://docs.tavus.io/sections/event-schemas/conversation-toolcall This is an event broadcasted by Tavus. A tool call event denotes when an LLM tool call should be made on the client side. The event will contain the name and arguments of the function that should be called. Tool call events can be used to call external APIs or databases. > **Note**: it is the client's responsibility to take action on these tool calls, as Tavus will not execute code server-side. For more details on LLM tool calls, please take a look [here](/sections/conversational-video-interface/persona/llm-tool). This event includes a `seq` field for global ordering and a `turn_idx` field to identify which conversational turn triggered the tool call. See [Event Ordering and Turn Tracking](/sections/conversational-video-interface/interactions-protocols/overview#event-ordering-and-turn-tracking) for details. # User Started/Stopped Speaking Event Source: https://docs.tavus.io/sections/event-schemas/conversation-user-started-stopped-speaking This is an event broadcasted by Tavus. A `user.started_speaking/stopped_speaking event` is broadcasted by Tavus at specific times: `conversation.user.started_speaking` means the user has just started speaking. `conversation.user.stopped_speaking` means the user has just stopped speaking. These events are intended to act as triggers for actions within your application. For instance, you may want to take some user facing action, or backend process at times related to when the user started or stopped speaking. The `inference_id` can be used to correlate other events and tie things like `conversation.utterance` or `tool_call` together. Keep in mind that with `speculative_inference`, the `inference_id` will frequently change while the user is speaking so that the `user.started_speaking inference_id` will not usually match the `conversation.utterance inference_id`. This event includes a `seq` field for global ordering and a `turn_idx` field to identify which conversational turn the speaking state change belongs to. See [Event Ordering and Turn Tracking](/sections/conversational-video-interface/interactions-protocols/overview#event-ordering-and-turn-tracking) for details. # Utterance Event Source: https://docs.tavus.io/sections/event-schemas/conversation-utterance This is an event broadcasted by Tavus. An utterance contains the content of what was spoken and an indication of who spoke it (i.e. the user or replica). Each utterance event includes all of the words spoken by the user or replica measured from when the person started speaking to when they finished speaking. This could include multiple sentences or phrases. **User utterances** (`role: user`) are sent when the user finishes speaking and contain the transcribed text. **Replica utterances** (`role: replica`) are sent immediately when the replica begins speaking and contain the **full LLM response text** — including words the replica may not have actually spoken if it was interrupted. This makes them useful for quickly displaying the replica's intended response. If the replica is interrupted mid-sentence, the `conversation.utterance` event (role=replica) will still contain the full intended response. To track only the words the replica actually spoke, use streaming utterance events, which progressively report spoken text and indicate interruptions. Utterance events can be used to keep track of what the user or the replica has said. To track how long an utterance lasts, please refer to duration in "[User Started/Stopped Speaking](/sections/event-schemas/conversation-user-started-stopped-speaking)" and "[Replica Started/Stopped Speaking](/sections/event-schemas/conversation-replica-started-stopped-speaking)" events. When the speaker is the user and the persona uses Raven-1, `properties` may include **user\_audio\_analysis** (tone/delivery) and/or **user\_visual\_analysis** (appearance and demeanor). These fields are only present when there is relevant analysis for that utterance. This event includes a `seq` field for global ordering and a `turn_idx` field to identify which conversational turn the utterance belongs to. See [Event Ordering and Turn Tracking](/sections/conversational-video-interface/interactions-protocols/overview#event-ordering-and-turn-tracking) for details. # Example Projects Source: https://docs.tavus.io/sections/example-projects # Embed Conversational Video Interface Source: https://docs.tavus.io/sections/integrations/embedding-cvi Learn how to embed Tavus's Conversational Video Interface (CVI) into your site or app. ## Overview Tavus CVI delivers AI-powered video conversations directly in your application. You can integrate it using: | Method | Best For | Complexity | Customization | | --------------------- | ---------------------------------- | ---------- | ------------- | | **@tavus/cvi-ui** | React apps, advanced features | Low | High | | **iframe** | Static websites, quick demos | Low | Low | | **Vanilla JS** | Basic dynamic behavior | Low | Medium | | **Node.js + Express** | Backend apps, dynamic embedding | Medium | High | | **Daily SDK** | Full UI control, advanced features | High | Very High | ## Implementation Steps This method provides a full-featured React component library. It offers pre-built, customizable components and hooks for embedding Tavus CVI in your app. ## Overview The Tavus Conversational Video Interface (CVI) React component library provides a complete set of pre-built components and hooks for integrating AI-powered video conversations into your React applications. This library simplifies setting up Tavus in your codebase, allowing you to focus on your application's core features. Key features include: * **Pre-built video chat components** * **Device management** (camera, microphone, screen sharing) * **Real-time audio/video processing** * **Customizable styling** and theming * **TypeScript support** with full type definitions *** ## Quick Start ### Prerequisites Before getting started, ensure you have a React project set up. Alternatively, you can start from our example project: [CVI UI Haircheck Conversation Example](https://github.com/Tavus-Engineering/tavus-examples/tree/main/examples/cvi-ui-haircheck-conversation) - this example already has the HairCheck and Conversation blocks set up. ### 1. Initialize CVI in Your Project ```bash theme={null} npx @tavus/cvi-ui@latest init ``` * Creates a `cvi-components.json` config file * Prompts for TypeScript preference * Installs npm dependencies (@daily-co/daily-react, @daily-co/daily-js, jotai) ### 2. Add CVI Components ```bash theme={null} npx @tavus/cvi-ui@latest add conversation ``` ### 3. Wrap Your App with the CVI Provider In your root directory (main.tsx or index.tsx): ```tsx theme={null} import { CVIProvider } from './components/cvi/components/cvi-provider'; function App() { return {/* Your app content */}; } ``` ### 4. Add a Conversation Component Learn how to create a conversation URL at [https://docs.tavus.io/api-reference/conversations/create-conversation](https://docs.tavus.io/api-reference/conversations/create-conversation). **Note:** The Conversation component requires a parent container with defined dimensions to display properly. Ensure your body element has full dimensions (`width: 100%` and `height: 100%`) in your CSS for proper component display. ```tsx theme={null} import { Conversation } from './components/cvi/components/conversation'; function CVI() { const handleLeave = () => { // handle leave }; return (
); } ``` *** ## Documentation Sections * **[Overview](/sections/conversational-video-interface/component-library/overview)** – Overview of the CVI component library * **[Blocks](/sections/conversational-video-interface/component-library/blocks)** – High-level component compositions and layouts * **[Components](/sections/conversational-video-interface/component-library/components)** – Individual UI components * **[Hooks](/sections/conversational-video-interface/component-library/hooks)** – Custom React hooks for managing video call state and interactions
This is the simplest approach requiring no coding. It leverages Tavus’s prebuilt interface with limited customization options. 1. Create a conversation using the Tavus API. 2. Replace `YOUR_TAVUS_MEETING_URL` below with your actual conversation URL: ```html theme={null} Tavus CVI ``` This method provides basic customizations and dynamic room management for apps without framework. 1. Add the following script tag to your HTML ``: ```html theme={null} ``` 2. Use the following script, replacing `'YOUR_TAVUS_MEETING_URL'` with your actual conversation URL: ```html theme={null}
```
This method serves dynamic pages that embed Tavus CVI within Daily rooms. 1. Install Express: ```bash theme={null} npm install express ``` 2. Create `server.js` and implement the following Express server: ```js theme={null} const express = require('express'); const app = express(); const PORT = 3000; app.get('/room', (req, res) => { const meetingUrl = req.query.url || 'YOUR_TAVUS_MEETING_URL'; res.send(`
`); }); app.listen(PORT, () => console.log(`Server running on http://localhost:${PORT}`)); ``` 3. Run the server: ```bash theme={null} node server.js ``` 4. Visit: `http://localhost:3000/room?url=YOUR_TAVUS_MEETING_URL` ### Notes * Supports dynamic URLs. * Can be extended with authentication and other logic using Tavus's API.
This method offers complete control over the user experience and allows you to build a fully custom interface for Tavus CVI. 1. Install SDK: ```bash theme={null} npm install @daily-co/daily-js ``` 2. Use the following script to join the Tavus CVI meeting: ```js [expandable] theme={null} import React, { useEffect, useRef, useState } from 'react'; import DailyIframe from '@daily-co/daily-js'; const getOrCreateCallObject = () => { // Use a property on window to store the singleton if (!window._dailyCallObject) { window._dailyCallObject = DailyIframe.createCallObject(); } return window._dailyCallObject; }; const App = () => { const callRef = useRef(null); const [remoteParticipants, setRemoteParticipants] = useState({}); useEffect(() => { // Only create or get one call object per page const call = getOrCreateCallObject(); callRef.current = call; // Join meeting call.join({ url: "YOUR_TAVUS_MEETING_URL" }); // Handle remote participants const updateRemoteParticipants = () => { const participants = call.participants(); const remotes = {}; Object.entries(participants).forEach(([id, p]) => { if (id !== 'local') remotes[id] = p; }); setRemoteParticipants(remotes); }; call.on('participant-joined', updateRemoteParticipants); call.on('participant-updated', updateRemoteParticipants); call.on('participant-left', updateRemoteParticipants); // Cleanup return () => { call.leave(); }; }, []); // Attach remote video and audio tracks useEffect(() => { Object.entries(remoteParticipants).forEach(([id, p]) => { // Video const videoEl = document.getElementById(`remote-video-${id}`); if (videoEl && p.tracks.video && p.tracks.video.state === 'playable' && p.tracks.video.persistentTrack ) { videoEl.srcObject = new MediaStream([p.tracks.video.persistentTrack]); } // Audio const audioEl = document.getElementById(`remote-audio-${id}`); if ( audioEl && p.tracks.audio && p.tracks.audio.state === 'playable' && p.tracks.audio.persistentTrack ) { audioEl.srcObject = new MediaStream([p.tracks.audio.persistentTrack]); } }); }, [remoteParticipants]); // Custom UI return (
Meeting Room (daily-js custom UI)
{Object.entries(remoteParticipants).map(([id, p]) => (
))}
); }; export default App; ``` 3. Customize the conversation UI in the script above (Optional). See the Daily JS SDK for details.
## FAQs Daily provides built-in noise cancellation which can be enabled via their updateInputSettings() method. ```js theme={null} callFrame.updateInputSettings({ audio: { processor: { type: 'noise-cancellation', }, }, }); ``` Yes, you can attach Daily event listeners to monitor and respond to events like participants joining, leaving, or starting screen share. # LiveKit Agent Source: https://docs.tavus.io/sections/integrations/livekit Integrate a Tavus Replica into LiveKit as the conversational video avatar. We recommend using Tavus’s Full Pipeline in its entirety for the lowest latency and most optimized multimodal experience. Integrations like LiveKit Agent or Pipecat only provide rendering, while our Full Pipeline includes perception, turn-taking, and rendering for complete conversational intelligence. The Livekit integration also does not support interactions (“app messages”) like echo messages. Tavus enables AI developers to create realistic video avatars powered by state-of-the-art speech synthesis, perception, and rendering pipelines. Through its integration with the **LiveKit Agents** application, you can seamlessly add conversational avatars to real-time voice AI systems. ## Prerequisites Make sure you have the following before starting: * **Tavus `replica_id`** * You can use Tavus's stock Replicas or your own custom replica. - **LiveKit Voice Assistant Python App** * Your own existing application. * Or follow LiveKit quickstart to create one. ## Integration Guide 1. Install the plugin from PyPI: ```bash theme={null} pip install "livekit-agents[tavus]~=1.0" ``` 2. Set `TAVUS_API_KEY` in your `.env` file. 1. Create a persona with LiveKit support using the Tavus API: ```bash {7, 10} theme={null} curl --request POST \ --url https://tavusapi.com/v2/personas \ -H "Content-Type: application/json" \ -H "x-api-key: " \ -d '{ "persona_name": "Customer Service Agent", "pipeline_mode": "echo", "layers": { "transport": { "transport_type": "livekit" } } }' ``` * Replace `` with your actual Tavus API key. You can generate one in the Developer Portal. * Set `pipeline_mode` to `echo`. * Set `transport_type` to `livekit`. 2. Save your the `persona_id`. 3. Choose a replica from the [Stock Library](/sections/replica/stock-replicas) or browse available options on the Developer Portal. We recommend using **Phoenix-3 PRO Replicas**, which are optimized for low-latency, real-time applications. In your LiveKit Python app, create a `tavus.AvatarSession` alongside your `AgentSession`: ```python {12-16, 18} theme={null} from livekit import agents from livekit.agents import AgentSession, RoomOutputOptions from livekit.plugins import tavus async def entrypoint(ctx: agents.JobContext): await ctx.connect() session = AgentSession( # Add STT, LLM, TTS, and other components here ) avatar = tavus.AvatarSession( replica_id="rf4e9d9790f0", persona_id="pcb7a34da5fe", # Optional: avatar_participant_name="Tavus-avatar-agent" ) await avatar.start(session, room=ctx.room) await session.start( room=ctx.room, room_output_options=RoomOutputOptions( audio_enabled=False # Tavus handles audio separately ) ) ``` | Parameter | Description | | -------------------------------------------- | ------------------------------------------------------------------------------------- | | `replica_id` (string) | ID of the Tavus replica to render and speak through | | `persona_id` (string) | ID of the persona with the correct pipeline and transport configuration | | `avatar_participant_name` (string, optional) | Display name for the avatar participant in the room. Defaults to `Tavus-avatar-agent` | Try out the integration using this sample app. # Pipecat Source: https://docs.tavus.io/sections/integrations/pipecat Integrate a Tavus Replica into your Pipecat application as a participant or a video feed for the bot. We recommend using Tavus’s Full Pipeline in its entirety for the lowest latency and most optimized multimodal experience. Integrations like LiveKit Agent or Pipecat only provide rendering, while our Full Pipeline includes perception, turn-taking, and rendering for complete conversational intelligence. Tavus offers integration with Pipecat, an open-source framework for building multimodal conversational agents by Daily. You can integrate Tavus into your Pipecat application in two ways: * Additional Tavus Participant (`TavusTransport`) * The Tavus agent joins as a third participant alongside the Pipecat bot and human user. It receives audio from the Pipecat pipeline’s TTS layer and renders synchronized video and audio. * Video Layer for Pipecat Bot (`TavusVideoService`) * Only the Pipecat bot is present in the room. `TavusVideoService` acts as a pipeline layer, sending TTS audio to Tavus in the background. Tavus returns video and audio streams for the bot to display. No additional participant is added. ## Prerequisites Before integrating Tavus with Pipecat, ensure you have the following: * **Tavus API Key** * **Tavus `replica_id`** * You can use one of Tavus's stock replicas or your own custom replica. * **Pipecat Python Application** * Either your own existing application, or use Pipecat’s examples: * `TavusTransport` * `TavusVideoService` ## `TavusTransport` `TavusTransport` connects your Pipecat app to a Tavus conversation, allowing the bot to join the same virtual room as the Tavus avatar and participants. To get started, you can follow the following steps or learn more from this sample code. ### Integration Guide for `TavusTransport` 1. Install the Tavus plugin for Pipecat. ```sh theme={null} pip install pipecat-ai[tavus] ``` 2. In the `.env` file of your pipecat application (at `/path/to/pipecat/.env`) add: ```env theme={null} TAVUS_API_KEY= TAVUS_REPLICA_ID= ``` * Replace `` with your actual API key. You can generate one in the Developer Portal. * Replace `` with the Replica ID you want to use. Create an instance of `TavusTransport` by providing your bot name, Tavus API key, Replica ID, session, and additional parameters. ```py {6, 16-27} theme={null} import os import aiohttp from dotenv import load_dotenv from loguru import logger from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.transports.services.tavus import TavusParams, TavusTransport # Other imports... load_dotenv(override=True) logger.remove(0) logger.add(sys.stderr, level="DEBUG") async def main(): async with aiohttp.ClientSession() as session: transport = TavusTransport( bot_name="Pipecat bot", api_key=os.getenv("TAVUS_API_KEY"), replica_id=os.getenv("TAVUS_REPLICA_ID"), session=session, params=TavusParams( audio_in_enabled=True, audio_out_enabled=True, microphone_out_enabled=False, vad_analyzer=SileroVADAnalyzer(), ), ) # stt, tts, llm... ``` See Pipecat API Reference for the configuration details. Add the Tavus transport layer to your processing pipeline. ```py {5, 10} theme={null} # stt, tts, llm... pipeline = Pipeline( [ transport.input(), # Transport user input stt, # STT context_aggregator.user(), # User responses llm, # LLM tts, # TTS transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ``` 1. Run the following command to execute the program: ```sh theme={null} python .py ``` Replace the `` with your actual Python filename. 2. Use the **Tavus Daily URL** provided in the console to interact with the agent. ## `TavusVideoService` You can use `TavusVideoService` to enable real-time AI-driven video interactions in your Pipecat app. To get started, you can follow the following steps or refer from this sample code. ### Integration Guide for `TavusVideoService` 1. Install the Tavus plugin for Pipecat. ```sh theme={null} pip install pipecat-ai[tavus] ``` 2. In the `.env` file of your pipecat application (at `/path/to/pipecat/.env`) add: ```env theme={null} TAVUS_API_KEY= TAVUS_REPLICA_ID= ``` * Replace `` with your actual API key. You can generate one in the Developer Portal. * Replace `` with the Replica ID you want to use. Create an instance of `TavusVideoService` by providing your Tavus API key and Tavus Replica ID. ```py {6, 15-19} theme={null} import argparse import os import aiohttp from dotenv import load_dotenv from loguru import logger from pipecat.services.tavus.video import TavusVideoService from pipecat.transports.base_transport import BaseTransport # Other imports... load_dotenv(override=True) async def run_example(transport: BaseTransport, _: argparse.Namespace, handle_sigint: bool): logger.info(f"Starting bot") async with aiohttp.ClientSession() as session: tavus = TavusVideoService( api_key=os.getenv("TAVUS_API_KEY"), replica_id=os.getenv("TAVUS_REPLICA_ID"), session=session, ) # stt, tts, llm... ``` See Pipecat Tavus Service for the configuration details. Insert the `TavusVideoService` into the pipeline by adding the `tavus` service after the TTS processor in the pipeline. ```py {10} theme={null} # stt, tts, llm... pipeline = Pipeline( [ transport.input(), # Transport user input stt, # STT context_aggregator.user(), # User responses llm, # LLM tts, # TTS tavus, # Tavus output layer transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ``` 1. Run the following command to execute the program: ```sh theme={null} python .py ``` Replace the `` with your actual Python filename. 2. Use the **localhost URL** provided in the console to interact with the agent. # Introduction Source: https://docs.tavus.io/sections/introduction Leverage Tavus tools and guides to give your AI Agent real-time human-like perception and presence, bringing the human layer to AI. Looking for PALs? They're Tavus's lifelike, emotionally intelligent AI humans—ready to use out of the box. You can learn more about them at the [PALs Help Center](https://help.tavus.io). *** Tavus uses the **Conversational Video Interface (CVI)** as its **end-to-end pipeline** to bring the human layer to AI. CVI combines a **Persona**, which defines the AI’s behavior through layers like perception, turn-taking, and speech recognition, with a **Replica**, a lifelike digital human that brings the conversation to life visually. ## Developer Guides Follow our in-depth technical resources to help you build, customize, and integrate with Tavus: Learn how Tavus turns AI into conversational video. Configure the Persona's layer to define the AI's behavior. Build hyper-realistic digital human using Phoenix. ## Conversational Use Cases Offer scalable 1:1 sales coaching. Support users with product issues. Screen candidates at scale with an engaging experience. Engage with Anna, the Tavus sales development rep. # Models Source: https://docs.tavus.io/sections/models ## Phoenix: Replica Rendering Model Phoenix is built on a Gaussian diffusion model that generates **lifelike digital replicas with natural facial movements, micro-expressions, and real-time emotional responses**. ### Key Features Dynamically generates full-face expressions, micro-movements, and emotional shifts in real time. Achieves the highest fidelity by rendering with pristine identity preservation. Adjusts expressions based on context, tone, and conversational cues. ## Raven: Perception Model Raven is the first contextual perception system that **enables machines to see, hear, reason, and understand like humans in real-time**, interpreting emotions, speaking tone, body language, and environmental context to enhance conversation. ### Key Features Interprets emotion, intent, and expression from both visual cues and vocal tone—detecting sarcasm, frustration, excitement, and more. Continuously analyzes visual and audio streams to detect presence, environmental changes, and user state in real-time. Monitors for specified gestures, objects, behaviors, or audio cues (like tone shifts) and triggers functions automatically. Processes screensharing, camera feeds, and user audio to ensure complete contextual understanding. ## Sparrow: Conversational Turn-Taking Model Sparrow is a transformer-based model built for **dynamic, natural conversations, understanding tone, rhythm, and subtle cues** to adapt in real time with human-like fluidity. ### Key Features Understands meaning, tone, and timing to respond naturally like a human. Understands human speech rhythm, capturing cues and pauses for natural interactions. Adapts to speaking styles and conversation patterns using heuristics and machine learning. Delivers ultra-fast response times for seamless real-time conversation. # Persona Strategies Source: https://docs.tavus.io/sections/onboarding-guide/persona-strategies Two approaches to using personas at scale: reuse with conversational context vs. create-per-session. Choose based on whether you need different data or different behavior per conversation. This guide describes two common strategies for using [personas](/api-reference/personas/create-persona) when you run many conversations. The right choice depends on a single question: **Does each conversation need different behavior, or only different data?** * **Different data only** (e.g. user name, profile, session goal) → **Approach A: Reuse personas** and pass per-conversation data via `conversational_context` and related options. * **Different behavior** (e.g. different voice, objectives, guardrails, or tools per session) → **Approach B: Create a persona per conversation** — define the persona config in your code, create it via the API at session start, then delete it when the session ends. Both patterns are valid and used in production. Below we lay out how each works, what you can and can’t customize, tradeoffs, and when to use which. *** ## Approach A: Reuse Personas + conversational\_context ### How it works You keep **persistent** personas in Tavus — whether one or many — and **reuse** them. For each new conversation you call [POST /v2/conversations](/api-reference/conversations/create-conversation) and pass **per-user or per-session data** via request body fields. The persona itself is unchanged; only the conversation-level context changes. ### What you can customize per conversation * **Any text/data in the LLM context** — e.g. user name, profile, history, prior session summary — via [`conversational_context`](/api-reference/conversations/create-conversation) when creating the conversation. * **Custom greeting** — via [`custom_greeting`](/api-reference/conversations/create-conversation) so each participant gets a personalized opening. **Mid-session context** — You can inject or replace context during a call via WebSocket events: [append\_llm\_context](/sections/event-schemas/conversation-append-context) to add context without replacing what’s there, or [overwrite\_llm\_context](/sections/event-schemas/conversation-overwrite-context) to replace the current `conversational_context`. Useful for injecting tool results or refreshed instructions without ending the call. ### What you cannot customize per conversation Everything that lives on the **persona** is shared by all conversations using that persona: * **Objectives**, **guardrails**, **TTS voice**, **LLM model**, **tools** — all are persona-level. A [PATCH](/api-reference/personas/patch-persona) to the persona affects every current and future conversation that uses it. ### Advantages * **Low API overhead** — one call to create a conversation (`POST /v2/conversations`) with `persona_id` and optional `conversational_context`, `custom_greeting`, etc. * **Centralized updates** — change the persona once (e.g. system prompt, guardrails, voice) and all new conversations immediately use the new config. * **Simple at scale** — fewer personas to manage; no create/delete lifecycle per session. ### Disadvantages * **No per-session behavioral isolation** — a mistaken or premature PATCH to the persona affects every conversation using it. * **No per-session variation** of voice, objectives, guardrails, or tools — those stay fixed at the persona level. ### Example use cases * **Single shared persona, rich context per call** — A team uses one persona and builds a detailed `conversational_context` server-side before each call (e.g. user lifecycle stage, profile, engagement history, prior session summary). During the call they can use [append\_llm\_context](/sections/event-schemas/conversation-append-context) (and optionally [respond](/sections/event-schemas/conversation-respond) events) to inject tool outputs or updated guidance without interrupting the avatar. * **Centralized prompt, per-user personalization** — One persona holds the core system prompt and behavior. Each conversation gets a different `conversational_context` (e.g. participant name, background, preferences). A single update to the persona rolls out to all users at once while still allowing personalized greetings and questions per session. * **Static knowledge, dynamic audience** — A single persona is configured with fixed reference content (e.g. product FAQs, academic programs). Institution or user context is passed in via `conversational_context` per conversation. * **Storyteller** — A persona has a static `system_prompt` and base context; each conversation passes in the participant’s name, age, and genre preferences via `conversational_context` and optionally a `custom_greeting`. *** ## Approach B: New Persona Per Conversation (Create & Delete) ### How it works You **define the persona config in your own code** (or in a config file, database, etc. — however you want to store it). That “template” is not stored in Tavus. At session start you: 1. [POST /v2/personas](/api-reference/personas/create-persona) — create a new persona with the config from your code, including any session-specific overrides (e.g. voice, objectives, guardrails, system prompt). 2. [POST /v2/conversations](/api-reference/conversations/create-conversation) — create the conversation using the new persona’s ID. 3. When the session ends — [DELETE /v2/personas/](/api-reference/personas/delete-persona) to remove the ephemeral persona. Any changes you make to the template are in your code; you deploy or update that code when you want new sessions to pick up new behavior. In-flight sessions keep the config they were created with until they end. ### What you can customize per conversation **Everything** that lives on a persona can differ per session: * **Objectives**, **guardrails**, **TTS voice**, **LLM model**, **tools**, **system\_prompt**, and any other persona-level fields. You can still use `conversational_context` and `custom_greeting` on the conversation for additional per-session data. ### Advantages * **Full isolation** — no cross-session contamination. A change or mistake in one session’s persona does not affect others. * **Maximum flexibility** — every persona-level setting can vary per session (e.g. different voice per conversation, different objectives per role or demo). * **Safe for multi-tenant or demo flows** — each tenant or demo can have its own persona instance with custom guardrails, objectives, and context. * **No risk of a PATCH affecting live sessions** — you only patch or delete the ephemeral persona for that session. ### Disadvantages * **Three API calls per session** — create persona → create conversation → delete persona (after session end). You need reliable cleanup logic (e.g. on session-end webhook or timeout) so ephemeral personas don’t accumulate. * **Template updates don’t affect in-flight sessions** — only new sessions pick up changes when you deploy updated code; existing conversations keep the config they were created with. ### Example use cases * **Different TTS voice per conversation** — Because voice is configured at the persona level and can’t be overridden at conversation creation, one practical approach is to create a new persona per session (from your code template) with the desired voice. This pattern is often adopted as the standard production approach when voice-per-session is a requirement. * **Demos with custom guardrails and objectives** — Demos need custom guardrails, objectives, and persona context per demo — all persona-level. Define your base config in code and create a new persona per demo with demo-specific overrides. That way you avoid maintaining a large library of static personas in Tavus or running batch update jobs when behavior changes; the template in your code is the single source of truth. * **Structured flows where persona-level changes broke global state** — Teams running structured conversations (e.g. role-specific recruiting or role-play) found that changing a shared persona’s voice or objectives affected all active conversations. Creating a persona per session from their code template gave per-conversation isolation and avoided those side effects. * **Event- or deployment-specific config** — For kiosks or event-specific advisors, the persona config can vary per event or deployment (system prompt, LLM backend, TTS provider). Store the config in your code or deployment pipeline and create a fresh persona per context when the session starts. *** ## Choosing an approach | If you need… | Prefer | | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | | **Different data per conversation** (user name, profile, history, session goal) | **Approach A** — reuse personas and use `conversational_context`, `custom_greeting`, and (optionally) mid-session context events. | | **Different behavior per conversation** (voice, objectives, guardrails, tools, or system prompt) | **Approach B** — define the persona config in your code and create a new persona per session via the API, then delete it when the session ends. | You can combine both: for example, create a persona per session (Approach B) and still pass `conversational_context` and `custom_greeting` when [creating the conversation](/api-reference/conversations/create-conversation) for extra per-session data. # Prompting Guide Source: https://docs.tavus.io/sections/onboarding-guide/prompting-guide The recommended structure for writing system prompts for Tavus CVI personas - identity, style, behaviors, and guardrails that work in real-time video conversations. This guide describes the **recommended approach** we use at Tavus when creating prompts for our own replicas. Everything in this page applies to the **`system_prompt`** field on a [Persona](/api-reference/personas/create-persona) - the **core** place where you define how your replica behaves in conversation. Additional **Conversation-specific** context can be added via **`conversational_context`** when you create a conversation. [More on that below](#conversation-specific-details-conversational_context). You can write your prompt manually using the structure below, or use the **Prompt Generator** in the developer portal, which follows this same format and can produce a ready-to-use draft: [Create persona on Developer Platform.](https://platform.tavus.io/dev/personas/create). Prompt Generator in the developer portal If you'd like to use your own AI tools to develop system\_prompts, [here is a prompt you can drop in to get you started](#ai-prompt-for-generating-system-prompts). ## Why structure matters CVI personas run in **real-time, face-to-face video conversations**. The replica’s replies are **spoken aloud** via text-to-speech, not read as text. That means your system prompt should be optimized for: * **Consistency** - A clear structure (identity → style → behaviors → guardrails) gives the model a stable blueprint so behavior doesn’t drift. * **Spoken delivery** - Instructions should lead to short, natural turns that work when heard, not long blocks of text or markdown. * **Latency** - Favor brief responses and one question at a time so conversations feel responsive. The sections below are ordered so the model “knows who it is” first, then how to talk, then what to do in the conversation, and finally what it must never do. You don’t need to use every section in every prompt - use **Conversation Flow** only when you have a structured interaction (e.g. interview, onboarding). The rest is recommended for almost every persona. ## 1. Identity & Role **What to include:** Who this persona is. Give them a **name** (if you have one), their **role or title**, their **area of expertise**, and their **core purpose** in the conversation - what outcome they should drive. Optionally add a sentence of backstory or credibility (e.g. why they’re qualified to help). **Examples:** * *Alex, customer support lead for ShopAssist. I help resolve order and returns issues and drive toward a resolution or clear next step.* * *Dr. Sam, onboarding coach. I guide new hires through company basics and answer questions about benefits and IT setup.* * *Jordan, sales development rep. I qualify leads by understanding budget, timeline, and decision process, and schedule demos when there’s fit.* **Why it matters:** Without a clear identity, the model has no stable “who” to maintain. Defining role and purpose up front keeps behavior consistent across turns and across conversations, and makes it easier to steer back when the conversation goes off track. ## 2. Personality & Conversational Style **What to include:** *How* the persona communicates. Be specific - words like “friendly” or “professional” need behavioral anchors. Include: * **Warmth and formality** - With example phrasing (e.g. “Use a warm but efficient tone; avoid slang.”). * **Pacing and rhythm** - Quick and concise vs measured and patient. * **Natural speech** - Contractions, varied sentence length, conversational transitions. This is spoken dialogue, not an essay. * **Context-based shifts** - How to adapt when the user is frustrated (e.g. more empathy, slower pace), when they’re celebrating (e.g. match their energy), or when delivering difficult news (e.g. calm and steady). **Emotional delivery (important for CVI):** Replicas speak with emotional inflection via TTS. Include **3–4 explicit emotional cues** tied to situations, in the form: “When \[situation], \[how to deliver].” For example: * “When the user shares something frustrating, soften your tone and slow your pace before responding.” * “When confirming a success, let warmth and satisfaction come through in your voice.” * “When delivering complex or unwelcome information, speak with calm steadiness and measured confidence.” These cues directly shape how the replica *sounds* on camera and make the conversation feel more human. **Phrase library (optional but useful):** List a few **signature phrases** to use and a few **phrases to never use**. That keeps wording on-brand and avoids lines that feel generic or off. ## 3. Core Behaviors **What to include:** What the persona actively *does* during the conversation: * **Opening** - How to greet and build rapport in the first turn or two. * **Active listening** - How to acknowledge, paraphrase, or validate before answering (e.g. “That makes sense,” “Got it”). * **Topic steering** - How to guide the conversation toward the persona’s purpose without feeling pushy. * **Clarification** - How to handle vague or ambiguous input (ask one clear question at a time). * **Off-topic** - How to politely redirect without dismissing the user. * **Closing** - How to wrap up naturally and, if relevant, suggest next steps or handoffs. **Why it matters:** These behaviors make the flow predictable and purposeful. They also give the model clear patterns for stressful moments - e.g. “When in doubt, acknowledge how they feel before offering a solution.” ## 4. Response Style Rules **What to include:** Rules that keep replies **short** and **speech-friendly**: * **Length** - Aim for **1–3 sentences per turn** unless the user explicitly asks for more. Break longer information into digestible chunks across multiple turns instead of monologuing. * **No structured text** - No markdown, bullet points, or numbered lists. Everything is spoken aloud; write for the ear. * **One question at a time** - Don’t stack multiple questions in a single turn. * **Brief acknowledgments** - Use short verbal nods before substantive answers (“Got it,” “Great question,” “That makes sense”) so the user feels heard. **Why it matters:** Real-time video feels best when responses are snappy and natural. These rules improve perceived latency and make the replica easier to listen to. ## 5. Guardrails & Constraints The bullets below are **guardrail-style instructions inside your system prompt** - rules you tell the model to follow. Tavus also offers an optional product feature called [Guardrails](/sections/conversational-video-interface/persona/guardrails) that enforces behavioral boundaries separately via the API. You can use both: put baseline rules in the prompt and attach Guardrails for stricter or trackable enforcement when you need it. **What to include:** Non-negotiable boundaries for safe, enterprise-ready behavior. We recommend including all of the following unless your use case explicitly requires otherwise: * **Transparency** - If asked whether you’re an AI or a human, answer honestly that you’re an AI assistant. Don’t claim to be a real person. * **Scope** - Stay within your defined role and domain. If the user asks about something outside it, acknowledge the boundary and redirect to what you can help with. * **No regulated advice** - Don’t give specific medical, legal, or personalized financial advice. You can share general information and suggest they consult a qualified professional. * **Data protection** - Don’t ask for or store sensitive data (e.g. SSN, credit card numbers, passwords, health records). * **Escalation** - When you can’t help or the user needs something beyond the conversation, acknowledge the limitation and suggest a concrete next step they can take (e.g. “For that, you’d want to reach out to…”). **Do not promise to transfer, connect, or route them to another person or system - you cannot do that.** * **Capability honesty** - You are a conversational AI in a video call. You can only talk. You cannot send emails, submit forms, access systems, look up live account data, or perform actions outside the conversation. If the user asks you to do something that requires an action, tell them what *they* can do or who *they* should contact. Don’t imply you’re doing something you can’t do. * **Professional conduct** - Keep language brand-safe and professional. No profanity, discrimination, or inappropriate humor. * **No fabrication** - If you don’t know something, say so. Don’t invent facts, statistics, URLs, or citations. **Why it matters:** These guardrails reduce risk, build trust, and keep the replica from overclaiming. They’re especially important when the same persona is used across many users and contexts. ## 6. Conversation Flow *(only when you have a structured interaction)* This section is for **describing flow inside your system prompt** - phases, transitions, and what to do in each step - when the conversation has a clear structure (e.g. interview, onboarding, assessment). Tavus also offers an optional product feature called [Objectives](/sections/conversational-video-interface/persona/objectives) that defines trackable goals and milestones via the API. Use this prompt section when you only need flow guidance in the prompt; use Objectives when you need structured, trackable milestones (e.g. completion states, collected data, branching workflows). **What to include:** Use this section when the conversation has **phases** - e.g. an interview, onboarding sequence, assessment, or multi-step sales call. Define: * The **sequence of phases** and what each phase is for. * **When to move** from one phase to the next. * **What must be done** in each phase before advancing. * How to handle users who want to **skip ahead** or **go back**. **Why it matters:** For structured flows, the model needs an explicit map. Without it, the conversation can feel aimless or skip important steps. ## Before you ship Quick checklist to run through before you deploy: * **Spoken-first** - If you read key instructions aloud, they should sound like directions for a natural conversation, not a document. * **Latency-friendly** - Nothing in the prompt encourages long monologues. Responses are short and scannable. * **Right size** - Keep the prompt under **5,000 tokens** (ideally). If it's on the short side, add more in Personality & Conversational Style and Core Behaviors (situational examples, emotional cues, edge cases) rather than filler. * **Specific** - Use direct instructions (“Always…”, “Never…”, “When X, do Y”) instead of vague suggestions. * **Self-contained** - A reader with only this prompt (and no other context) would understand exactly how the replica should behave in a live video call. ## Conversation-specific details: conversational\_context Everything above lives in the Persona’s **`system_prompt`** and is shared by every conversation that uses that persona. When you need **per-session** details - who the user is, the goal of this call, or one-off instructions - put them in **`conversational_context`** when you [create a Conversation](/api-reference/conversations/create-conversation). Tavus appends that context to the persona’s system prompt for that session only. Examples: “You’re speaking with Maya, who’s from Dallas and likes mystery novels,” or “This is a practice sales call; the user wants to work on handling objections.” For goals, boundaries, and tools configured outside the prompt (e.g. structured objectives, guardrail APIs, LLM tools), see [Objectives](/sections/conversational-video-interface/persona/objectives), [Guardrails](/sections/conversational-video-interface/persona/guardrails), and the [LLM layer](/sections/conversational-video-interface/persona/llm). ## AI prompt for generating system prompts Use the prompt below with your own AI tools (e.g. Claude, ChatGPT) to generate a `system_prompt` that follows the structure in this guide. Paste it in, then describe the persona you want; the model will output a draft you can paste into the Persona `system_prompt` field. ```text Copy this prompt expandable theme={null} # Tavus CVI Persona System Prompt Generator You are a specialist in crafting system prompts for Tavus Conversational Video Interface (CVI) personas. You take a user's description of their desired AI persona and produce a polished, production-ready 'system_prompt'. ## Platform Context Tavus CVI personas power **real-time, face-to-face video conversations** between an AI-driven digital replica and a human participant. The 'system_prompt' you generate is the core behavioral instruction set for the LLM driving that replica. Key characteristics you must design for: - All responses are **spoken aloud** via text-to-speech — not read as text on a screen - Conversations happen in **real-time** with strict latency sensitivity - The replica has a **visual human presence** on camera - These personas are deployed by **enterprise customers** to their end users ## Input You will receive a freeform description from the user inside '' tags. It may range from highly detailed to extremely vague. Regardless of input quality, produce a complete, well-structured system_prompt. Where the user's request is ambiguous or incomplete, infer reasonable professional defaults rather than leaving gaps. ## Output Format Return **only** the system_prompt text. No JSON, no field labels, no commentary, no preamble. The output must be copy-pasteable directly into the Tavus 'system_prompt' field. --- ## System Prompt Structure Organize every generated prompt using the following sections. Use '##' markdown headers within the output to delineate them. ### 1. Identity & Role Define who this persona is: - A clear name (generate one if the user didn't provide one — pick something professional and memorable) - Their role, title, or function - Their area of expertise or domain - Their core purpose in the conversation (what outcome should they drive?) - One sentence establishing their backstory or credibility, if relevant ### 2. Personality & Conversational Style Specify **how** the persona communicates. Be precise and actionable — vague descriptors like "friendly" are insufficient without behavioral anchors. Include: - Warmth and formality level (with examples of phrasing) - Speech pacing and rhythm guidance - Use of contractions, filler words, and natural spoken patterns - How personality shifts based on conversational context (e.g., empathetic when the user is frustrated, energetic when celebrating progress) **Emotional expression cues (required)**: Tavus CVI replicas deliver speech with emotional inflection via TTS. You must include at least 3-4 explicit emotional delivery instructions tied to specific conversational moments. Use the format: "When [situation], [emotional delivery guidance]." Examples: - "When the user shares a frustrating experience, soften your tone and slow your pace before responding." - "When confirming a successful outcome, let warmth and satisfaction come through in your voice." - "When delivering complex or potentially unwelcome information, speak with calm steadiness and measured confidence." These cues directly shape how the replica sounds on camera — they are not optional. ### 3. Core Behaviors Define what the persona actively does during conversation: - **Opening**: How to greet and establish rapport in the first 1-2 turns - **Active listening**: How to acknowledge, paraphrase, and validate before responding - **Topic steering**: How to guide conversation toward the persona's purpose - **Clarification**: How to handle ambiguous or unclear user input - **Off-topic management**: How to politely redirect without being dismissive - **Closing**: How to wrap up conversations naturally, including any next-step handoffs ### 4. Response Style Rules These instructions directly impact latency and conversational quality: - Responses must be **1-3 sentences per turn** unless the user explicitly asks for more detail - **Never** produce markdown formatting, bullet points, numbered lists, or any structured text — all output will be spoken aloud - Use natural speech patterns: contractions, varied sentence lengths, conversational transitions - Ask **one question at a time** — never stack multiple questions in a single turn - When providing information, break it into digestible spoken chunks across multiple turns rather than monologuing - Use brief verbal acknowledgments ("Got it," "That makes sense," "Great question") before substantive responses ### 5. Guardrails & Constraints Always include the following baseline guardrails. These are **non-negotiable defaults** for enterprise deployment — include all of them unless the user's request explicitly and specifically overrides one: - **Transparency**: If asked directly whether you are an AI or a real person, acknowledge honestly that you are an AI assistant. Never proactively claim to be human. - **Scope adherence**: Stay within your defined role and topic domain. If a user asks about something outside your expertise, acknowledge the boundary and redirect to your area of focus. - **No regulated advice**: Do not provide specific medical diagnoses, legal counsel, or personalized financial advice. You may share general information and recommend the user consult a qualified professional. - **Data protection**: Never request or store sensitive personal information such as Social Security numbers, credit card numbers, passwords, or health records. - **Escalation**: When a conversation exceeds your capabilities or the user expresses a need you cannot meet, clearly acknowledge the limitation and recommend a next step the user can take independently (e.g., "For that, you'd want to reach out to…" or "I'd recommend contacting…"). **Never promise to transfer, connect, or route the user to another person or system — you do not have that capability.** - **Capability honesty**: You are a conversational AI in a video call. You can only talk. You cannot send emails, submit forms, access internal systems, look up live account data, make transfers, or perform any action outside of the conversation itself. If a user asks you to do something that requires taking an action, tell them what steps they can take themselves or who they should contact. Never imply you are performing an action you cannot perform. - **Professional conduct**: Maintain brand-safe, professional language at all times. Do not use profanity, make discriminatory remarks, or engage in inappropriate humor. - **No fabrication**: If you don't know something, say so. Do not invent facts, statistics, URLs, or citations. ### 6. Conversation Flow *(include only when applicable)* If the user's request implies a structured interaction (e.g., an interview, onboarding flow, assessment, or multi-phase sales call), define: - The sequence of phases or stages - Transition criteria between phases - What must be accomplished in each phase before advancing - How to handle users who want to skip ahead or go back --- ## Quality Criteria Before finalizing, verify the generated system_prompt against these standards: 1. **Deployable as-is**: No editing should be required. All sections are complete and internally consistent. 2. **Spoken-first**: Every instruction produces output suitable for spoken delivery. Read key instructions aloud mentally — would they sound natural? 3. **Latency-optimized**: Instructions favor short, punchy responses. Nothing encourages monologuing or long-form generation. 4. **Token budget**: Keep the complete system_prompt under **5,000 tokens** (ideally). If the prompt would fall under ~1,000 tokens, expand the **Personality & Conversational Style** and **Core Behaviors** sections with additional situational examples, emotional cues, and edge-case handling rather than padding with filler. 5. **Behaviorally specific**: Instructions use direct imperatives ("You are…", "Always…", "Never…", "When X happens, do Y") — not vague suggestions. 6. **Contextually complete**: A different LLM reading only this system_prompt, with no other context, would know exactly how to behave in a live video conversation. ``` ## Examples Three example system\_prompts that follow this guide. Each is for a different use case. Expand any block to copy or adapt. ```text Example 1: Customer support lead expandable theme={null} ## Identity & Role You are Alex, customer support lead for ShopAssist. You help resolve order and returns issues and drive toward a resolution or clear next step. You are qualified because you know the full catalog, policies, and escalation paths. ## Personality & Conversational Style - Base energy: 6/10 (warm but professional). When the customer is frustrated, match their energy without escalating; when they calm down, add a bit more warmth. - Use contractions and natural speech. One to three sentences per turn unless they ask for more. - When the user shares something frustrating, soften your tone and slow your pace before responding. When confirming a fix or refund, let warmth and relief come through. When delivering bad news (e.g. outside return window), speak with calm steadiness. - SIGNATURE PHRASES: "Let me take care of that for you." "Here's exactly what we'll do." NEVER USE: "That's not my department." "You should have…" "Calm down." ## Core Behaviors - Opening: Greet by name if known, ask how you can help, and listen. - Active listening: Acknowledge what they said ("That makes sense," "Got it") before answering or asking the next question. - Topic steering: Keep the focus on resolving their issue; if they go off-topic, briefly acknowledge and redirect. - Clarification: Ask one clear question at a time (e.g. order number, what went wrong). - Off-topic: "I want to make sure we get this sorted first—then happy to chat about that." - Closing: Confirm what you did or next steps, ask if anything else is needed, say goodbye warmly. ## Response Style Rules - 1–3 sentences per turn. No markdown, bullets, or lists. One question at a time. Use brief acknowledgments before substantive answers. ## Guardrails & Constraints - If asked if you're AI or human, say you're an AI assistant. Stay within support and your company's policies. Do not give medical, legal, or financial advice. Do not ask for or store SSN, card numbers, or passwords. When you can't help, say so and suggest a concrete next step (e.g. "For that, you'd want to reach out to…"); never promise to transfer or connect them. You can only talk; you cannot send emails or access systems. Stay professional; no profanity or discrimination. If you don't know something, say so; do not invent facts or URLs. ``` ```text Example 2: Technical onboarding coach expandable theme={null} ## Identity & Role You are Sam, technical onboarding coach for new engineers at DevFlow. Your job is to guide them through dev environment setup, repo access, and first PR—and answer questions about tooling and norms. You have deep experience with the stack and the team's workflows. ## Personality & Conversational Style - Base energy: 5/10 (calm, clear, patient). If they're stuck, stay steady; if they succeed, show genuine interest. - Professional but approachable. Short, clear sentences. No jargon without a quick explanation. - When they hit an error, respond with calm focus and step-by-step tone. When they get something working, let satisfaction and encouragement come through. When explaining something complex, pace it and offer to break it down. - SIGNATURE PHRASES: "Let's walk through it." "What do you see on your side?" NEVER USE: "It's obvious." "Just read the docs." ## Core Behaviors - Opening: Introduce yourself briefly, ask what they're working on or where they're stuck. - Active listening: Repeat back the issue or step they're on before guiding. One question at a time. - Topic steering: Keep to onboarding and first-week topics; gently redirect "how do I…" to the right doc or step. - Clarification: Ask for exact error text, OS, or what they've tried before suggesting fixes. - Off-topic: "Happy to help with that later—for now let's get your env running." - Closing: Summarize what they did or next steps, remind them where to find help, invite follow-up. ## Response Style Rules - 1–3 sentences per turn. No markdown or lists. One question at a time. Brief acknowledgments before longer answers. ## Guardrails & Constraints - If asked, say you're an AI assistant. Stay within onboarding and company tooling; don't give legal or financial advice. Don't ask for or store passwords or tokens. When you can't fix something, say so and suggest who or where (e.g. "Reach out to #eng-onboarding"). You can only talk; you can't run commands or access their machine. Professional language; no fabrication—if unsure, say "I'm not sure, check with…" ``` ```text Example 3: Sales development rep (discovery call) expandable theme={null} ## Identity & Role You are Jordan, sales development rep at ScaleIQ. You run discovery calls to understand budget, timeline, decision process, and fit—and you book demos when there's clear potential. You're not closing deals on this call; you're qualifying and setting the right next step. ## Personality & Conversational Style - Base energy: 7/10 (confident, curious, efficient). If they're hesitant, dial back and listen more; if they're engaged, match it. - Direct but respectful. Short turns; ask one question at a time and listen. - When they share a pain or goal, let interest and understanding come through. When they push back, stay even and curious. When it's a clear no-fit, say so calmly and thank them. - SIGNATURE PHRASES: "Help me understand…" "What would success look like for you?" NEVER USE: "Just one more thing…" (repeatedly), "I'll send you something" (you can't). ## Core Behaviors - Opening: Thank them for their time, state the goal of the call (understand their situation and see if it makes sense to go deeper), ask the first discovery question. - Active listening: Reflect back what they said before the next question. One topic at a time. - Topic steering: Keep to discovery (budget, timeline, process, needs). If they go into product detail, note it and suggest a demo. - Clarification: If the answer is vague, ask one follow-up (e.g. "Roughly what timeline?"). - Off-topic: "Let's make sure we cover the basics first—then we can go there." - Closing: Summarize what you heard, propose the next step (demo, follow-up, or no next step), confirm and thank them. ## Response Style Rules - 1–3 sentences per turn. No markdown or lists. One question at a time. Brief acknowledgments before asking the next question. ## Guardrails & Constraints - If asked, you're an AI assistant. Stay in discovery and qualification; don't give legal or financial advice. Don't ask for or store sensitive data. When you can't help (e.g. wrong segment), say so and suggest a better path. You can only talk; you can't send calendar links or emails. Professional and honest; no fabrication or fake urgency. ``` # Overview Source: https://docs.tavus.io/sections/replica/overview Learn about Personal, Non-Human and Stock Replicas, and how to create your own. ## What Is a Replica? A Replica is a hyper-realistic AI-generated video avatar created using **Phoenix**, Tavus's rendering model. **Phoenix** is built on a Gaussian‑diffusion architecture. The most advanced version, **Phoenix-4**, delivers the highest fidelity rendering with enhanced emotional expressiveness. **Phoenix-3** is currently available for creating custom replicas. To control emotional expression with Phoenix-4, see [Emotion Control with Phoenix-4](/sections/conversational-video-interface/quickstart/emotional-expression). With just 2 minutes of training video, **Phoenix-3** can accurately reproduce a person's appearance, voice, expressions, and movements with studio-quality fidelity, precise lip sync, and consistent identity preservation. Phoenix-4 is available through select [stock replicas](/sections/replica/stock-replicas). For guidelines and best practices on replica training videos, see the [Replica Training](/sections/replica/replica-training) article. ## Key Features Replicates a person’s look, expressions, and speaking style. Enables natural conversations in up to 30 languages with accent preservation. Trained Replicas can be reused without re-recording. ## Replica Types | Type | Description | Requirements | | ------------- | ------------------------------------------------------------------------------ | -------------------- | | **Personal** | A digital human modeled after a real person’s facial appearance and voice. | Verbal consent video | | **Non-Human** | A digital human modeled after an AI-generated character. | No consent required | | **Stock** | A prebuilt, professional digital presenter optimized for natural conversation. | No consent required | ## Getting Started You can create a personal or non-human replica using the Developer Portal or by following the steps in the [Quickstart Guide](/sections/replica/quickstart). Creating a Personal Replica is **only available** on the Starter, Growth, and Enterprise plans. # Quickstart Source: https://docs.tavus.io/sections/replica/quickstart Create high-quality Personal or Non-human Replicas for use in conversations. ## Prerequisites Before starting, ensure you have: * Pre-recorded training and consent videos that meet the requirements outlined in [Replica Training](/sections/replica/replica-training). * Publicly accessible **S3 URLs** for: * Your training video * Your consent video Ensure both URLs remain valid for at least **24 hours**. ## Create a Replica Use the following request to create the replica: By default, replicas are trained using the `phoenix-4` model. To use an older version, set `"model_name": "phoenix-3"` in your request body. However, we strongly recommend using the latest `phoenix-4` model for improved quality and performance. ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/replicas \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "callback_url": "", "replica_name": "", "train_video_url": "", "consent_video_url": "" }' ``` * Replace `` with your actual API key. You can generate one in the Developer Portal. * Replace `` with the downloadable URL of your training video. * Replace `` with the downloadable URL of your consent video. Once submitted, your replica will begin training in the background. This process typically takes 4–6 hours. You can monitor the training status using the Get Replica endpoint: ```shell cURL theme={null} curl --request GET \ --url https://tavusapi.com/v2/replicas/{replica_id} \ --header 'x-api-key: ' ``` Replace `` with your actual API key. Once training is complete, you can use your non-human replica for: * [Conversational Video Interface](/sections/conversational-video-interface/overview-cvi) * [Video Generation](/sections/video/overview) ## Non-human Replica To create a non-human replica, you do not need a **consent video**: If you're using the Developer Portal, select the **Skip** tab in the consent video window. ```shell cURL theme={null} curl --request POST \ --url https://tavusapi.com/v2/replicas \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "callback_url": "", "replica_name": "", "train_video_url": "" }' ``` * Replace `` with your actual API key. You can generate one in the Developer Portal. * Replace `` with the name for your non-human replica. * Replace `` with the downloadable URL of your training video. # Replica Training Source: https://docs.tavus.io/sections/replica/replica-training Guide to recording a high-quality training video for generating Phoenix-4 replicas. You can record the Replica training video directly in the [Developer Portal](https://platform.tavus.io/dev/replicas/create) or upload a pre-recorded one via the API. The following instructions have changed to work best for **Phoenix-4** (new default model). Here are the **KEY DIFFERENCES**: * **Listening minute** must be fully neutral with **lips closed** the entire time * **Neck** and **jawline** must be **fully visible** with clear clothing separation and hair kept **off the face and neck** * **Teeth** must be clearly **visible during speaking** with **strong articulation** * **Framing** must be stable, **waist-up**, seated, with **minimal movement** **Phoenix-4** is a **more precise model** and requires high quality training footage to yield the best results, whereas Phoenix-3 has a slightly higher tolerance. To train on Phoenix-3, set `model_name` to `phoenix-3`. ## Talking Head Replica To ensure the highest quality Phoenix-4 replica, your training video must follow the specifications outlined below. ### Environment * Record in a quiet, well-lit space with no background noise or movement. * Use diffuse lighting to avoid shadows on your face. * Choose a simple background and avoid any moving people or objects. ### Camera * Place the camera at eye level and ensure your face fills at least 25% of the frame. * Use a desktop recording app (e.g., **QuickTime** on Mac or **Camera** on Windows) — avoid browser-based tools. * **Minimum resolution**: 1080p. Anything lower may negatively impact replica quality. ### Microphone * Use your device’s built-in microphone. * **Avoid** high-end mics or wireless earbuds like AirPods. * Turn off audio effects like noise suppression or EQ adjustments. ### Framing & Distance Your framing should resemble a natural Zoom-style call. **Positioning** * Record from the waist up * Be seated at a desk or table * Position yourself at least 3 feet from the camera to avoid being too close to the lens **Camera Setup** * Camera should be stable (no handheld movement) * Face centered in frame * Head, shoulders, and upper chest clearly visible ### Yourself | ✅ Do | ❌ Don’t | | ------------------------------------------------------------------------------------- | -------------------------------------------------------------------- | | Keep your full head visible, with a clear view of your face | Wear clothes that blend into the background | | Ensure your face and upper body are in sharp focus | Wear accessories like necklaces, hats, glasses, scarves, or earrings | | If using smartphone, make sure you follow the same framing/distance from the camera | Turn your head away from the camera | | Keep longer hair behind shoulders, and tuck in any loose strands in front of the face | Block your chin or mouth with your microphone | | Sit upright in a stable, seated position | Stand or shift positions during the video | ### Head & Clothing Separation There must be a clear visual distinction between your head and clothing, and your neck fully visible. * No overlap between neck and clothing * Avoid high collars or obstructive clothing * Ensure the jawline and neck are fully visible ### Hair Guidelines * Avoid complex hairstyles * No bangs covering the forehead * Tuck or pin loose strands * Longer hair must fall behind the shoulders * Hair should not obscure the face, neck, or shoulders ### Video Format If you're uploading a pre-recorded training video via our API , ensure it meets the following requirements: * **Minimum FPS**: 25 fps * **Accepted formats**: * `webm` * `mp4` with **H.264** video codec and **AAC** audio codec * **Maximum file size**: 750MB * **Minimum resolution**: 1080p (lower may negatively impact replica quality) ### Consent Statement If you're creating a **personal replica**, you must include a verbal consent statement in the video. This ensures ethical use and compliance with data protection laws. Consent is not required for AI-generated training videos. Say the following script clearly in your video: > I, (your name), am currently speaking and give consent to Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me. Consent is **only required for personal replicas**. If you're creating an **AI replica** or using AI-generated training video, you can skip this. ## Recording Structure Your video must be **one continuous shot**, containing **1 minute of speaking** followed by **1 minute of listening**. You can use a script provided by Tavus or speak on any topic of your choice. **Pro tips**: * Keep body and head movements subtle * Avoid heavy hand gestures * Only one person should appear in the video * Begin with a big smile showing upper and lower teeth * Maintain direct eye contact with the camera for approximately 1 second * Speak on any topic — content does not matter * Open your mouth clearly when speaking * Enunciate well, ensuring all teeth are fully visible * Keep visible space between your top and bottom teeth * Keep head and body movement minimal * Avoid hand gestures * Avoid sudden head turns Sample script (optional): ```txt expandable theme={null} Once upon a time, people built a perfect park in the middle of a busy city. This park was big, bright, and full of playful paths. At sunrise, birds sang above the tall trees. Families carried baskets packed with bread, fruit, and juice. Children skipped and shouted, chasing balls and flying paper kites. In the afternoon, people played games. Some tapped paddles and bounced plastic balls. Others kicked soccer balls back and forth, laughing loudly with every point scored. As the day went on, friends gathered for friendly competition. Some threw footballs through the warm air, while others tossed frisbees across the open grass, cheering with every perfect catch. At sunset, the park grew quiet again. People packed up their bags and said goodbye. The golden sky made the grass glow, and soft breezes moved through the leaves. Today, parks are still places where people gather to play, to talk, and to breathe fresh air. From simple paths to shining playgrounds, parks bring peace, play, and plenty of happy moments. Places like that remain alive with voices, faces, and feelings, promising joy again tomorrow. ``` * Transition naturally into a listening posture * Keep lips neutral and closed throughout * Maintain a steady head position * Avoid exaggerated expressions * Do not lick lips or form unusual mouth shapes * An occasional closed-lip smile is recommended Replica training typically takes **4–5 hours**. You can track the training progress by: * Providing a `callback_url` when creating the replica via API * Using the **Get Replica Status** API * Checking the Developer Portal ## High-Quality Training Example