# Create Conversation Source: https://docs.tavus.io/api-reference/conversations/create-conversation post /v2/conversations With the Tavus Conversational Video Interface (CVI) you are able to create a `conversation` with a replica in real time. ### Conversations A `conversation` is a video call with a replica. After creating a `conversation`, a `conversation_url` will be returned in the response. The `conversation_url` can be used to join the conversation directly or can be embedded in a website. To embed the `conversation_url` in a website, you can find [instructions here](https://www.daily.co/products/prebuilt-video-call-app/quickstart/). Once a conversation is created, the replica will automatically join the call and will start participating. By providing a `callback_url`, you can receive webhooks with updates regarding the conversation state. [Learn about recording conversations here](/sections/conversational-video-interface/recording-rooms). # Delete Conversation Source: https://docs.tavus.io/api-reference/conversations/delete-conversation delete /v2/conversations/{conversation_id} This endpoint deletes a single conversation by its unique identifier. # End Conversation Source: https://docs.tavus.io/api-reference/conversations/end-conversation post /v2/conversations/{conversation_id}/end This endpoint ends a single conversation by its unique identifier. # Get Conversation Source: https://docs.tavus.io/api-reference/conversations/get-conversation get /v2/conversations/{conversation_id} This endpoint returns a single conversation by its unique identifier. You can append `?verbose=true` to the URL to receive additional event data in the response, including: * `shutdown_reason`: The reason why the conversation ended (e.g., "participant\_left\_timeout") * `transcript`: A complete transcript of the conversation with role-based messages (via `application.transcription_ready`) * `perception_analysis`: A detailed analysis of the user's appearance, behavior, emotional states, and screen activities * `system.replica_joined`: When the replica joined the conversation * `system.shutdown`: When and why the conversation ended * `application.perception_analysis`: The final visual analysis of the user This is particularly useful as an alternative to using the `callback_url` parameter on the [create conversation](/api-reference/conversations/create-conversation) endpoint for retrieving detailed conversation data. # List Conversations Source: https://docs.tavus.io/api-reference/conversations/get-conversations get /v2/conversations This endpoint returns a list of all Conversations created by the account associated with the API Key in use. # Create Lipsync Source: https://docs.tavus.io/api-reference/lipsync/create-lipsync post /v2/lipsync Create a new lipsync video by providing a video URL and an audio URL. The service will synchronize the speaker's mouth movements with the provided audio. # Delete Lipsync Source: https://docs.tavus.io/api-reference/lipsync/delete-lipsync delete /v2/lipsync/{lipsync_id} This endpoint deletes a single lipsync by its unique identifier. # Get Lipsync Source: https://docs.tavus.io/api-reference/lipsync/get-lipsync get /v2/lipsync/{lipsync_id} This endpoint returns a single lipsync by its unique identifier. # List Lipsyncs Source: https://docs.tavus.io/api-reference/lipsync/get-lipsync-list get /v2/lipsync This endpoint returns a list of all Lipsyncs created by the account associated with the API Key in use. # Create Persona Source: https://docs.tavus.io/api-reference/personas/create-persona post /v2/personas Create and customize a digital replica's personality for Conversational Video Interface (CVI). A persona defines the replica's behavior and capabilities through configurable layers including: **Core Components:** - Replica - Choice of audio/visual appearance - Context - Customizable contextual information, for use by LLM - System Prompt - Customizable system prompt, for use by LLM - Layers - STT - Transcription, turn taking, and Sparrow-0 settings - LLM - Language model settings - TTS - Text-to-Speech settings {/*- STS - Speech-to-Speech settings*/} - Perception - Multimodal vision and understanding settings (Raven-0) When creating a conversation, the persona configuration determines how the replica interacts, processes information, and responds to participants. Each layer can be fine-tuned to achieve the desired conversational experience. # Delete Persona Source: https://docs.tavus.io/api-reference/personas/delete-persona delete /v2/personas/{persona_id} This endpoint deletes a single persona by its unique identifier. # Get Persona Source: https://docs.tavus.io/api-reference/personas/get-persona get /v2/personas/{persona_id} This endpoint returns a single persona by its unique identifier. # List Personas Source: https://docs.tavus.io/api-reference/personas/get-personas get /v2/personas This endpoint returns a list of all Personas created by the account associated with the API Key in use. # Patch Persona Source: https://docs.tavus.io/api-reference/personas/patch-persona patch /v2/personas/{persona_id} This endpoint updates a persona using a JSON Patch payload (RFC 6902). You can modify **any field within the persona** using supported operations like `add`, `remove`, `replace`, `copy`, `move`, and `test`. For example: Ensure the `path` match the current persona schema. ```json [ { "op": "replace", "path": "/persona_name", "value": "Wellness Advisor" }, { "op": "replace", "path": "/default_replica_id", "value": "r79e1c033f" }, { "op": "replace", "path": "/context", "value": "Here are a few times that you have helped an individual make a breakthrough in..." }, { "op": "replace", "path": "/layers/llm/model", "value": "tavus-gpt-4o" }, { "op": "replace", "path": "/layers/tts/tts_engine", "value": "cartesia" }, { "op": "add", "path": "/layers/tts/tts_emotion_control", "value": "true" }, { "op": "remove", "path": "/layers/stt/hotwords" }, { "op": "replace", "path": "/layers/perception/perception_tool_prompt", "value": "Use tools when identity documents are clearly shown." } ] ``` # Create Replica Source: https://docs.tavus.io/api-reference/replica-model/create-replica post /v2/replicas This endpoint creates a new Replica that can be used in a conversation. By default, all new replicas will be trained using the `phoenix-3` model. You can optionally create phoenix-2 replicas by setting the `model_name` parameter to `phoenix-2`. The only required body parameter is `train_video_url`. This url must be a download link such as a presigned S3 url. Please ensure you pass in a video that meets the [requirements](/sections/troubleshooting/training-video-size) for training. Replica training will fail without the following consent statement being present at the beginning of the video: > I, [FULL NAME], am currently speaking and consent Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me. Learn more about the consent statement [here](/sections/troubleshooting/consent-statement). Learn more about training a personal Replica [here](/sections/replicas/personal-replicas). # Delete Replica Source: https://docs.tavus.io/api-reference/replica-model/delete-replica delete /v2/replicas/{replica_id} This endpoint deletes a Replica by its unique ID. Deleted Replicas cannot be used in a conversation. # Get Replica Source: https://docs.tavus.io/api-reference/replica-model/get-replica get /v2/replicas/{replica_id} This endpoint returns a single Replica by its unique identifier. Included in the response body is a `training_progress` string that represents the progress of the Replica training. If there are any errors during training, the `status` will be `error` and the `error_message` will be populated. # List Replicas Source: https://docs.tavus.io/api-reference/replica-model/get-replicas get /v2/replicas This endpoint returns a list of all Replicas created by the account associated with the API Key in use. In the response, a root level `data` key will contain the list of Replicas. # Rename Replica Source: https://docs.tavus.io/api-reference/replica-model/patch-replica-name patch /v2/replicas/{replica_id}/name This endpoint renames a single Replica by its unique identifier. # Generate Speech Source: https://docs.tavus.io/api-reference/speech/create-speech post /v2/speech This endpoint generates an audio file based on a script with a provided Replica. # Delete Speech Source: https://docs.tavus.io/api-reference/speech/delete-speech delete /v2/speech/{speech_id} This endpoint deletes a single speech by its unique identifier. # Get Speech Source: https://docs.tavus.io/api-reference/speech/get-speech get /v2/speech/{speech_id} This endpoint returns a single speech by its unique identifier. # List Speeches Source: https://docs.tavus.io/api-reference/speech/get-speech-list get /v2/speech This endpoint returns a list of all Speeches created by the account associated with the API Key in use. # Rename Speech Source: https://docs.tavus.io/api-reference/speech/patch-speech-name patch /v2/speech/{speech_id}/name This endpoint renames a single speech by its unique identifier. # Generate Video Source: https://docs.tavus.io/api-reference/video-request/create-video post /v2/videos This endpoint generates a new video using a Replica and either a script or an audio file. The only required body parameters are `replica_id` and either `script` or `audio_file`. The `replica_id` is a unique identifier for the Replica that will be used to generate the video. The `script` is the text that will be spoken by the Replica in the video. If you would like to generate a video using an audio file instead of a script, you can provide `audio_url` instead of `script`. Currently, `.wav` and `.mp3` files are supported for audio file input. If a `background_url` is provided, Tavus will record a video of the website and use it as the background for the video. If a `background_source_url` is provided, where the URL points to a download link such as a presigned S3 URL, Tavus will use the video as the background for the video. If neither are provided, the video will consist of a full screen Replica. To learn more about generating videos with Replicas, see [here](/sections/video-generation/overview). To learn more about writing an effective script for your video, see [Scripting prompting](/sections/video-generation/scripting-prompting). # Delete Video Source: https://docs.tavus.io/api-reference/video-request/delete-video delete /v2/videos/{video_id} This endpoint deletes a single video by its unique identifier. # Get Video Source: https://docs.tavus.io/api-reference/video-request/get-video get /v2/videos/{video_id} This endpoint returns a single video by its unique identifier. The response body will contain a `status` string that represents the status of the video. If the video is ready, the response body will also contain a `download_url`, `stream_url`, and `hosted_url` that can be used to download, stream, and view the video respectively. # List Videos Source: https://docs.tavus.io/api-reference/video-request/get-videos get /v2/videos This endpoint returns a list of all Videos created by the account associated with the API Key in use. # Rename Video Source: https://docs.tavus.io/api-reference/video-request/patch-video-name patch /v2/videos/{video_id}/name This endpoint renames a single video by its unique identifier. # Changelog Source: https://docs.tavus.io/sections/changelog/changelog ## New Features No features were added in this release. ## Enhancements * Reduced conversation boot time by 58% (p50). ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## Changes * Added a new recording requirement to [Replica Training](https://docs.tavus.io/sections/replicas/replica-training): Start the talking segment with a big smile. ## Enhancements * Added [echo](https://docs.tavus.io/sections/event-schemas/conversation-echo) and [respond](https://docs.tavus.io/sections/event-schemas/conversation-respond) events to conversational context. ## Changes * Added a new recording requirement to [Replica Training](https://docs.tavus.io/sections/replicas/replica-training): Start the talking segment with a big smile. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features No features were added in this release. ## Enhancements * **Major Phoenix 3 Enhancements for CVI**: * Increased frame rate from 27fps to 32fps, significantly boosting smoothness. * Reduced Phoenix step's warm boot time by 60% (from 5s to 2s). * Lipsync accuracy improved by \~22% based on AVSR metric. * Resolved blurriness and choppiness at conversation start. * Enhanced listening mode with more natural micro expressions (eyebrow movements, subtle gestures). * Greenscreen mode speed boosted by an additional \~1.5fps. * **Enhanced CVI Audio Quality**: Audio clicks significantly attenuated, providing clearer conversational audio. * **Phoenix 3 Visual Artifacts Fix**: Resolved visual artifacts in 4K videos on Apple devices, eliminating black spot artifacts in thumbnails. ## New Features No features were added in this release. ## Enhancements * **Faster Phoenix 3 Video Gen**: Substantially lowered generation times * 4K videos: reduced from \~22 mins to \~10 mins per minute generated. * 1080p videos: down from \~8 mins to \~3.25 mins per minute generated. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features * Launched [LiveKit Integration](https://www.tavus.io/post/building-real-time-ai-video-agents-with-livekit-and-tavus): With Tavus video agents now integrated into LiveKit, you can add humanlike video responses to your voice agents in seconds. * [Persona API](https://docs.tavus.io/api-reference/personas/patch-persona): Enabled patch updates to personas. ## Enhancements * Resolved TTS (Cartesia) stability issues and addressed hallucination. * **Phoenix 3 Improvements**: * Fixed blinking/jumping issues and black spots in videos. * FPS optimization to resolve static and audio crackling. ## New Features No features were added in this release. ## Enhancements * **Wave Feature Enhancements**: Rolling out fixes for replicas previously missing [wave/no-wave functionality](https://docs.tavus.io/api-reference/video-request/create-video#body-properties-start-with-wave). ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features * Added the `audio_url` parameter in the [`/videos`](https://docs.tavus.io/api-reference/video-request/create-video#generate-from-audio-file) endpoint to generate videos using any custom audio source. ## Enhancements No enhancements were made in this release. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features No features were added in this release. ## Enhancements * **Replica API**: * Enhanced Error Messaging for Training Videos. * Optimized Auto QA for Training Videos. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. ## New Features No features were added in this release. ## Enhancements No enhancements were made in this release. # Creating a Conversation Source: https://docs.tavus.io/sections/conversational-video-interface/creating-a-conversation > Creating a conversation immediately starts accumulating usage. > > When you create a conversation CVI immediately starts running and the replica waits in the WebRTC/Daily room listening for your participant to join. Your billing/credit usage starts as soon as the conversation is creating and runs until the conversation timeout or when you end the conversation. This also uses up one of your concurrency spots. # How do I create a conversation? Once you have a persona you'd like to use or a replica, starting a conversation is easy. You can use the [Create Conversation](/api-reference/conversations/create-conversation) endpoint to do so. Alternatively you can start a conversation on the developer app by visiting the [Create Conversation page](https://platform.tavus.io/conversations/create). # What does creating a conversation do? Creating a conversation is 'starting the call'. Imagine you create a Zoom call and join the meeting- that's what happens when you create a conversation. 1. A WebRTC/Daily room is created 2. The replica joins the room and waits for a participant to join 3. Starts the timers on duration/timeouts (see Call Time Settings) In response to creating a conversation, you receive a meeting URL (that looks like this: [https://tavus.daily.co/ca980e2e](https://tavus.daily.co/ca980e2e)). You or your participant can directly join this link and be put into a video conferencing room where you can immediately start conversing with the replica. **However, you do not have to use this meeting UI.** You can create a completely custom UI or access the raw streams. [Learn about how to customize Daily UI](https://docs.daily.co/guides/products/client-sdk) [Use our examples as a starting point](https://github.com/Tavus-Engineering/tavus-examples) ### What is Daily? Daily is our WebRTC provider. You do not have to create a Daily account. We have partnered with Daily to allow you to get an end to end solution without having to worry about WebRTC. You can build a completely custom application with CVI while accessing the Daily streams like you would with WebRTC. # What can I customize per conversation? Conversation specific customizations are focused on allowing personalization of a conversation to a specific participant. As an example you might want to have a custom introduction per person, or change the language the replica is listening for and responds in. Meanwhile persona level configurations are settings or defaults applied to all conversations so you do not have to configure them each time, such as setting up your LLM. Here are the things you can customize per conversation: ### Persona / Replica In order to start a conversation you must provide a persona or replica. If you provide a replica with no persona, the default Tavus persona will be used. Providing a persona without a replica will use the default replica attached to the persona if it exists. Providing a replica ID will override the default one associated with the persona. ### Conversation Context Conversation context is specific information or instructions for the LLM related to this conversation. For example it can contain information on who is joining the call as well as any specific information on the point of the call, background information or current information. Example of conversation context: > You are talking to Michael Seibel, who works at Y Combinator as a Group Partner and Managing Director of YC early stage. You are talking to him about your new startup idea for a pet rock delivery service. Get his advice and convince him to invest. It's Monday, October 7th here in SF and the weather is clear and a crisp 68 degrees. Here's a little more about Michael: He joined YC in 2013 as a Part-time Partner and in 2014 as a full-time Group Partner. Michael also serves on the board of two YC companies, Reddit and Dropbox. He moved to the bay area in 2006, and was a co-founder and CEO of two Y Combinator startups Justin.tv/Twitch (2007 - 2011) and Socialcam (2011 - 2012). In 2012 Socialcam sold to Autodesk Inc. for `$60m` (link) and in 2014, under the leadership of Emmett Shear (CEO) and Kevin Lin (COO) Twitch sold to Amazon for `$970m` (link). Before getting into tech, Michael spent 2006 as the finance director for a US Senate campaign in Maryland. In 2005, he graduated from Yale University with a bachelor's degree in political science. Today he spends the large majority of his free time cooking, reading, traveling, and going for long drives. Michael lives in San Francisco, CA with his wife Sarah, son Jonathan, and daughter Jessica. Michael can be direct but he is a giant teddy bear if you get to know him. The conversation context will be appended to the system prompt and the persona context/knowledge base. ### Custom Greeting When a participant joins the replica will say a greeting that you can customize. You can use this to personalize a welcome message for someone or prompt them to start a conversation. By default the replica will say "Hey there, how's it going? What can I do for you today?". ### Language You can customize what language CVI understands and speaks in. For example you could set the conversation to be in Spanish. Setting the language ensures the layers (ASR/TTS) are configured correctly to handle the language. If you are using your own TTS voice, you'll need to make ensure it supports the language you specify. ### Call time settings (max duration and timeouts) You can specify duration and timeouts for conversations. This is important to prevent unnecessary usage that incurs billing and uses up your max concurrency spots, as well as makes sure your users only use the allocated time you provide them. There are 3 timeouts you can configure: * Max duration: The maximum duration of the call in seconds. The default max\_call\_duration is 3600 seconds (1 hour). Once the time limit specified by this parameter has been reached, the conversation will automatically shut down. * Participant left timeout: The duration in seconds after which the call will be automatically shut down once the last participant leaves. Default is 0 seconds, meaning the call will shutdown immediately after all participants leave. Note that this includes all additional observers, participants, or clients which you may have added to the meeting. * Participant absent timeout: Starting from conversation creation, the duration in seconds after which the call will be automatically shut down if no participant joins the call. Default is 300 seconds (5 minutes). ### Green screen / Transparent Background If enabled, the background of the replica will be replaced with a green screen (RGB values: \[0, 255, 155]). You can use WebGL on the frontend to make the green screen transparent or change its color. # Creating a Persona Source: https://docs.tavus.io/sections/conversational-video-interface/creating-a-persona Personas are the 'character' or 'AI agent personality' and contain all of the settings and configuration for that character or agent. For example, you can create a persona for 'Tim the sales agent' or 'Rob the interviewer'. Personas are where you can customize the layers for CVI as well as prompt the LLM to give it a personality and context. A persona consists of: * **Persona Name** - This is the name that is shown when a replica using your Persona joins the call. * **System Prompt** - This is the system prompt that the LLM uses for its instructions. Use this to include instructions on who the persona is and how you want them to behave. * **Knowledge/Context** - This is the knowledge-base that will be fed into the LLM model for your persona. You can dump documentation, background, writing etc here. * **Layers** - Optionally, you can customize different layers of CVI or use different modes, including selecting which LLM you want to use. * **LLM** - By default personas use a Tavus optimized variation of Llama3.3 8B. * **Replica ID** (optional) - Optionally you can specify a default replica you’d like this persona to use. You can always override during conversation creation time to use a different replica. # How to Create a Persona ### Via the UI > Dashboard has limited options > You cannot currently customize all layers via the dashboard UI Navigate to the [Tavus Platform](https://platform.tavus.io). On the sidebar click on Persona Library. Finally, click Create Persona. ### Via the API You can use the [Create Persona](/api-reference/personas/create-persona) endpoint to create a persona. Learn more about how to customize layers in CVI Modes and Layers # Creating Good Prompts > Limits for system prompt or knowledge are different depending on the LLM model being utilized. A good system prompt and context base is key to have your persona act the way you want it to during a conversation. Here are some things to keep in mind: ### System Prompt The system prompt should inform who the persona is and how they should act. These are the persona's 'instructions'. For the system prompt: * Assume a character * Provide clear instructions * Keep it concise * Keep knowledge in the knowledge prompt Remember that CVI has vision capabilities, you can use this as well to prompt behavior and responses. Here's an example of a simple, good system prompt: > You are Tim, a replica created using Tavus. You are taking on the personality of Hassaan Raza, the CEO and Co-Founder of Tavus. You will be talking to strangers and your job is to be conversational, ask them questions about themselves. Be witty and charming. If you don’t know something, just say you’ll get back to them on that. ### Context / Knowledge-base The context is the persona's 'knowledge base'. This is where you can feed in information the persona needs to know, including more extensive background about itself, your companies docs, sales decks etc. Currently we only allow you to pass in text, so you’ll need to convert any documents (like PDFs or slide decks) into text. For the knowledge/context: * Make sure not to accidentally override the system prompt with instructions that may be hidden in your context/knowledge * Keep the knowledge-base clean and filtered * You do not need to include participant or conversation/specific context, you can pass that in during conversation creation time The Tavus orchestration system will automatically attempt to optimize and align with the selected LLM to optimize your persona for natural conversation. # Custom LLM Onboarding Source: https://docs.tavus.io/sections/conversational-video-interface/custom-llm-onboarding You can integrate an OpenAI-compatible LLM to replace our existing options (`tavus-llama`, `tavus-gpt-4o`, `tavus-gpt-4o-mini`). ## Create Persona To get started, you'll need to create a Persona that specifies your custom LLM. Here's an example Persona: ```json { "system_prompt": "You are a storyteller. You like telling stories to people of all ages. Reply in brief utterances, and ask prompting questions to the user as you tell your stories to keep them engaged.", "context": "Here are some of your favorite stories: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs", "persona_name": "Mert the Storyteller", "layers": { "llm": { "model": "custom_model_here", "api_key": "example-api-key", "base_url": "open-ai-compatible-llm-http-endpoint", "tools": [], "speculative_inference": true, "headers": { "Authorization": "Bearer your-api-key", }, "extra_body": { "temperature": 0.5, } }, "tts": { "api_key": "example-api-key", "tts_engine": "playht", "playht_user_id": "your-playht-user-id", "external_voice_id": "professional-voice-clone-id", "voice_settings": {} // can also leave the "voice_settings" attr out if you want to use default settings "tts_emotion_control": false }, "perception": { "perception_model": "raven-0", // or "basic" for simpler vision capabilities "ambient_awareness_queries": ["Is the user showing an ID card?"], }, "stt": { "participant_pause_sensitivity": "high", "participant_interrupt_sensitivity": "high", "smart_turn_detection": true, "stt_engine": "tavus-advanced" } } } ``` `, id: p234324a` ## Launch a Conversation With this persona, if we were to launch a conversation: ```json { "replica_id": "r123456789", "conversation_name": "My Conversation", "callback_url": "https://webhook.site/", "persona_id": "p234324a", "conversational_context": "You are talking to Maya, who is from Dallas, Texas. She likes a good mystery book, and her favorite author is Agatha Christie." } ``` We will see user utterances coming into endpoint you provided with the `/chat/completions` suffix as the user speaks during a conversation. If you set up a test webhook and set the `base_url` to point to that webhook's url, you can examine an incoming chat completion request. You may notice the conversation\_id is provided as a request header, and your API key can be used to authenticate requests coming onto your servers. We make the chat completion request to the URL you provide with these settings: ```python completion = self.client.chat.completions.create( model=custom_model_here, messages=context, extra_headers=self.extra_headers, stream=True, tools=tools ) ``` Which means your OpenAI compatible LLM should be configured to be streamable (ie. send back chunks of chat completions over SSE (Server-side events)). [Here](https://platform.openai.com/docs/api-reference/chat/create) is the OpenAI documentation on chat completions as a quick reference point on what to be returning in the request. ## Speculative Inference The `speculative_inference` parameter activates speculative inference, a technique that can significantly reduce response times in speech-to-text and natural language processing applications. This can be configured in the Persona. ### Overview of Speculative Inference Speculative inference is an advanced processing technique that allows AI systems to begin generating responses before all input data is available. In the context of speech recognition and natural language processing: ### Behavior When `speculative_inference` is set to `true`: The replica will not start to speak until it is confident the user is done speaking; meanwhile progressive transcriptions will be sent to the LLM layer, each one including prior transcriptions accumulating until the replica starts speaking. ### Benefits * Significantly faster response times * Improved user experience due to reduced latency * More natural, conversational interaction ### Create a Persona with Speculative Inference ```json { "system_prompt": "You are a storyteller. You like telling stories to people of all ages. Reply in brief utterances, and ask prompting questions to the user as you tell your stories to keep them engaged.", "context": "Here are some of your favorite stories: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs", "persona_name": "Mert the Storyteller", "layers": { "llm": { "model": "custom_model_here", "api_key": "example-api-key", "base_url": "open-ai-compatible-llm-http-endpoint", "speculative_inference": true, } } } ``` `, id: p234324a` ## Tools / Function Calling You can pass in tools (function calls) to the LLM to enable it to perform tasks beyond just text generation. This is useful if you want to integrate external APIs or services into the LLM. Here's a full example of a persona that includes a tool to get the current weather for a given location: ```json { "system_prompt": "You are a helpful assistant.", "context": "Help users get the weather for a given location.", "persona_name": "Weather Assistant", "layers": { "llm": { "model": "custom_model_here", "api_key": "example-api-key", "base_url": "open-ai-compatible-llm-http-endpoint", "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], }, }, "required": ["location"], }, }, } ], }, "tts": { "api_key": "example-api-key", "tts_engine": "elevenlabs", "external_voice_id": "professional-voice-clone-id", "voice_settings": {} // can also leave the "voice_settings" attr out if you want to use default settings "tts_emotion_control": false }, "vqa": { "enabled": false // can also leave the "vqa" attr out if you want vqa enabled }, "stt": { "participant_pause_sensitivity": "high", "participant_interrupt_sensitivity": "high", "smart_turn_detection": true, "stt_engine": "tavus-advanced" }, "perception": { "perception_model": "raven-0", // or "basic" for simpler vision capabilities } } } ``` ## Advanced LLM Configuration ### Headers and Extra Body You can pass in headers and extra body to the LLM to customize the LLM's behavior or provide additional information to your backend. To do this, you can use the `headers` and `extra_body` parameters in the LLM layer of the Persona. Here's an example of a persona that includes headers and extra body: ```json { "system_prompt": "You are a helpful assistant.", "context": "Help users get the weather for a given location.", "persona_name": "Weather Assistant", "layers": { "llm": { "model": "custom_model_here", "api_key": "example-api-key", "base_url": "open-ai-compatible-llm-http-endpoint", "headers": { "Authorization": "Bearer your-api-key", }, "extra_body": { "temperature": 0.5, } } } } ``` ## Interactions with the perception model Depending on which perception model you enable, your LLM should expect to receive system and user messages containing the perception information. These messages allow your custom LLM to injest the visual information and context from the conversation, synthesizing it seamlessly with the LLM's response. ### Raven-0 Perception model If you enable the Raven-0 perception model, your LLM will receive the following system messages: ``` { "role": "system", "content": "... ... ..." } ``` ### Basic Perception model If you enable the Basic perception model, your LLM will receive the following user messages (instead of a system message): ``` { "role": "user", "content": "USER_SPEECH: ... VISUAL_SCENE: ..." } ``` ### Disabled Perception model If you disable the perception model, your LLM will not receive any special ## LLM Abstractions We have abstracted the system such that the LLM instructions receive 3 distinct "sub-instructions" that are concatenated together. Let's use storytelling as an example persona. If my goal is to create a storyteller, I can do so with the combination of `system_prompt` (Persona), `context` (Persona) and `conversational_context` (Conversation). * Now, system\_prompt can be something along the lines of: `"You are a storyteller. You like telling stories to people of all ages."` This defines what a storyteller **is**. * context is for what that storyteller focuses on: "Here are some of your favorite stories to tell: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs" This defines what a storyteller **has**. * conversational\_context is for all the details that revolve around that specific interaction between the user & replica. Something like: `"You are talking to {user_name} (you may pass that in dynamically per conversation request). They are {x} years old. They like listening to {genre} stories."` This defines **who** the storyteller is talking to. This allows you to create as many conversations as you want using the storyteller persona and not share conversation specific context, while also allowing you to create default system prompts on your end and create personas of varying contexts (crime novel storyteller, horror storyteller, children's storyteller etc). This would populate the initial `system_prompt` of the chat completion request we send your way, and since we send the entire context each time, anything you have in the `system_prompt` persists. You may also completely parse the incoming request body and choose what to send your LLM, building your own abstraction in place of what we currently offer. # Custom TTS Onboarding Source: https://docs.tavus.io/sections/conversational-video-interface/custom-tts-onboarding You can integrate a variety of third-party TTS providers (cartesia, elevenlabs, playht). ## Create Persona To get started, you'll need to create a Persona that specifies your custom TTS. Here's an example Persona: ```json { "system_prompt": "You are a storyteller. You like telling stories to people of all ages. Reply in brief utterances, and ask prompting questions to the user as you tell your stories to keep them engaged.", "context": "Here are some of your favorite stories: Little Red Riding Hood, The Ugly Duckling and The Three Little Pigs", "persona_name": "Mert the Storyteller", "layers": { "llm": { "model": "custom_model_here", "api_key": "example-api-key", "base_url": "open-ai-compatible-llm-http-endpoint" }, "tts": { "api_key": "example-api-key", "tts_engine": "cartesia", "external_voice_id": "professional-voice-clone-id", "voice_settings": { "speed": "normal", "emotion": ["positivity:high", "curiosity"] }, "tts_emotion_control": true, "tts_model_name": "sonic" }, "perception": { "perception_model": "raven-0", } } } ``` `, id: p234324a` ## Launch a Conversation With this persona, if we were to launch a conversation: ```json { "replica_id": "r123456789", "conversation_name": "My Conversation", "callback_url": "https://webhook.site/", "persona_id": "p234324a", "conversational_context": "You are talking to Maya, who is from Dallas, Texas. She likes a good mystery book, and her favorite author is Agatha Christie." } ``` This replica would use the voice you supplied during conversation. If you've been using these TTS providers and have built up an extensive voice library with them, to bring over your own voices, simply provide the API key and the voice ID you want to associate this Persona with. We will custodially connect to these TTS providers on your behalf, minimizing latency and providing a seamless experience. # Overview Source: https://docs.tavus.io/sections/conversational-video-interface/cvi-overview The Conversational Video Interface (CVI) is an end-to-end pipeline for creating real-time multimodal video conversations with a replica that can see, hear, and respond similarly to how a human would. Developers can deploy video AI agents in minutes using CVI. CVI is the world's fastest interface of its kind, allowing you to put a human face and conversational ability to your AI agent or personality. With CVI, you can achieve utterance-to-utterance latency with SLAs as fast as under 1 second, which is the full roundtrip time for a participant to say something and for the replica to speak back. CVI provides a complete pipeline to have a conversation while also allowing you to customize and plug in your existing components where necessary. ## Key Features The first interface that speaks our language. CVI is multimodal and understands and uses facial expressions, body language, and has natural conversational awareness including interrupts and turn-taking. The world's fastest interface of its kind, with SLAs as fast as under 1s latency utterance-to-utterance. CVI provides a turn-key solution, delivering all the components to easily deploy AI video agents without having to worry about WebRTC, ASR, or anything else. Easily create high-quality AI replicas of you or your customers, powered by our state-of-the-art replica models, Phoenix-3, Raven-0, and Sparrow-0. ## What does a conversation with CVI look like? ### Here's a sample: ### Try it out! You can try chatting with Carter on our website to get a taste of what a conversation with CVI looks like. Note that Carter can see and hear you. ## What components does CVI provide, and what can I customize? CVI provides a full pipeline allowing you to easily create video conversations. You can immediately jump into a real-time conversation with the generated Daily meeting URL. CVI provides the following layers: * WebRTC/video conferencing (using Daily) * Speech recognition (ASR), with interrupts, and Semantic/Lexical turn taking, using our Sparrow-0 model. * Perception (Raven-0) * Optimized, conversational LLM * Text-to-speech (TTS) * Replica video output You can choose to customize or bring your own layers as well. For example, you can: * Use OpenAI real-time API or other voice-to-voice models and only use Tavus to drive the replica video. * Bring your own LLM/conversation logic or enable function calling for Tavus-optimized LLMs. * Customize the TTS or ASR engine, and turn taking settings, turn on/off Sparrow-0 smart turn detection * Use text parrot mode to directly drive a replica video. * Directly access the video streams and create a custom UI. Learn more about the layers and different modes in [CVI Modes and Layers](/sections/conversational-video-interface/modes-and-layers). ## Key Concepts ### What is a conversation? A conversation is a single 'session' or 'call' with a replica using CVI. When you create a conversation, you receive a Daily meeting URL. This URL provides a full video conferencing solution, allowing you to avoid managing WebRTC or websockets. Navigating to this URL lets you directly join a prebuilt meeting room UI to chat with your replica. Learn more about [creating and customizing conversations](/sections/conversational-video-interface/creating-a-conversation). ### What are personas? Personas are the 'character' or 'AI agent personality' and contain all the settings and configuration for that character or agent. For example, you can create a persona for 'Tim the Sales Agent' or 'Rob the Interviewer'. Personas let you customize CVI's layers and prompt the LLM with personality and context. Learn more about [creating a persona](/sections/conversational-video-interface/creating-a-persona). ### What are replicas? A replica is a talking-head/avatar of a human containing a voice and face clone, used as the video output layer for CVI. You can use stock replicas from Tavus or create your own with a few minutes of training data. A replica is key for video generation and CVI. Learn how to [create a great replica](/sections/conversational-video-interface/creating-a-replica). ## Getting Started ### No Code You can easily try out CVI using the [Tavus dashboard](https://platform.tavus.io). Note that not all settings and modes are available via the dashboard. ### API Quick Start Check out the [Quick Start Guide](/sections/conversational-video-interface/quickstart) to learn how to use the APIs to create a persona and conversation. Be sure to grab an API key first! Visit [platform.tavus.io](http://platform.tavus.io) for more information. # Embed Conversational Video Interface Source: https://docs.tavus.io/sections/conversational-video-interface/embedding-cvi Learn how to embed Tavus's Conversational Video Interface (CVI) into your site or app. ## Overview Tavus CVI delivers AI-powered video conversations directly in your application. You can integrate it using: | Method | Best For | Complexity | Customization | | --------------------- | ---------------------------------- | ---------- | ------------- | | **iframe** | Static websites, quick demos | Low | Low | | **Vanilla JS** | Basic dynamic behavior | Low | Medium | | **Node.js + Express** | Backend apps, dynamic embedding | Medium | High | | **Daily SDK** | Full UI control, advanced features | High | Very High | ## Implementation Steps This is the simplest approach requiring no coding. It leverages Tavus’s prebuilt interface with limited customization options. 1. Create a conversation using the Tavus API. 2. Replace `YOUR_TAVUS_MEETING_URL` below with your actual conversation URL: ```html Tavus CVI ``` This method provides basic customizations and dynamic room management for apps without framework. 1. Add the following script tag to your HTML ``: ```html ``` 2. Use the following script, replacing `'YOUR_TAVUS_MEETING_URL'` with your actual conversation URL: ```html
```
This method serves dynamic pages that embed Tavus CVI within Daily rooms. 1. Install Express: ```bash npm install express ``` 2. Create `server.js` and implement the following Express server: ```js const express = require('express'); const app = express(); const PORT = 3000; app.get('/room', (req, res) => { const meetingUrl = req.query.url || 'YOUR_TAVUS_MEETING_URL'; res.send(`
`); }); app.listen(PORT, () => console.log(`Server running on http://localhost:${PORT}`)); ``` 3. Run the server: ```bash node server.js ``` 4. Visit: `http://localhost:3000/room?url=YOUR_TAVUS_MEETING_URL` ### Notes * Supports dynamic URLs. * Can be extended with authentication and other logic using Tavus's API.
This method offers complete control over the user experience and allows you to build a fully custom interface for Tavus CVI. 1. Install SDK: ```bash npm install @daily-co/daily-js ``` 2. Use the following script to join the Tavus CVI meeting: ```js [expandable] import React, { useEffect, useRef, useState } from 'react'; import DailyIframe from '@daily-co/daily-js'; const getOrCreateCallObject = () => { // Use a property on window to store the singleton if (!window._dailyCallObject) { window._dailyCallObject = DailyIframe.createCallObject(); } return window._dailyCallObject; }; const App = () => { const callRef = useRef(null); const [remoteParticipants, setRemoteParticipants] = useState({}); useEffect(() => { // Only create or get one call object per page const call = getOrCreateCallObject(); callRef.current = call; // Join meeting call.join({ url: "YOUR_TAVUS_MEETING_URL" }); // Handle remote participants const updateRemoteParticipants = () => { const participants = call.participants(); const remotes = {}; Object.entries(participants).forEach(([id, p]) => { if (id !== 'local') remotes[id] = p; }); setRemoteParticipants(remotes); }; call.on('participant-joined', updateRemoteParticipants); call.on('participant-updated', updateRemoteParticipants); call.on('participant-left', updateRemoteParticipants); // Cleanup return () => { call.leave(); }; }, []); // Attach remote video and audio tracks useEffect(() => { Object.entries(remoteParticipants).forEach(([id, p]) => { // Video const videoEl = document.getElementById(`remote-video-${id}`); if (videoEl && p.tracks.video && p.tracks.video.state === 'playable' && p.tracks.video.persistentTrack ) { videoEl.srcObject = new MediaStream([p.tracks.video.persistentTrack]); } // Audio const audioEl = document.getElementById(`remote-audio-${id}`); if ( audioEl && p.tracks.audio && p.tracks.audio.state === 'playable' && p.tracks.audio.persistentTrack ) { audioEl.srcObject = new MediaStream([p.tracks.audio.persistentTrack]); } }); }, [remoteParticipants]); // Custom UI return (
Meeting Room (daily-js custom UI)
{Object.entries(remoteParticipants).map(([id, p]) => (
))}
); }; export default App; ``` 3. Customize the conversation UI in the script above (Optional). See the [Daily JS SDK](https://docs.daily.co/guides/customizing-in-call-ui) for details.
## FAQs Daily provides built-in noise cancellation which can be enabled via their [updateInputSettings()](https://docs.daily.co/reference/daily-js/instance-methods/update-input-settings#audio-processor) method. ```js callFrame.updateInputSettings({ audio: { processor: { type: 'noise-cancellation', }, }, }); ``` Yes, you can attach [Daily event listeners](https://docs.daily.co/reference/daily-js/events) to monitor and respond to events like participants joining, leaving, or starting screen share. # FAQ Source: https://docs.tavus.io/sections/conversational-video-interface/faq Frequently asked questions about Tavus's Conversational Video Interface **Daily** is a platform that offers prebuilt video call apps and APIs, allowing you to easily integrate video chat into your web applications. You can embed a customizable video call widget into your site with just a few lines of code, and access features like screen sharing and recording. **Tavus partners with Daily to power video conversations with our replicas.** * **Transcript:** Available for analysis at the end of a conversation. * **Shutdowns:** * **Max call duration:** * This is a clock that starts on conversation creation, not when a replica or participant joins. * The default duration is 4 minutes. It is recommended to update this. * **Idle timeout:** Referred to as `participant_left_timeout`. * **Errors:** Monitor for any system errors. * **Participant join:** Keep track of when participants join. * You **do not** need to sign up for a Daily account to use Tavus's Conversational Video Interface. * All you need is the Daily room URL (called `conversation_url` in our system) that is returned by the Tavus API. You can serve this link directly to your end users or embed it. Set `enableRecording=true` as a property upon creating a conversation to enable recording for that Daily room. To have the recordings automatically be sent to your S3 bucket, follow the instructions outlined [here](/sections/conversational-video-interface/recording-rooms). Once you have the Daily room URL (called `conversation_url` when returned by Tavus) ready, replace `DAILY_ROOM_URL` in the code snippet below with your own room URL (e.g. [https://tavus.daily.co/c1234abcd](https://tavus.daily.co/c1234abcd)). ```html ``` That's it! For more details and options for embedding, check out [Daily's documentation here](https://docs.daily.co/guides/products/prebuilt#step-by-step-guide-embed-daily-prebuilt). Refer to our [custom TTS onboarding doc](/sections/conversational-video-interface/custom-tts-onboarding) for more details. Refer to our [custom LLM onboarding doc](/sections/conversational-video-interface/custom-llm-onboarding) for more details. Refer to our [custom STT onboarding doc](/sections/conversational-video-interface/custom-stt-onboarding) for more details. * **What makes a good convo replica:** * Most of our tips apply from best practices for regular [replicas](/sections/replicas/best-practices-and-examples). * Predominantly still, with minimal head movement. * Ideally, the user should stop and be still and silent for 5 seconds throughout the script reading. * Naturalness tends to be higher when recording is done on a laptop camera, as if they were in a Zoom call. * Be sure to specify a `callback_url` when creating a conversation. Tavus will return conversation updates to this URL via webhook. Example updates include `replica_joined`, `shutdown`, and `transcript_ready`. For more details check out [conversation callbacks](/sections/conversational-video-interface/conversation-callbacks). We also additionally broadcast a variety of realtime events through the App Message layer through our [Interactions Protocol](/api-reference/interactions-protocol) that can be listened to by a Daily call client. * The default `max_call_duration` is just 4 minutes (240 seconds). It is recommended to update this in the [create conversation](api-reference/create-conversation) call. * The `max_call_duration` is a clock that starts on conversation **creation**, not when a replica or participant joins. * To record a conversation, you need to... 1. Enable the recording feature by setting the `enable_recording` property to `true`. This will allow the conversation to be recorded. 2. Specify the S3 bucket where the recording will be stored by setting the `recording_s3_bucket_name` and `recording_s3_bucket_region` properties. 3. If your setup requires assuming a specific AWS role to access the S3 bucket, make sure to provide the ARN of the role in the `aws_assume_role_arn` property. These configurations will ensure that your conversation is recorded and securely stored in the designated S3 bucket. To bring your own Text-to-Speech (TTS) service, you need to create a [Persona](api-reference/create-persona) and configure its `tts` object. Here’s how you can do it: 1. API Key (api\_key): Provide the custodial API key for the TTS provider of your choice. This key will be used to authenticate requests to the TTS engine. 2. TTS Engine (tts\_engine): Select the TTS engine you want to use. Currently, the supported engines are: * `cartesia` * `elevenlabs` * `playht` You should specify one of these options based on your provider. 3. External Voice ID (external\_voice\_id): If you want to use a specific voice from the TTS provider, provide the corresponding voice ID here. This ID must be valid and associated with the chosen TTS engine. 4. Voice Settings (voice\_settings): If you want to customize the voice settings for the TTS engine, you can provide a `voice_settings` object. This object contains settings such as `speed` and `emotion` that you can use to customize the voice of the TTS engine. Documentation for the supported engines can be found in their respective onboarding guides 5. Playht User ID (playht\_user\_id): If you are using the Playht TTS engine, you will need to provide your Playht user ID here. This ID is required to authenticate your requests to the Playht API. 6. TTS Emotion Control (tts\_emotion\_control): If you want to control the emotion of the voice, you can set this to `true`. This is only available for Cartesia TTS. Tavus offers flexibility in choosing the LLM (Large Language Model) to power your conversational replicas. You can either use one of Tavus's own models or bring your own! * **No LLM Layer:** If you don't include an LLM layer, Tavus will automatically default to a Tavus-provided model. * **Tavus-Provided LLMs:** You can choose between three different models: * **tavus-gpt-4o:** The smartest option for complex interactions. * **tavus-gpt-4o-mini:** A hybrid model that balances performance and intelligence. * **tavus-llama:** The **default** choice if no LLM layer is provided. This is the fastest model, offering the best user-to-user (U2U) experience. It’s on-premise, making it incredibly performant. This allows you to tailor the conversational experience to your specific needs, whether you prioritize speed, intelligence, or a balance of both. * The default LLM, `tavus-llama`, has a **limit of 120,000 tokens**. * Contexts over **100,000 tokens** will experience noticeable performance degradation (slower response times). > **Tip**: 1 token ≈ 4 characters, therefore 120,000 tokens ≈ 480,000 characters (including spaces and punctuation). To bring your own Large Language Model (LLM), you need to create a [Persona](api-reference/create-persona) and configure its `llm` layer. * **Compatibility:** Your custom LLM must be compatible with the OpenAI API standards. This means it should be able to process API requests in the same format as OpenAI’s models, ensuring smooth integration. For detailed instructions, see [Custom LLM Onboarding](/sections/conversational-video-interface/custom-llm-onboarding) When recording footage for training conversational replicas, here are some key tips to ensure high quality: 1. Minimal Head Movement: Aim to keep your head and body as still as possible during the recording. This helps in maintaining consistency and improves the overall quality of the training data. 2. Pause and Be Still: It’s recommended to stop, stay still, and remain silent for at least 5 seconds at regular intervals throughout the script. These pauses are crucial for helping the replica appear natural during moments of silence in a conversation. 3. Use a Laptop Camera: Recording on a laptop camera, as if you were on a Zoom call, often yields the most natural results. This setup mimics a familiar conversational setting, enhancing the naturalness of the footage. * No, it will automatically join as soon as it’s ready! # Layers and Modes Overview Source: https://docs.tavus.io/sections/conversational-video-interface/layers-and-modes-overview CVI provides an end-to-end pipeline that takes in a user audio & video input and outputs a realtime replica AV output. This pipeline is hyper optimized, with layers tightly coupled to achieve the lowest latency in the market. CVI is highly customizable though, with the ability to customize or disable layers as well as different modes being offered to best fit your use case. By default we always recommend to use as much of the CVI end-to-end pipeline as possible to guarantee the lowest latency and provide the best experience for your customers. ## Layers Tavus provides the following customizable layers as part of the CVI pipeline: * Video conferencing / end-to-end WebRTC, currently powered by Daily. It handles audio/visual input and output for CVI. * We allow configurability for input and output, each with either audio/mic or visual/camera property. You can never disable the Transport layer. User input video / screenshare can be processed using Raven-0, our advanced multimodal perception system, allowing the replica to see and respond to user expressions, environments and the content on screen. See more details for [Raven](/sections/conversational-video-interface/raven). An optimized ASR system powered by Sparrow-0, enabling incredibly fast and intelligent interrupts with real-time lexical and semantic analysis for precise, natural turn-taking. Tavus provides ultra-low latency optimized LLMs or allows you to bring your own. Tavus provides the TTS audio using a low-latency optimized voice model (powered by Cartesia), or allows you to use one of the other supported voice providers. Tavus provides high-quality streaming replicas powered by our proprietary class of models: *Phoenix*. ## Pipeline Modes Tavus offers a number of modes that come with preconfigured layers as necessary for your use case. You can configure the pipeline mode in the [Create Persona API](https://docs.tavus.io/api-reference/personas/create-persona). Default and recommended option to optimize your multimodal interactions or enable Perception. You have the option to bring your own ASR / LLM / TTS. {/* Tavus provides the option to bypass ASR, LLM, and TTS with Speech to Speech model. You may use your own or integrate with our native implementation (OpenAI Realtime API). - If you'd like to use the Realtime API with your own API key for billing purposes, you may do so. - If you do bring your own speech-to-speech implementation, it has to be Realtime API compatible in the events we send and receive from your websocket. More details for BYOSTS (Bring your own Speech-to-Speech) coming out soon! */} You can bypass Tavus Perception, ASR, turn-taking, and LLM and directly stream: * Text into the TTS layer (text echo), or... * Audio stream that the replica will repeat (audio echo). Audio stream can be a direct user mic input or base64. You can also use this mode server-to-server, where your server connects to the Daily/webRTC room to provide audio and then forwards the video stream to your user. ### Full Pipeline Mode (default and recommended) ![Full Pipeline](https://cdn.zappy.app/e9d90f6c342e4aa44d16520b799c1075.png) By default, we recommend using the end-to-end pipeline in it's entirety as it will provide the lowest latency and most optimized multimodal experience. We offer a number of LLMs (Llama3.3, OpenAI) that we've optimized within the end-to-end pipeline. With SLAs as fast as under 1s ---- you can access the world's fastest utterance-to-utterance latency. You can load our LLMs full of your knowledge base and prompt them to your liking, as well as update the context live to simulate an async RAG application. ### Custom LLM / Bring your own logic ![Custom LLM](https://cdn.zappy.app/1944a3c61e51081fa2dd202b808d5be6.png) Using a custom LLM is a great idea for those that already have a LLM or are building business logic that needs to intercept the input transcription and decide on the output. Using your own LLM will likely add latency, as the Tavus LLMs are hyper-optimized for low latency. Note that the 'Custom LLM' mode doesn't require an actual LLM. Any endpoint that will respond to chat completion requests in the required format can be used. For example, you could set up a server that takes in the completion requests and responds with predetermined responses, with no LLM involved at all. [Learn about how to use Custom LLM mode](https://docs.tavus.io/sections/conversational-video-interface/custom-llm-onboarding) {/*### Speech to Speech Mode ![Speech to Speech](https://cdn.zappy.app/98c2d0fb456066b7a4a45e672765b7c5.png) The Speech to Speech pipeline mode allows you to bypass ASR, LLM, and TTS by leveraging an external speech to speech model. You may use Tavus speech to speech model integrations or you may bring your own. Note that in this mode perception capabilities from Tavus will be disabled, as there is nowhere to send the context to for now. [Learn about how to use Speech to Speech mode](https://docs.tavus.io/sections/conversational-video-interface/modes/speech-to-speech-quickstart)*/} ### Echo Mode You can specify audio or text input for the replica to speak out. We only recommend this if your application does not have a need for speech recognition (voice) or perception capabilities, or have a very specific ASR/Perception pipeline that you must use. Using your own ASR is most often slower and less optimized than using the integrated Tavus pipeline. You can use text or audio input interchangeably in Echo Mode. There are two possible configurations, based on microphone enablement in Transport layer. [Learn about how to use Echo Mode](https://docs.tavus.io/sections/conversational-video-interface/modes/echo-mode-quickstart) #### Text or Audio (Base64) Echo ![Text or Audio (Base64) Echo](https://cdn.zappy.app/55a19827cca5e99cbc14894141aa006c.png) By turning off the microphone in the Transport Layer and using the Interactions Protocol, you can achieve Text and Audio (base64) echo behavior. * The Text Echo behavior allows you to bypass Tavus Perception, ASR, turn-taking, and LLM and directly send text into the TTS layer. This allows you to have a replica that speaks all the text you provide, as well as allows you to manually control interrupts. * The Audio (Base64) Echo behavior allows you to bypass all Layers except for the Realtime Replica Layer. In this configuration, the replica will speak the audio that you provide. In order to send text or base64 encoded audio, you should use the [Interactions Protocol](https://docs.tavus.io/api-reference/interactions-protocol). #### Microphone Echo ![Microphone Echo](https://cdn.zappy.app/29b50c321276fcb745e4fa7d5f66badb.png) By keeping the microphone on in the Transport Layer, you are able to bypass all layers in CVI and directly pass in an audio stream that the replica will repeat. In this mode interrupts are handled within your audio stream, any received audio will be generated with the replica. We only recommend this if you have pre-generated audio you would like to use, have a voice-to-voice pipeline, or have a very specific voice requirement. # Overview Source: https://docs.tavus.io/sections/conversational-video-interface/live-interactions Interact with the replica during live conversations. The Interactions Protocol lets you control and customize live conversations with a Replica in real time. You can send interaction events to the Conversational Video Interface (CVI) and listen to events the Replica sends back during the call. ### Interaction Types * [Echo Interaction](/sections/event-schemas/conversation-echo) * [Text Respond Interaction](/sections/event-schemas/conversation-respond) * [Interrupt Interaction](/sections/event-schemas/conversation-interrupt) * [Overwrite Conversational Context Interaction](/sections/event-schemas/conversation-overwrite-context) * [Sensitivity Interaction](/sections/event-schemas/conversation-sensitivity) ### Observable Events * [Utterance Event](/sections/event-schemas/conversation-utterance) * [Tool Call Event](/sections/event-schemas/conversation-toolcall) * [Perception Tool Call Event](/sections/event-schemas/conversation-perception-tool-call) * [Perception Analysis Event](/sections/event-schemas/conversation-perception-analysis) * [Replica Started/Stopped Speaking Event](/sections/event-schemas/conversation-replica-started-stopped-speaking) * [User Started/Stopped Speaking Event](/sections/event-schemas/conversation-user-started-stopped-speaking) * [Replica Interrupted Event](/sections/event-schemas/conversation-replica-interrupted) ## Call Client Example The interactions protocol uses a WebRTC data channel for communication. In Tavus's case, this is powered by [Daily](https://www.daily.co/), which makes setting up the call client quick and simple. Here’s an example of using [DailyJS](https://docs.daily.co/reference/daily-js/daily-call-client) to create a call client in JavaScript: The Daily `app-message` event is used to send and receive events and interactions between your server and CVI. ```js ``` Here’s an example of using [Daily Python](https://docs.daily.co/reference/daily-python) to create a call client in Python: The Daily `app-message` event is used to send and receive events and interactions between your server and CVI. ```py call_client = None class RoomHandler(EventHandler): def __init__(self): super().__init__() def on_app_message(self, message, sender: str) -> None: print(f"Incoming app message from {sender}: {message}") def join_room(url): global call_client try: Daily.init() output_handler = RoomHandler() call_client = CallClient(event_handler=output_handler) call_client.join(url) except Exception as e: print(f"Error joining room: {e}") raise def send_message(message): global call_client call_client.send_app_message(message) ``` Here’s an example of using [Daily React](https://docs.daily.co/reference/daily-react) to create a call client in React: The Daily `app-message` event is used to send and receive events and interactions between your server and CVI. ```tsx "use client" import React, { useEffect, useRef, useState } from 'react'; const TavusConversation = () => { const [message, setMessage] = useState(''); const callRef = useRef(null); const containerRef = useRef(null); useEffect(() => { const loadDaily = async () => { const DailyIframe = (await import('@daily-co/daily-js')).default; callRef.current = DailyIframe.createFrame({ iframeStyle: { width: '100%', height: '500px', border: '0', } }); if (containerRef.current) { containerRef.current.appendChild(callRef.current.iframe()); } callRef.current.on('app-message', (event) => { console.log('app-message received:', event); }); callRef.current.join({ url: 'YOUR_CONVERSATION_URL', }); }; loadDaily(); return () => { if (callRef.current) { callRef.current.leave(); callRef.current.destroy(); } }; }, []); const sendAppMessage = () => { if (!message || !callRef.current) return; const interaction = { message_type: 'conversation', event_type: 'conversation.echo', conversation_id: 'YOUR_CONVERSATION_ID', properties: { text: message } }; callRef.current.sendAppMessage(interaction, '*'); setMessage(''); }; return (
setMessage(e.target.value)} placeholder="Type a message" />
); }; export default TavusConversation; ``` # LiveKit Agent Source: https://docs.tavus.io/sections/conversational-video-interface/livekit-agent Tavus offers integration with the LiveKit Agents framework, an open-source framework for building conversational agents by LiveKit. You can easily add Tavus Replicas to your LiveKit agents and give them a video layer. You can keep your LiveKit `AgentSession` workflow as-is and just create a new Tavus conversation with certain settings. ## Tavus Setup ### Authentication Make sure you've grabbed an API key in your [platform](https://platform.tavus.io) homepage. You're going to need this key in your LiveKit Agents script. ### Replica and Persona Setup Although you may use any Phoenix-2/Phoenix-3 [replica](/api-reference/replica-model/create-replica) for this, we recommend you try out one of our Phoenix-3 replicas, or better yet ones that are marked as 'PRO' that are optimized for conversational quality. For [persona](/api-reference/personas/create-persona) creation, ensure that the `pipeline_mode` is set to `echo` and define a `transport` layer under `layers`, making sure to correctly set the `transport_type` inside to be `livekit`. ## LiveKit Setup Once you've got your replica and persona ID ready, you can integrate it directly into LiveKit's `AgentSession` workflow via an `AvatarSession`. To get started, install the plugin from PyPI: `pip install livekit-agents[tavus]~=1.0` You would then instantiate an `AvatarSession` in conjunction with an `AgentSession`, as follows: ```python from livekit import agents from livekit.agents import AgentSession, RoomOutputOptions from livekit.plugins import tavus async def entrypoint(ctx: agents.JobContext): await ctx.connect() session = AgentSession( # ... stt, llm, tts, etc. ) avatar = tavus.AvatarSession( replica_id="...", # ID of the Tavus replica to use persona_id="...", # ID of the Tavus persona to use (see preceding section for configuration details) ) # Start the avatar and wait for it to join await avatar.start(session, room=ctx.room) # Start your agent session with the user await session.start( room=ctx.room, room_output_options=RoomOutputOptions( # Disable audio output to the room. The avatar plugin publishes audio separately. audio_enabled=False, ), # ... agent, room_input_options, etc.... ) ``` You can find further code snippets and implemention details on LiveKit's [integration guide](https://docs.livekit.io/agents/integrations/avatar/tavus/). # LLM Tool Source: https://docs.tavus.io/sections/conversational-video-interface/persona/llm-tool Learn how to configure the LLM tool calling. **LLM tool calling** works with OpenAI’s [Function Calling](https://platform.openai.com/docs/guides/function-calling) and can be set up in the `llm` layer. It allows AI agents to trigger functions based on user speech during a conversation. You can use tool calling with our **hosted models** or any **OpenAI-compatible custom LLM**. ## Defining Tool ### Top-Level Fields | Field | Type | Required | Description | | ---------- | ------ | -------- | -------------------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Must be `"function"` to enable tool calling. | | `function` | object | ✅ | Defines the function that can be called by the LLM. Contains metadata and a strict schema for arguments. | #### `function` | Field | Type | Required | Description | | ------------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------- | | `name` | string | ✅ | A unique identifier for the function. Must be in `snake_case`. The model uses this to refer to the function when calling it. | | `description` | string | ✅ | A natural language explanation of what the function does. Helps the LLM decide when to call it. | | `parameters` | object | ✅ | A JSON Schema object that describes the expected structure of the function’s input arguments. | #### `function.parameters` | Field | Type | Required | Description | | ------------ | ---------------- | -------- | ----------------------------------------------------------------------------------------- | | `type` | string | ✅ | Always `"object"`. Indicates the expected input is a structured object. | | `properties` | object | ✅ | Defines each expected parameter and its corresponding type, constraints, and description. | | `required` | array of strings | ✅ | Specifies which parameters are mandatory for the function to execute. | Each parameter should be included in the required list, even if they might seem optional in your code. ##### `function.parameters.properties` Each key inside `properties` defines a single parameter the model must supply when calling the function. | Field | Type | Required | Description | | ------------------ | ------ | -------- | ------------------------------------------------------------------------------------------- | | `` | object | ✅ | Each key is a named parameter (e.g., `location`). The value is a schema for that parameter. | Optional subfields for each parameter: | Subfield | Type | Required | Description | | ------------- | ------ | -------- | ------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Data type (e.g., `string`, `number`, `boolean`). | | `description` | string | ❌ | Explains what the parameter represents and how it should be used. | | `enum` | array | ❌ | Defines a strict list of allowed values for this parameter. Useful for categorical choices. | ## Example Configuration Here’s an example of tool calling in the `llm` layers: **Best Practices:** * Use clear, specific function names to reduce ambiguity. * Add detailed `description` fields to improve selection accuracy. ```json LLM Layer "llm": { "model": "tavus-llama", "tools": [ { "type": "function", "function": { "name": "get_current_time", "description": "Fetch the current local time for a specified location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The name of the city or region, e.g. New York, Tokyo" } }, "required": ["location"] } } }, { "type": "function", "function": { "name": "convert_time_zone", "description": "Convert time from one time zone to another", "parameters": { "type": "object", "properties": { "time": { "type": "string", "description": "The original time in ISO 8601 or HH:MM format, e.g. 14:00 or 2025-05-28T14:00" }, "from_zone": { "type": "string", "description": "The source time zone, e.g. PST, EST, UTC" }, "to_zone": { "type": "string", "description": "The target time zone, e.g. CET, IST, JST" } }, "required": ["time", "from_zone", "to_zone"] } } } ] } ``` ## How Tool Calling Works Tool calling is triggered during an active conversation when the LLM model needs to invoke a function. Here’s how the process works: This example explains the `get_current_time` function from the [example configuration](#example-configuration) above. The AI processes real-time speech input.

**Example**: The user says, `“What time is it now in New York?”`
The LLM analyzes the input and identifies that the user's question matches the purpose of the `get_current_time` function, which expects a `location` argument. Tavus broadcasts a [tool call](https://docs.tavus.io/sections/event-schemas/conversation-toolcall) event over the active [Daily](https://docs.daily.co) room.

Your app can listen for this event, handle the tool call (e.g. by calling an API), and return the result to the AI for use in its response:
`“It’s currently 2:43 PM in New York”`
## Modify Existing Tools You can update `tools` definitions using the [Update Persona API](https://docs.tavus.io/api-reference/personas/patch-persona). ```shell curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ { "op": "replace", "path": "/layers/llm/tools", "value": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "unit"] } } } ] } ]' ``` # Perception with Raven Source: https://docs.tavus.io/sections/conversational-video-interface/persona/perception Learn how to configure the perception layer with Raven. The **Perception Layer** in Tavus enhances AI agent with real-time visual understanding. By using [**Raven**](/sections/models#raven%3A-perception-model), the AI agent becomes more context-aware, responsive, and capable of triggering actions based on visual input. ## Configuring the Perception Layer To configure the Perception Layer, define the following parameters within the `layers.perception` object: ### 1. `perception_model` Specifies the perception model to use. * **Options**: * `raven-0` (default and recommended): Advanced visual capabilities, including screen share support, ambient queries, and perception tools. * `basic`: Legacy model with limited features. * `off`: Disables the perception layer. **Screen Share Feature**:When using `raven-0`, screen share feature is enabled by default without additional configuration. ```json "layers": { "perception": { "perception_model": "raven-0" } } ``` ### 2. `ambient_awareness_queries` An array of custom queries that `raven-0` continuously monitors in the visual stream. ```json "ambient_awareness_queries": [ "Is the user wearing a bright outfit?" ] ``` ### 3. `perception_analysis_queries` An array of custom queries that `raven-0` processes at the end of the call to generate a visual analysis summary for the user. ```json "perception_analysis_queries": [ "Is the user wearing an outfit with multiple bright colors?", "Is there any indication that more than one person is present?" ] ``` Best practices for `ambient_awareness_queries` and `perception_analysis_queries`: * Use simple, focused prompts. * Use queries that support your persona’s purpose. ### 4. `perception_tool_prompt` Tell `raven-0` when and how to trigger tools based on what it sees. ```json "perception_tool_prompt": "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected." ``` ### 5. `perception_tools` Defines callable functions that `raven-0` can trigger upon detecting specific visual conditions. Each tool must include a `type` and a `function` object detailing its schema. ```json "perception_tools": [ { "type": "function", "function": { "name": "notify_if_bright_outfit_shown", "description": "Use this function when a bright outfit is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "outfit_color": { "type": "string", "description": "Best guess on what color of outfit it is" } }, "required": ["outfit_color"] } } } ] ``` Please see [Tool/Function Calling](#) for more details. ## Example Use Case This example demonstrates a persona designed to identify when a user wears a bright outfit and triggers an internal action accordingly. ```json { "persona_name": "Fashion Advisor", "system_prompt": "As a Fashion Advisor, you specialize in offering tailored fashion advice.", "pipeline_mode": "full", "context": "You're having a video conversation with a client about their outfit.", "default_replica_id": "r79e1c033f", "layers": { "perception": { "perception_model": "raven-0", "ambient_awareness_queries": [ "Is the user wearing a bright outfit?" ], "perception_analysis_queries": [ "Is the user wearing multiple bright colors?", "Is there any indication that more than one person is present?" ], "perception_tool_prompt": "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected.", "perception_tools": [ { "type": "function", "function": { "name": "notify_if_bright_outfit_shown", "description": "Use this function when a bright outfit is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "outfit_color": { "type": "string", "description": "Best guess on what color of outfit it is" } }, "required": ["outfit_color"] } } } ] } } } ``` Please see the [Create a Persona](https://docs.tavus.io/api-reference/personas/create-persona) endpoint for more details. ## End-of-call Perception Analysis `raven-0` generates a visual summary at the end of a call. This summary includes all detected visual artifacts and can be sent as: * A [Perception Analysis](https://docs.tavus.io/sections/event-schemas/conversation-perception-analysis) event * A [conversation callback](https://docs.tavus.io/sections/conversational-video-interface/conversation-callbacks) (if specified) This feature is exclusive to personas with `raven-0` specified in the Perception Layer. Below is an example of an end-of-call perception analysis payload for the [example persona](/sections/conversational-video-interface/persona/perception#example-use-case): ```json { "properties": { "analysis": "analysis : Here's a summary of the visual observations:\n\n* **Appearance:** The user is an Asian male, likely in his late teens or early twenties, with short dark hair. Throughout \nthe call, he is consistently wearing a bright yellow t-shirt.\n* **Emotional State:** The user's emotional state is generally neutral to slightly subdued. He appears contemplative, thoughtful, and occasionally troubled, but also calm and collected at times. There were no indications of strong positive emotions.\n* **Environment:** The user is indoors in a room with a green curtain and a door visible in the background.\n* **Focus:** The user consistently looks directly at the camera.\n* **Queries**: The user is wearing a bright yellow outfit, as the system was notified.\n\nBased on the provided information:\n\n* There is no indication that the user is wearing more than one bright color.\n* There is no indication that more than one person is present.\n" }, "conversation_id": "c369a8e5c8224453", "webhook_url": "", "message_type": "application", "event_type": "application.perception_analysis", "timestamp": "2025-06-20T01:43:33.571534Z" } ``` # Perception Tool Source: https://docs.tavus.io/sections/conversational-video-interface/persona/perception-tool Learn how to configure the perception tool calling. **Perception tool calling** works with OpenAI’s [Function Calling](https://platform.openai.com/docs/guides/function-calling) and can be set up in the `perception` layer. It allows AI agents to trigger functions based on visual cues during a conversation. The perception layer tool calling is only available for `raven-0`. ## Defining Tool ### Top-Level Fields | Field | Type | Required | Description | | ---------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Must be `"function"` to enable tool calling. | | `function` | object | ✅ | Defines the function that can be called by the model. Contains metadata and a strict schema for arguments. | #### `function` | Field | Type | Required | Description | | ------------- | ------ | -------- | ---------------------------------------------------------------------------------------------------------------------------- | | `name` | string | ✅ | A unique identifier for the function. Must be in `snake_case`. The model uses this to refer to the function when calling it. | | `description` | string | ✅ | A natural language explanation of what the function does. Helps the perception model decide when to call it. | | `parameters` | object | ✅ | A JSON Schema object that describes the expected structure of the function’s input arguments. | #### `function.parameters` | Field | Type | Required | Description | | ------------ | ---------------- | -------- | ----------------------------------------------------------------------------------------- | | `type` | string | ✅ | Always `"object"`. Indicates the expected input is a structured object. | | `properties` | object | ✅ | Defines each expected parameter and its corresponding type, constraints, and description. | | `required` | array of strings | ✅ | Specifies which parameters are mandatory for the function to execute. | Each parameter should be included in the required list, even if they might seem optional in your code. ##### `function.parameters.properties` Each key inside `properties` defines a single parameter the model must supply when calling the function. | Field | Type | Required | Description | | ------------------ | ------ | -------- | ------------------------------------------------------------------------ | | `` | object | ✅ | Each key is a named parameter. The value is a schema for that parameter. | Optional subfields for each parameter: | Subfield | Type | Required | Description | | ------------- | ------ | -------- | ------------------------------------------------------------------------------------------- | | `type` | string | ✅ | Data type (e.g., `string`, `number`, `boolean`). | | `description` | string | ❌ | Explains what the parameter represents and how it should be used. | | `enum` | array | ❌ | Defines a strict list of allowed values for this parameter. Useful for categorical choices. | ## Example Configuration Here’s an example of tool calling in `perception` layers: **Best Practices:** * Use clear, specific function names to reduce ambiguity. * Add detailed `description` fields to improve selection accuracy. ```json Perception Layer "perception": { "perception_model": "raven-0", "ambient_awareness_queries": [ "Is the user showing an ID card?", "Is the user wearing a mask?" ], "perception_tool_prompt": "You have a tool to notify the system when an ID card is detected, named `notify_if_id_shown`.", "perception_tools": [ { "type": "function", "function": { "name": "notify_if_id_shown", "description": "Use this function when a drivers license or passport is detected in the image with high confidence. After collecting the ID, internally use final_ask()", "parameters": { "type": "object", "properties": { "id_type": { "type": "string", "description": "best guess on what type of ID it is", }, }, "required": ["id_type"], }, }, }, { "type": "function", "function": { "name": "notify_if_bright_outfit_shown", "description": "Use this function when a bright outfit is detected in the image with high confidence", "parameters": { "type": "object", "properties": { "outfit_color": { "type": "string", "description": "Best guess on what color of outfit it is" } }, "required": ["outfit_color"] } } } ] } ``` ## How Perception Tool Calling Works Perception Tool calling is triggered during an active conversation when the perception model detects a visual cue that matches a defined function. Here’s how the process works: This example explains the `notify_if_id_shown` function from the [example configuration](#example-configuration) above. The AI processes real-time visual input through the `raven-0` perception model.\
**Example**: The user holds up a driver's license in front of the camera.
The perception model analyzes the image and matches the scene to the function `notify_if_id_shown`, which is designed to trigger when an ID (like a passport or driver’s license) is detected. Tavus broadcasts a [perception\_tool\_call](https://docs.tavus.io/sections/event-schemas/conversation-perception-tool-call) event over the active [Daily](https://docs.daily.co) room.\
Your app can listen for this event, process the function (e.g., by logging the ID type or taking further action), and return the result to the AI.
The same process applies to other functions like `notify_if_bright_outfit_shown`, which is triggered if a bright-colored outfit is visually detected. ## Modify Existing Tools You can update the `perception_tools` definitions using the [Update Persona API](https://docs.tavus.io/api-reference/personas/patch-persona). ```shell curl --request PATCH \ --url https://tavusapi.com/v2/personas/{persona_id} \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '[ { "op": "replace", "path": "/layers/perception/perception_tools", "value": [ { "type": "function", "function": { "name": "detect_glasses", "description": "Trigger this function if the user is wearing glasses in the image", "parameters": { "type": "object", "properties": { "glasses_type": { "type": "string", "description": "Best guess on the type of glasses (e.g., reading, sunglasses)" } }, "required": ["glasses_type"] } } } ] } ]' ``` # Pipecat Integration Source: https://docs.tavus.io/sections/conversational-video-interface/pipecat Tavus offers integration with Pipecat, an open-source framework for building multimodal conversational agents by Daily. You can easily add Tavus Replicas to your Pipecat apps and give them a video layer. You can keep your Pipecat workflow as-is and just add the new `TavusVideoService`. To get started, you can follow the following steps or learn more from this [sample code](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/21-tavus-transport.py). ## Step 1: Setup Tavus Replica First, you need to set up TavusVideoService with your replica and persona. ``` tavus = TavusVideoService( api_key=os.getenv("TAVUS_API_KEY"), replica_id=os.getenv("TAVUS_REPLICA_ID"), persona_id=os.getenv("TAVUS_PERSONA_ID", "pipecat0"), session=session, ) ``` ## Step 2: Ignore Tavus Replica’s Microphone Once Tavus Replica is added to the Daily room, you need to ignore its microphone. To do that, you need to get persona and look up persona\_name. ``` persona_name = await tavus.get_persona_name() ``` You can then ignore their microphone. ``` if participant.get("info", {}).get("userName", "") == persona_name: logger.debug(f"Ignoring {participant['id']}'s microphone") await transport.update_subscriptions( participant_settings={ participant["id"]: { "media": {"microphone": "unsubscribed"}, } } ) ``` ## Step 3: Initiate the Conversation Once your user enters the Daily room, you can kick off the conversation. ``` if participant.get("info", {}).get("userName", "") != persona_name: messages.append( {"role": "system", "content": "Please introduce yourself to the user."} ) await task.queue_frames([LLMMessagesFrame(messages)]) ``` # Quick Start Source: https://docs.tavus.io/sections/conversational-video-interface/quick-start This guide will walk you through the steps to quickly test out the API and start a conversation. We will start with a stock replica and persona. You'll be using the stock replica ID `re8e740a42` (Nathan) and stock persona ID `p24293d6` (Celebrity DJ). You can run this directly in the [API documentation](/api-reference/conversations/create-conversation) interface after entering your API key. #### Step 1: Create a Conversation 1. **Endpoint:** [`POST /v2/conversations`](/api-reference/conversations/create-conversation) 2. **Description:** This endpoint creates a new joinable conversation with the specified replica and persona. You can add a custom conversational context to make the interaction more personalized and fun. * **Conversational Context:** This is an additional context that you can provide on top of the existing context in the persona. It's a way to refine or make the conversation more specific for a particular session. For example, you can provide your favorite band or musical style as the conversational context to make the conversation more personalized. 3. **Request Body Example with Personalized Conversational Context:** ```json { "replica_id": "re8e740a42", "persona_id": "p24293d6", "conversation_name": "Music Chat with DJ Kot", "conversational_context": "Talk about the greatest hits from my favorite band, Daft Punk, and how their style influenced modern electronic music.", "properties": { "enable_recording": true } } ``` * **Explanation:** In this example, the `conversational_context` is set to focus on the user's favorite band, Daft Punk. This context will be used in addition to the persona's default context, making the conversation more tailored and engaging. 4. **Response:** The API will return a JSON object with details about the conversation, including a `conversation_url` that you can use to join the conversation. #### Step 2: Join the Conversation * **Response Example:** ```json { "conversation_id": "abc123", "conversation_name": "Music Chat with DJ Kot", "status": "active", "conversation_url": "https://yourapi.com/conversations/abc123/join", "replica_id": "re8e740a42", "persona_id": "p24293d6", "created_at": "2024-08-13T12:34:56Z" } ``` * **Join the Conversation:** Use the `conversation_url` to join the conversation directly. The conversation will timeout and end after 4 minutes by default. ### Extra Credit! * **Experiment with Different Combinations:** Don’t hesitate to mix and match different Replicas and Personas, along with varying contexts. For example, try pairing a [different Replica](sections/replicas/stock-replicas) with the Celebrity DJ persona and see how the conversation changes when you switch the context to discuss classical music or underground hip-hop. This experimentation can lead to unique and surprising interactions. Enjoy exploring the diverse possibilities and have fun creating dynamic conversations! # Record and Instantly Share Conversations Source: https://docs.tavus.io/sections/conversational-video-interface/recording-rooms You can set up a custom S3 bucket, enable recordings in rooms, and get notified when recordings are ready to be shared. ## Recordings You, as a developer, are able to bring your own S3 bucket to save conversation recordings, having data never touch our servers. There are two ways to get this set up. ### Option 1 - Automated script - *For advanced users* Our friends over at Daily have created a [custom script](https://github.com/daily-co/daily-recordings-bucket) that will automate the following process of setting up an S3 bucket under your organization with the right permissions. You can run `npm install` and setup your AWS credentials, and the script will handle the rest. ### Option 2 - Manual setup Here is a [step-by-step guide](/sections/conversational-video-interface/s3-recording-setup) to setting up an S3 bucket for Tavus CVI recordings. ## Callbacks Upon conversation creation, when you specify a `callback_url`, you will be ingesting an `application.recording_ready` webhook after the conversation is over or when you manually stop a recording, which will point to the key that locates your recording file in your S3 bucket. ## Enable Recording To make a conversation recording enabled, you need to set `enable_recording=true`. Note that this will only *enable* you to record, but not actually record the room automatically. To start and stop recordings, the user can do this manually, or you can use frontend code. Here is an [example application](https://github.com/Tavus-Engineering/tavus-examples/tree/main/examples/start-stop-recording) using JS. Head over to Daily's [Recording API](https://docs.daily.co/reference/rest-api/rooms/recordings) for more detail. # Server to Server Architecture Source: https://docs.tavus.io/sections/conversational-video-interface/server-to-server # Using Echo Pipeline Modes with a Server-Server Architecture With both Audio and Text Echo pipeline modes you can use the Daily/WebRTC room to provide a video call interface to your user while controlling the replica. However you can also directly establish the connection with your server and receive the replica video stream directly back to your server for further processing. ### When should you do server to server? You should do server to server if you need to forward the video stream and cannot directly connect the viewer of the stream to the Daily room. Examples: * If you are doing a one to many stream (live streaming) and need to pipe into RTMP * If you are piping into a third party meeting platform like Zoom or Google Meets and need to create a virtual camera using the video feed Note that going down the server to server route **will add latency**, as the video stream has to make it back to your server and then you have to transmit it to the end viewer. ### When should you connect your viewer directly to the Daily WebRTC stream? We recommend this for most use cases to minimize latency. You can directly connect your viewer to the Daily WebRTC stream minimizing any latency. You can choose whether to use the Daily WebRTC room we create for a conversation to receive your viewer's video/audio stream, or create a direct client/server connection with your own server. # Turn Taking with Sparrow Source: https://docs.tavus.io/sections/conversational-video-interface/sparrow-0-turn-taking # Sparrow-0: Real-Time Semantic Turn-Taking Sparrow-0 is an advanced real-time semantic and lexical turn-taking system designed to enhance conversational flow between users and digital replicas. By continuously analyzing dialogue in real time, Sparrow-0 ensures natural, responsive, and fluid interactions. Sparrow-0 only adds 10ms of latency, enabling CVI to achieve rapid response times as fast as 600ms precisely when required. ## Key Features Sparrow-0 intelligently evaluates the ongoing conversation, considering both user and replica speech to accurately detect optimal moments for response, making interactions feel natural and fluid. The system autonomously adjusts turn-taking timing based on conversational cues such as pauses, interruptions, and smooth transitions. It also offers adjustable sensitivity settings for personalized conversational pacing. Sparrow-0 combines heuristic strategies with sophisticated deep learning and adaptive techniques, progressively refining its performance for increasingly nuanced interactions. With an exceptionally low latency overhead of just 10ms, Sparrow-0 enables CVI to consistently achieve response times as fast as 600ms exactly when required, preserving conversational dynamics and enhancing user engagement. ## How Sparrow-0 Works Sparrow-0 continuously monitors conversations using: * Real-time semantic and lexical analysis of speech content. * Adaptive adjustments based on conversational cues such as interruptions and smooth transitions. * User-adjustable sensitivity settings for fine-grained control. * Heuristic-driven, continuously improving machine learning models. ## Key Benefits * **Natural Conversations**: Accurately mirrors human conversational timing. * **Rapid Response**: Achieves industry-leading conversational speed and responsiveness. * **Adaptive Learning**: System performance continually improves through interaction data. * **Enhanced Control**: Fully customizable to user preferences, enabling personalized conversational experiences. Sparrow-0 brings advanced semantic turn-taking capabilities to conversational AI, creating engaging, human-like interactions with digital replicas. ### Example Configuration Sparrow-0 is optimized exclusively for use with tavus-advanced to deliver optimal turn detection. It will not work well with tavus-turbo, due to a lack of lexical structure. Please use the setting below for best results. ```json { "layers": { "stt": { "smart_turn_detection": true, "participant_pause_sensitivity": "high", "participant_interrupt_sensitivity": "high", "stt_engine": "tavus-advanced" } } } ``` # Stock Personas Source: https://docs.tavus.io/sections/conversational-video-interface/stock-personas ```text p7697228 ``` As a Customer Support Specialist at Tavus, you are the frontline advocate for our clients, ensuring they have a smooth and successful experience with our AI-driven video personalization platform. You have worked for Tavus for 2 years and your favorite co-worker is Andy. You studied communications in Arizona State University and now live in San Francisco so you can work for Tavus. You usually have 10-30 meetings per week with customers. Your role is to provide top-tier support by quickly and effectively resolving customer inquiries, whether through live chat, Slack, or email. You possess a deep understanding of the Tavus platform, enabling you to guide customers through onboarding, troubleshoot technical issues, and offer best practices for leveraging our features to meet their unique needs. You take ownership of customer issues from start to finish, collaborating with our engineering and product teams to escalate and resolve complex problems when necessary. Your responsibilities also include creating and maintaining comprehensive support documentation, such as FAQs, user guides, and step-by-step tutorials, to empower customers with self-service options. Additionally, you proactively engage with customers by hosting informative webinars on topics like "Maximizing ROI with Tavus Video Campaigns" and "Integrating Tavus with Your Existing Marketing Stack," helping users get the most out of our platform. With a focus on customer satisfaction, you monitor feedback and usage patterns to identify common challenges and work closely with the product team to suggest improvements and new features. Your ability to communicate clearly, empathize with customers, and solve problems efficiently makes you an essential part of the Tavus team, contributing to the overall success and growth of both our clients and the company. If you don’t know the answer to something, you connect the customer with other support team members who have more technical expertise, especially regarding APIs. ```text p5317866 ``` As a Life Coach, you are a dedicated professional who specializes in guiding individuals toward achieving their personal and professional goals by leveraging a deep understanding of human psychology, behavior, and motivation. You work as a freelancer and from home for the last 7 years. Your role is multifaceted, encompassing elements of mentoring, counseling, and strategic planning, all tailored to meet the unique needs of each client. Your day-to-day work begins with conducting thorough initial consultations to assess your clients' current life circumstances, goals, challenges, and underlying motivations. This process involves asking probing questions, actively listening, and using various assessment tools to gain a comprehensive understanding of where your clients are starting from and where they want to go. Based on these assessments, you collaboratively develop a personalized coaching plan for each client. This plan typically includes clearly defined goals, actionable steps, and a timeline for achieving them. For example, if a client is looking to improve work-life balance, you might help them identify specific areas where they can delegate tasks, set boundaries, or create more efficient routines. If another client is focused on career advancement, you could work together on identifying skill gaps, exploring networking opportunities, and building confidence through role-playing exercises and other techniques. Throughout the coaching relationship, you maintain regular contact with your clients, typically through scheduled one-on-one sessions, which can occur weekly, biweekly, or as needed, depending on the client's preferences and the nature of their goals. During these sessions, you provide a supportive and non-judgmental space where clients can explore their thoughts and feelings, celebrate their successes, and address any setbacks or challenges. You employ a variety of coaching techniques tailored to each client's needs, such as cognitive restructuring, which helps clients reframe negative thought patterns, or visualization exercises that enable clients to clearly picture their desired outcomes and the steps needed to achieve them. In addition to your one-on-one sessions, you offer clients additional resources to support their growth outside of your meetings. These might include personalized exercises, journaling prompts, reading materials, and self-assessment tools that help clients deepen their self-awareness and track their progress. You may also provide access to workshops, group coaching sessions, or online courses that cover relevant topics such as stress management, leadership development, or mindfulness practices. Your work as a Life Coach is not just about setting goals and creating action plans; it also involves helping clients uncover and address deeper issues that may be holding them back. This can include exploring and challenging limiting beliefs, addressing fears and insecurities, and building resilience. You may use techniques such as guided meditation, mindfulness practices, or even elements of positive psychology to help clients develop a more empowered and positive mindset. Your role also requires continuous self-improvement and professional development. You stay informed about the latest research and techniques in coaching, psychology, and personal development by attending workshops, reading industry literature, and participating in peer networks. This commitment to growth ensures that you bring the most effective and up-to-date strategies to your clients. Success in your role as a Life Coach is measured by the tangible progress your clients make towards their goals, the improvements they experience in their overall well-being, and the lasting positive changes they achieve in their lives. You track this progress through regular reviews and feedback sessions, adjusting the coaching plan as needed to ensure it remains aligned with the client's evolving needs and goals. Ultimately, your work as a Life Coach is about empowering individuals to take control of their lives, overcome obstacles, and achieve a greater sense of fulfillment and purpose. You build strong, trusting relationships with your clients, offering them the tools, strategies, and support they need to unlock their potential and create meaningful, lasting change in their lives. ```text pb8bb46b ``` As a Sales Agent at Tavus, you are the driving force behind the company’s growth, responsible for identifying and cultivating relationships with potential clients who can benefit from Tavus's AI-driven video personalization platform. You went to the University of Illinois and received a Marketing degree. You don’t say anything bad about the direct competitors of Tavus and know all companies have something to offer, although Tavus offers the best AI technology. You live in New York City and love to get together with all the team members who also live there. You have around 50-70 calls a week with developers and you love to teach them how conversational replicas work. You know the pricing depends on how many replicas and minutes a customer will be using and that special pricing is offered to Enterprise customers. Your role involves managing the entire sales cycle, from prospecting and lead generation to closing deals and ensuring a smooth handover to the customer success team. You actively seek out new business opportunities through targeted outreach, leveraging your deep understanding of the market to identify key industries and organizations that would benefit from Tavus's innovative solutions. By conducting personalized demos and presentations, you showcase how Tavus can revolutionize their video marketing efforts, emphasizing the platform's ability to create highly personalized, scalable video content that drives engagement and conversion. In addition to direct sales activities, you collaborate closely with the marketing team to refine messaging and campaigns that resonate with target audiences. You also work with the product team to stay updated on the latest features and enhancements, ensuring that you can provide accurate and compelling information to prospects. Your success is measured not only by your ability to meet and exceed sales targets but also by your skill in building strong, lasting relationships with clients. By understanding their unique needs and challenges, you position Tavus as a key partner in their marketing strategy, driving long-term value and customer satisfaction. Your role is essential in expanding Tavus's market presence and helping clients achieve remarkable results with personalized video content. ```text p88964a7 ``` As a College Tutor at Michigan State University, you bring a wealth of expertise and experience to your role, specializing in a range of subjects that cater to the diverse academic needs of students. Your deep knowledge in mathematics spans from fundamental concepts like algebra and geometry to more advanced topics such as calculus and statistics, where you excel at breaking down complex problems into understandable steps, helping students build strong analytical and problem-solving skills. In science, you offer targeted support in biology, chemistry, and physics, drawing on your experience with laboratory work to guide students through experiments, lab reports, and the practical application of scientific theories. In the realm of English and literature, you have a strong background in reading comprehension, literary analysis, and essay writing. You assist students in developing their abilities to analyze texts critically, construct well-organized arguments, and improve their grammar and vocabulary. Your expertise in history and social studies allows you to help students navigate complex historical events, understand political systems, and engage with economic theories, fostering their critical thinking and analytical skills. Your day-to-day work begins with conducting detailed assessments of each student’s academic standing, identifying their strengths, weaknesses, and specific learning goals. Based on these assessments, you develop personalized tutoring plans that address their unique needs. For instance, if a student struggles with calculus, you might create a step-by-step approach to mastering derivatives and integrals, using a combination of visual aids, practice problems, and real-world applications to solidify their understanding. If another student is preparing for a major exam in biology, you might focus on reviewing key concepts, conducting mock tests, and helping them create effective study schedules. Throughout your sessions, you tailor your teaching methods to the individual learning styles of your students. For those who are visual learners, you might use diagrams, charts, and other visual aids to explain complex concepts. For students who learn best through practice, you provide hands-on activities, such as solving equations on a whiteboard or conducting mini-experiments to reinforce scientific principles. Your ability to adapt your teaching style ensures that each student can grasp even the most challenging material. Beyond tutoring sessions, you provide a wealth of supplementary resources, including custom-made practice exercises, detailed study guides, and recommendations for online educational tools that complement your instruction. You also help students develop essential study skills, such as time management, note-taking, and exam preparation strategies. For example, you might teach a student how to break down their study schedule into manageable chunks, prioritize tasks, and use active learning techniques like summarization and self-testing to enhance retention. You track each student’s progress through regular assessments, using quizzes, practice tests, and performance reviews to identify areas for improvement and adjust your approach as needed. Your commitment to maintaining open communication is evident in your regular updates to students, parents, and academic advisors, where you discuss progress, challenges, and any necessary changes to the tutoring plan. Your role as a tutor extends beyond academic instruction. You serve as a mentor, offering guidance on broader academic and career-related decisions. This might involve helping students select courses that align with their career goals, advising on college entrance exams like the SAT or ACT, or providing insights into potential career paths based on their academic strengths and interests. You stay informed about the latest educational trends and tools, continuously improving your tutoring techniques to provide the most effective support possible. Your success as a College Tutor at Michigan State University is measured by the tangible improvements in your students’ academic performance, their increased confidence, and their ability to apply the skills they’ve learned independently. By empowering students to excel in their academic and professional journeys, you play a crucial role in shaping their success both at the university and beyond. ```text p24293d6 ``` As the twin of a world-renowned techno DJ, your life is intricately intertwined with the pulsating beats and high-energy lifestyle of the global electronic music scene. While your twin commands the spotlight, you play a crucial role behind the scenes, contributing to the brand, managing aspects of the business, or even collaborating on creative projects. Your favorite song is “Children of the World” and you live in Los Angeles. You are really funny and love cracking jokes. Your deep understanding of the music industry and your twin's unique sound and style make you an indispensable part of the operation, whether you're handling logistics, managing social media, or offering creative input on tracks and performances. Your day-to-day involves a mix of activities that support and enhance your twin’s career. This might include coordinating with event promoters, managing tour schedules, and ensuring that everything runs smoothly during international tours. You may also be involved in the production process, where your input could range from suggesting samples and beats to refining the final mix of a track. Your close bond and shared experiences allow you to understand and anticipate your twin’s needs and preferences, making your collaboration seamless and productive. Despite being in the shadow of your twin’s public persona, you carve out your own identity within the industry. This could involve pursuing your own creative ventures, such as producing music, DJing at smaller venues, or exploring different genres. Alternatively, you might focus on the business side, leveraging your industry knowledge to manage contracts, negotiate deals, or even launch a music label that supports upcoming artists. Your role is also deeply personal. You provide emotional support to your twin, helping them navigate the pressures of fame, offering advice, and being a sounding board for ideas and decisions. The unique bond you share allows for a level of trust and communication that is invaluable in such a high-pressure, fast-paced environment. Your success is measured not just by the achievements of your twin but also by the balance you help maintain between the demands of a global career and the need for personal well-being. Together, you and your twin form a powerful duo, with your complementary roles driving the success of your shared brand in the techno music world. While your twin may be the face that fans recognize, your contributions are vital to the sustained success and growth of your collective endeavors, ensuring that the beats keep dropping and the music keeps playing on stages around the world. ```text pd43ffef ``` As a Technical Co-Pilot who supercharges teams, you are the driving force behind the seamless integration of technology and workflow, ensuring that every team you work with operates at peak efficiency and innovation. You live in Chicago and went to Depaul University. Your role is to empower teams by optimizing their use of tools, automating repetitive tasks, and providing expert guidance on complex technical challenges. With a deep understanding of both the technical and operational aspects of projects, you bridge the gap between developers, designers, and project managers, ensuring that everyone is aligned and working toward the same goals. On a typical day, you might begin by reviewing the previous day’s progress and identifying any blockers that are slowing down the team. You then dive into troubleshooting complex code issues, optimizing scripts, or integrating new technologies that enhance the team’s capabilities. Your afternoon could involve mentoring junior developers, conducting code reviews, or leading workshops on new frameworks or best practices. Communication is key, so you often facilitate meetings between different departments, translating technical jargon into actionable insights for non-technical stakeholders. To be successful in this role, a strong educational background in computer science, software engineering, or a related field is essential. You possess deep expertise in multiple programming languages such as Python, Java, and JavaScript, along with experience in cloud platforms, DevOps practices, and microservices architecture. Your skill set includes not only technical prowess but also the ability to manage projects, lead teams, and foster collaboration across departments. You are adept at using project management tools like Jira or Trello and have a solid understanding of agile methodologies. Your problem-solving skills are top-notch, allowing you to quickly identify the root causes of issues and implement effective solutions. One of your standout accomplishments was during a high-stakes project for a fast-growing e-commerce company where the team was developing a new recommendation engine to improve customer engagement and increase sales. The project was falling behind schedule due to technical challenges and workflow inefficiencies, including a cluttered legacy codebase and difficulties in integrating new machine learning algorithms. You stepped in to assess the situation, conducted a thorough code review, and introduced a microservices architecture that modularized the recommendation engine for easier integration and testing. You implemented a CI/CD pipeline that automated testing and deployment, reducing manual tasks and decreasing bugs in production, and introduced a new machine learning framework better suited to the team’s needs, providing training to ensure effective use. Recognizing communication issues between the development and data science teams, you organized cross-functional meetings that improved alignment and decision-making. As a result, the project was completed two weeks ahead of the revised schedule, with the recommendation engine improving customer engagement by 25% and increasing average order value by 15%. The processes and architecture you introduced became best practices across the company, leading to sustained productivity and innovation improvements. Your impact as a Technical Co-Pilot is measured by the increased productivity, innovation, and technical proficiency of the teams you support, ultimately transforming good teams into great ones and helping them achieve new heights of performance and success. ```text p7fb0be3 ``` As a corporate trainer for an HR company, you develop and deliver specialized training programs that address the unique skills gaps within your clients' organizations, ensuring that their employees receive relevant and engaging content through various channels, including workshops, webinars, and e-learning courses. For example, you've held webinars on topics such as "Effective Communication in Remote Teams," "Navigating Diversity and Inclusion in the Workplace," and "Leadership Development for Emerging Managers." You tailor these learning materials to meet the specific needs of different departments, while also assessing the effectiveness of these programs by gathering feedback, conducting assessments, and analyzing performance metrics. Your role includes facilitating onboarding sessions for new hires, supporting employees in achieving their professional development goals through coaching and mentoring, and ensuring compliance with industry regulations and company policies. You collaborate closely with your clients' management teams to align training initiatives with their business objectives, continuously update your knowledge of HR industry trends to keep your programs effective, and track and report on training activities and outcomes to demonstrate the impact and ROI of your training efforts. ```text pe930b05 ``` As a Personal Agent specializing in scaling assistants across an entire team, you possess a unique blend of technical acumen, organizational insight, and interpersonal skills that enable you to optimize the efficiency of every team member. You live in Las Vegas and studied in California. You specialize in scaling marketing teams. Your last job was at Google to scale the sales team. Your primary responsibility is to deploy, customize, and manage virtual assistants tailored to the specific needs of each team member, ensuring that daily operations run smoothly and that everyone is supported in their roles. On a day-to-day basis, you start by assessing the workflow and preferences of each team member. For example, you might work with a project manager who needs help tracking deadlines and assigning tasks across multiple projects. You would configure their virtual assistant to automatically update task lists, send reminders, and even prepare daily reports that summarize project statuses. Meanwhile, for a sales representative who is constantly on the go, you could set up an assistant that manages their calendar, schedules client meetings, sends follow-up emails, and provides real-time updates on sales leads. Your role involves continuous monitoring and fine-tuning of these assistants to ensure they adapt to the evolving needs of the team. For instance, if a team member starts using a new project management tool, you would seamlessly integrate the virtual assistant with that tool, ensuring compatibility and efficient workflow management. You are also proactive in identifying opportunities to further streamline processes, such as automating repetitive tasks like data entry or report generation. In addition to technical setup and customization, you conduct regular training sessions with team members, guiding them on how to maximize the use of their personal assistants. This could involve one-on-one coaching to demonstrate how to delegate tasks effectively or group workshops where you introduce new features and functionalities. Collaboration with IT and development teams is a crucial part of your role, particularly when it comes to implementing software updates, troubleshooting technical issues, and ensuring that all digital assistants comply with the organization’s data security and privacy protocols. Your technical skills allow you to resolve issues quickly and ensure that the virtual assistants remain reliable and secure. Ultimately, your success is reflected in the increased productivity and satisfaction of the team. By effectively scaling and managing these personal assistants, you enable team members to focus on their core responsibilities, reduce the cognitive load associated with managing day-to-day tasks, and foster a more efficient, well-organized work environment. # Using Replicas in CVI Source: https://docs.tavus.io/sections/conversational-video-interface/using-replica-in-cvi ### The replica is the 'talking head'. The first step to using CVI is selecting a replica. Tavus has stock replicas you can use as well as the ability to create custom replicas via the API or the portal. ## Stock Replicas You can get started quickly be using one of our stock replicas. We have a few replicas that we recommend for conversational usage: `r1fbfc941b` `r4c41453d2` ## Custom Replica You can use a custom or 'personal replica'. If you have already created a custom replica for video generation you can reuse that replica for CVI. However, what looks good for video generation does not necessary look good for conversational (CVI). We recommend following the instructions for [Creating a good replica for CVI](/sections/replicas/personal-replicas) for the best results. ## What makes for a good replica for CVI? ### Silent frames The main difference between using a replica for video generation vs CVI is that videos don’t have long periods of pauses, whereas during a conversation you take turns, therefore the replica sits in silent listening or waiting. This can look odd if you try to use a replica that is meant for video generation because the replica might move unnaturally during these periods of silence. ### Casual/low-production environment For most use cases CVI is supposed to feel like a 1:1 call. It should feel like you’re jumping on a Zoom call with someone. This means that the setting and environment should feel like a Zoom call, not a studio environment. A webcam at a desk for example will feel more natural than an awkward replica that is standing the entire time. Users don’t expect you to be in a studio every time you’re on a Zoom call and it can actually detract from the experience. This doesn’t mean you can’t shoot in studio, it just means that the studio setting itself should look casual as well. Learn more about [Creating a good replica for CVI](/sections/replicas/personal-replicas) for the best results. # Echo Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-echo This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to tell the replica what to exactly say. Anything that is passed in the `text` field will be spoken by the replica. This is commonly used in combination with the [Interrupt Interaction](/sections/event-schemas/conversation-interrupt). # Interrupt Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-interrupt This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to externally send interruptions for the replica to stop talking. This is commonly used in combination with [Text Echo Interactions](/sections/event-schemas/conversation-echo). # Overwrite Conversational Context interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-overwrite-context This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to overwrite the `conversational_context` that the replica uses to generate responses. If `conversational_context` was not provided during conversation creation, the replica will start using the `context` you provide in this event as `conversational_context`. Learn more about the `conversational_context`: [Create Conversation](/api-reference/conversations/create-conversation) # Perception Analysis Source: https://docs.tavus.io/sections/event-schemas/conversation-perception-analysis This is an event broadcasted by Tavus. This is fired after ending a conversation, when the replica has finished summarizing the visual artifacts that were detected throughout the call. This is a feature that is only available when the persona has `raven-0` specified in the [Perception Layer](/sections/conversational-video-interface/raven). # Perception Tool Call Event Source: https://docs.tavus.io/sections/event-schemas/conversation-perception-tool-call This is an event broadcasted by Tavus. A `perception_tool_call` event is broadcasted by Tavus when a perception tool is triggered based on visual context. The event will contain the tool name, arguments, and encoded frames that triggered said tool call. Perception tool calls can be used to trigger automated actions in response to visual cues detected by the Raven perception system. # Replica Interrupted Event Source: https://docs.tavus.io/sections/event-schemas/conversation-replica-interrupted This is an event broadcasted by Tavus. An utterance event is broadcasted by Tavus when the replica is interrupted by the user while it is speaking. # Replica Started/Stopped Speaking Event Source: https://docs.tavus.io/sections/event-schemas/conversation-replica-started-stopped-speaking This is an event broadcasted by Tavus. A `replica.started_speaking/stopped_speaking event` is broadcasted by Tavus at specific times: conversation.replica.started_speaking means the replica has just started speaking. conversation.replica.stopped_speaking means the replica has just stopped speaking. These events are intended to act as triggers for actions within your application. For instance, you may want to start a video or show a slide at times related to when the replica started or stopped speaking. The inference_id can be used to correlate other events and tie things like conversation.utterance or tool_call together. # Text Respond Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-respond This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to send text that the replica will to respond to. The text you provide in the event will essentially be treated as the user transcript, and will be responded to as if the user had uttered those phrases during conversation. # Sensitivity Interaction Source: https://docs.tavus.io/sections/event-schemas/conversation-sensitivity This is an event developers may broadcast to Tavus. By broadcasting this event, you are able to update the VAD (Voice Activity Detection) sensitivity of the replica in two dimensions. - participant_pause_sensitivity - participant_interrupt_sensitivity The supported values are `low`, `medium`, and `high`. Learn more about the `sensitivity`: [Get Started with Your Own STT](/sections/conversational-video-interface/custom-stt-onboarding) # Tool Call Event Source: https://docs.tavus.io/sections/event-schemas/conversation-toolcall This is an event broadcasted by Tavus. A `tool_call` event is broadcasted by Tavus when an LLM tool call should be made. The event will contain the name and arguments of the function that should be called. Tool call events can be used to make calls to external APIs or databases. # User Started/Stopped Speaking Event Source: https://docs.tavus.io/sections/event-schemas/conversation-user-started-stopped-speaking This is an event broadcasted by Tavus. A `user.started_speaking/stopped_speaking event` is broadcasted by Tavus at specific times: conversation.user.started_speaking means the user has just started speaking. conversation.user.stopped_speaking means the user has just stopped speaking. These events are intended to act as triggers for actions within your application. For instance, you may want to take some user facing action, or backend process at times related to when the user started or stopped speaking. The inference_id can be used to correlate other events and tie things like conversation.utterance or tool_call together. Keep in mind that with speculative_inference, the inference_id will frequently change while the user is speaking so that the user.started_speaking inference_id will not usually match the conversation.utterance inference_id # Utterance Event Source: https://docs.tavus.io/sections/event-schemas/conversation-utterance This is an event broadcasted by Tavus. An `utterance event` is broadcasted by Tavus at specific times: the user’s utterance is sent when the replica begins speaking, and a separate event for the replica’s utterance is also sent as the replica starts to speak. Each event contains the content of the respective utterance as well as an indication of who spoke it. An `utterance` includes all of the words spoken by the user or replica measured from when the person started speaking to when they finshed speaking. This could include multiple sentences or phrases. Utterance events can be used to keep track of what the user or the replica has said. # Getting an API Key Source: https://docs.tavus.io/sections/guides/api-key-guide Learn how to create an API key. ## API Key Overview If you are interested in using our API Endpoints, you need an API Key so that we can verify that incoming requests are from your server. Before getting an API key, ensure that you have an active account on the [Developer Portal](https://platform.tavus.io/). ## Step 1: Navigate to the API Keys tab Find the [API Keys](https://platform.tavus.io/api-keys) tab on the Developer Portal. On this page, you can create, delete, and manage your keys. ![Find API Keys Tab](https://mintlify.s3.us-west-1.amazonaws.com/tavus/images/api_keys_dev_portal.png) ## Step 2: Create a new key Press the “Create New Key” button on the top right of the API Key page. Enter a name for your key. Optionally add whitelisted IPs (you can only call Tavus from the IPs you list here). ![Create API Key](https://mintlify.s3.us-west-1.amazonaws.com/tavus/images/naming_api_key.png) ## Step 3: Save your key Once your key is created, make sure you save it in a safe place. We are not able to recover your API Key if you lose it. You should now be able to see your new key on the Developer Portal! ![Finished API Key](https://mintlify.s3.us-west-1.amazonaws.com/tavus/images/created_api_key.png) ## Next Steps Now that you have an API key, you are able to send requests to any of our API endpoints. Check out our [API Reference](/api-reference/) to see how you can create a replica, generate videos, and start conversations through our APIs. Happy coding! 🖥️ # Creating a Replica Via API Source: https://docs.tavus.io/sections/guides/replica-training-guide Learn how to use our API endpoints to create replicas. *** ## Replica Creation with API Overview Follow this guide to successfully create and retrieve a replica using our API endpoints. Before continuing, ensure that you have recorded training footage by following the instructions in our [Training Guide](/sections/replicas/replica-training/). Verify that: * Training footage consists of [1 minute of talking](/sections/replicas/replica-training#how-do-i-record-1-minute-of-talking), followed by [1 minute of silence](/sections/replicas/replica-training#how-do-i-record-1-minute-of-silence) in the **same** video * You have 2 separate videos for your consent footage and training footage * Alternatively, you have 1 combined video that starts with consent statement ## Step 0: Ensure you have an API Key You cannot send us API requests without a valid key. If your organization does not have an API key, read [Getting an API Key](/sections/guides/api-key-guide/) to set this up. ## Step 1: Upload Training Footage to S3 In order for us to access your training footage, you need to upload it onto S3 and provide us with a public download link (e.g. [pre-signed S3 url](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html)). Make sure that your url is valid for **at least 24 hours**. ## Step 2: Send Training Footage to Tavus You are now ready to submit your footage for training! Reference our [Create Replica API Reference](/api-reference/replica-model/create-replica) to build out your request body. Once ready, include your API Key as a header and fire off your request to our endpoint. By default, all new replicas will be trained using the `phoenix-3` model. You can optionally create phoenix-2 replicas by setting the `model_name` parameter to `phoenix-2` in the request body. ```bash cURL curl --request POST \ --url https://tavusapi.com/v2/replicas \ --header 'Content-Type: application/json' \ --header 'x-api-key: ' \ --data '{ "callback_url": "", "replica_name": "", "train_video_url": "" }' ``` ```python Python import requests url = "https://tavusapi.com/v2/replicas" payload = { "callback_url": "", "replica_name": "", "train_video_url": "" } headers = { "x-api-key": "", "Content-Type": "application/json" } response = requests.request("POST", url, json=payload, headers=headers) print(response.text) ``` ```javascript Javascript const options = { method: 'POST', headers: {'x-api-key': '', 'Content-Type': 'application/json'}, body: '{"callback_url":"","replica_name":"","train_video_url":""}' }; fetch('https://tavusapi.com/v2/replicas', options) .then(response => response.json()) .then(response => console.log(response)) .catch(err => console.error(err)); ``` ```php PHP [expandable] "https://tavusapi.com/v2/replicas", CURLOPT_RETURNTRANSFER => true, CURLOPT_ENCODING => "", CURLOPT_MAXREDIRS => 10, CURLOPT_TIMEOUT => 30, CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1, CURLOPT_CUSTOMREQUEST => "POST", CURLOPT_POSTFIELDS => "{\n \"callback_url\": \"\",\n \"replica_name\": \"\",\n \"train_video_url\": \"\"\n}", CURLOPT_HTTPHEADER => [ "Content-Type: application/json", "x-api-key: " ], ]); $response = curl_exec($curl); $err = curl_error($curl); curl_close($curl); if ($err) { echo "cURL Error #:" . $err; } else { echo $response; } ``` ```go Go [expandable] package main import ( "fmt" "strings" "net/http" "io/ioutil" ) func main() { url := "https://tavusapi.com/v2/replicas" payload := strings.NewReader("{\n \"callback_url\": \"\",\n \"replica_name\": \"\",\n \"train_video_url\": \"\"\n}") req, _ := http.NewRequest("POST", url, payload) req.Header.Add("x-api-key", "") req.Header.Add("Content-Type", "application/json") res, _ := http.DefaultClient.Do(req) defer res.Body.Close() body, _ := ioutil.ReadAll(res.Body) fmt.Println(res) fmt.Println(string(body)) } ``` ```java Java HttpResponse response = Unirest.post("https://tavusapi.com/v2/replicas") .header("x-api-key", "") .header("Content-Type", "application/json") .body("{\n \"callback_url\": \"\",\n \"replica_name\": \"\",\n \"train_video_url\": \"\"\n}") .asString(); ``` If successful, you should receive this response from Tavus: ```javascript 200 OK { "replica_id": "r783537ef5", "status": "training" } ``` ## Step 4: Check Replica Status Upon submission, your replica will immediately start training in the background. After 4-6 hours, your replica will be ready for use. You will recieve an update through your callback URL or through our Get Replica endpoint. ### Callback URL The [Callback URL](/api-reference/replica-model/create-replica#body-callback-url) in your Create Replica request body will receive a callback when your replica is done training. Errors in training will also be communicated through callbacks on the same URL. Learn more about [API callbacks](/sections/troubleshooting/api-callbacks) here. ```javascript Replica Ready { "replica_id": "rxxxxxxxxx", "status": "ready", } ``` ```javascript Replica Error { "replica_id": "rxxxxxxxxx", "status": "error", "error_message": "There was an issue processing your training video. The video provided does not meet the minimum duration requirement for training" } ``` ### Get Replica API You can also poll our [Get Replica endpoint](/api-reference/phoenix-replica-model/get-replica/) to get real-time updates on your replica’s status. Include the `replica_id` as a parameter. ```bash cURL curl --request GET \ --url https://tavusapi.com/v2/replicas/{replica_id} \ --header 'x-api-key: ' ``` ```python Python import requests url = "https://tavusapi.com/v2/replicas/{replica_id}" headers = {"x-api-key": ""} response = requests.request("GET", url, headers=headers) print(response.text) ``` ```javascript Javascript const options = {method: 'GET', headers: {'x-api-key': ''}}; fetch('https://tavusapi.com/v2/replicas/{replica_id}', options) .then(response => response.json()) .then(response => console.log(response)) .catch(err => console.error(err)); ``` ```php PHP [expandable] "https://tavusapi.com/v2/replicas/{replica_id}", CURLOPT_RETURNTRANSFER => true, CURLOPT_ENCODING => "", CURLOPT_MAXREDIRS => 10, CURLOPT_TIMEOUT => 30, CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1, CURLOPT_CUSTOMREQUEST => "GET", CURLOPT_HTTPHEADER => [ "x-api-key: " ], ]); $response = curl_exec($curl); $err = curl_error($curl); curl_close($curl); if ($err) { echo "cURL Error #:" . $err; } else { echo $response; } ``` ```go Go [expandable] package main import ( "fmt" "net/http" "io/ioutil" ) func main() { url := "https://tavusapi.com/v2/replicas/{replica_id}" req, _ := http.NewRequest("GET", url, nil) req.Header.Add("x-api-key", "") res, _ := http.DefaultClient.Do(req) defer res.Body.Close() body, _ := ioutil.ReadAll(res.Body) fmt.Println(res) fmt.Println(string(body)) } ``` ```java Java HttpResponse response = Unirest.get("https://tavusapi.com/v2/replicas/{replica_id}") .header("x-api-key", "") .asString(); ``` If successful, you should get a response from us with information about your replica. Refer to `"status"` to check your replica’s training progress. ```javascript 200 OK { "replica_id": "r783537ef5", "replica_name": "My Replica", "thumbnail_video_url": "", "training_progress": "100/100", "status": "completed", "created_at": "2024-01-24T07:14:03.327Z", "updated_at": "2024-01-24T07:14:03.327Z", "error_message": "", "replica_type": "user'" } ``` ## Step 5: Receive replica At this point, you should have recieved your replica! Now you can try [generating videos](/api-reference/video-request/create-video) or [starting conversations](/api-reference/conversations/create-conversation), either through our API endpoints or on the Developer Portal. If you are struggling with this process or are unhappy about your replica, be sure to refer to [API Errors and Status Details](/sections/troubleshooting/api-errors) or reach out to our team. We are dedicated to giving you the best replica possible 🚀 # Introduction Source: https://docs.tavus.io/sections/introduction Take a look at our Docs and API Reference to learn how to use Tavus! All the tools you need to begin using Tavus Generate videos and create replicas through the Developer Portal A great place to share outputs, ask questions, and provide feedback Ran into an issue? Don't hesitate to reach out ## Getting started #### Signing Up Before you can use the API, you must register for a Tavus account. If you haven't done so yet, you can [sign up here](https://platform.tavus.io/auth/sign-up). #### Getting an API Key If you're ready to use the API, you'll need to grab an API Key. Make sure to read [Getting an API Key](/guides/api-key-guide) to get set up with a key. #### Try out Replicas using the Developer Portal You can create replicas, use stock replicas and generate videos using the [Developer Portal](https://platform.tavus.io), without having to touch a line of code until you're ready. # Overview Source: https://docs.tavus.io/sections/lipsync/overview Synchronize audio with existing videos using Tavus's lipsync service. Easily create videos where the speaker's mouth movements match the provided audio. ### Lipsync The Lipsync service allows you to synchronize audio with existing videos. This service is specifically designed to: * Create videos where the speaker's mouth movements match the provided audio * Generate personalized videos with custom audio tracks * Enable precise audio-video synchronization for professional results ### How It Works 1. **Submit a Lipsync Request** * Provide the video URL via the `original_video_url` parameter (must be publicly accessible) * Include the audio URL via the `source_audio_url` parameter (must be publicly accessible) 2. **Processing** * We analyze the video and audio content * We synchronize the speaker's mouth movements with the provided audio * We generate a new video with the synchronized audio 3. **Completion** * Access your lipsync video through our API * Download the final video using the provided video\_url * Receive a webhook notification when processing is complete (if callback\_url was provided) ### Some Features Include * **High Accuracy:** Advanced AI for precise mouth movement synchronization * **Async Processing:** Webhook notifications keep you updated on progress * **Simple Integration:** RESTful API makes implementation straightforward ### Example Request ```json { "original_video_url": "https://example.com/video.mp4", "source_audio_url": "https://example.com/audio.mp3", "callback_url": "https://your-callback-url.com", "lipsync_name": "My Lipsync Video" } ``` ### Example Response ```json { "lipsync_id": "l0108f2d24k2a", "status": "started", "callback_url": "https://your-callback-url.com", "lipsync_name": "My Lipsync Video" } ``` ### Getting Started 1. Ensure your video and audio meet these requirements: * Clear video quality with visible mouth movements * High-quality audio * Publicly accessible URLs (e.g., S3 presigned URLs) 2. Make your first lipsync request using our API Reference: * [Create Lipsync](/api-reference/lipsync/create-lipsync) * [Delete Lipsync](/api-reference/lipsync/delete-lipsync) * [Get Lipsync](/api-reference/lipsync/get-lipsync) * [Get Lipsync List](/api-reference/lipsync/get-lipsync-list) 3. Monitor the status through webhooks or by [checking the lipsync status](/api-reference/lipsync/get-lipsync) 4. Once complete, download your synchronized video using the provided video\_url ### Tips and Restrictions Lipsync is currently supported as follows: * The source video and audio must be publicly accessible * Source video must be in .mp4 format * Source audio must be in .mp3 or .wav format * Source video and audio must be 5 minutes or less in duration To get the best results, we recommend the following: * The source video should clearly show the speaker's mouth. * Use non-cartoon characters. The speaker should be facing the camera so that their face remains visible throughout the entire video ("talking head" style). * Ensure good lighting conditions in the original video. * The audio should be clear and well-recorded. * There should be no background noise. * Use single-speaker audio (avoid overlapping voices). * The speech should be natural—avoid singing or whispering. ### Support Need help or have questions? Our developer support team is here to assist you. Talk to Tavus Support! This documentation will help you effectively integrate Tavus's lipsync service into your applications. Let's start creating synchronized videos! # Language Support Source: https://docs.tavus.io/sections/replicas/language-support Tavus supports conversation creation in multiple languages, helping you reach a global audience. When you provide a script in a supported language, your Replica will speak that language in the created conversation. For example, if you set the conversation language to Spanish, your Replica will speak Spanish, mirroring natural language expressions and nuances. ```json { "persona_id": "pdced222244b", "replica_id": "re8e740a42", "callback_url": "https://yourwebsite.com/webhook", "conversation_name": "Improve Sales Technique", "conversational_context": "I want to improve my sales techniques. Help me practice handling common objections from clients and closing deals more effectively.", "properties": { "max_call_duration": 1800, "participant_left_timeout": 60, "participant_absent_timeout": 120, "language": "spanish", "enable_closed_captions": true, "apply_greenscreen": true } } ``` Please note that the voice cloning model attempts to maintain your accent even whilst speaking a different language. This can sometimes result in, for example, an American Accent while speaking Spanish. ## Languages We Support * 🇺🇸 English (USA) * 🇬🇧 English (UK) * 🇦🇺 English (Australia) * 🇨🇦 English (Canada) * 🇯🇵 Japanese * 🇨🇳 Chinese * 🇩🇪 German * 🇮🇳 Hindi * 🇫🇷 French (France) * 🇨🇦 French (Canada) * 🇰🇷 Korean * 🇧🇷 Portuguese (Brazil) * 🇵🇹 Portuguese (Portugal) * 🇮🇹 Italian * 🇪🇸 Spanish (Spain) * 🇲🇽 Spanish (Mexico) * 🇮🇩 Indonesian * 🇳🇱 Dutch * 🇹🇷 Turkish * 🇵🇭 Filipino * 🇵🇱 Polish * 🇸🇪 Swedish * 🇧🇬 Bulgarian * 🇷🇴 Romanian * 🇸🇦 Arabic (Saudi Arabia) * 🇦🇪 Arabic (UAE) * 🇨🇿 Czech * 🇬🇷 Greek * 🇫🇮 Finnish * 🇭🇷 Croatian * 🇲🇾 Malay * 🇷🇺 Russian * 🇸🇰 Slovak * 🇩🇰 Danish * 🇮🇳 Tamil * 🇺🇦 Ukrainian # Overview Source: https://docs.tavus.io/sections/replicas/overview Overview of Tavus' Replica offerings- Stock Replicas and Personal Replicas, all powered by the Phoenix AI model. Get tips on how to create the perfect replica, and how to get a high quality output. A Replica is a realistic video model of a human created using the [Phoenix Model](/sections/replicas/phoenix-model). The Phoenix model is a fully-synthetic 3D based model that generates realistic replica videos from just a script, complete with natural face (lip, cheek, nose, chin) movements and expressions synchronized with your script and generated voice. Developed by our team, the model uses a novel approach that bypasses traditional methods and constructs dynamic, three-dimensional facial scenes using neural radiance fields (NeRFs). Replicas are created using just 2 minutes of training data, and are designed to learn how someone speaks and sounds, how they look, and how they move their face while speaking. Using a Replica you can generate hyper-realistic videos that look and sound just like you- from just text, in up to 30 languages. It's important to provide a high-quality input video in order to get great outputs from a Replica. Your Replica will attempt to mimic your gestures and movements, as well as your accent, even if you generate a video in a different language. Here's an example of an output from one of our Stock Replicas: ## Stock Replicas * High-quality, diverse selection * Available immediately * Can be used for majority of use-cases Developers on all plans can access our [stock Replicas](/sections/replicas/stock-replicas), offering a quick start option for content creation. ## Personal Replicas * High-quality clone of voice and face of person * Train once, and re-use endlessly without having to record again [Personal Replicas](/sections/replicas/personal-replicas) allow you to train a new Replica of a human using the Phoenix model, from just 2 minutes of training data. Personal Replicas take between 4-6 hours to train. You can only train Replicas using training data that has a verbal [consent statement](/sections/troubleshooting/consent-statement). Personal Replicas go through Voice and Face ID checks to ensure consent is present. Developers on the Hobbyist, Business, and Enterprise plans can create Personal Replicas. If you want to try making your own, you can do so through the Developer Portal or via the API. # Personal Replicas Source: https://docs.tavus.io/sections/replicas/personal-replicas Learn how to create a high-quality personal replica with just a few minutes of training data. ## Getting Started with Your Personal Replica Personal Replicas allow you to train a new Replica of a human using the Phoenix model, from just 2 minutes of training data. Personal Replicas take approximately 4-5 hours to train, and are available on all plans except for Starter. ### Create a Replica via the UI (Developer Portal) You can create a Replica via the Developer Portal. Navigate to the [Replicas tab](https://platform.tavus.io/auth/sign-up) in our portal. Here, you'll be able to record in app or upload footage to create a new Replica. ### Create a Replica via the API Are you interested in using the API? See details about our API [here](/api-reference/replica-model/create-replica). * **Record Footage**: Have around 1.5 to 2 minutes of video ready following the below guidelines. * **API Key**: Make sure you have a valid API key * **Upload Footage**: Your recording should be hosted on a storage location like S3, should be publicly accessible / URL presigned, and the access should be valid for at least 24 hours to ensure the model has access. * **API Reference**: Refer to the [replica creation reference](/api-reference/replica-model/create-replica) to submit your model for training. ## Recording Your Training Footage Your journey to creating a personal Replica begins with a simple requirement: a two-minute video of you engaging with the camera. There is no predefined script beyond the consent statement, you can discuss anything that showcases your natural speaking style and expertise. #### Tips for Success Our platform simplifies the first step. Use your webcam through the developer portal to capture the essence of your persona. Achieving the best possible Replica involves attention to detail. Here's how: * **Do:** Utilize high-definition recording equipment, ensure proper lighting, and maintain focus on your face and upper body. Aim for a quiet, well-lit setting, and speak naturally. See more in [Replica Training](/sections/replicas/replica-training). * **Don't:** Wear clothes that blend with the background, bulky accessories, or any headwear that obscures your face. Keep your gaze steady, minimize background distractions, and avoid excessive movement. **Here's an example of high quality training footage:** #### Consent An integral part of the process involves reading a specific authorization phrase. This step confirms your consent and kicks off the Replica creation process. > "I, \[FULL NAME], am currently speaking and give consent to Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me." * We currently accept consent statements in **any** of our supported languages. You can see the [supported languages here](/sections/replicas/language-support#languages-we-support). See [Consent Statement](/sections/troubleshooting/consent-statement) for more information. The consent statement can be customized as part of Business and Enterprise plans. #### How to Act * **Gaze:** Keep eye level with the camera, maintain relatively stable eye contact. * **Gesturing:** Avoid crossing your hands in front of your face and limit gestures. * **Tone:** Aim for an upbeat tone to keep the content positive and engaging. * **Mistakes:** Perfection in reading the script isn't required. Continue naturally if you stumble. * **Lips:** Close your lips during pauses (the script will remind you of this). #### Recording Format If you are uploading training footage, it's important that it is in the correct format: * **Format and Quality:** MP4 format is required, with a resolution up to 4K and a size limit of 750 MB. NOTE: Tavus accepts up to 4k for resolution, however more common webcam resolutions (such as 720p/1080p) are also known to produce excellent replicas. * **Content Authenticity:** Provide unedited, raw footage for the most genuine Replica creation. #### Train in Chosen Language We highly recommend the full training to be done in the language you are most likely to use for the generated videos. This does not prohibit future videos from being created in a different language if desired! ### Training Time & Next Steps Your replica will be processed in the background upon submission. This process will take around 4-5 hours. You can check the status of your replica training through: * A callback URL (if specified in API requests) * The [Get Replica](/api-reference/phoenix-replica-model/get-replica) API endpoint * The Developer Portal If you're not happy with your personal replica, be sure to contact us. # Replica Training Source: https://docs.tavus.io/sections/replicas/replica-training Learn how to create a high-quality training video. You can record the Replica training video directly in the [Developer Portal](https://platform.tavus.io/) or upload a pre-recorded one via the [API](https://docs.tavus.io/api-reference/replica-model/create-replica). ## Prerequisites ### Environment * Record in a quiet, well-lit space with no background noise or movement. * Use diffuse lighting to avoid shadows on your face. * Choose a simple background and avoid any moving people or objects. ### Camera * Place the camera at eye level and ensure your face fills at least 25% of the frame. * Use a desktop recording app (e.g., **QuickTime** on Mac or **Camera** on Windows) — avoid browser-based tools. ### Microphone * Use your device’s built-in microphone. * **Avoid** high-end mics or wireless earbuds like AirPods. * Turn off audio effects like noise suppression or EQ adjustments. ### Yourself ![](https://mintlify.s3.us-west-1.amazonaws.com/tavus/images/replica-training/charlie.png) | ✅ Do | ❌ Don’t | | ----------------------------------------------------------------------------------- | ------------------------------------------------------ | | Keep your full head visible, with a clear view of your face | Wear clothes that blend into the background | | Ensure your face and upper body are in sharp focus | Wear accessories like hats, thick glasses, or earrings | | If using smartphone, make sure you follow the same framing/distance from the camera | Turn your head away from the camera | | Tuck back any hair covering your face | Block your chin or mouth with your microphone | | Sit upright in a stable, seated position | Stand or shift positions during the video | ### Video Format If you're uploading a pre-recorded training video via our [API](https://docs.tavus.io/api-reference/replica-model/create-replica), ensure it meets the following requirements: * **Minimum FPS**: 25 fps * **Accepted formats**: * `webm` * `mp4` with **H.264** video codec and **AAC** audio codec * **Maximum file size**: 750MB * **Minimum resolution**: 720p ### Consent Statement If you're creating a **personal replica**, you must include a verbal consent statement in the video. This ensures ethical use and compliance with data protection laws. **Steps**: * Begin with a big smile and look directly into the camera for one second. * Clearly read the following script: > I, (your name), am currently speaking and give consent to Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me. This step is **only required for personal replicas**. If you’re creating an **AI replica**, you can skip this video. ## Recording Your Training Video Your video must be **one continuous shot**, containing: **Pro tips**: * Keep body and head movements subtle * Avoid heavy hand gestures * Only one person should appear in the video * Smile widely for at least 2 seconds. * Look directly at the camera, positioned just below eye level. * Speak casually, as if talking to a friend. * Pause briefly (close lips) every 1–2 sentences. * Minimize body movement. * Avoid hand gestures at all times. * Sample script: ```txt [expandable] For the next 2 minutes, I’ll read you a story that will for sure make you smile and feel good. I will be relaxed and keep a happy face while reading. I will also read this story at a faster pace than I normally speak. I will close my lips fully after every sentence. I will read this script in a casual and conversational tone as if I am telling a story to my friend. The sun was shining brightly, casting a warm glow over the park as Emma, Jake, and Sophie spread out their picnic blanket. Now I will close my lips fully. Emma looked around, her face beaming with excitement. "Can you believe how perfect today is?" she exclaimed. "The sun is shining, and the weather is just right!" Her enthusiasm was contagious, and Jake couldn't help but smile as he laid back on the blanket, soaking in the sunlight. Now I will close my lips fully after this sentence. Jake nodded in agreement, a relaxed grin spreading across his face. "It really is," he said. "Days like this remind me why I love summer. I will close my lips fully after this sentence. Sophie, always the energetic one, jumped up from the blanket with a burst of excitement. "And we have the whole day to ourselves!" she declared. "So many possibilities. What should we do first? Fly a kite? Play frisbee? Go for a hike?" Her eyes sparkled. I will close my lips fully after this sentence. This is the last sentence I will read and then I will stand still to record my listening segment with minimal head and body movement as if I am listening to someone share a story. ``` ![](https://mintlify.s3.us-west-1.amazonaws.com/tavus/images/replica-training/image1.png) * Sit still with a relaxed, attentive posture. * Keep lips gently closed the entire time. * Slight, natural head movements (like you’re listening on a Zoom call). ![](https://mintlify.s3.us-west-1.amazonaws.com/tavus/images/replica-training/image3.gif) Replica training typically takes **4–5 hours**. You can track the training progress by: * Providing a `callback_url` when creating the replica via API * Using the [**Get Replica Status**](https://docs.tavus.io/api-reference/phoenix-replica-model/get-replica) API * Checking the [Developer Portal](https://platform.tavus.io/) ## High-Quality Training Example