Skip to main content
Memories allow AI Personas to remember context across turns and understand time and dates, making conversations more coherent over longer interactions.Memories are enabled using a unique memory_stores that acts as the memory key. Information collected during conversations is associated with this participant and can be referenced in future interactions.
Yes. Cross-conversation Memories are supported as part of this update.
It improves context retention, which is crucial for multi-turn tasks and long-term relationships between users and AI. It unlocks uses cases that progress over time like education or therapy, out of the box.
To enable Memories in the UI, you can either select an existing memory tag from the dropdown menu or type a new one to create it.
Use the memory_stores field in the Create Conversation API call. This should be a stable, unique identifier for the user (e.g. user email, CRM ID, etc.). Example:
{
  "replica_id": "rb17cf590e15",
  "conversation_name": "Follow-up Chat",
  "memory_stores": ["user_123"]
}
Full example here: Memories API Docs
Not yet. Editing and reviewing Memories is not supported in this early release. Retrieval endpoints are under development and will be available in a future update.
No. Memories are optional. If you don’t include a memory_stores, the AI Persona will behave statelessly—like a standard LLM—with no memory across sessions.
No. Memories are tied to unique memory_stores. Sharing this ID across users would cause memory crossover. Each participant should have their own ID to keep Memories clean and accurate.
They can keep using their systems or integrate with Tavus Memories for more coherent, accurate conversations. Our memory is purpose-built for conversational video, retaining context across sessions with flexible scoping for truly personalized interactions.
Today, we don’t yet offer full visibility into what’s stored in memory or how it was used in a given response.
Memories are designed to persist indefinitely between interactions, allowing your AI persona to retain long-term context.
Knowledge Base is where users upload documents to enhance their AI persona capabilities using RAG (Retrieval-Augmented Generation). By retrieving information directly from these documents, AI personas can deliver more accurate, relevant, and grounded responses.
Using RAG, the Knowledge Base system continuously:
  • Analyzes the conversation context
  • Retrieves relevant information from your document base
  • Augments the AI’s responses with this contextual knowledge from your documents
With our industry-leading RAG, responses arrive in just 30 ms, up to 15× faster than other solutions. Conversations feel instant, natural, and friction-free.
Yes, users can keep using their systems, but we strongly recommend they integrate with the Tavus Knowledge Base. Our Knowledge Base isn’t just faster: it’s the fastest RAG on the market, delivering answers in just 30 ms. That speed means conversations flow instantly, without awkward pauses or lagging. These interactions feel natural in a way user-built systems can’t match.
An AI recruiter can reference a candidate’s resume uploaded via PDF and provide more accurate responses to applicant questions, using the resume content as grounding.
By having a Knowledge Base, AI personas can respond with facts, unlocking domain-specific intelligence:
  • Faster onboarding (just upload the docs)
  • More trustworthy answers, especially in regulated or high-stakes environments
  • Higher task completion for users, thanks to grounded knowledge
Supported file types (uploaded to a publicly accessible URL like S3):
  • CSV
  • PDF
  • TXT
  • PPTX
  • PNG
  • JPG
  • You can also enter any site URL and the Tavus API will scrape the site’s contents and reformat the content as a machine readable document.
Yes. Documents are linked to the API key that was used to upload them. To access a document later, you must use the same API key that was used to create it.
Once your documents have been uploaded and processed, include their IDs in your conversation request. Here’s how:
curl --location 'https://tavusapi.com/v2/conversations/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: '<API KEY>' \
--data '{
    "persona_id": "<Persona ID>",
    "replica_id": "<Replica ID>",
    "document_ids": ["Document ID"]
}'
Note: You can include multiple document_ids, and your AI persona will dynamically reference those documents during the conversation. You can also attach a document to a Persona.
Upload files by providing a downloadable URL using the Create Documents endpoint. Tags are also supported for organization. This request returns a document_id, which you’ll later use in conversation calls:
curl --location 'https://tavusapi.com/v2/documents/' \
--header 'Content-Type: application/json' \
--header 'x-api-key: '<API Key>' \
--data '{
    "document_url": "<publically accessible link>",
    "document_name": "slides_new.pdf",
    "tags": ["<tag-1>", "<tag-2>"]
}'
  • file_size_too_large – File exceeds the maximum allowed upload size.
  • file_format_unsupported – This file type isn’t supported for upload.
  • invalid_file_url – Provided file link is invalid or inaccessible.
  • file_empty – The uploaded file contains no readable content.
  • website_processing_failed – Website content could not be retrieved or processed.
  • chunking_failed – System couldn’t split file into processable parts.
  • embedding_failed – Failed to generate embeddings for your file content.
  • vector_store_failed – Couldn’t save data to the vector storage system.
  • s3_storage_failed – Error storing file in S3 cloud storage.
  • contact_support – An error occurred; please reach out for help.
Conversation.rag.observability tool call will be sent, which will fire if the conversational LLM decides to use any of the document chunks in its response, returning the document IDs and document names of the chunks
When creating a conversation with documents, you can optimize how the system searches through your knowledge base by specifying a retrieval strategy. This strategy determines the balance between search speed and the quality of retrieved information, allowing you to fine-tune the system based on your specific needs.You can choose from three different strategies:
  • Speed: Optimizes for faster retrieval times for minimal latency.
  • Balanced (default): Provides a balance between retrieval speed and quality.
  • Quality: Prioritizes finding the most relevant information, which may take slightly longer but can provide more accurate responses.
Maximum of 5 mins.
No. Currently, we only support documents written in English.
Users need AI that can drive conversations to clear outcomes. With Objectives, users can now can define objectives with measurable completion criteria, branch automatically based on user responses, and track progress in real time. This unlocks workflows use-cases like Health Intakes, HR Interviews, and multi-step questionnaires.
The easiest way is through the Persona Builder, which walks you through designing objectives for your workflow. You can also attach them manually using the API, either during Persona creation by including an objectives_id, or by editing an existing Persona with a PATCH request.
Objectives are good for very templated one-off conversational use cases. For example, job interviews or health care intake, where there is a very defined path that the conversation should take. These kinds of use cases usually show up with our Enterprise API customers, where they have repetitive use cases at scale.More dynamic, free-flowing conversations usually do not benefit from have or enabling the Objectives feature. For example, talking with a Travel advisor where the conversation is very open ended, would usually not benefit from Objectives.Objectives are good for very defined workflows. Complex multi-session experiences don’t fit current Objectives framework.
Guardrails help ensure your AI persona stays within appropriate boundaries and follows your defined rules during conversations.
You can create them in the Persona Builder for a guided setup, or manually attach them via the API, either during Persona creation by adding a guardrails_id, or by editing an existing Persona with a PATCH request.
Yes. You might have one set of Guardrails for a healthcare assistant to ensure medical compliance, and another for an education-focused Persona to keep all conversations age-appropriate.
The Persona Builder is a guided, conversational setup flow that helps you create your AI persona step by step, no technical skills required. Whether you’re building a virtual SDR, interviewer, or assistant, the builder tailors the experience to your use case.
Previously, building a Persona required navigating multiple tabs and settings, often needing developer expertise. Now, it’s as easy as having a conversation. You’re prompted with clear, use-case-specific questions that streamline setup from start to finish.
Daily is a platform that offers prebuilt video call apps and APIs, allowing you to easily integrate video chat into your web applications. You can embed a customizable video call widget into your site with just a few lines of code and access features like screen sharing and recording. Tavus partners with Daily to power video conversations with our replicas.
  • You do not need to sign up for a Daily account to use Tavus’s Conversational Video Interface.
  • All you need is the Daily room URL (called conversation_url in our system) that is returned by the Tavus API. You can serve this link directly to your end users or embed it.
You can use Daily Prebuilt if you want a full-featured call UI and JavaScript control over the conversation. Once you have the Daily room URL (conversation_url) ready, replace DAILY_ROOM_URL in the code snippet below with your room URL.
<html>
  <script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
  <body>
    <script>
      call = window.Daily.createFrame();
      call.join({ url: 'DAILY_ROOM_URL' });
    </script>
  </body>
</html>
That’s it! For more details and options for embedding, check out Daily’s documentation. or our implementation guides.
You can use an iframe if you just want to embed the conversation video with minimal setup. Once you have the Daily room URL (conversation_url) ready, replace YOUR_TAVUS_MEETING_URL in the iframe code snippet below with your room URL.
<html>
  <body>
    <iframe
      src="YOUR_TAVUS_MEETING_URL"
      allow="camera; microphone; fullscreen; display-capture"
      style="width: 100%; height: 500px; border: none;">
    </iframe>
  </body>
</html>
That’s it! For more details and options for embedding, check out Daily’s documentation. or our implementation guides.
To add a custom LLM layer, you’ll need the model name, base URL, and API key from your LLM provider. Then, include the LLM config in your layers field when creating a persona using the Create Persona API. Example configuration:
{
  "persona_name": "Storyteller",
  "system_prompt": "You are a storyteller who entertains people of all ages.",
  "context": "Your favorite stories include Little Red Riding Hood and The Three Little Pigs.",
  "pipeline_mode": "full",
  "default_replica_id": "r665388ec672",
  "layers": {
    "llm": {
      "model": "gpt-3.5-turbo",
      "base_url": "https://api.openai.com/v1",
      "api_key": "your-api-key",
      "speculative_inference": true
    }
  }
}
For more details, refer to our Large Language Model (LLM) documentation.
You can integrate with third-party TTS providers by configuring the tts object in your persona. Supported engines include:
  • Cartesia
  • ElevenLabs
Example configuration:
{
  "layers": {
    "tts": {
      "api_key": "your-tts-provider-api-key",
      "tts_engine": "cartesia",
      "external_voice_id": "your-voice-id",
      "voice_settings": {
        "speed": "normal",
        "emotion": ["positivity:high", "curiosity"]
      },
      "tts_emotion_control": true,
      "tts_model_name": "sonic"
    }
  }
}
For more details, read more on our TTS documentation.
You need to create a webhook endpoint that can receive POST requests from Tavus. This endpoint will receive the callback events for the visual summary after the conversation ended. Then, add callback_url property when creating the conversation
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api_key>' \
  --data '{
  "persona_id": "p596401c2cf9",
  "replica_id": "rf4703150052",
  "callback_url": "your_webhook_url"
}'
You need to create a webhook endpoint that can receive POST requests from Tavus. This endpoint will receive the callback events for the transcripts after the conversation ended. Then, add callback_url property when creating the conversation.
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api_key>' \
  --data '{
  "persona_id": "p596401c2cf9",
  "replica_id": "rf4703150052",
  "callback_url": "your_webhook_url"
}'
Your backend then will receive an event with properties event_type = application.transcription_ready when the transcript is ready.
application.transcription_ready
{
  "properties": {
    "replica_id": "<replica_id>",
    "transcript": [
      {
        "role": "system",
        "content": "You are in a live video conference call with a user. You will get user message with two identifiers, 'USER SPEECH:' and 'VISUAL SCENE:', where 'USER SPEECH:' is what the person actually tells you, and 'VISUAL SCENE:' is what you are seeing when you look at them. Only use the information provided in 'VISUAL SCENE:' if the user asks what you see. Don't output identifiers such as 'USER SPEECH:' or 'VISUAL SCENE:' in your response. Reply in short sentences, talk to the user in a casual way.Respond only in english.   "
      },
      {
        "role": "user",
        "content": " Hello, tell me a story. "
      },
      {
        "role": "assistant",
        "content": "I've got a great one about a guy who traveled back in time.  Want to hear it? "
      },
      {
        "role": "user",
        "content": "USER_SPEECH:  Yeah I'd love to hear it.  VISUAL_SCENE: The image shows a close-up of a person's face, focusing on their forehead, eyes, and nose. In the background, there is a television screen mounted on a wall. The setting appears to be indoors, possibly in a public or commercial space."
      },
      {
        "role": "assistant",
        "content": "Let me think for a sec.  Alright, so there was this mysterious island that appeared out of nowhere,  and people started disappearing when they went to explore it.  "
      },
    ]
  },
  "conversation_id": "<your_conversation_id>",
  "webhook_url": "<your_webhook_url>",
  "message_type": "application",
  "event_type": "application.transcription_ready",
  "timestamp": "2025-02-10T21:30:06.141454Z"
}
You need to create a webhook endpoint that can receive POST requests from Tavus. This endpoint will receive the callback events for the visual summary after the conversation ended. Then, add callback_url property when creating the conversation.
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api_key>' \
  --data '{
  "persona_id": "p596401c2cf9",
  "replica_id": "rf4703150052",
  "callback_url": "your_webhook_url"
}'
Your backend then will receive an event with properties event_type = application.perception_analysis when the summary is ready.
application.perception_analysis
{
  "properties": {
    "analysis": "Here's a summary of the visual observations from the video call:\n\n*   **Overall Demeanor & Emotional State:** The user consistently appeared calm, collected, and neutral. They were frequently described as pensive, contemplative, or focused, suggesting they were often engaged in thought or listening attentively. No strong positive or negative emotions were consistently detected.\n\n*   **Appearance:**\n    *   The user is a young Asian male, likely in his early 20s, with dark hair.\n    *   He consistently wore a black shirt, sometimes specifically identified as a black t-shirt. One observation mentioned a \"1989\" print on the shirt.\n    *   He was consistently looking directly at the camera.\n\n*   **Environment:** The user was consistently in an indoor setting, most likely an office or home. Common background elements included:\n    *   White walls.\n    *   Windows or glass panels/partitions, often with black frames.\n    *   Another person was partially visible in the background for several observations.\n\n*   **Actions:**\n    *   The user was seen talking and gesturing with his hand in one observation, indicating he was actively participating in a conversation.\n\n*   **Ambient Awareness Queries:**\n    *   **Acne:** Acne was initially detected on the user's face in one observation, but later observations did not detect it. This suggests that acne may have been visible at one point but not throughout the entire call.\n    *   **Distress/Discomfort:** No signs of distress or discomfort were observed at any point during the call."
  },
  "conversation_id": "<your_conversation_id>",
  "webhook_url": "<your_webhook_url>",
  "message_type": "application",
  "event_type": "application.perception_analysis",
  "timestamp": "2025-06-19T06:57:32.480826Z"
}
Tavus offers flexibility in choosing the LLM (Large Language Model) to power your conversational replicas. You can either use one of Tavus’s own models or bring your own!
  • Tavus-Provided LLMs: You can choose between three different models:
    • tavus-llama-4: The default choice if no LLM layer is provided. This is the smartest and fastest model, offering the best user-to-user (U2U) experience. It’s on-premise, making it incredibly performant.
    • tavus-gpt-4o: Another viable option for complex interactions.
    • tavus-gpt-4o-mini: Faster than tavus-gpt-4o at the slight cost of performance.
  • No LLM Layer: If you don’t include an LLM layer, Tavus will automatically default to a Tavus-provided model.
This allows you to tailor the conversational experience to your specific needs, whether you prioritize speed, intelligence, or a balance of both.
  • The default LLM, tavus-llama-4, has a limit of 32,000 tokens.
  • Contexts over 25,000 tokens will experience noticeable performance degradation (slower response times).
1 token ≈ 4 characters; therefore 32,000 tokens ≈ 128,000 characters (including spaces and punctuation).
When recording footage for training conversational replicas, here are some key tips to ensure high quality:
  1. Minimal Head Movement: Aim to keep your head and body as still as possible during the recording. This helps in maintaining consistency and improves the overall quality of the training data.
  2. Pause and Be Still: It’s recommended to stop, stay still, and remain silent for at least 5 seconds at regular intervals throughout the script. These pauses are crucial for helping the replica appear natural during moments of silence in a conversation.
  3. Use a Laptop Camera: Recording on a laptop camera, as if you were on a Zoom call, often yields the most natural results. This setup mimics a familiar conversational setting, enhancing the naturalness of the footage.
You can configure perception tools in the layers.perception object when creating a persona:
{
  "layers": {
    "perception": {
      "perception_model": "raven-0",
      "ambient_awareness_queries": [
        "Is the user showing an ID card?",
        "Is the user wearing a mask?"
      ],
      "perception_tool_prompt": "You have a tool to notify the system when an ID card is detected, named `notify_if_id_shown`. You MUST use this tool when a bright outfit is detected.",
      "perception_tools": [
        {
          "type": "function",
          "function": {
            "name": "notify_if_id_shown",
            "description": "Use this function when a drivers license or passport is detected in the image with high confidence",
            "parameters": {
              "type": "object",
              "properties": {
                "id_type": {
                  "type": "string",
                  "description": "best guess on what type of ID it is"
                }
              },
              "required": ["id_type"]
            }
          }
        }
      ]
    }
  }
}
Or modify perception tools using the Update Persona API:
curl --request PATCH \
  --url https://tavusapi.com/v2/personas/{persona_id} \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '[
    {
      "op": "replace",
      "path": "/layers/perception/perception_tools",
      "value": [
        {
          "type": "function",
          "function": {
            "name": "detect_glasses",
            "description": "Trigger this function if the user is wearing glasses",
            "parameters": {
              "type": "object",
              "properties": {
                "glasses_type": {
                  "type": "string",
                  "description": "Type of glasses (e.g., reading, sunglasses)"
                }
              },
              "required": ["glasses_type"]
            }
          }
        }
      ]
    }
  ]'
Read more on this page
No, it will automatically join as soon as it’s ready!
Out of the box, Tavus handles the complex backend infrastructure for you: LLMs, rendering, video delivery, and conversational intelligence are all preconfigured and production-ready.From there, nearly everything else is customizable: • What your AI Persona sees • How they look and sound • How they behave in conversationTavus offers unmatched flexibility, whether you’re personalizing voice, face, or behavior, you’re in control.
Tavus uses WebRTC to power real-time, face-to-face video interactions with extremely low latency.Unlike other platforms that piece together third-party tools, we built the entire pipeline (from LLM to rendering) to keep latency low and responsiveness high. Ironically, by minimizing reliance on multiple APIs, we’ve made everything faster.
Tavus CVI is powered by a tightly integrated stack of components, including:
  • LLMs for natural language understanding
  • Real-time rendering for facial video
  • APIs for Persona creation and conversational control
You can explore key APIs here: • Create a PersonaCreate a Conversation
Tavus supports over 30 spoken languages through a combination of Cartesia (our default TTS engine) and ElevenLabs. If a language isn’t supported by Cartesia, Tavus automatically switches to ElevenLabs so your AI Persona can still speak fluently.Supported languages include English (all variants), French, German, Spanish, Portuguese, Chinese, Japanese, Hindi, Italian, Korean, Dutch, Polish, Russian, Swedish, Turkish, Indonesian, Filipino, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Hungarian, Norwegian, and Vietnamese.View the full supported language list for complete details and language-specific information.
Yes to accents. Not quite for regional dialects.When you generate a voice using Tavus, the system will default to the accent used in training. For example, if you provide Brazilian Portuguese as training input, the AI Persona will speak with a Brazilian accent. Tavus’ TTS providers auto-detect and match accordingly.
Tavus supports full orchestration through function calling. That means your AI persona can interact with external tools—calendar apps, CRMs, email systems, and more—based on your setup. Just define the function endpoints and let your AI persona take action.Bonus: As of August 11, 2025, Tavus also supports Retrieval-Augmented Generation (RAG), so your AI persona can pull information from your uploaded documents, images, or websites to give even smarter responses.Learn more via Tavus Documentation.
A good prompt is short, clear, and specific, like giving directions to a 5-year-old. Avoid data dumping. Instead, guide the AI with context and intent.Tavus helps by offering system prompt templates, use-case guidance, and API fields to structure your instructions.Bonus: As of August 11, 2025, the new Persona Builder lets you craft goal-driven Personas through an conversational, guided experience. This beta release democratizes Persona-building, removing friction and putting the power in our users’ hands.
You can bring your own LLM by configuring the layers field in the Create Persona API. Here’s an example:
{
  "persona_name": "Storyteller",
  "system_prompt": "You are a storyteller who entertains people of all ages.",
  "context": "Your favorite stories include Little Red Riding Hood and The Three Little Pigs.",
  "pipeline_mode": "full",
  "default_replica_id": "r665388ec672",
  "layers": {
    "llm": {
      "model": "gpt-3.5-turbo",
      "base_url": "https://api.openai.com/v1",
      "api_key": "your-api-key",
      "speculative_inference": true
    }
  }
}
More info here: LLM Documentation
Think of it this way: Tavus is the engine, and you design the car. The UI is 100% up to you.To make it easier, we offer a full Component Library you can copy and paste into your build—video frames, mic/camera toggles, and more.
You can use third-party text-to-speech (TTS) providers like Cartesia or ElevenLabs. Just pass your voice settings in the tts object during Persona setup:
{
  "layers": {
    "tts": {
      "api_key": "your-tts-provider-api-key",
      "tts_engine": "cartesia",
      "external_voice_id": "your-voice-id",
      "voice_settings": {
        "speed": "normal",
        "emotion": ["positivity:high", "curiosity"]
      },
      "tts_emotion_control": true,
      "tts_model_name": "sonic"
    }
  }
}
Learn more in our TTS Documentation.
Tavus uses Daily’s video engine, which includes built-in noise cancellation. You can enable this through the updateInputSettings() method in the Daily API.
Yes! Daily supports event listeners you can hook into. Track actions like participants joining, leaving, screen sharing, and more. Great for analytics or triggering workflows.
Within the create convo API, there’s this property:image.jpeg
Tavus is built with enterprise-grade security in mind. We’re:
  • SOC 2 compliant
  • GDPR compliant
  • HIPAA compliant
  • BAA compliant
This ensures your data is handled with the highest levels of care and control.
Tavus is designed with strict privacy and data segregation standards:
  • We never train our models on your data.
  • Any data processed through Tavus is not retained by default.
  • Conversations and Personas are managed using anony/mized IDs.
  • You own and store all transcripts, conversation recordings, and outputs.
  • Memory and knowledge base are stored internally and only utilize specific datasets related to the conversation.
This approach allows you to maintain full control and isolation over sensitive data, including for use cases that require private model training or strict regulatory compliance.
I