Tavus home pagelight logodark logo
  • Login
  • Account Help
  • Get Support
  • Get Started
  • Get Started
Conversational Video Interface
FAQ
  • Documentation
  • API Reference
  • Community
  • Status
    • Introduction
    • Getting an API Key
    Replicas
    • Overview
    • Replica Training
    • Creating a Replica Via API
    • Stock Replicas
    • Personal Replicas
    • Language Support
    Video Generation
    • Overview
    • Replica Selection
    • Scripting
    Conversational Video Interface
    • Overview
    • Quick Start
    • Creating a Conversation
    • Creating a Persona
    • Layers and Modes Overview
    • Stock Personas
    • Using Replicas in CVI
    • Turn Taking with Sparrow
    • Perception with Raven
    • Interactions Protocol
    • Integrating with CVI
    • FAQ
    Lipsync
    • Overview
    Troubleshooting
    • Training Video Size
    • Consent Statement
    • Script Length
    • API Errors and Status Details
    • API Callbacks
    Changelog
    • Changelog
    Conversational Video Interface

    FAQ

    Frequently asked questions about Tavus’s Conversational Video Interface

    Daily is a platform that offers prebuilt video call apps and APIs, allowing you to easily integrate video chat into your web applications. You can embed a customizable video call widget into your site with just a few lines of code, and access features like screen sharing and recording. Tavus partners with Daily to power video conversations with our replicas.

    • Transcript: Available for analysis at the end of a conversation.
    • Shutdowns:
      • Max call duration:
        • This is a clock that starts on conversation creation, not when a replica or participant joins.
        • The default duration is 4 minutes. It is recommended to update this.
      • Idle timeout: Referred to as participant_left_timeout.
    • Errors: Monitor for any system errors.
    • Participant join: Keep track of when participants join.
    • You do not need to sign up for a Daily account to use Tavus’s Conversational Video Interface.
    • All you need is the Daily room URL (called conversation_url in our system) that is returned by the Tavus API. You can serve this link directly to your end users or embed it.

    Set enableRecording=true as a property upon creating a conversation to enable recording for that Daily room. To have the recordings automatically be sent to your S3 bucket, follow the instructions outlined here.

    Once you have the Daily room URL (called conversation_url when returned by Tavus) ready, replace DAILY_ROOM_URL in the code snippet below with your own room URL (e.g. https://tavus.daily.co/c1234abcd).

    <html>
      <script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
      <body>
        <script>
          call = window.Daily.createFrame();
          call.join({ url: 'DAILY_ROOM_URL' });
        </script>
      </body>
    </html>
    

    That’s it! For more details and options for embedding, check out Daily’s documentation here.

    Refer to our custom TTS onboarding doc for more details.

    Refer to our custom LLM onboarding doc for more details.

    Refer to our custom STT onboarding doc for more details.

    • What makes a good convo replica:
      • Most of our tips apply from best practices for regular replicas.
      • Predominantly still, with minimal head movement.
      • Ideally, the user should stop and be still and silent for 5 seconds throughout the script reading.
      • Naturalness tends to be higher when recording is done on a laptop camera, as if they were in a Zoom call.
    • Be sure to specify a callback_url when creating a conversation. Tavus will return conversation updates to this URL via webhook. Example updates include replica_joined, shutdown, and transcript_ready. For more details check out conversation callbacks. We also additionally broadcast a variety of realtime events through the App Message layer through our Interactions Protocol that can be listened to by a Daily call client.
    • The default max_call_duration is just 4 minutes (240 seconds). It is recommended to update this in the create conversation call.
    • The max_call_duration is a clock that starts on conversation creation, not when a replica or participant joins.
    • To record a conversation, you need to…
    1. Enable the recording feature by setting the enable_recording property to true. This will allow the conversation to be recorded.
    2. Specify the S3 bucket where the recording will be stored by setting the recording_s3_bucket_name and recording_s3_bucket_region properties.
    3. If your setup requires assuming a specific AWS role to access the S3 bucket, make sure to provide the ARN of the role in the aws_assume_role_arn property.

    These configurations will ensure that your conversation is recorded and securely stored in the designated S3 bucket.

    To bring your own Text-to-Speech (TTS) service, you need to create a Persona and configure its tts object. Here’s how you can do it:

    1. API Key (api_key): Provide the custodial API key for the TTS provider of your choice. This key will be used to authenticate requests to the TTS engine.
    2. TTS Engine (tts_engine): Select the TTS engine you want to use. Currently, the supported engines are:
    • cartesia
    • elevenlabs
    • playht

    You should specify one of these options based on your provider.

    1. External Voice ID (external_voice_id): If you want to use a specific voice from the TTS provider, provide the corresponding voice ID here. This ID must be valid and associated with the chosen TTS engine.
    2. Voice Settings (voice_settings): If you want to customize the voice settings for the TTS engine, you can provide a voice_settings object. This object contains settings such as speed and emotion that you can use to customize the voice of the TTS engine. Documentation for the supported engines can be found in their respective onboarding guides
    3. Playht User ID (playht_user_id): If you are using the Playht TTS engine, you will need to provide your Playht user ID here. This ID is required to authenticate your requests to the Playht API.
    4. TTS Emotion Control (tts_emotion_control): If you want to control the emotion of the voice, you can set this to true. This is only available for Cartesia TTS.

    Tavus offers flexibility in choosing the LLM (Large Language Model) to power your conversational replicas. You can either use one of Tavus’s own models or bring your own!

    • No LLM Layer: If you don’t include an LLM layer, Tavus will automatically default to a Tavus-provided model.
    • Tavus-Provided LLMs: You can choose between three different models:
      • tavus-gpt-4o: The smartest option for complex interactions.
      • tavus-gpt-4o-mini: A hybrid model that balances performance and intelligence.
      • tavus-llama: The default choice if no LLM layer is provided. This is the fastest model, offering the best user-to-user (U2U) experience. It’s on-premise, making it incredibly performant.

    This allows you to tailor the conversational experience to your specific needs, whether you prioritize speed, intelligence, or a balance of both.

    To bring your own Large Language Model (LLM), you need to create a Persona and configure its llm layer.

    • Compatibility: Your custom LLM must be compatible with the OpenAI API standards. This means it should be able to process API requests in the same format as OpenAI’s models, ensuring smooth integration.

    For detailed instructions, see Custom LLM Onboarding

    When recording footage for training conversational replicas, here are some key tips to ensure high quality:

    1. Minimal Head Movement: Aim to keep your head and body as still as possible during the recording. This helps in maintaining consistency and improves the overall quality of the training data.
    2. Pause and Be Still: It’s recommended to stop, stay still, and remain silent for at least 5 seconds at regular intervals throughout the script. These pauses are crucial for helping the replica appear natural during moments of silence in a conversation.
    3. Use a Laptop Camera: Recording on a laptop camera, as if you were on a Zoom call, often yields the most natural results. This setup mimics a familiar conversational setting, enhancing the naturalness of the footage.
    • No, it will automatically join as soon as it’s ready!
    Record and Instantly Share ConversationsOverview
    linkedindiscord
    Powered by Mintlify