Video Generation
Conversational Video Interface
Troubleshooting
Changelog
FAQ
Frequently asked questions about Tavus’s Conversational Video Interface
Daily is a platform that offers prebuilt video call apps and APIs, allowing you to easily integrate video chat into your web applications. You can embed a customizable video call widget into your site with just a few lines of code, and access features like screen sharing and recording. Tavus partners with Daily to power video conversations with our replicas.
- Transcript: Available for analysis at the end of a conversation.
- Shutdowns:
- Max call duration:
- This is a clock that starts on conversation creation, not when a replica or participant joins.
- The default duration is 4 minutes. It is recommended to update this.
- Idle timeout: Referred to as
participant_left_timeout
.
- Max call duration:
- Errors: Monitor for any system errors.
- Participant join: Keep track of when participants join.
- You do not need to sign up for a Daily account to use Tavus’s Conversational Video Interface.
- All you need is the Daily room URL (called
conversation_url
in our system) that is returned by the Tavus API. You can serve this link directly to your end users or embed it.
Set enableRecording=true
as a property upon creating a conversation to enable recording for that Daily room. To have the recordings automatically be sent to your S3 bucket, follow the instructions outlined here.
Once you have the Daily room URL (called conversation_url
when returned by Tavus) ready, replace DAILY_ROOM_URL
in the code snippet below with your own room URL (e.g. https://tavus.daily.co/c1234abcd).
<html>
<script crossorigin src="https://unpkg.com/@daily-co/daily-js"></script>
<body>
<script>
call = window.Daily.createFrame();
call.join({ url: 'DAILY_ROOM_URL' });
</script>
</body>
</html>
That’s it! For more details and options for embedding, check out Daily’s documentation here.
Refer to our custom TTS onboarding doc for more details.
Refer to our custom LLM onboarding doc for more details.
- What makes a good convo replica:
- Most of our tips apply from best practices for regular replicas.
- Predominantly still, with minimal head movement.
- Ideally, the user should stop and be still and silent for 5 seconds throughout the script reading.
- Naturalness tends to be higher when recording is done on a laptop camera, as if they were in a Zoom call.
- Be sure to specify a
callback_url
when creating a conversation. Tavus will return conversation updates to this URL via webhook. Example updates includereplica_joined
,shutdown
, andtranscript_ready
. For more details check out conversation callbacks.
- The default
max_call_duration
is just 4 minutes (240 seconds). It is recommended to update this in the create conversation call. - The
max_call_duration
is a clock that starts on conversation creation, not when a replica or participant joins.
- To record a conversation, you need to…
- Enable the recording feature by setting the
enable_recording
property totrue
. This will allow the conversation to be recorded. - Specify the S3 bucket where the recording will be stored by setting the
recording_s3_bucket_name
andrecording_s3_bucket_region
properties. - If your setup requires assuming a specific AWS role to access the S3 bucket, make sure to provide the ARN of the role in the
aws_assume_role_arn
property.
These configurations will ensure that your conversation is recorded and securely stored in the designated S3 bucket.
To bring your own Text-to-Speech (TTS) service, you need to create a Persona and configure its tts
object. Here’s how you can do it:
- API Key (api_key): Provide the custodial API key for the TTS provider of your choice. This key will be used to authenticate requests to the TTS engine.
- TTS Engine (tts_engine): Select the TTS engine you want to use. Currently, the supported engines are:
cartesia
elevenlabs
playht
You should specify one of these options based on your provider.
- External Voice ID (external_voice_id): If you want to use a specific voice from the TTS provider, provide the corresponding voice ID here. This ID must be valid and associated with the chosen TTS engine.
- Voice Settings (voice_settings): If you want to customize the voice settings for the TTS engine, you can provide a
voice_settings
object. This object contains settings such asspeed
andemotion
that you can use to customize the voice of the TTS engine. Documentation for the supported engines can be found in their respective onboarding guides - Playht User ID (playht_user_id): If you are using the Playht TTS engine, you will need to provide your Playht user ID here. This ID is required to authenticate your requests to the Playht API.
Tavus offers flexibility in choosing the LLM (Large Language Model) to power your conversational replicas. You can either use one of Tavus’s own models or bring your own!
- No LLM Layer: If you don’t include an LLM layer, Tavus will automatically default to a Tavus-provided model.
- Tavus-Provided LLMs: You can choose between three different models:
- tavus-gpt-4o: The smartest option for complex interactions.
- tavus-gpt-4o-mini: A hybrid model that balances performance and intelligence.
- tavus-llama: The default choice if no LLM layer is provided. This is the fastest model, offering the best user-to-user (U2U) experience. It’s on-premise, making it incredibly performant.
This allows you to tailor the conversational experience to your specific needs, whether you prioritize speed, intelligence, or a balance of both.
To bring your own Large Language Model (LLM), you need to create a Persona and configure its llm
layer.
- Compatibility: Your custom LLM must be compatible with the OpenAI API standards. This means it should be able to process API requests in the same format as OpenAI’s models, ensuring smooth integration.
For detailed instructions, see Custom LLM Onboarding
When recording footage for training conversational replicas, here are some key tips to ensure high quality:
- Minimal Head Movement: Aim to keep your head and body as still as possible during the recording. This helps in maintaining consistency and improves the overall quality of the training data.
- Pause and Be Still: It’s recommended to stop, stay still, and remain silent for at least 5 seconds at regular intervals throughout the script. These pauses are crucial for helping the replica appear natural during moments of silence in a conversation.
- Use a Laptop Camera: Recording on a laptop camera, as if you were on a Zoom call, often yields the most natural results. This setup mimics a familiar conversational setting, enhancing the naturalness of the footage.
- No, it will automatically join as soon as it’s ready!