Large Language Model (LLM)
Learn how to use Tavus-optimized LLMs or integrate your own custom LLM.
The LLM Layer in Tavus enables your persona to generate intelligent, context-aware responses. You can use Tavus-hosted models or connect your own OpenAI-compatible LLM.
Tavus-Hosted Models
1. model
Select one of the available models:
tavus-llama
is the default model and runs an optimized variant of Llama 3.3 8B.
tavus-llama
tavus-gpt-4o
tavus-gpt-4o-mini
Context Window Limit
- All Tavus-hosted models have a limit of 32,000 tokens.
- Contexts over 25,000 tokens will experience noticeable performance degradation (slow response times).
Tip: 1 token ≈ 4 characters, therefore 32,000 tokens ≈ 128,000 characters (including spaces and punctuation).
2. tools
Optionally enable tool calling by defining functions the LLM can invoke.
Please see LLM Tool Calling for more details.
3. speculative_inference
When set to true
, the LLM begins processing speech transcriptions before user input ends, improving responsiveness.
This is field is optional, but recommended for better performance.
Example Configuration
Custom LLMs
Prerequisites
To use your own OpenAI-compatible LLM, you’ll need:
- Model name
- Base URL
- API key
Ensure your LLM:
- Streamable (ie. via SSE)
- Uses the
/chat/completions
endpoint
1. model
Name of the custom model you want to use.
2. base_url
Base URL of your LLM endpoint.
Do not include route extensions in the base_url
.
3. api_key
API key to authenticate with your LLM provider.
base_url
and api_key
are required only when using a custom model.
4. tools
Optionally enable tool calling by defining functions the LLM can invoke.
Please see LLM Tool Calling for more details.
5. speculative_inference
When set to true
, the LLM begins processing speech transcriptions before user input ends, improving responsiveness.
This is field is optional, but recommended for better performance.
6. headers
Optional headers for authenticating with your LLM.
This field is optional, depending on your LLM model.
7. extra_body
Add parameters to customize the LLM request, such as temperature.
This is field is optional.
Example Configuration
Refer to the Create Persona API for a full list of supported fields.
Perception
When using the raven-0
perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the user’s video input.
Basic Perception model
If you use the Basic perception model, your LLM will receive the following user messages (instead of a system message):
Disabled Perception model
If you disable the perception model, your LLM will not receive any special messages.