Tavus-Hosted Models
1. model
Select one of the available models. tavus-gpt-oss is recommended as a good starting point; the table below helps you choose based on your priorities.
| Model | Speed | Intelligence | Naturalness | Best For |
|---|---|---|---|---|
tavus-gpt-oss | β‘β‘β‘ | π§ | π¬ | Snappy, low-latency |
tavus-gpt-4.1 (deprecated) | β‘β‘ | π§ π§ π§ | π¬π¬π¬ | Long-context reasoning |
tavus-gpt-4o (deprecated) | β‘β‘ | π§ π§ | π¬π¬ | Legacy option |
tavus-gemini-2.5-flash | β‘β‘ | π§ π§ | π¬π¬π¬ | Latency + logical deduction |
tavus-claude-haiku-4.5 | β‘β‘ | π§ π§ | π¬π¬ | Grounded, fewer hallucinations |
tavus-gpt-5.2 | β‘β‘ | π§ π§ | π¬π¬ | General use, latency less critical |
tavus-gpt-4o-mini (deprecated) | β‘β‘ | π§ | π¬π¬ | Legacy option |
tavus-gemini-3-flash | β‘ | π§ π§ π§ | π¬π¬π¬ | Highest intelligence, lower speed |
Context Window Limit
- Performance and intelligence are best when prompts are limited to 5,000 tokens. You may see degradations in speed and instruction following in the 15,000β20,000 token range.
- All Tavus-hosted models support up to 32,000 tokens; staying within 5k is recommended for optimal behavior.
2. tools
Optionally enable tool calling by defining functions the LLM can invoke.
Please see LLM Tool Calling for more details.
3. speculative_inference
When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. This is the default value; you can set it to false to disable.
This field is optional. It defaults to
true for better performance.4. extra_body
Add parameters to customize the LLM request. For Tavus-hosted models, you can pass temperature and top_p:
This field is optional.
Example Configuration
Custom LLMs
Prerequisites
To use your own OpenAI-compatible LLM, youβll need:- Model name
- Base URL
- API key
- Streamable (ie. via SSE)
- Uses the
/chat/completionsendpoint
1. model
Name of the custom model you want to use.
2. base_url
Base URL of your LLM endpoint.
Do not include route extensions in the
base_url.3. api_key
API key to authenticate with your LLM provider.
4. tools
Optionally enable tool calling by defining functions the LLM can invoke.
Please see LLM Tool Calling for more details.
5. speculative_inference
When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. This is the default value; you can set it to false to disable.
This field is optional. It defaults to
true for better performance.6. headers
Optional headers for authenticating with your LLM.
This field is optional, depending on your LLM model.
7. extra_body
Add parameters to customize the LLM request. You can pass any parameters that your LLM provider supports:
This field is optional.
8. default_query
Add default query parameters that get appended to the base URL when making requests to the /chat/completions endpoint.
This field is optional. Useful for LLM providers that require query parameters for authentication or versioning.
Example Configuration
Refer to the Create Persona API for a full list of supported fields.
Perception
When using theraven-1 perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the userβs video input.

