The LLM Layer in Tavus enables your persona to generate intelligent, context-aware responses. You can use Tavus-hosted models or connect your own OpenAI-compatible LLM.

Tavus-Hosted Models

1. model

Select one of the available models:

tavus-llama is the default model and runs an optimized variant of Llama 3.3 8B.

  • tavus-llama
  • tavus-gpt-4o
  • tavus-gpt-4o-mini

Context Window Limit

  • All Tavus-hosted models have a limit of 32,000 tokens.
  • Contexts over 25,000 tokens will experience noticeable performance degradation (slow response times).

Tip: 1 token ≈ 4 characters, therefore 32,000 tokens ≈ 128,000 characters (including spaces and punctuation).

"model": "tavus-gpt-4o"

2. tools

Optionally enable tool calling by defining functions the LLM can invoke.

Please see LLM Tool Calling for more details.

3. speculative_inference

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness.

"speculative_inference": true

This is field is optional, but recommended for better performance.

Example Configuration

{
  "persona_name": "Health Coach",
  "system_prompt": "You provide wellness tips and encouragement for people pursuing a healthy lifestyle.",
  "context": "You specialize in daily routines, diet advice, and motivational support.",
  "pipeline_mode": "full",
  "default_replica_id": "r665388ec672",
  "layers": {
    "llm": {
      "model": "tavus-gpt-4o",
      "speculative_inference": true
    }
  }
}

Custom LLMs

Prerequisites

To use your own OpenAI-compatible LLM, you’ll need:

  • Model name
  • Base URL
  • API key

Ensure your LLM:

  • Streamable (ie. via SSE)
  • Uses the /chat/completions endpoint

1. model

Name of the custom model you want to use.

"model": "gpt-3.5-turbo"

2. base_url

Base URL of your LLM endpoint.

Do not include route extensions in the base_url.

"base_url": "https://your-llm.com/api/v1"

3. api_key

API key to authenticate with your LLM provider.

"api_key": "your-api-key"

base_url and api_key are required only when using a custom model.

4. tools

Optionally enable tool calling by defining functions the LLM can invoke.

Please see LLM Tool Calling for more details.

5. speculative_inference

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness.

"speculative_inference": true

This is field is optional, but recommended for better performance.

6. headers

Optional headers for authenticating with your LLM.

"headers": {
  "Authorization": "Bearer your-api-key"
}

This field is optional, depending on your LLM model.

7. extra_body

Add parameters to customize the LLM request, such as temperature.

"extra_body": {
  "temperature": 0.5
}

This is field is optional.

Example Configuration

{
  "persona_name": "Storyteller",
  "system_prompt": "You are a storyteller who entertains people of all ages.",
  "context": "Your favorite stories include Little Red Riding Hood and The Three Little Pigs.",
  "pipeline_mode": "full",
  "default_replica_id": "r665388ec672",
  "layers": {
    "llm": {
      "model": "gpt-3.5-turbo",
      "base_url": "https://api.openai.com/v1",
      "api_key": "your-api-key",
      "speculative_inference": true
    }
  }
}

Refer to the Create Persona API for a full list of supported fields.

Perception

When using the raven-0 perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the user’s video input.

{
    "role": "system",
    "content": "<user_appearance>...</user_appearance> <user_emotions>...</user_emotions> <user_screenshare>...</user_screenshare>"
}

Basic Perception model

If you use the Basic perception model, your LLM will receive the following user messages (instead of a system message):

{
    "role": "user",
    "content": "USER_SPEECH: ... VISUAL_SCENE: ..."
}

Disabled Perception model

If you disable the perception model, your LLM will not receive any special messages.