Large Language Model (LLM)

The LLM Layer in Tavus enables your persona to generate intelligent, context-aware responses. You can use Tavus-hosted models or connect your own OpenAI-compatible LLM.

Tavus-Hosted Models

1. `model`

Select one of the available models:

tavus-gpt-oss (Recommended)
tavus-gpt-4o
tavus-gpt-4o-mini

Context Window Limit

All Tavus-hosted models have a limit of 32,000 tokens.
Contexts over 25,000 tokens will experience noticeable performance degradation (slow response times).

Tip: 1 token ≈ 4 characters, therefore 32,000 tokens ≈ 128,000 characters (including spaces and punctuation).

"model": "tavus-gpt-oss"

2. `tools`

Optionally enable tool calling by defining functions the LLM can invoke.

Please see LLM Tool Calling for more details.

3. `speculative_inference`

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness.

"speculative_inference": true

This is field is optional, but recommended for better performance.

4. `extra_body`

Add parameters to customize the LLM request. For Tavus-hosted models, you can pass temperature and top_p:

"extra_body": {
  "temperature": 0.7,
  "top_p": 0.9
}

This field is optional.

Example Configuration

{
  "persona_name": "Health Coach",
  "system_prompt": "You provide wellness tips and encouragement for people pursuing a healthy lifestyle.",
  "context": "You specialize in daily routines, diet advice, and motivational support.",
  "pipeline_mode": "full",
  "default_replica_id": "r665388ec672",
  "layers": {
    "llm": {
      "model": "tavus-gpt-4o",
      "speculative_inference": true,
      "extra_body": {
        "temperature": 0.7,
        "top_p": 0.9
      }
    }
  }
}

Custom LLMs

Prerequisites

To use your own OpenAI-compatible LLM, you’ll need:

Model name
Base URL
API key

Ensure your LLM:

Streamable (ie. via SSE)
Uses the /chat/completions endpoint

1. `model`

Name of the custom model you want to use.

"model": "gpt-3.5-turbo"

2. `base_url`

Base URL of your LLM endpoint.

Do not include route extensions in the base_url.

"base_url": "https://your-llm.com/api/v1"

3. `api_key`

API key to authenticate with your LLM provider.

"api_key": "your-api-key"

base_url and api_key are required only when using a custom model.

4. `tools`

Optionally enable tool calling by defining functions the LLM can invoke.

Please see LLM Tool Calling for more details.

5. `speculative_inference`

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness.

"speculative_inference": true

This is field is optional, but recommended for better performance.

6. `headers`

Optional headers for authenticating with your LLM.

"headers": {
  "Authorization": "Bearer your-api-key"
}

This field is optional, depending on your LLM model.

7. `extra_body`

Add parameters to customize the LLM request. You can pass any parameters that your LLM provider supports:

"extra_body": {
  "temperature": 0.5,
  "top_p": 0.9,
  "frequency_penalty": 0.5
}

This field is optional.

8. `default_query`

Add default query parameters that get appended to the base URL when making requests to the /chat/completions endpoint.

"default_query": {
  "api-version": "2024-02-15-preview"
}

This field is optional. Useful for LLM providers that require query parameters for authentication or versioning.

Example Configuration

{
  "persona_name": "Storyteller",
  "system_prompt": "You are a storyteller who entertains people of all ages.",
  "context": "Your favorite stories include Little Red Riding Hood and The Three Little Pigs.",
  "pipeline_mode": "full",
  "default_replica_id": "r665388ec672",
  "layers": {
    "llm": {
      "model": "gpt-4o",
      "base_url": "https://your-azure-openai.openai.azure.com/openai/deployments/gpt-4o",
      "api_key": "your-api-key",
      "speculative_inference": true,
      "default_query": {
        "api-version": "2024-02-15-preview"
      }
    }
  }
}

Refer to the Create Persona API for a full list of supported fields.

Perception

When using the raven-0 perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the user’s video input.

{
    "role": "system",
    "content": "<user_appearance>...</user_appearance> <user_emotions>...</user_emotions> <user_screenshare>...</user_screenshare>"
}

Basic Perception model

If you use the Basic perception model, your LLM will receive the following user messages (instead of a system message):

{
    "role": "user",
    "content": "USER_SPEECH: ... VISUAL_SCENE: ..."
}

Disabled Perception model

If you disable the perception model, your LLM will not receive any special messages.

Getting Started

Conversational Video Interface

Replica

Video Generation

Resources

Large Language Model (LLM)

Tavus-Hosted Models

1. `model`

2. `tools`

3. `speculative_inference`

4. `extra_body`

Example Configuration

Custom LLMs

Prerequisites

1. `model`

2. `base_url`

3. `api_key`

4. `tools`

5. `speculative_inference`

6. `headers`

7. `extra_body`

8. `default_query`

Example Configuration

Perception

Basic Perception model

Disabled Perception model

Getting Started

Conversational Video Interface

Replica

Video Generation

Resources

​Tavus-Hosted Models

​1. model

​2. tools

​3. speculative_inference

​4. extra_body

​Example Configuration

​Custom LLMs

​Prerequisites

​1. model

​2. base_url

​3. api_key

​4. tools

​5. speculative_inference

​6. headers

​7. extra_body

​8. default_query

​Example Configuration

​Perception

​Basic Perception model

​Disabled Perception model

Tavus-Hosted Models

1. `model`

2. `tools`

3. `speculative_inference`

4. `extra_body`

Example Configuration

Custom LLMs

Prerequisites

1. `model`

2. `base_url`

3. `api_key`

4. `tools`

5. `speculative_inference`

6. `headers`

7. `extra_body`

8. `default_query`

Example Configuration

Perception

Basic Perception model

Disabled Perception model