Large Language Model (LLM)

The LLM Layer in Tavus enables your PAL to generate intelligent, context-aware responses. You can use Tavus-hosted models or connect your own OpenAI-compatible LLM. Configure the LLM under layers.llm when you Create PAL or update a PAL. For how a PAL fits together, see PAL overview.

Tavus-Hosted Models

1. `model`

Select one of the available models. tavus-glm-4.7 is recommended as the default; the table below helps you choose based on your priorities.

Model	Speed	Intelligence	Naturalness	Best For
`tavus-glm-4.7`	⚡⚡	🧠🧠🧠	💬💬	Default for most PALs; agentic tool use, multi-step reasoning
`tavus-gpt-oss`	⚡⚡⚡	🧠	💬	Snappy, low-latency fallback
`tavus-gpt-4.1` (deprecated)	⚡⚡	🧠🧠🧠	💬💬💬	Long-context reasoning
`tavus-gemini-2.5-flash`	⚡⚡	🧠🧠	💬💬💬	Latency + logical deduction
`tavus-claude-haiku-4.5`	⚡⚡	🧠🧠	💬💬	Grounded, fewer hallucinations
`tavus-gpt-5.2`	⚡⚡	🧠🧠	💬💬	General use, latency less critical
`tavus-gemini-3-flash`	⚡	🧠🧠🧠	💬💬💬	Highest intelligence, lower speed

Context Window Limit

Performance and intelligence are best when prompts are limited to 5,000 tokens. You may see degradations in speed and instruction following in the 15,000–20,000 token range.
Context limits vary by model (for example, tavus-glm-4.7 supports up to 200,000 tokens); staying within 5k is recommended for optimal behavior.

Tip: 1 token ≈ 4 characters, so 5,000 tokens ≈ 20,000 characters (including spaces and punctuation).

"model": "tavus-glm-4.7"

2. `tools`

Legacy field. Do not use for new integrations. Inline LLM tools were historically defined here as OpenAI-style function objects on the PAL body. That approach still runs for existing PALs but is deprecated - it cannot use the tools registry features (delivery, on_call, on_resolve, API auth, or reuse across PALs).

Use the Tools overview and Tool Calling for LLM instead. If you maintain a PAL that still sets layers.llm.tools, see Legacy inline tool calling.

3. `speculative_inference`

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. This is the default value; you can set it to false to disable.

"speculative_inference": true

This field is optional. It defaults to true for better performance.

4. `extra_body`

Add parameters to customize the LLM request. For Tavus-hosted models, you can pass temperature and top_p:

"extra_body": {
  "temperature": 0.7,
  "top_p": 0.9
}

This field is optional.

Example Configuration

{
  "pal_name": "Health Coach",
  "system_prompt": "You provide wellness tips and encouragement for people pursuing a healthy lifestyle.",
  "pipeline_mode": "full",
  "default_face_id": "r90bbd427f71",
  "layers": {
    "llm": {
      "model": "tavus-glm-4.7",
      "speculative_inference": true,
      "extra_body": {
        "temperature": 0.7,
        "top_p": 0.9
      }
    }
  }
}

Custom LLMs

Prerequisites

To use your own OpenAI-compatible LLM, you’ll need:

Model name
Base URL
API key

Ensure your LLM:

Streamable (i.e. via SSE)
Uses the /chat/completions endpoint

1. `model`

Name of the custom model you want to use.

"model": "gpt-3.5-turbo"

2. `base_url`

Base URL of your LLM endpoint.

Do not include route extensions in the base_url.

"base_url": "https://your-llm.com/api/v1"

3. `api_key`

API key to authenticate with your LLM provider.

"api_key": "your-api-key"

base_url and api_key are required only when using a custom model.

4. `tools`

Legacy field. Do not use for new integrations. Same inline shape as layers.llm.tools on Tavus-hosted models - deprecated in favor of the tools registry. See Legacy inline tool calling if you still patch tools on a custom-LLM PAL this way.

5. `speculative_inference`

When set to true, the LLM begins processing speech transcriptions before user input ends, improving responsiveness. This is the default value; you can set it to false to disable.

"speculative_inference": true

This field is optional. It defaults to true for better performance.

6. `headers`

Optional additional headers to include when making requests to your LLM. Use this for any extra headers your provider requires beyond the API key (which should be set via the api_key field).

"headers": {
  "X-Organization-ID": "your-org-id",
  "X-Request-Source": "tavus-cvi"
}

This field is optional, depending on your LLM provider’s requirements.

7. `extra_body`

Add parameters to customize the LLM request. You can pass any parameters that your LLM provider supports:

"extra_body": {
  "temperature": 0.5,
  "top_p": 0.9,
  "frequency_penalty": 0.5
}

This field is optional.

8. `default_query`

Add default query parameters that get appended to the base URL when making requests to the /chat/completions endpoint.

"default_query": {
  "api-version": "2024-02-15-preview"
}

This field is optional. Useful for LLM providers that require query parameters for authentication or versioning.

Example Configuration

{
  "pal_name": "Storyteller",
  "system_prompt": "You are a storyteller who entertains people of all ages.",
  "pipeline_mode": "full",
  "default_face_id": "r90bbd427f71",
  "layers": {
    "llm": {
      "model": "gpt-4o",
      "base_url": "https://your-azure-openai.openai.azure.com/openai/deployments/gpt-4o",
      "api_key": "your-api-key",
      "speculative_inference": true,
      "default_query": {
        "api-version": "2024-02-15-preview"
      }
    }
  }
}

Refer to Create PAL for a full list of supported fields.

Perception

When using the raven-1 perception model with a custom LLM, your LLM will receive system messages containing visual context extracted from the user’s video input. See Perception for how perception is configured and what is sent to the model.

{
    "role": "system",
    "content": "<user_appearance>...</user_appearance> <user_emotions>...</user_emotions> <user_screenshare>...</user_screenshare>"
}

Disabled Perception model

If you disable the perception model, your LLM will not receive any special messages.

Getting started

Build

Deploy

Debug

Guides

Resources

Large Language Model (LLM)

Tavus-Hosted Models

1. `model`

2. `tools`

3. `speculative_inference`

4. `extra_body`

Example Configuration

Custom LLMs

Prerequisites

1. `model`

2. `base_url`

3. `api_key`

4. `tools`

5. `speculative_inference`

6. `headers`

7. `extra_body`

8. `default_query`

Example Configuration

Perception

Disabled Perception model

​Tavus-Hosted Models

​1. model

​2. tools

​3. speculative_inference

​4. extra_body

​Example Configuration

​Custom LLMs

​Prerequisites

​1. model

​2. base_url

​3. api_key

​4. tools

​5. speculative_inference

​6. headers

​7. extra_body

​8. default_query

​Example Configuration

​Perception

​Disabled Perception model

Tavus-Hosted Models

1. `model`

2. `tools`

3. `speculative_inference`

4. `extra_body`

Example Configuration

Custom LLMs

Prerequisites

1. `model`

2. `base_url`

3. `api_key`

4. `tools`

5. `speculative_inference`

6. `headers`

7. `extra_body`

8. `default_query`

Example Configuration

Perception

Disabled Perception model