> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Perception

> Learn how to configure the perception layer with Raven to enable real-time visual and audio understanding.

The **Perception Layer** in Tavus enhances an AI agent with real-time visual and audio understanding.
By using [Raven](/sections/models#raven%3A-perception-model), the AI agent becomes more context-aware, responsive, and capable of triggering actions based on visual and audio input.

## Configuring the Perception Layer

To configure the Perception Layer, define the following parameters within the `layers.perception` object:

### 1. `perception_model`

Specifies the perception model to use.

* **Options**:
  * `raven-1` **(default and recommended)**: Real-time emotional understanding from user audio, more natural and human-like interactions, plus all visual capabilities from raven-0.
  * `raven-0` (legacy settings [here](/sections/troubleshooting#migration-from-legacy-perception-to-raven-1))
  * `off`: Disables the perception layer.

<Note>
  **Screen Share Feature**: When using Raven, screen share is enabled by default without additional configuration.
</Note>

### Audio Perception

Raven-1 (the default) analyzes user tone and emotion in real-time. This context is automatically sent to the LLM alongside utterances, enabling more natural, empathetic responses. For example:

```
<user_audio_analysis>The user sounded sarcastic when they said this</user_audio_analysis>
Wow, I love Mondays.
```

Audio analysis tags are stripped from transcription callbacks.

<Note>
  Audio analysis output is limited to 32 tokens per utterance.
</Note>

## Perception Analysis Queries

Raven supports three kinds of queries that differ by **when** they run and **how** they affect the call:

* **perception\_analysis\_queries** — Evaluated only at **end of call**. They do not change live behavior; they only shape the summary you get in the [Perception Analysis](/sections/event-schemas/conversation-perception-analysis) event sent to your [conversation callback](/sections/webhooks-and-callbacks#conversation-callbacks).
* **visual\_awareness\_queries** and **audio\_awareness\_queries** — Evaluated **throughout the call**. Their answers are passed to the LLM as context, so the replica can react in real time. You receive this ongoing analysis in each user turn via the [Utterance event](/sections/event-schemas/conversation-utterance) as `user_visual_analysis` and `user_audio_analysis`.

Use **visual\_awareness\_queries** and **audio\_awareness\_queries** when you want the replica to be aware of or focus on something specific during the conversation. Use **perception\_analysis\_queries** when you want your end-of-call summary to address specific points.

## Visual Perception Configuration

### 2. `visual_awareness_queries`

An array of custom queries that Raven continuously monitors in the visual stream.

```json  theme={null}
"visual_awareness_queries": [
  "Is the user wearing a bright outfit?"
]
```

<AccordionGroup>
  <Accordion title="visual_awareness_queries examples">
    Queries that Raven evaluates **continuously during the call** (on the order of every second). The answers are fed into the rolling visual context for the LLM, so the replica can respond to what it "sees." This same context also supports the end-of-call summary. You can read the ongoing visual analysis for each user utterance in the [Utterance event](/sections/event-schemas/conversation-utterance) as **user\_visual\_analysis**.

    **When to use:** When you want the replica to pay attention to something visual in real time (e.g. expression, clothing, objects on screen).

    **Example:**

    ```json  theme={null}
    "visual_awareness_queries": [
      "What is the main expression on the user's face?",
      "Is the user wearing a jacket?",
      "Does the user appear distressed or uncomfortable?"
    ]
    ```
  </Accordion>
</AccordionGroup>

### 3. `perception_analysis_queries`

An array of custom queries that Raven processes at the end of the call to generate a visual analysis summary for the user.

<AccordionGroup>
  <Accordion title="perception_analysis_queries examples">
    Queries that are answered **once, at the end of the call**, by looking at what was observed over the whole conversation. They do not affect the call itself—only the content of the end-of-call summary. (Currently the summary is visual only; naming is kept general for future support.)

    **When to use:** When you want the post-call report to answer specific questions (e.g. "Did the user ever have two people on screen?", "How often was the user looking at the screen?").

    **Example:**

    ```json  theme={null}
    "perception_analysis_queries": [
      "On a scale of 1-100, how often was the user looking at the screen?",
      "Is there any indication that more than one person is present?"
    ]
    ```

    The answers are delivered in a [Perception Analysis](/sections/event-schemas/conversation-perception-analysis) event. Example payload:

    ```json  theme={null}
    {
      "properties": {
        "analysis": "**User's Gaze Toward Screen:** The participant looked at the screen approximately 75% of the time.\n\n**Multiple People Present:** No indication of additional participants was detected during the call."
      },
      "conversation_id": "<conversation_id>",
      "event_type": "application.perception_analysis",
      "timestamp": "2025-07-11T09:13:35.361736Z"
    }
    ```
  </Accordion>
</AccordionGroup>

<Note>
  You do not need to set `visual_awareness_queries` in order to use `perception_analysis_queries`.
</Note>

```json  theme={null}
"perception_analysis_queries": [
  "Is the user wearing multiple bright colors?",
  "Is there any indication that more than one person is present?",
  "On a scale of 1-100, how often was the user looking at the screen?"
]
```

<Tip>
  Best practices for `visual_awareness_queries` and `perception_analysis_queries`:

  * Use simple, focused prompts.
  * Use queries that support your persona's purpose.
</Tip>

<Warning>
  All Raven API parameters (queries, prompts, tool definitions, etc.) have a **10,000 character limit** per entry. Entries exceeding this limit will cause an exception.
</Warning>

### 4. `visual_tool_prompt`

Tell Raven when and how to trigger tools based on what it sees.

```json  theme={null}
"visual_tool_prompt":
  "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected."
```

### 5. `visual_tools`

Defines callable functions that Raven can trigger upon detecting specific visual conditions. Each tool must include a `type` and a `function` object detailing its schema.

```json  theme={null}
"visual_tools": [
  {
    "type": "function",
    "function": {
      "name": "notify_if_bright_outfit_shown",
      "description": "Use this function when a bright outfit is detected in the image with high confidence",
      "parameters": {
        "type": "object",
        "properties": {
          "outfit_color": {
            "type": "string",
            "description": "Best guess on what color of outfit it is"
          }
        },
        "required": ["outfit_color"]
      }
    }
  }
]
```

<Note>
  Please see [Tool Calling](/sections/conversational-video-interface/persona/perception-tool) for more details.
</Note>

## Audio Perception Configuration (Raven-1)

The following fields are available when using `raven-1` and enable custom audio-based perception capabilities.

### 6. `audio_awareness_queries`

An array of custom queries that Raven-1 continuously monitors in the audio stream. Use these to track specific audio patterns or user states.

<Note>
  Audio analysis output is limited to 32 tokens per query response.
</Note>

```json  theme={null}
"audio_awareness_queries": [
  "Does the user sound frustrated or confused?",
  "Is the user speaking quickly as if in a hurry?"
]
```

<AccordionGroup>
  <Accordion title="audio_awareness_queries examples">
    Queries that Raven-1 evaluates **continuously during the call** on the audio stream. The answers are passed to the LLM as context so the replica can respond to tone and delivery. You can read the ongoing audio analysis for each user utterance in the [Utterance event](/sections/event-schemas/conversation-utterance) as **user\_audio\_analysis**. (There is no separate end-of-call summary for audio.)

    **When to use:** When you want the replica to react to how the user sounds (e.g. frustrated, confused, in a hurry).

    **Example:**

    ```json  theme={null}
    "audio_awareness_queries": [
      "Does the user sound frustrated or confused?",
      "Is the user speaking quickly as if in a hurry?"
    ]
    ```
  </Accordion>
</AccordionGroup>

### 7. `audio_tool_prompt`

Tell Raven-1 when and how to trigger tools based on what it hears (beyond the automatic emotion analysis).

```json  theme={null}
"audio_tool_prompt":
  "You have a tool to escalate to a human agent when the user sounds very frustrated, named `escalate_to_human`. Use this tool when detecting sustained frustration."
```

### 8. `audio_tools`

Defines callable functions that Raven-1 can trigger based on audio analysis. Each tool must include a `type` and a `function` object detailing its schema.

```json  theme={null}
"audio_tools": [
  {
    "type": "function",
    "function": {
      "name": "escalate_to_human",
      "description": "Escalate the conversation to a human agent when user frustration is detected",
      "parameters": {
        "type": "object",
        "properties": {
          "reason": {
            "type": "string",
            "description": "The reason for escalation"
          }
        },
        "required": ["reason"]
      }
    }
  }
]
```

## Example Configurations

<AccordionGroup>
  <Accordion title="Visual Perception Example">
    This example demonstrates a persona that monitors for visual cues (bright outfits) and triggers a tool when detected.

    ```json  theme={null}
    {
      "persona_name": "Fashion Advisor",
      "system_prompt": "As a Fashion Advisor, you specialize in offering tailored fashion advice.",
      "pipeline_mode": "full",
      "default_replica_id": "rf4e9d9790f0",
      "layers": {
        "perception": {
          "perception_model": "raven-1",
          "visual_awareness_queries": [
            "Is the user wearing a bright outfit?"
          ],
          "perception_analysis_queries": [
            "Is the user wearing multiple bright colors?",
            "On a scale of 1-100, how often was the user looking at the screen?"
          ],
          "visual_tool_prompt": "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected.",
          "visual_tools": [
            {
              "type": "function",
              "function": {
                "name": "notify_if_bright_outfit_shown",
                "description": "Use this function when a bright outfit is detected in the image with high confidence",
                "parameters": {
                  "type": "object",
                  "properties": {
                    "outfit_color": {
                      "type": "string",
                      "description": "Best guess on what color of outfit it is"
                    }
                  },
                  "required": ["outfit_color"]
                }
              }
            }
          ]
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="Audio Perception Example">
    This example demonstrates a persona that monitors user tone and escalates to a human agent when sustained frustration is detected.

    ```json  theme={null}
    {
      "persona_name": "Support Agent",
      "system_prompt": "You are a helpful customer support agent.",
      "pipeline_mode": "full",
      "default_replica_id": "rf4e9d9790f0",
      "layers": {
        "perception": {
          "perception_model": "raven-1",
          "audio_awareness_queries": [
            "Does the user sound frustrated or confused?",
            "Is the user speaking quickly as if in a hurry?"
          ],
          "audio_tool_prompt": "You have a tool to escalate to a human agent when the user sounds very frustrated, named `escalate_to_human`. Use this tool when detecting sustained frustration.",
          "audio_tools": [
            {
              "type": "function",
              "function": {
                "name": "escalate_to_human",
                "description": "Escalate the conversation to a human agent when user frustration is detected",
                "parameters": {
                  "type": "object",
                  "properties": {
                    "reason": {
                      "type": "string",
                      "description": "The reason for escalation"
                    }
                  },
                  "required": ["reason"]
                }
              }
            }
          ]
        }
      }
    }
    ```
  </Accordion>
</AccordionGroup>

<Note>
  Please see the <a href="/api-reference/personas/create-persona" target="_blank">Create a Persona</a> endpoint for more details.
</Note>


Built with [Mintlify](https://mintlify.com).