The Perception Layer in Tavus enhances AI agent with real-time visual understanding. By using Raven, the AI agent becomes more context-aware, responsive, and capable of triggering actions based on visual input.

Configuring the Perception Layer

To configure the Perception Layer, define the following parameters within the layers.perception object:

1. perception_model

Specifies the perception model to use.

  • Options:
    • raven-0 (default and recommended): Advanced visual capabilities, including screen share support, ambient queries, and perception tools.
    • basic: Legacy model with limited features.
    • off: Disables the perception layer.

Screen Share Feature:When using raven-0, screen share feature is enabled by default without additional configuration.

"layers": {
  "perception": {
    "perception_model": "raven-0"
  }
}

2. ambient_awareness_queries

An array of custom queries that raven-0 continuously monitors in the visual stream.

Best practices for custom queries:

  • Use simple, focused prompts.
  • Use queries that support your persona’s purpose.
"ambient_awareness_queries": [
  "Is the user wearing a bright outfit?"
]

3. perception_tool_prompt

Tell raven-0 when and how to trigger tools based on what it sees.

"perception_tool_prompt":
  "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected."

4. perception_tools

Defines callable functions that raven-0 can trigger upon detecting specific visual conditions. Each tool must include a type and a function object detailing its schema.

"perception_tools": [
  {
    "type": "function",
    "function": {
      "name": "notify_if_bright_outfit_shown",
      "description": "Use this function when a bright outfit is detected in the image with high confidence",
      "parameters": {
        "type": "object",
        "properties": {
          "outfit_color": {
            "type": "string",
            "description": "Best guess on what color of outfit it is"
          }
        },
        "required": ["outfit_color"]
      }
    }
  }
]

Please see Tool/Function Calling for more details.

End-of-call Perception Analysis

raven-0 generates a visual summary at the end of a call. This summary includes all detected visual artifacts and can be sent as:

This feature is exclusive to personas with raven-0 specified in the Perception Layer.

Example Use Case

This example demonstrates a persona designed to identify when a user wears a bright outfit and triggers an internal action accordingly.

{
  "persona_name": "Fashion Advisor",
  "system_prompt": "As a Fashion Advisor, you specialize in offering tailored fashion advice.",
  "pipeline_mode": "full",
  "context": "You're having a video conversation with a client about their outfit.",
  "default_replica_id": "r79e1c033f",
  "layers": {
    "perception": {
      "perception_model": "raven-0",
      "ambient_awareness_queries": [
        "Is the user wearing a bright outfit?"
      ],
      "perception_tool_prompt": "You have a tool to notify the system when a bright outfit is detected, named `notify_if_bright_outfit_shown`. You MUST use this tool when a bright outfit is detected.",
      "perception_tools": [
        {
          "type": "function",
          "function": {
            "name": "notify_if_bright_outfit_shown",
            "description": "Use this function when a bright outfit is detected in the image with high confidence",
            "parameters": {
              "type": "object",
              "properties": {
                "outfit_color": {
                  "type": "string",
                  "description": "Best guess on what color of outfit it is"
                }
              },
              "required": ["outfit_color"]
            }
          }
        }
      ]
    }
  }
}

Please see the Create a Persona endpoint for more details.