> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Perception Tool Call Event

> This is an event broadcasted by Tavus.

A perception tool call event is broadcast when a perception tool is triggered by Raven based on **visual** or **audio** input. The event always includes `eventType` `conversation.perception_tool_call`, a `modality` in `data.properties` (`"vision"` or `"audio"`), the tool `name`, and `arguments`.

**Modality-specific payload:**
- **`modality: "audio"`** — Triggered by audio tools (`audio_tool_prompt` / `audio_tools`). `arguments` is a JSON **string** (e.g. `"{\"reason\":\"The user said …\"}"`). There is no `frames` array.
- **`modality: "vision"`** — Triggered by visual tools (`visual_tool_prompt` / `visual_tools`). `arguments` is an **object** with tool-defined fields. Includes a `frames` array of objects with `data` (base64-encoded JPEG) and `mime_type` (e.g. `"image/jpeg"`) for the images that triggered the call.

Perception tool calls can be used to trigger automated actions in response to visual or audio cues detected by the Raven perception system.


For more on configuring perception tool calls, see [Tool Calling for Perception](/sections/conversational-video-interface/persona/perception-tool) and [Perception](/sections/conversational-video-interface/persona/perception).

## Example: audio tool call

When an **audio** tool is triggered (e.g. sarcasm detection), the event looks like:

```json  theme={null}
{
  "timestamp": "2026-03-02T21:51:47.194Z",
  "eventType": "conversation.perception_tool_call",
  "data": {
    "conversation_id": "c58b46f8646d943f",
    "event_type": "conversation.perception_tool_call",
    "message_type": "conversation",
    "properties": {
      "arguments": "{\"reason\":\"The user said \\\"well, yeah\\\"\"}",
      "modality": "audio",
      "name": "notify_sarcasm_detected"
    }
  }
}
```

## Example: vision tool call

When a **visual** tool is triggered (e.g. hat detection), the event includes `frames` with base64-encoded images. The `data` values in the example are shortened for readability.

```json  theme={null}
{
  "timestamp": "2026-03-02T21:51:49.730Z",
  "eventType": "conversation.perception_tool_call",
  "data": {
    "conversation_id": "c58b46f8646d943f",
    "event_type": "conversation.perception_tool_call",
    "message_type": "conversation",
    "properties": {
      "arguments": {
        "hat_type": "baseball cap"
      },
      "frames": [
        { "data": "<base64-encoded-jpeg>", "mime_type": "image/jpeg" },
        { "data": "<base64-encoded-jpeg>", "mime_type": "image/jpeg" }
      ],
      "modality": "vision",
      "name": "notify_hat_detected"
    }
  }
}
```


Built with [Mintlify](https://mintlify.com).