> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Tool Calling for Perception

> Define vision and audio tools that fire when Raven sees or hears something during a conversation.

**Perception tool calling** lets the PAL trigger functions based on **visual** or **audio** cues the perception model (Raven) detects during a conversation, in parallel with the main LLM turn. Tools are reusable objects: create them once, and attach them to any number of PALs. The only difference from LLM tools is `origin`.

<Note>
  This page documents the **tools registry** (`/v2/tools` with `origin: "vision"` or `"audio"`). If your PAL still embeds tools under `layers.perception.visual_tools` or `layers.perception.audio_tools`, see [Legacy inline tool calling](/sections/troubleshooting#legacy-inline-tool-calling).
</Note>

<Note>
  Perception tool calling is only available with **Raven** (`perception_model: "raven-1"` on the PAL's `perception` layer).
</Note>

## How Perception Tools Work

Perception runs as a **parallel step** alongside the conversational LLM. Raven analyses the audio and video streams continuously and fires a tool the moment it detects something matching one of the tool descriptions you defined.

There are two flavors, picked via the tool's `origin`:

* **Vision tools** (`origin: "vision"`) - triggered by what Raven **sees** in the video stream (e.g. an ID card, a bright outfit, a hat).
* **Audio tools** (`origin: "audio"`) - triggered by what Raven **hears** in the audio stream (e.g. sarcasm, sustained frustration).

Because perception runs in parallel, **the PAL keeps speaking and listening normally** while a perception tool dispatches. Perception tools are **fire-and-forget**: the PAL does not pause, fill, or react to the result on the conversational side.

## Defining a Perception Tool

The `name`, `description`, `parameters`, and `delivery` fields work the same way they do for LLM tools - see [Tool Calling for LLM](/sections/conversational-video-interface/pal/llm-tool#tool-object) for the full reference.

| Field         | Type   | Required | Description                                                                            |
| ------------- | ------ | -------- | -------------------------------------------------------------------------------------- |
| `name`        | string | ✅        | Unique identifier, scoped to your account. Must match `^[a-zA-Z_][a-zA-Z0-9_]{0,63}$`. |
| `description` | string | ✅        | What Raven should look or listen for. Be specific - this is what triggers the tool.    |
| `parameters`  | object | ❌        | JSON Schema for the arguments Raven extracts when the cue is detected.                 |
| `origin`      | string | ✅        | `"vision"` or `"audio"`.                                                               |
| `delivery`    | object | ❌        | Defaults to `{"app_message": true}`. API is also supported (same shape as LLM tools).  |

<Note>
  You do **not** need to set `on_call`, `on_resolve`, or `static_filler` on a perception tool. Omit them and the API applies the only allowed values (`null`, `"fire_and_forget"`, `null` respectively). Passing any other value returns a 400.
</Note>

## Vision Tool Example

```bash Create a vision tool [expandable] theme={null}
curl --request POST \
  --url https://tavusapi.com/v2/tools \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
    "name": "notify_if_id_shown",
    "description": "Trigger when a driver'\''s license or passport is clearly visible in the video stream with high confidence.",
    "parameters": {
      "type": "object",
      "properties": {
        "id_type": {
          "type": "string",
          "description": "Best guess on what type of ID it is"
        }
      },
      "required": ["id_type"]
    },
    "origin": "vision"
  }'
```

When Raven detects an ID in frame, your application receives a [`conversation.perception_tool_call`](/sections/event-schemas/conversation-perception-tool-call) event with `modality: "vision"`, the `name`, structured `arguments`, and a `frames` array of base64-encoded images that triggered the call.

## Audio Tool Example

```bash Create an audio tool [expandable] theme={null}
curl --request POST \
  --url https://tavusapi.com/v2/tools \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
    "name": "notify_sarcasm_detected",
    "description": "Trigger when the user'\''s tone or phrasing suggests sarcasm.",
    "parameters": {
      "type": "object",
      "properties": {
        "reason": {
          "type": "string",
          "description": "Why you detected sarcasm (e.g. what the user said)"
        }
      },
      "required": ["reason"]
    },
    "origin": "audio"
  }'
```

When Raven hears the cue, your application receives a [`conversation.perception_tool_call`](/sections/event-schemas/conversation-perception-tool-call) event with `modality: "audio"` and the structured `arguments`.

## Attaching to a PAL

Perception tools are attached the same way as LLM tools:

```bash Attach perception tools theme={null}
curl --request POST \
  --url https://tavusapi.com/v2/pals/{pal_id}/tools \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
    "tool_ids": ["tabc123def456"]
  }'
```

The same PAL can hold both LLM and perception tools. Make sure the PAL's `perception` layer has `perception_model: "raven-1"` for vision and audio tools to fire.

## Delivery

Perception tools use the same `delivery` field as LLM tools - see [Tool Delivery](/sections/conversational-video-interface/pal/llm-tool-delivery) and [Tool Authentication](/sections/conversational-video-interface/pal/llm-tool-auth). The only perception-specific bit: the app-message event is `conversation.perception_tool_call` (not `conversation.tool_call`).

<Note>
  Because perception tools are fire-and-forget, the response body your API returns is **not consumed** by the conversational LLM. A `2xx` is enough to acknowledge receipt; a non-2xx is logged but does not affect the conversation.
</Note>

<Note>
  Replace `<api-key>` with your actual API key. You can generate one in the <a href="https://maker.tavus.io/dev/api-keys" target="_blank">PAL Maker</a>.
</Note>
