> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Persona

> Creates a persona that configures how a replica behaves and sounds in CVI for every conversation that uses that persona.

<Info>
  For AI agents, use `https://docs.tavus.io/openapi.yaml` for the full HTTP API contract.
</Info>


## OpenAPI

````yaml post /v2/personas
openapi: 3.0.3
info:
  title: Tavus Developer API Collection
  version: 1.0.0
  contact: {}
servers:
  - url: https://tavusapi.com
security:
  - apiKey: []
tags:
  - name: Videos
  - name: Replicas
  - name: Voices
  - name: Conversations
  - name: Personas
  - name: Pronunciation Dictionaries
  - name: Replacements
  - name: Transcriptions
  - name: Documents
paths:
  /v2/personas:
    post:
      tags:
        - Personas
      summary: Create Persona
      description: >-
        Creates a persona that configures how a replica behaves and sounds in
        CVI for every conversation that uses that persona.
      operationId: createPersona
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                persona_name:
                  type: string
                  description: A name for the persona.
                  example: Life Coach
                system_prompt:
                  type: string
                  description: >-
                    This is the system prompt that will be used by the llm.
                    **Each request must have a `system_prompt` value unless
                    you're using echo mode**.
                  example: >-
                    As a Life Coach, you are a dedicated professional who
                    specializes in...
                pipeline_mode:
                  type: string
                  description: >-
                    The pipeline mode to use for the persona. Possible values:
                    `full`, `echo`. `full` will provide the default end-to-end
                    experience. `echo` will turn off most steps, and allow the
                    replica to sync video with audio passed in through Echo
                    events, which it will speak out.
                  enum:
                    - full
                    - echo
                default_replica_id:
                  type: string
                  description: >-
                    The default replica_id associated with this persona if one
                    exists. When creating a conversation, a persona_id with a
                    default_replica_id associated can we used to create a
                    conversation without specifying a replica_id.
                  example: r90bbd427f71
                document_ids:
                  type: array
                  description: >-
                    Array of document IDs that the persona will have access to.
                    These documents will be available to the persona in all
                    their conversations. The `document_ids` are returned in the
                    response of the [Get
                    Document](/api-reference/documents/get-document) and the
                    [Create Document](/api-reference/documents/create-document)
                    endpoints.
                  items:
                    type: string
                  example:
                    - d1234567890
                    - d2468101214
                document_tags:
                  type: array
                  description: >-
                    Array of document tags that the persona will have access to.
                    Documents matching these tags will be available to the
                    persona in all their conversations. The tags are passed in
                    the `document_tags` parameter of the [Create
                    Document](/api-reference/documents/create-document)
                    endpoint. As soon as one document has the tag, you will be
                    able to pass the tags in this parameter..
                  items:
                    type: string
                  example:
                    - product_info
                    - company_policies
                objectives_id:
                  type: string
                  description: >-
                    The unique identifier of the objectives to attach to this
                    persona. Objectives provide goal-oriented instructions that
                    help guide conversations toward specific outcomes. Create
                    objectives using the [Create
                    Objectives](/api-reference/objectives/create-objectives)
                    endpoint.
                  example: o12345
                guardrail_ids:
                  type: array
                  maxItems: 50
                  description: >-
                    Array of guardrail IDs enforced during this persona's
                    conversations. Up to 50 per persona. Guardrail IDs are
                    returned by [Create
                    Guardrails](/api-reference/guardrails/create-guardrails) and
                    [Get Guardrails](/api-reference/guardrails/get-guardrails).
                  items:
                    type: string
                  example:
                    - g1234567890ab
                    - g0987654321cd
                guardrail_tags:
                  type: array
                  maxItems: 50
                  description: >-
                    Array of guardrail tags. Any guardrail you own with a
                    matching tag is attached to this persona dynamically. Up to
                    50 tags per persona, and a persona can have at most 50
                    guardrails total.
                  items:
                    type: string
                  example:
                    - compliance
                    - healthcare
                guardrails_id:
                  type: string
                  description: >-
                    **Legacy.** The unique identifier of a guardrail set to
                    attach to this persona. New integrations should use
                    `guardrail_ids` / `guardrail_tags` instead — see [Legacy
                    guardrail
                    sets](/api-reference/guardrails/legacy-guardrail-sets).
                  example: g12345
                layers:
                  type: object
                  description: >
                    Optional nested settings for each CVI pipeline layer
                    (perception, STT, conversational flow, LLM, TTS). For an
                    overview of what each layer controls, see [Persona overview
                    — CVI
                    layers](/sections/conversational-video-interface/persona/overview#cvi-layer).
                  properties:
                    perception:
                      type: object
                      properties:
                        perception_model:
                          type: string
                          description: >-
                            The perception model to use. `raven-1` (default and
                            recommended) provides real-time emotional
                            understanding from user audio, more natural and
                            human-like interactions, plus all visual
                            capabilities from raven-0. `raven-0` (legacy
                            settings
                            [here](/sections/troubleshooting#migration-from-legacy-perception-to-raven-1))
                            offers advanced visual perception only. `off`
                            disables all perception.
                          enum:
                            - raven-1
                            - raven-0
                            - 'off'
                          default: raven-1
                          example: raven-1
                        visual_awareness_queries:
                          type: array
                          description: >-
                            Custom queries that Raven continuously monitors in
                            the visual stream. These provide ambient visual
                            context without requiring explicit prompting.
                          items:
                            type: string
                          example:
                            - Is the user showing an ID card?
                            - Does the user appear distressed or uncomfortable?
                        visual_tool_prompt:
                          type: string
                          description: >-
                            A prompt that details how and when to use visual
                            tools based on what Raven sees. This helps the
                            replica understand the context of the visual tools.
                          example: >-
                            You have a tool to notify the system when an ID card
                            is detected, named `notify_if_id_shown`. You MUST
                            use this tool when a form of ID is detected.
                        visual_tools:
                          type: array
                          description: >-
                            Tools that can be triggered based on visual context,
                            enabling automated actions in response to visual
                            cues.
                          items:
                            type: object
                            properties:
                              name:
                                type: string
                                description: The name of the tool to be called.
                              description:
                                type: string
                                description: >-
                                  A description of what the tool does and when
                                  it should be called.
                          example:
                            - type: function
                              function:
                                name: notify_if_id_shown
                                description: >-
                                  Use this function when a drivers license or
                                  passport is detected in the image with high
                                  confidence. After collecting the ID,
                                  internally use final_ask()
                                parameters:
                                  type: object
                                  properties:
                                    id_type:
                                      type: string
                                      description: best guess on what type of ID it is
                                  required:
                                    - id_type
                        audio_awareness_queries:
                          type: array
                          description: >-
                            Custom queries that Raven-1 continuously monitors in
                            the audio stream. These provide ambient audio
                            context such as user tone and emotional state. Only
                            available with `raven-1`.
                          items:
                            type: string
                          example:
                            - Does the user sound frustrated or confused?
                            - Is the user speaking quickly as if in a hurry?
                        audio_tool_prompt:
                          type: string
                          description: >-
                            A prompt that details how and when to use audio
                            tools based on what Raven-1 hears. Only available
                            with `raven-1`.
                          example: >-
                            You have a tool to escalate to a human agent when
                            the user sounds very frustrated, named
                            `escalate_to_human`. Use this tool when detecting
                            sustained frustration.
                        audio_tools:
                          type: array
                          description: >-
                            Tools that can be triggered based on audio analysis,
                            enabling automated actions in response to user tone
                            and emotion. Only available with `raven-1`.
                          items:
                            type: object
                            properties:
                              name:
                                type: string
                                description: The name of the tool to be called.
                              description:
                                type: string
                                description: >-
                                  A description of what the tool does and when
                                  it should be called.
                          example:
                            - type: function
                              function:
                                name: escalate_to_human
                                description: >-
                                  Escalate the conversation to a human agent
                                  when user frustration is detected
                                parameters:
                                  type: object
                                  properties:
                                    reason:
                                      type: string
                                      description: The reason for escalation
                                  required:
                                    - reason
                    stt:
                      type: object
                      description: >
                        **Note**: Turn-taking is now configured on the
                        [Conversational Flow
                        layer](/sections/conversational-video-interface/persona/conversational-flow).
                      properties:
                        stt_engine:
                          type: string
                          description: >-
                            The STT engine used for transcription. `tavus-auto`
                            (default, recommended) automatically selects the
                            best model for the conversation's language.
                            `tavus-parakeet` offers highest throughput and
                            lowest latency for English and European languages.
                            `tavus-soniox` is purpose-built for Indian languages
                            with broad multilingual coverage. `tavus-whisper`
                            provides broad multilingual coverage across all
                            supported languages. `tavus-deepgram-medical` is
                            domain-specific English STT optimized for clinical
                            and healthcare vocabulary. `tavus-advanced` is
                            deprecated and not recommended for new integrations.
                            See the [STT layer
                            documentation](/sections/conversational-video-interface/persona/stt)
                            for details.
                          enum:
                            - tavus-auto
                            - tavus-parakeet
                            - tavus-soniox
                            - tavus-whisper
                            - tavus-deepgram-medical
                            - tavus-advanced
                          default: tavus-auto
                          example: tavus-auto
                        hotwords:
                          type: string
                          description: >
                            The hotwords parameter lets you provide example
                            phrases that guide the STT model to prioritize
                            certain words or phrases—especially names, technical
                            terms, or uncommon language. For instance, including
                            "Roey is the name of the person you're speaking
                            with" helps the model transcribe "Roey" correctly
                            instead of "Rowie."
                          example: Roey is the name of the person you're speaking with.
                    conversational_flow:
                      type: object
                      description: >-
                        Controls conversational flow dynamics for the replica.
                        When not explicitly provided, all fields default to None
                        (turned off). If any parameter is provided, sensible
                        defaults are applied to all other parameters. See more
                        details
                        [here](/sections/conversational-video-interface/persona/conversational-flow).
                      properties:
                        turn_detection_model:
                          type: string
                          description: >-
                            The model used for turn detection. Options include
                            `sparrow-1` (recommended) for advanced turn
                            detection that is faster, more accurate, and more
                            natural, and `sparrow-0` (legacy) for standard turn
                            detection. Default is `sparrow-1` when any
                            conversational flow parameter is provided.
                          enum:
                            - sparrow-1
                            - sparrow-0
                          example: sparrow-1
                        turn_taking_patience:
                          type: string
                          description: >-
                            Controls how eagerly and quickly the replica claims
                            conversational turns. Affects both response latency
                            and likelihood of interrupting during natural
                            pauses. `low` = eager and quick to respond, may
                            interrupt pauses; `medium` (default) = balanced;
                            `high` = patient, waits for clear turn completion.
                          enum:
                            - low
                            - medium
                            - high
                          example: medium
                        replica_interruptibility:
                          type: string
                          description: >-
                            Controls how sensitive the replica is to user speech
                            while the replica is talking. Determines whether the
                            replica stops to listen or keeps speaking. `low` =
                            keeps talking, less interruptible; `medium`
                            (default) = balanced; `high` = stops easily, more
                            interruptible.
                          enum:
                            - low
                            - medium
                            - high
                          example: medium
                        voice_isolation:
                          type: string
                          description: >-
                            Controls the voice isolation model used on
                            participant audio. Voice isolation separates speech
                            from background noise in the participant's
                            microphone audio. `near` (default) = separates
                            speech from background noise for scenarios where the
                            user is less than 1 meter away from the microphone;
                            `off` = no voice isolation, raw audio is sent down
                            the conversational pipeline. Default is `near`.
                          enum:
                            - 'off'
                            - near
                          default: near
                          example: near
                        wake_phrase:
                          type: string
                          description: >-
                            A specific phrase the persona listens for before
                            responding. When set, the persona remains silent
                            until it hears the wake phrase, similar to a voice
                            assistant. The persona still records all user
                            utterances in the transcript so it has full
                            conversation context when it does respond. Choose a
                            phrase that is unique enough to avoid
                            over-triggering (avoid generic greetings like
                            `Hey`). Default is `None` (disabled).
                          example: Hey Siri
                        idle_engagement:
                          type: string
                          description: >-
                            Controls whether the replica proactively re-engages
                            the user after a stretch of silence, and how
                            eagerly. `off` (default) = the replica never breaks
                            silence; `patient` = re-engages after longer
                            silences, suited to tutors or contemplative use
                            cases; `eager` = re-engages after shorter silences,
                            suited to SDR or sales-style use cases.
                          enum:
                            - 'off'
                            - patient
                            - eager
                          default: 'off'
                          example: 'off'
                    llm:
                      type: object
                      properties:
                        model:
                          type: string
                          description: >
                            The model name that will be used by the LLM.
                            **tavus-gpt-oss** is recommended as a good starting
                            point. Other Tavus-hosted options include
                            tavus-gemini-2.5-flash, tavus-claude-haiku-4.5,
                            tavus-gpt-5.2, and tavus-gemini-3-flash. See the
                            [LLM layer
                            documentation](/sections/conversational-video-interface/persona/llm)
                            for a full comparison.


                            For your own OpenAI-compatible LLM, provide a
                            `model`, `base_url`, and `api_key`.


                            **Context window:** Performance and intelligence are
                            best when prompts are limited to 5,000 tokens.
                            Degradations in speed and instruction following may
                            occur in the 15,000–20,000 token range. Tavus-hosted
                            models support up to 32,000 tokens. Tip: 1 token ≈ 4
                            characters.
                        base_url:
                          type: string
                          description: The base url for your OpenAI compatible endpoint.
                          example: your-base-url
                        api_key:
                          type: string
                          description: The API key for the OpenAI compatible endpoint.
                          example: your-api-key
                        speculative_inference:
                          type: boolean
                          description: >-
                            When set to `true`, the LLM begins processing speech
                            transcriptions before user input ends, improving
                            responsiveness. Default is `true`.
                          example: true
                          default: true
                        tools:
                          type: array
                          description: >-
                            Optional tools to provide to your custom LLM - click
                            [here](/sections/conversational-video-interface/persona/llm-tool)
                            for more details.
                          example:
                            - type: function
                              function:
                                name: get_current_weather
                                description: Get the current weather in a given location
                                parameters:
                                  type: object
                                  properties:
                                    location:
                                      type: string
                                      description: >-
                                        The city and state, e.g. San Francisco,
                                        CA
                                    unit:
                                      type: string
                                      enum:
                                        - celsius
                                        - fahrenheit
                                  required:
                                    - location
                        headers:
                          type: object
                          description: Optional headers to provide to your custom LLM
                          example:
                            Authorization: Bearer your-api-key
                        extra_body:
                          type: object
                          description: >
                            Optional parameters to customize the LLM request. 


                            For Tavus-hosted models, you can pass `temperature`
                            and `top_p`:

                            - `temperature`: Controls randomness in the model's
                            output. Range typically 0.0 to 2.0. Lower values
                            make output more deterministic and focused, higher
                            values make it more creative and varied.

                            - `top_p`: Controls diversity via nucleus sampling.
                            Range 0.0 to 1.0. Lower values make output more
                            focused on high-probability tokens, higher values
                            allow more diverse token selection.


                            For custom LLMs, you can pass any parameters that
                            your LLM provider supports (e.g., `temperature`,
                            `top_p`, `frequency_penalty`, etc.).
                          example:
                            temperature: 0.7
                            top_p: 0.9
                    tts:
                      type: object
                      properties:
                        api_key:
                          type: string
                          description: >
                            The API key for the chosen TTS provider. Only
                            required when using private voices.


                            **ElevenLabs:** When using pronunciation
                            dictionaries with your own ElevenLabs key, the key
                            must have the `pronunciation_dictionaries_write`
                            scope (or full account access). See [ElevenLabs API
                            key
                            scopes](https://elevenlabs.io/docs/api-reference/service-accounts/api-keys/create).


                            **Cartesia:** No additional scope required — any
                            valid Cartesia API key works.
                          example: your-api-key
                        tts_engine:
                          type: string
                          description: The TTS engine that will be used.
                          enum:
                            - cartesia
                            - elevenlabs
                        external_voice_id:
                          type: string
                          description: >-
                            The voice ID used for the TTS engine when you want
                            to customize your replica's voice. Choose from
                            Cartesia's stock voices by referring to their [Voice
                            Catalog](https://docs.cartesia.ai/api-reference/voices/list),
                            or if you want more options you can consider
                            [ElevenLabs](https://elevenlabs.io/docs/api-reference/voices/get-all).
                          example: external-voice-id
                        voice_settings:
                          type: object
                          description: >
                            Optional voice settings to customize TTS behavior.
                            For Cartesia we support inline Cartesia SSML
                            settings
                            (https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags).
                            For ElevenLabs we support: `speed` (0.7–1.2),
                            `stability` (0.0–1.0), `similarity_boost` (0.0–1.0),
                            `style` (0.0–1.0), `use_speaker_boost` (boolean).
                            See [ElevenLabs Voice
                            Settings](https://elevenlabs.io/docs/api-reference/voices/settings/get).
                          example:
                            speed: 0.5
                            emotion:
                              - positivity:high
                              - curiosity
                        tts_emotion_control:
                          type: boolean
                          description: >-
                            When true, Tavus automatically handles LLM prompting
                            for emotion tags, enabling expressive vocal delivery
                            and natural emotional facial movements (only
                            available with Phoenix-4 replicas). Defaults to
                            true.
                          example: true
                          default: true
                        tts_model_name:
                          type: string
                          description: >-
                            The model name that will be used by the TTS engine.
                            Please double check this with the TTS provider you
                            are using to ensure valid model names.
                          example: sonic-3
                        pronunciation_dictionary_id:
                          type: string
                          description: >
                            The unique identifier of a Tavus pronunciation
                            dictionary to attach to this persona. Tavus will
                            apply the dictionary's rules at conversation time.


                            Provider-specific dictionary IDs are managed
                            internally by Tavus and are not exposed in GET
                            responses — only this field is visible.
                          example: pd_abc123def456
            examples:
              Required Parameters Only:
                value:
                  pipeline_mode: full
                  system_prompt: >-
                    As a Life Coach, you are a dedicated professional who
                    specializes in...
              Full Customizations:
                value:
                  persona_name: Life Coach
                  system_prompt: >-
                    As a Life Coach, you are a dedicated professional who
                    specializes in...
                  pipeline_mode: full
                  default_replica_id: r90bbd427f71
                  layers:
                    llm:
                      model: tavus-gpt-oss
                      speculative_inference: true
                      tools:
                        - type: function
                          function:
                            name: life_coach_insight
                            description: >-
                              Offer personalized life coaching advice or
                              guidance based on a user's challenge or goal.
                            parameters:
                              type: object
                              properties:
                                topic:
                                  type: string
                                  description: >-
                                    The area of life or goal the user wants to
                                    improve (e.g. career, relationships,
                                    confidence)
                                urgency_level:
                                  type: string
                                  enum:
                                    - low
                                    - medium
                                    - high
                              required:
                                - topic
                    tts:
                      tts_engine: cartesia
                      voice_settings:
                        speed: normal
                        emotion:
                          - positivity:high
                          - curiosity
                      tts_emotion_control: true
                      tts_model_name: sonic-3
                    perception:
                      perception_model: raven-1
                      visual_awareness_queries:
                        - Is the user showing an ID card?
                        - Does the user appear distressed or uncomfortable?
                      visual_tool_prompt: >-
                        You have a tool to notify the system when an ID card is
                        detected, named `notify_if_id_shown`. You MUST use this
                        tool when a form of ID is detected.
                      visual_tools:
                        - type: function
                          function:
                            name: notify_if_id_shown
                            description: >-
                              Use this function when a drivers license or
                              passport is detected in the image with high
                              confidence. After collecting the ID, internally
                              use final_ask()
                            parameters:
                              type: object
                              properties:
                                id_type:
                                  type: string
                                  description: best guess on what type of ID it is
                              required:
                                - id_type
                      audio_awareness_queries:
                        - Does the user sound frustrated or confused?
                    stt:
                      stt_engine: tavus-auto
                    conversational_flow:
                      turn_detection_model: sparrow-1
                      turn_taking_patience: medium
                      turn_commitment: medium
                      replica_interruptibility: high
                      voice_isolation: near
                      idle_engagement: 'off'
                    document_ids:
                      - d1234567890
                      - d2468101214
                    document_tags:
                      - product_info
                      - company_policies
      responses:
        '200':
          description: ''
          content:
            application/json:
              schema:
                type: object
                properties:
                  persona_id:
                    type: string
                    description: A unique identifier for the persona.
                    example: pcb7a34da5fe
                  persona_name:
                    type: string
                    description: The name of the persona.
                    example: Life Coach
                  created_at:
                    type: string
                    description: The date and time the persona was created.
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: The error message.
                    example: Invalid replica_uuid
        '401':
          description: UNAUTHORIZED
          content:
            application/json:
              schema:
                type: object
                properties:
                  message:
                    type: string
                    description: The error message.
                    example: Invalid access token
      security:
        - apiKey: []
components:
  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: x-api-key

````