> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Replica Training

> Guide to recording a high-quality training video for generating Phoenix-4 replicas.

You can record the Replica training video directly in the [Developer Portal](https://platform.tavus.io/dev/replicas/create) or upload a pre-recorded one via the API.

<Note>
  The following instructions have changed to work best for **Phoenix-4** (new default model).

  Here are the **KEY DIFFERENCES**:

  * <Icon icon="ear-listen" /> **Listening minute** must be fully neutral with **lips closed** the entire time
  * <Icon icon="user-vneck-hair-long" /> **Neck** and **jawline** must be **fully visible** with clear clothing separation and hair kept **off the face and neck**
  * <Icon icon="teeth" /> **Teeth** must be clearly **visible during speaking** with **strong articulation**
  * <Icon icon="frame" /> **Framing** must be stable, **waist-up**, seated, with **minimal movement**

  <Icon icon="lightbulb-exclamation" /> **Phoenix-4** is a **more precise model** and requires high quality training footage to yield the best results, whereas Phoenix-3 has a slightly higher tolerance. To train on Phoenix-3, set `model_name` to `phoenix-3`.
</Note>

## Talking Head Replica

To ensure the highest quality Phoenix-4 replica, your training video must follow the specifications outlined below.

### Environment

* Record in a quiet, well-lit space with no background noise or movement.
* Use diffuse lighting to avoid shadows on your face.
* Choose a simple background and avoid any moving people or objects.

### Camera

* Place the camera at eye level and ensure your face fills at least 25% of the frame.
* Use a desktop recording app (e.g., **QuickTime** on Mac or **Camera** on Windows) — avoid browser-based tools.
* **Minimum resolution**: 1080p. Anything lower may negatively impact replica quality.

### Microphone

* Use your device’s built-in microphone.
* **Avoid** high-end mics or wireless earbuds like AirPods.
* Turn off audio effects like noise suppression or EQ adjustments.

### Framing & Distance

Your framing should resemble a natural Zoom-style call.

**Positioning**

* Record from the waist up
* Be seated at a desk or table
* Position yourself at least 3 feet from the camera to avoid being too close to the lens

**Camera Setup**

* Camera should be stable (no handheld movement)
* Face centered in frame
* Head, shoulders, and upper chest clearly visible

### Yourself

<Frame>
    <img src="https://mintcdn.com/tavus/_cI_e0wGUkj7b2SY/images/replica-training/charlie.png?fit=max&auto=format&n=_cI_e0wGUkj7b2SY&q=85&s=5280478561a274ea807c3cc05d3b0afa" alt="" width="2087" height="1177" data-path="images/replica-training/charlie.png" />
</Frame>

| ✅ Do                                                                                  | ❌ Don’t                                                              |
| ------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| Keep your full head visible, with a clear view of your face                           | Wear clothes that blend into the background                          |
| Ensure your face and upper body are in sharp focus                                    | Wear accessories like necklaces, hats, glasses, scarves, or earrings |
| If using smartphone, make sure you follow the same framing/distance from the camera   | Turn your head away from the camera                                  |
| Keep longer hair behind shoulders, and tuck in any loose strands in front of the face | Block your chin or mouth with your microphone                        |
| Sit upright in a stable, seated position                                              | Stand or shift positions during the video                            |

### Head & Clothing Separation

There must be a clear visual distinction between your head and clothing, and your neck fully visible.

<Frame>
    <img src="https://mintcdn.com/tavus/YTkRCg4w3gZlQwab/images/replica-training/head-clothing-separation.png?fit=max&auto=format&n=YTkRCg4w3gZlQwab&q=85&s=596e3a6e6a3ce6542593b047f61a4d32" alt="" width="955" height="459" data-path="images/replica-training/head-clothing-separation.png" />
</Frame>

* No overlap between neck and clothing
* Avoid high collars or obstructive clothing
* Ensure the jawline and neck are fully visible

### Hair Guidelines

* Avoid complex hairstyles
* No bangs covering the forehead
* Tuck or pin loose strands
* Longer hair must fall behind the shoulders
* Hair should not obscure the face, neck, or shoulders

### Video Format

If you're uploading a pre-recorded training video via our <a href="/api-reference/phoenix-replica-model/create-replica" target="_blank">API</a>

, ensure it meets the following requirements:

* **Minimum FPS**: 25 fps
* **Accepted formats**:
  * `webm`
  * `mp4` with **H.264** video codec and **AAC** audio codec
* **Maximum file size**: 750MB
* **Minimum resolution**: 1080p (lower may negatively impact replica quality)

### Consent Statement

If you're creating a **personal replica**, you must include a verbal consent statement in the video. This ensures ethical use and compliance with data protection laws. Consent is not required for AI-generated training videos.

Say the following script clearly in your video:

> I, (your name), am currently speaking and give consent to Tavus to create an AI clone of me by using the audio and video samples I provide. I understand that this AI clone can be used to create videos that look and sound like me.

<Note>
  Consent is **only required for personal replicas**. If you're creating an **AI replica** or using AI-generated training video, you can skip this.
</Note>

## Recording Structure

Your video must be **one continuous shot**, containing **1 minute of speaking** followed by **1 minute of listening**. You can use a script provided by Tavus or speak on any topic of your choice.

<Tip>
  **Pro tips**:

  * Keep body and head movements subtle
  * Avoid heavy hand gestures
  * Only one person should appear in the video
</Tip>

<Steps>
  <Step title="Opening">
    * Begin with a big smile showing upper and lower teeth
    * Maintain direct eye contact with the camera for approximately 1 second
  </Step>

  <Step title="Speaking Segment (1 Minute)">
    * Speak on any topic — content does not matter
    * Open your mouth clearly when speaking
    * Enunciate well, ensuring all teeth are fully visible
    * Keep visible space between your top and bottom teeth
    * Keep head and body movement minimal
    * Avoid hand gestures
    * Avoid sudden head turns

    Sample script (optional):

    ```txt expandable theme={null}
    Once upon a time, people built a perfect park in the middle of a busy city. This park was big, bright, and full of playful paths. At sunrise, birds sang above the tall trees. Families carried baskets packed with bread, fruit, and juice.

    Children skipped and shouted, chasing balls and flying paper kites. In the afternoon, people played games. Some tapped paddles and bounced plastic balls. Others kicked soccer balls back and forth, laughing loudly with every point scored.

    As the day went on, friends gathered for friendly competition. Some threw footballs through the warm air, while others tossed frisbees across the open grass, cheering with every perfect catch. At sunset, the park grew quiet again. People packed up their bags and said goodbye. The golden sky made the grass glow, and soft breezes moved through the leaves.

    Today, parks are still places where people gather to play, to talk, and to breathe fresh air. From simple paths to shining playgrounds, parks bring peace, play, and plenty of happy moments. Places like that remain alive with voices, faces, and feelings, promising joy again tomorrow.
    ```

    <Frame>
            <img src="https://mintcdn.com/tavus/_cI_e0wGUkj7b2SY/images/replica-training/image1.png?fit=max&auto=format&n=_cI_e0wGUkj7b2SY&q=85&s=2d681b2bb21ab3ec4cc1c40df63431fa" alt="" width="1826" height="1067" data-path="images/replica-training/image1.png" />
    </Frame>
  </Step>

  <Step title="Listening Segment (1 Minute)">
    * Transition naturally into a listening posture
    * Keep lips neutral and closed throughout
    * Maintain a steady head position
    * Avoid exaggerated expressions
    * Do not lick lips or form unusual mouth shapes
    * An occasional closed-lip smile is recommended

    <Frame>
            <img src="https://mintcdn.com/tavus/YTkRCg4w3gZlQwab/images/replica-training/image3.gif?s=672f46be303689c4260444989495834b" alt="" width="480" height="270" data-path="images/replica-training/image3.gif" />
    </Frame>
  </Step>
</Steps>

<Note>
  Replica training typically takes **4–5 hours**. You can track the training progress by:

  * Providing a `callback_url` when creating the replica via API
  * Using the <a href="/api-reference/phoenix-replica-model/get-replica" target="_blank">**Get Replica Status**</a>

    API
  * Checking the <a href="https://platform.tavus.io/" target="_blank">Developer Portal</a>
</Note>

## High-Quality Training Example

<Frame>
  <video src="https://cdn.zappy.app/4832196bc412238b186c67674cbc5618.mp4" controls={true} width="600" height="350" />
</Frame>

## Full Body Replica

To create a full body replica for conversational video, follow these guidelines:

<Frame>
  <img src="https://cdn.zappy.app/5dcc67296ac7948ce99f1385ff05909e.png" className="mx-auto" style={{ width:"50%" }} />
</Frame>

### Framing & Orientation

* The subject must be captured **from head to toe**, with no extra space above or below.
* Record in **vertical format** (portrait mode) or crop appropriately to maintain vertical framing.

### Posture & Movement

* Remain **standing still** throughout the recording.
* **Avoid hand gestures** or exaggerated body movements to maintain consistency and model quality.

### Resolution & Quality

* A **4K resolution** is recommended for best results.
* Ensure consistent lighting, with no shadows or sudden changes in exposure.


Built with [Mintlify](https://mintlify.com).