Lipsync

The Lipsync service allows you to synchronize audio with existing videos. This service is specifically designed to:

  • Create videos where the speaker’s mouth movements match the provided audio
  • Generate personalized videos with custom audio tracks
  • Enable precise audio-video synchronization for professional results

How It Works

  1. Submit a Lipsync Request

    • Provide the video URL via the original_video_url parameter (must be publicly accessible)
    • Include the audio URL via the source_audio_url parameter (must be publicly accessible)
  2. Processing

    • We analyze the video and audio content
    • We synchronize the speaker’s mouth movements with the provided audio
    • We generate a new video with the synchronized audio
  3. Completion

    • Access your lipsync video through our API
    • Download the final video using the provided video_url
    • Receive a webhook notification when processing is complete (if callback_url was provided)

Some Features Include

  • High Accuracy: Advanced AI for precise mouth movement synchronization
  • Async Processing: Webhook notifications keep you updated on progress
  • Simple Integration: RESTful API makes implementation straightforward

Example Request

{
  "original_video_url": "https://example.com/video.mp4",
  "source_audio_url": "https://example.com/audio.mp3",
  "callback_url": "https://your-callback-url.com",
  "lipsync_name": "My Lipsync Video"
}

Example Response

{
  "lipsync_id": "l0108f2d24k2a",
  "status": "started",
  "callback_url": "https://your-callback-url.com",
  "lipsync_name": "My Lipsync Video"
}

Getting Started

  1. Ensure your video and audio meet these requirements:

    • Clear video quality with visible mouth movements
    • High-quality audio
    • Publicly accessible URLs (e.g., S3 presigned URLs)
  2. Make your first lipsync request using our API Reference:

  1. Monitor the status through webhooks or by checking the lipsync status

  2. Once complete, download your synchronized video using the provided video_url

Tips and Restrictions

Lipsync is currently supported as follows:

  • The source video and audio must be publicly accessible
  • Source video must be in .mp4 format
  • Source audio must be in .mp3 or .wav format
  • Source video and audio must be 5 minutes or less in duration

To get the best results, we recommend the following:

  • The source video should clearly show the speaker’s mouth.
  • Use non-cartoon characters. The speaker should be facing the camera so that their face remains visible throughout the entire video (“talking head” style).
  • Ensure good lighting conditions in the original video.
  • The audio should be clear and well-recorded.
  • There should be no background noise.
  • Use single-speaker audio (avoid overlapping voices).
  • The speech should be natural—avoid singing or whispering.

Support

Need help or have questions? Our developer support team is here to assist you.

This documentation will help you effectively integrate Tavus’s lipsync service into your applications. Let’s start creating synchronized videos!