Overview
Synchronize audio with existing videos using Tavus’s lipsync service. Easily create videos where the speaker’s mouth movements match the provided audio.
Lipsync
The Lipsync service allows you to synchronize audio with existing videos. This service is specifically designed to:
- Create videos where the speaker’s mouth movements match the provided audio
- Generate personalized videos with custom audio tracks
- Enable precise audio-video synchronization for professional results
How It Works
-
Submit a Lipsync Request
- Provide the video URL via the
original_video_url
parameter (must be publicly accessible) - Include the audio URL via the
source_audio_url
parameter (must be publicly accessible)
- Provide the video URL via the
-
Processing
- We analyze the video and audio content
- We synchronize the speaker’s mouth movements with the provided audio
- We generate a new video with the synchronized audio
-
Completion
- Access your lipsync video through our API
- Download the final video using the provided video_url
- Receive a webhook notification when processing is complete (if callback_url was provided)
Some Features Include
- High Accuracy: Advanced AI for precise mouth movement synchronization
- Async Processing: Webhook notifications keep you updated on progress
- Simple Integration: RESTful API makes implementation straightforward
Example Request
Example Response
Getting Started
-
Ensure your video and audio meet these requirements:
- Clear video quality with visible mouth movements
- High-quality audio
- Publicly accessible URLs (e.g., S3 presigned URLs)
-
Make your first lipsync request using our API Reference:
-
Monitor the status through webhooks or by checking the lipsync status
-
Once complete, download your synchronized video using the provided video_url
Tips and Restrictions
Lipsync is currently supported as follows:
- The source video and audio must be publicly accessible
- Source video must be in .mp4 format
- Source audio must be in .mp3 or .wav format
- Source video and audio must be 5 minutes or less in duration
To get the best results, we recommend the following:
- The source video should clearly show the speaker’s mouth.
- Use non-cartoon characters. The speaker should be facing the camera so that their face remains visible throughout the entire video (“talking head” style).
- Ensure good lighting conditions in the original video.
- The audio should be clear and well-recorded.
- There should be no background noise.
- Use single-speaker audio (avoid overlapping voices).
- The speech should be natural—avoid singing or whispering.
Support
Need help or have questions? Our developer support team is here to assist you.
This documentation will help you effectively integrate Tavus’s lipsync service into your applications. Let’s start creating synchronized videos!