Recrawl Document
Trigger a recrawl of a website document to fetch fresh content.
Documentation Index
Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Path Parameters
Unique id of the crawl-backed website document to refresh. Use when the source site changed, you want to refresh content on a schedule, or retry after crawl or processing errors.
The document must be in ready or error (otherwise 409). It must have been created with a crawl configuration unless you supply crawl in the request body for this call.
The same document cannot be recrawled more than once within each 1-hour cooldown (429 if invoked too soon).
Body
Optional body. Omit entirely to reuse the crawl depth / max_pages stored from document creation, or include crawl to override those values for this run only.
After 202, status is typically recrawling until processing finishes. If you set callback_url when creating the document, webhooks report progress until the document returns to ready or error.
Poll Get Document for current status, crawl_count, and last_crawled_at.
Account limits: at most 5 concurrent crawls per user and at most 100 crawl-backed documents per user.
Optional depth and max_pages for this recrawl only; overrides stored crawl settings from document creation when provided. If omitted, the original crawl configuration is used.
What runs: the same starting URL as the original crawl, links followed within these limits, fresh page content processed, existing vectors replaced when processing completes, and crawl_count / last_crawled_at updated (see the 202 payload and Get Document while status is recrawling).
Response
Recrawl initiated successfully
Unique identifier for the document
"d8-5c71baca86fc"
Name of the document
"Company Website"
URL of the document
"https://example.com/"
After a successful recrawl request, typically recrawling until processing completes, then ready or error. Other values: started, processing.
started, processing, ready, error, recrawling "recrawling"
Processing progress as a percentage (0-100). Null when processing has not started or is complete.
null
Error code indicating why processing failed. Only present when status is error. Possible values include: file_download_failed, file_format_unsupported, file_size_too_large, file_empty, invalid_file_url, document_processing_failed, website_processing_failed, chunking_failed, embedding_failed, vector_store_failed, contact_support.
ISO 8601 timestamp of when the document was created
"2024-01-01T12:00:00Z"
ISO 8601 timestamp of when the document was last updated
"2024-01-15T10:30:00Z"
If set on Create Document, Tavus POSTs status updates here while this recrawl runs through completion.
"https://your-server.com/webhook"
Array of document tags
["website", "company"]The crawl configuration being used for the recrawl
List of URLs from the previous crawl (will be updated when recrawl completes)
[
"https://docs.example.com/",
"https://docs.example.com/getting-started"
]ISO 8601 timestamp of the previous crawl
"2024-01-01T12:05:00Z"
Number of times the document has been crawled (will increment when recrawl completes)
1

