> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tavus.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Document

> Upload documents to your knowledge base for personas to reference during conversations.

<Info>
  For AI agents, use `https://docs.tavus.io/openapi.yaml` for the full HTTP API contract.
</Info>


## OpenAPI

````yaml post /v2/documents
openapi: 3.0.3
info:
  title: Tavus Developer API Collection
  version: 1.0.0
  contact: {}
servers:
  - url: https://tavusapi.com
security:
  - apiKey: []
tags:
  - name: Videos
  - name: Replicas
  - name: Voices
  - name: Conversations
  - name: Personas
  - name: Pronunciation Dictionaries
  - name: Replacements
  - name: Transcriptions
  - name: Documents
paths:
  /v2/documents:
    post:
      tags:
        - Documents
      summary: Create Document
      description: >-
        Upload documents to your knowledge base for personas to reference during
        conversations.
      operationId: createDocument
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                document_url:
                  type: string
                  description: >
                    Direct URL to a file or a website for your [Knowledge
                    Base](/sections/conversational-video-interface/knowledge-base).
                    Submitting this URL starts processing **asynchronously**;
                    the document can be used in conversations once processing
                    completes, which may take **a few minutes** depending on
                    file size.


                    <Note>

                    For now, our Knowledge Base only supports documents written
                    in English and works best for conversations in English. We
                    will be expanding our Knowledge Base language support soon.

                    </Note>


                    Maximum file size **50MB**. Supported file formats: `.pdf`,
                    `.txt`, `.docx`, `.doc`, `.png`, `.jpg`, `.pptx`, `.csv`,
                    and `.xlsx`. Website URLs are supported: a snapshot of the
                    page is processed into document content; use the `crawl`
                    object for multi-page crawling from a starting URL.
                  example: https://docs.example.com/
                document_name:
                  type: string
                  description: >-
                    Optional name for the document. If not provided, a default
                    name will be generated.
                  example: Example Docs
                callback_url:
                  type: string
                  description: >-
                    Optional URL that receives status updates while the document
                    processes asynchronously (e.g. started, processing, ready,
                    error).
                  example: https://your-server.com/webhook
                tags:
                  type: array
                  description: >
                    Optional tags to categorize the document for management and
                    for use with document-based access in conversations. After
                    the document is ready, attach it via `document_ids` on
                    [Create Persona](/api-reference/personas/create-persona) or
                    [Create
                    Conversation](/api-reference/conversations/create-conversation).
                  items:
                    type: string
                  example:
                    - docs
                    - website
                crawl:
                  type: object
                  description: >
                    Optional configuration for website crawling. When provided
                    with a website URL, the system follows links from the
                    starting URL and processes multiple pages into a single
                    document. Without this parameter, only the single page at
                    the URL is scraped.


                    **Rate limits:** at most **100** crawl documents per user,
                    at most **5** concurrent crawls at any time, and a **1-hour
                    cooldown** between recrawls of the same document.


                    To fetch fresh content after a crawled document exists, use
                    [Recrawl
                    Document](/api-reference/documents/recrawl-document).
                  properties:
                    depth:
                      type: integer
                      description: >-
                        How many levels deep to follow links from the starting
                        URL (1-10). A depth of 1 means only pages directly
                        linked from the starting URL.
                      minimum: 1
                      maximum: 10
                      example: 2
                    max_pages:
                      type: integer
                      description: >-
                        Maximum number of pages to crawl (1-100). Processing
                        stops once this limit is reached.
                      minimum: 1
                      maximum: 100
                      example: 10
              required:
                - document_url
            examples:
              minimal:
                summary: Required fields only
                value:
                  document_url: https://docs.example.com/
              with_crawl:
                summary: Website with multi-page crawl
                value:
                  document_name: Company Knowledge Base
                  document_url: https://docs.example.com/
                  crawl:
                    depth: 2
                    max_pages: 20
                  callback_url: https://your-server.com/webhook
      responses:
        '200':
          description: Document created successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  document_id:
                    type: string
                    description: Unique identifier for the created document
                    example: d8-5c71baca86fc
                  document_name:
                    type: string
                    description: Name of the document
                    example: Example Docs
                  document_url:
                    type: string
                    description: URL of the document or website
                    example: https://docs.example.com/
                  status:
                    type: string
                    description: >-
                      Current status of the document processing. Possible
                      values: `started`, `processing`, `ready`, `error`,
                      `recrawling`.
                    enum:
                      - started
                      - processing
                      - ready
                      - error
                      - recrawling
                    example: started
                  progress:
                    type: integer
                    nullable: true
                    description: >-
                      Processing progress as a percentage (0-100). Null when
                      processing has not started or is complete.
                    example: null
                  error_message:
                    type: string
                    nullable: true
                    description: >-
                      Error code indicating why processing failed. Only present
                      when status is `error`. Possible values include:
                      `file_download_failed`, `file_format_unsupported`,
                      `file_size_too_large`, `file_empty`, `invalid_file_url`,
                      `document_processing_failed`, `website_processing_failed`,
                      `chunking_failed`, `embedding_failed`,
                      `vector_store_failed`, `contact_support`.
                  created_at:
                    type: string
                    description: ISO 8601 timestamp of when the document was created
                    example: '2024-01-01T12:00:00Z'
                  updated_at:
                    type: string
                    description: ISO 8601 timestamp of when the document was last updated
                    example: '2024-01-01T12:00:00Z'
                  callback_url:
                    type: string
                    description: URL that will receive status updates
                    example: https://your-server.com/webhook
                  tags:
                    type: array
                    description: Array of document tags
                    items:
                      type: string
                    example:
                      - docs
                      - website
                  crawl_config:
                    type: object
                    nullable: true
                    description: >-
                      The crawl configuration used for this document (only
                      present for crawled websites)
                    properties:
                      depth:
                        type: integer
                        description: Crawl depth setting
                        example: 2
                      max_pages:
                        type: integer
                        description: Maximum pages setting
                        example: 10
                  crawled_urls:
                    type: array
                    nullable: true
                    description: >-
                      List of URLs that were crawled (only present for crawled
                      websites after processing completes)
                    items:
                      type: string
                    example:
                      - https://docs.example.com/
                      - https://docs.example.com/getting-started
                      - https://docs.example.com/api
                  last_crawled_at:
                    type: string
                    nullable: true
                    description: ISO 8601 timestamp of when the document was last crawled
                    example: '2024-01-01T12:00:00Z'
                  crawl_count:
                    type: integer
                    nullable: true
                    description: Number of times the document has been crawled
                    example: 1
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: The error message
                    example: 'Invalid request: document_url is required'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                type: object
                properties:
                  message:
                    type: string
                    description: The error message
                    example: Invalid access token
        '429':
          description: Too Many Requests - Crawl rate limit exceeded
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: The error message
                    example: >-
                      Crawl document limit reached (100). Contact
                      support@tavus.io to increase your limit.
      security:
        - apiKey: []
components:
  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: x-api-key

````