Paulius, Developer

Extract Data from Documents Using AI

Digitising documents is only half the battle - turning those pixels into structured, actionable data is where real value is created. Manual key‑entry is slow, costly, and prone to error. ConvertAPI’s new Document Data Extraction API applies state‑of‑the‑art AI models to pull the right numbers, dates, names, and line items from your invoices, receipts, statements, forms, and more - delivering clean JSON in seconds and freeing your team to focus on higher‑value work.

Why AI‑powered extraction?

  • Speed - Turn uploads into clean JSON within seconds.
  • Accuracy - Modern vision–language models exceed human accuracy on common fields.
  • Scalability - Add new document types or volumes without hiring or training staff.
  • Consistency - Deterministic JSON output keeps downstream automations simple.

Supported document types

The API understands both common and bespoke layouts:

Category Fields captured
Auto The system automatically detects the document type and returns the corresponding fields.
Invoices Invoice No., date, supplier, totals, tax, full line-item tables
Receipts Merchant name, date, total amount
Contracts / agreements Party names, effective dates, key terms
Identification documents Name, date of birth, ID No., expiry
Bank statements Transaction list, balances, account numbers
Forms (PDF / Word) Any labelled field: name, email, phone, etc.
Manual No default fields. Only fields described in CustomExtractionData will be captured.

Three extraction modes

Choose the workflow that fits your use case:

  1. Automatic type detection – The AI infers the document class and returns all standard fields—no configuration required.
  2. Explicit type selection – Force the engine to treat the file as Invoice, Receipt, Bank‑statement, etc., ensuring only the relevant schema is returned.
  3. Custom extraction only – Skip predefined fields entirely by choosing Manual DocumentType and harvest just the data you request via CustomExtractionData.

Tip: Whatever mode you select, you can always append extra targets with CustomExtractionData.

Predictable JSON output

Each result arrives as an array of objects containing the field name, the extracted value, and a confidence score between 0 and 1.

[
    {
        "FieldName": "Tax",
        "FieldValue": "$8.50",
        "Confidence": 0.9
    }
]

For invoices, detailed line items are returned as nested arrays:

{
    "FieldName": "LineItems",
    "FieldValue": [
        {
            "Quantity": "1.00",
            "Description": "Web Design",
            "Rate": "$85.00",
            "Adjustment": "0.00%",
            "LineTotal": "$85.00"
        }
    ],
    "Confidence": 0.9
}

Fine-Tune Extraction with Powerful Parameters

Need granular control? Specify exactly what the engine should look for and how confident it must be before returning a value. In the example below we:

  • Choose the DocumentType to Invoice,
  • Add a custom query (CustomExtractionData) to find the total price and return it under the key TotalResult,
  • Require a minimum confidence of 0.70 (70 %).

Request URL:

[POST] https://v2.convertapi.com/document/to/extract

Request JSON body:

You can make a request using multipart/form-data or octet stream; however, for simplicity in this example, we will use JSON.

{
  "Parameters": [
    {
      "Name": "File",
      "FileValue": {
        "Name": "my_file.pdf",
        "Data": "<Base64 encoded file content>"
      }
    },
    {
      "Name": "DocumentType",
      "Value": "Invoice"
    },
    {
      "Name": "CustomExtractionData",
      "Value": "[ { \"FieldName\": \"TotalResult\", \"Extract\": \"total price\" } ]"
    },
    {
      "Name": "MinimumConfidence",
      "Value": 0.7
    }
  ]
}

The engine returns all standard invoice fields and the value associated with total price. Each item is included only when the AI’s confidence score is at least 0.70 (70 %).

Sample Response

Here’s the JSON returned by the request: Download the JSON result

Try It Yourself

Sing up for free, and try the converter including the full parameter playground at: https://www.convertapi.com/a/api/document-to-extract

Request our API using any HTTP Client, or choose one of our SDK libraries:

Automate your back-office today. Start a free trial and get 250 free conversions, or purchase a plan at https://www.convertapi.com/prices.


Related converters

Ready to Streamline Your File Conversions?