Extract PDF API
Extract structured data from PDF with AI Data Extraction API
Digitising documents is only half the battle - turning those pixels into structured, actionable data is where real value is created. Manual key‑entry is slow, costly, and prone to error. ConvertAPI’s new Document Data Extraction API applies state‑of‑the‑art AI models to pull the right numbers, dates, names, and line items from your invoices, receipts, statements, forms, and more - delivering clean JSON in seconds and freeing your team to focus on higher‑value work.
The API understands both common and bespoke layouts:
Category | Fields captured |
---|---|
Auto | The system automatically detects the document type and returns the corresponding fields. |
Invoices | Invoice No., date, supplier, totals, tax, full line-item tables |
Receipts | Merchant name, date, total amount |
Contracts / agreements | Party names, effective dates, key terms |
Identification documents | Name, date of birth, ID No., expiry |
Bank statements | Transaction list, balances, account numbers |
Forms (PDF / Word) | Any labelled field: name, email, phone, etc. |
Manual | No default fields. Only fields described in CustomExtractionData will be captured. |
Choose the workflow that fits your use case:
Manual
DocumentType and harvest just the data you request via CustomExtractionData
.Tip: Whatever mode you select, you can always append extra targets with
CustomExtractionData
.
Each result arrives as an array of objects containing the field name, the extracted value, and a confidence score between 0 and 1.
[ { "FieldName": "Tax", "FieldValue": "$8.50", "Confidence": 0.9 } ]
For invoices, detailed line items are returned as nested arrays:
{ "FieldName": "LineItems", "FieldValue": [ { "Quantity": "1.00", "Description": "Web Design", "Rate": "$85.00", "Adjustment": "0.00%", "LineTotal": "$85.00" } ], "Confidence": 0.9 }
Need granular control? Specify exactly what the engine should look for and how confident it must be before returning a value. In the example below we:
DocumentType
to Invoice,CustomExtractionData
) to find the total price and return it under the key TotalResult
,Request URL:
[POST] https://v2.convertapi.com/document/to/extract
Request JSON body:
You can make a request using multipart/form-data or octet stream; however, for simplicity in this example, we will use JSON.
{ "Parameters": [ { "Name": "File", "FileValue": { "Name": "my_file.pdf", "Data": "<Base64 encoded file content>" } }, { "Name": "DocumentType", "Value": "Invoice" }, { "Name": "CustomExtractionData", "Value": "[ { \"FieldName\": \"TotalResult\", \"Extract\": \"total price\" } ]" }, { "Name": "MinimumConfidence", "Value": 0.7 } ] }
The engine returns all standard invoice fields and the value associated with total price. Each item is included only when the AI’s confidence score is at least 0.70 (70 %).
Here’s the JSON returned by the request: Download the JSON result
Sing up for free, and try the converter including the full parameter playground at: https://www.convertapi.com/a/api/document-to-extract
Request our API using any HTTP Client, or choose one of our SDK libraries:
Automate your back-office today. Start a free trial and get 250 free conversions, or purchase a plan at https://www.convertapi.com/prices.