PDF/A Validation for Developers

How to Programmatically Validate PDF/A Compliance: 4 Methods Compared

Validate PDF/A files programmatically with code examples in C#, Python, Java, and Node.js. Compare veraPDF, Adobe Preflight, and ConvertAPI PDF/A validators.

Tomas, CEO

If you're building a document pipeline that produces or accepts PDF/A files, for legal e-filing, electronic invoicing, government submissions, or long-term archives, you need a way to verify compliance programmatically. A .pdf extension tells you nothing. Even the file's own metadata can lie about its conformance level, because PDF/A metadata can be added (or wrong) without the underlying file actually meeting the ISO 19005 rules.

For developers, the question isn't whether to validate, it's how to integrate validation reliably into a pipeline without dragging a JVM into a serverless function or pinning your build to a $10K/year on-premise license.

This guide covers everything a developer needs to validate PDF/A files in production: what the ISO 19005 standard actually checks, the differences between PDF/A-1, PDF/A-2, and PDF/A-3, the A/B/U conformance levels, and four practical methods to validate PDF/A compliance, with working code examples in C#, Python, Java, and Node.js, and a side-by-side comparison of veraPDF, Adobe Preflight, ConvertAPI, and 3-Heights.


What Is PDF/A and Why Validation Matters

PDF/A is a restricted subset of the regular PDF specification. It removes features that make long-term preservation unreliable, things that depend on external resources, runtime computation, or software that might not exist in 50 years.

A compliant PDF/A file must:

  • Embed all fonts used in the document (no font references).
  • Include all color information (ICC profiles for device-dependent color spaces).
  • Avoid encryption, archives must be readable without keys.
  • Avoid JavaScript, audio, video, and executable content.
  • Use only PDF features defined in the ISO specification for that PDF/A version.
  • Include proper XMP metadata declaring the conformance level.
  • Avoid external references (linked files, external streams).

Validation is the process of checking a PDF against the hundreds of rules defined in ISO 19005. Tools that claim to produce PDF/A don't always get it right, especially when converting from Word, HTML, or images. Real-world validation failures are common:

  • A "PDF/A-1b" file from an older converter that actually violates the spec because it embeds a font subset incorrectly.
  • A scanned archive where the OCR layer references a non-embedded font.
  • A PDF/A-2b file submitted to a system that only accepts PDF/A-1a.
  • A file that passes the visual checks but has invalid XMP metadata.

If you're building a document pipeline that produces archival PDFs, or receives them from third parties, you need a reliable way to validate PDF/A compliance before accepting the file as valid.

PDF/A validation flow showing how a validator parses a PDF and checks ISO 19005 rules including font embedding, color spaces, encryption, transparency, and XMP metadata

How a PDF/A validator processes a file: parse structure, run hundreds of ISO 19005 rule checks, produce a structured report.


PDF/A Versions and Conformance Levels Explained

Before validating, you need to know what you're validating against. This is where most developers get confused.

The three PDF/A versions

PDF/A-1 (ISO 19005-1, published 2005): based on PDF 1.4. The strictest and most widely required standard. Used by most government and legal archives. No transparency, no layers, no JPEG 2000.

PDF/A-2 (ISO 19005-2, published 2011): based on PDF 1.7. Adds support for transparency, layers, JPEG 2000 compression, and PDF/A file attachments (only other PDF/A files can be attached). Most modern document systems accept PDF/A-2.

PDF/A-3 (ISO 19005-3, published 2012): same as PDF/A-2, but allows any file type as an embedded attachment. Commonly used for electronic invoicing (ZUGFeRD, Factur-X) where the PDF contains a human-readable invoice plus an embedded XML data file.

The conformance levels

Each version has multiple conformance levels:

  • Level B (Basic): ensures reliable visual reproduction of the document.
  • Level A (Accessible): adds structural/semantic requirements: tagged content, Unicode mapping, logical reading order. Required for accessibility compliance.
  • Level U (Unicode): only exists for PDF/A-2 and PDF/A-3. Between B and A: requires Unicode mapping for all text, but no structural tagging.

So the full list of conformance identifiers is: PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-2u, PDF/A-3a, PDF/A-3b, PDF/A-3u.

PDF/A version and conformance level matrix showing PDF/A-1, PDF/A-2, PDF/A-3 across Basic, Unicode, and Accessible conformance levels with use cases for each flavor

The eight valid PDF/A flavors mapped across version (PDF/A-1, PDF/A-2, PDF/A-3) and conformance level (B, U, A). Note: PDF/A-1 doesn't define a Unicode level.

Which level do you actually need?

Use case Typical requirement
Legal document archiving PDF/A-1b or PDF/A-2b
Government submissions (e.g., US courts, EU public procurement) PDF/A-1a or PDF/A-2a
Long-term corporate archive PDF/A-2b or PDF/A-3b
Accessibility compliance (Section 508, EN 301 549) PDF/A-1a or PDF/A-2a
Electronic invoicing (ZUGFeRD, Factur-X, FatturaPA) PDF/A-3b
Medical records (HIPAA-adjacent archives) PDF/A-2b or PDF/A-2u

If you're not sure which level the receiving system accepts, check their submission guidelines. When in doubt, PDF/A-2b is the safest general-purpose choice, it has the broadest tool support and accommodates modern PDF features.

Decision tree for choosing a PDF/A flavor: PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, or PDF/A-3b based on use case requirements

Decision tree for picking the right PDF/A flavor based on attachments, legacy system requirements, and accessibility needs.


Method 1: veraPDF (Open Source, Industry Reference)

veraPDF is the open-source PDF/A validator developed with support from the PDF Association. It's the closest thing the industry has to a reference implementation and is widely used by archival institutions.

When to use it

Use veraPDF if you need a free, auditable validator for internal use or if you're running one-off validations from the command line. Archives and research libraries often standardize on it.

Installation and usage

Download the installer or the CLI from the veraPDF releases page. The CLI version works on Windows, macOS, and Linux.

# Validate a single file (auto-detects the flavor from metadata)
verapdf --format text document.pdf
 
# Force validation against a specific flavor
verapdf --flavour 2b document.pdf
 
# Batch validate a directory and output machine-readable MRR (Machine-Readable Report)
verapdf --recurse --format mrr --output ./reports ./pdf-archive

Calling veraPDF from code

veraPDF is a Java library, so you can call it directly from JVM languages. From other languages, the most practical approach is invoking the CLI as a subprocess and parsing the XML/JSON output.

import subprocess
import json
 
def validate_with_verapdf(file_path, flavour='2b'):
    result = subprocess.run(
        ['verapdf', '--flavour', flavour, '--format', 'json', file_path],
        capture_output=True, text=True
    )
    report = json.loads(result.stdout)
    return report['report']['jobs'][0]['validationResult']['compliant']

Limitations you'll hit in production

  • Java runtime dependency: requires JRE 8+ installed and on the path. This is a non-starter for many serverless environments (AWS Lambda, Azure Functions) without building custom runtime layers.
  • Startup cost: the JVM takes several hundred milliseconds to start per invocation. For high-throughput validation, you'll need to run veraPDF as a long-running service, which adds operational complexity.
  • No hosted option: you manage the infrastructure, updates, and scaling yourself.
  • CLI parsing fragility: subprocess output can change between versions; JSON schema isn't fully stable across major releases.
  • Limited integration surface: no native SDKs for Python, Node.js, .NET, PHP, Ruby, or Go.

Method 2: Adobe Acrobat Pro Preflight

Adobe Acrobat Pro includes a feature called Preflight that can validate PDF/A compliance against all major flavors.

When to use it

Preflight is ideal for manual validation by document professionals who already use Acrobat, legal assistants checking court submissions, archivists reviewing individual deposits, or designers verifying client deliverables before handoff.

How it works

Open the PDF in Acrobat Pro, then: Tools → Print Production → Preflight → PDF/A Compliance, pick the target profile (for example "Verify compliance with PDF/A-2b"), and click Analyze. Preflight produces a detailed report listing every violation with links to the offending objects in the PDF.

Limitations you'll hit in production

  • Manual only, not API-based: there is no developer-friendly API for programmatic validation. Acrobat's JavaScript SDK is aging and not designed for server automation.
  • Requires a paid Acrobat Pro license (~$240/year per user).
  • Desktop only: not suitable for server-side or automated pipelines.
  • No batch capabilities at scale: Action Wizard supports small batches but breaks down for thousands of files.

For any automated workflow, Preflight is the wrong tool. It's great for spot-checking, wrong for pipelines.


Method 3: ConvertAPI PDF/A Validation (Cloud API, Recommended for Developers)

ConvertAPI's PDF/A validation endpoint provides PDF/A compliance checking via a simple REST API. It supports all PDF/A flavors and returns structured validation results you can parse directly in your application.

Why use a hosted validation API

For most developers, the friction of running veraPDF in production (JVM dependency, cold-start cost, no SDKs) outweighs the cost savings of self-hosting. A hosted API solves all of this with a single HTTP call and official SDKs in every major language.

Architecture diagram showing ConvertAPI PDF/A validation API integration with a developer's application via SDK call and JSON response

How the ConvertAPI PDF/A validation API plugs into a typical application: SDK call from your code, validation in the cloud, structured JSON response back.

Installation (C#)

Install-Package ConvertApi

Basic validation example (C#)

using ConvertApiDotNet;
 
var convertApi = new ConvertApi("YOUR_API_TOKEN");
 
var result = await convertApi.ConvertAsync("pdfa", "validate",
    new ConvertApiFileParam("document.pdf")
);
 
// Read the JSON validation report
await result.SaveFilesAsync("./validation-report.json");

Validate against a specific PDF/A flavor (C#)

var result = await convertApi.ConvertAsync("pdfa", "validate",
    new ConvertApiFileParam("document.pdf"),
    new ConvertApiParam("ExpectedConformance", "pdfA2a")   // 1a, 1b, 2a, 2b, 2u, 3a, 3b, 3u
);
 
var report = await result.ResponseJson();
bool isCompliant = report.IsValid;

Python example

import convertapi
 
convertapi.api_credentials = 'YOUR_API_TOKEN'
 
result = convertapi.convert('validate',
    { 'File': 'document.pdf', 'ExpectedConformance': 'pdfA2a' },
    from_format='pdfa'
)
 
result.save_files('./')

Node.js example

const ConvertAPI = require('convertapi');
const convertapi = new ConvertAPI('YOUR_API_TOKEN');
 
const result = await convertapi.convert('validate',
  { File: 'document.pdf', ExpectedConformance: 'pdfA2a' },
  'pdfa'
);
 
await result.saveFiles('./');

Java example

ConvertApi convertApi = new ConvertApi("YOUR_API_TOKEN");
 
ConversionResult result = convertApi.convert("pdfa", "validate",
    Arrays.asList(
        Param.of("File", new File("document.pdf")),
        Param.of("ExpectedConformance", "pdfA2a")
    )
).get();
 
result.saveFiles(Paths.get("./"));

Validate from a URL (no upload needed)

If your PDF is already stored in S3, Azure Blob, Google Cloud Storage, or any publicly addressable location, skip the upload step entirely:

var result = await convertApi.ConvertAsync("pdfa", "validate",
    new ConvertApiFileParam(new Uri("https://example.com/archive/document.pdf")),
    new ConvertApiParam("ExpectedConformance", "pdfA2a")
);

Batch validation (C#)

Validate hundreds of files concurrently, useful for auditing an existing archive:

var files = Directory.GetFiles("./archive", "*.pdf");
var tasks = files.Select(f => convertApi.ConvertAsync("pdfa", "validate",
    new ConvertApiFileParam(f),
    new ConvertApiParam("ExpectedConformance", "pdfA2a")
));
 
var results = await Task.WhenAll(tasks);

ASP.NET Core integration

Drop PDF/A validation into any web API:

[HttpPost("validate-pdfa")]
public async Task<IActionResult> ValidatePdfA(IFormFile file, string conformance = "pdfA2a")
{
    if (file == null || file.Length == 0)
        return BadRequest("No file uploaded.");
 
    await using var stream = file.OpenReadStream();
    var result = await _convertApi.ConvertAsync("pdfa", "validate",
        new ConvertApiFileParam(file.FileName, stream),
        new ConvertApiParam("ExpectedConformance", conformance)
    );
 
    var report = await result.ResponseJson();
    return Ok(new
    {
        isCompliant = report.IsValid,
        violations = report.Errors,
        conformance
    });
}

What the validation report contains

A typical ConvertAPI validation response includes:

  • IsValid: boolean, whether the file passes validation.
  • PdfaFlavor: the flavor the file was validated against.
  • DetectedFlavor: the flavor declared in the file's XMP metadata (may differ from the validated flavor).
  • Errors: array of rule violations, each with the ISO rule ID, a human-readable description, and (where possible) a reference to the offending object in the PDF.
  • Warnings: non-fatal issues worth reviewing.

When to use this

ConvertAPI PDF/A validation is the best fit for web applications, SaaS platforms, document pipelines, serverless functions, and any automated workflow where you need reliable validation without managing JVM-based infrastructure.

Things to consider

  • Requires an internet connection for the API call.
  • Files are transmitted to ConvertAPI servers for processing. Files are automatically deleted after validation, see the security documentation for details on encryption and compliance (SOC 2, GDPR, HIPAA-ready deployments).
  • The free tier includes enough conversion seconds for testing and low-volume projects. Check pricing for high-volume use.

Method 4: 3-Heights PDF Validator (Commercial, On-Premise)

3-Heights PDF Validator from PDF Tools AG is a commercial, on-premise library used heavily in the European archival and regulatory sector.

When to use it

Ideal for regulated enterprise environments where data cannot leave your network under any circumstances, budget isn't a constraint, and you need to pass formal procurement requirements that specify "on-premise only."

Usage (C# example)

using PdfTools.PdfValidator;
 
using var validator = new Validator();
using var stream = File.OpenRead("document.pdf");
 
var report = validator.Analyse(stream, Conformance.PdfA2b);
 
Console.WriteLine(report.IsConforming
    ? "Valid PDF/A-2b"
    : $"Invalid: {report.Messages.Count} issues");

Limitations

  • Expensive licensing: typical commercial licenses start in the five figures per year. OEM redistribution licenses are separately negotiated.
  • Heavy deployment footprint: native dependencies, platform-specific binaries.
  • No hosted option, you manage updates and infrastructure.
  • Primarily .NET and Java: limited language support compared to cloud APIs.

Side-by-Side Comparison

Comparison of four PDF/A validation methods: veraPDF, Adobe Preflight, ConvertAPI, and 3-Heights showing cost, setup time, integration options, and best use cases

The four PDF/A validation methods compared at a glance: open source, desktop, cloud API, and on-premise commercial.

Feature veraPDF Adobe Preflight ConvertAPI 3-Heights
Cost Free ~$240/user/year Pay-per-use, free tier $10,000+/year
Automation-friendly ⚠️ CLI only ❌ No ✅ REST API ✅ Native SDK
PDF/A-1 / 2 / 3 support ✅ All flavors ✅ All flavors ✅ All flavors ✅ All flavors
Native SDKs ❌ Java only ❌ None ✅ C#, Java, Python, Node, PHP, Ruby, Go ✅ C#, Java, C++
Runs on serverless ⚠️ With custom layers ❌ No ✅ Yes ⚠️ Complex
On-premise option ✅ Yes ✅ Yes (desktop only) ❌ Cloud only ✅ Yes
Setup time ~1 hour ~10 minutes <10 minutes Days to weeks
Best for Archives, research Manual review Web apps, APIs, SaaS Enterprise regulated

Common PDF/A Validation Errors and How to Fix Them

If you're generating PDF/A files and failing validation, these are the most common culprits:

"A font is not embedded"

The most common PDF/A violation by far. PDF/A requires every font used in the document to be fully embedded (or properly subset-embedded). The fix depends on how the PDF was created, embed fonts at generation time, or re-process the PDF through a PDF/A-aware converter.

"Device-dependent color space used without output intent"

PDF/A requires color management information. Every device-dependent color space (DeviceRGB, DeviceCMYK) must have an output intent specifying an ICC profile. If you're generating PDFs from HTML or Word, configure your converter to include an output intent like sRGB IEC61966-2.1.

"Encryption is not allowed"

PDF/A files cannot be encrypted. Remove any password protection or DRM before validating. If you need both archival compliance and access control, handle access at the storage layer (signed URLs, IAM policies), not in the PDF itself.

"Transparency is not allowed" (PDF/A-1 only)

Transparency effects, drop shadows, partial opacity, blend modes, are forbidden in PDF/A-1. Either upgrade to PDF/A-2 (which allows transparency), or flatten transparency at generation time.

"XMP metadata does not match document information dictionary"

PDF/A requires that XMP metadata and the legacy Info dictionary stay in sync. If your tool updates one without the other, validation fails. Use a PDF/A-aware library for any post-processing.

"JavaScript is not allowed"

All JavaScript must be stripped. This includes form-level scripts, document-level scripts, and field formatting scripts. If you need interactive forms in an archive, use PDF/A-2 with static form fields.

"Invalid or missing output intent"

Even if all color spaces are device-independent, PDF/A requires a declared output intent. Add a standard output intent like sRGB during PDF generation.


Which Method Should You Choose?

Choose veraPDF if you need a free, auditable validator, are comfortable managing a JVM-based tool, and primarily run batch validations rather than real-time checks.

Choose Adobe Preflight if you do occasional manual validation as part of a document review workflow and already have Acrobat Pro.

Choose ConvertAPI PDF/A validation if you're building an application, SaaS product, or document pipeline that needs reliable validation with minimal operational overhead. Official SDKs in 8+ languages, no JVM dependency, works in serverless environments, and you can be running in under 10 minutes.

Choose 3-Heights if you have strict on-premise requirements, enterprise budget, and formal regulatory constraints that specifically require an on-premise commercial validator.


Get Started with ConvertAPI PDF/A Validation

Integrate PDF/A validation into your application with a few lines of code. Official SDKs available for C#, Java, Python, Node.js, PHP, Ruby, and Go, no JVM dependency, no infrastructure to manage, and works in any serverless environment.

👉 View the PDF/A Validation API documentation

The free tier includes enough conversion seconds to validate thousands of files per month, more than enough to prototype and run small production workloads. For higher volume, see pricing.


Frequently Asked Questions

How do I check if a PDF is PDF/A compliant?

You need a dedicated PDF/A validator. The file extension .pdf alone doesn't tell you anything about PDF/A compliance, and even the XMP metadata can lie. Use veraPDF, Adobe Preflight, or a validation API like ConvertAPI to check against the ISO 19005 rules.

What's the difference between PDF/A-1, PDF/A-2, and PDF/A-3?

PDF/A-1 is the strictest (based on PDF 1.4, no transparency, no layers). PDF/A-2 adds transparency, layers, JPEG 2000, and PDF file attachments (PDF-only). PDF/A-3 is identical to PDF/A-2 but allows any file type as an attachment, commonly used for electronic invoicing formats like ZUGFeRD and Factur-X.

What's the difference between PDF/A-2a, PDF/A-2b, and PDF/A-2u?

The letter indicates the conformance level. B (Basic) ensures visual reproduction. U (Unicode) adds Unicode mapping for all text. A (Accessible) adds structural tagging and semantic information required for accessibility compliance.

Can I validate PDF/A files programmatically?

Yes. Use a validation API like ConvertAPI (REST, with SDKs for C#, Java, Python, Node.js, PHP, Ruby, and Go), or invoke veraPDF as a subprocess from your application. For JVM applications, veraPDF can be used as a library directly.

Can a PDF be PDF/A compliant but fail validation?

A PDF's metadata can claim PDF/A compliance while the file actually violates the ISO rules. This happens with older converters, incomplete post-processing, or files that had PDF/A metadata added after modification. Always validate, don't trust metadata.

Is there a free PDF/A validation API?

Yes. ConvertAPI offers a free tier on its PDF/A validation API that's enough to validate thousands of files per month, suitable for prototyping, testing, and small production workloads. For self-hosted, free, open-source validation, veraPDF can be invoked as a CLI subprocess or used as a Java library.

What PDF/A level should I use for long-term archiving?

For most new archives, PDF/A-2b is the sweet spot, it supports modern PDF features (transparency, JPEG 2000) while having broad tool support. Use PDF/A-1b if you're submitting to older systems that haven't updated to the 2011 standard. Use PDF/A-2a if accessibility compliance is required.

Can I convert a PDF to PDF/A and then validate it?

Yes. A common pipeline is to convert PDF to PDF/A using a PDF/A-aware converter, then validate the output to confirm compliance before storing it in the archive. ConvertAPI supports both operations in a single workflow.

Does PDF/A validation check for accessibility?

Only at the A conformance level (PDF/A-1a, PDF/A-2a, PDF/A-3a). Levels B and U do not require tagged structure or reading order. If you specifically need accessibility compliance (Section 508, WCAG, EN 301 549), validate against an A-level flavor.


Last updated: April 2026. All code examples tested with the latest SDK versions.


Related converters

Ready to Streamline Your File Conversions?