PDF to Text Python Overview

Convert a textual and scanned PDF document to a plain text file, extract text from a PDF, and apply OCR on a scanned PDF document before conversion.

Instant Text Extraction

Quickly pull plain text from any PDF document with customizable settings.

Accurate Parsing

Retains text structure while removing non-textual elements.

Works with Scanned PDFs

Supports OCR for image-based and scanned PDFs.

Lightweight Output

Get clean, minimalistic TXT files ready for further processing.

Custom OCR Settings

Select OCR engine, language, and OCR mode to get the best results.

Privacy First

Your data is secured under ISO 27001, GDPR, and HIPAA compliance.

Customizable Parameters

Fine-tune your automation with these powerful conversion options

File

File Supported formats: .pdf

File to be converted. Value can be URL or file content.

Password

String

Sets the password to open protected documents.

PageRange

String Default: 1-2000

Set page range. Example 1-10 or 1,2,5.

OcrMode

Collection Default: auto

Defines how OCR is applied during conversion. Auto performs OCR only when needed. Force applies OCR to all pages. Never disables OCR entirely.

Values: auto force never

OcrLanguage

Collection Default: auto

Configure the OCR language for text recognition. If auto-detection fails, manually specify the language.

Values: auto ar ca zh da nl en fi fr de el ko it ja no pl pt ro ru sl es sv tr ua th

IncludeFormatting

Bool Default: False

Persist formatting while extracting text. Only works when RemoveHeadersFooters and RemoveFootnotes properties are disabled.

SplitPages

Bool Default: False

Split each page to different result file.

RemoveHeadersFooters

Bool Default: False

Remove headers and footers from the document.

RemoveFootnotes

Bool Default: False

Remove footnotes from the document.

RemoveTables

Bool Default: False

Remove tables from the document.

StoreFile

Bool Default: False

When the StoreFile parameter is set to True, your converted file is written to ConvertAPI’s encrypted, temporary storage and made available via a time-limited secure download URL, valid for up to 3 hours. After this period, the file is permanently deleted.

When StoreFile is set to False, conversion happens entirely in-memory. The raw file bytes are streamed back in the API response without touching disk or external storage, ensuring maximum security and zero persistence so that only you can access the content.

Step-by-Step Guide

Easy PDF to Text integration programmatically using our Python library

1. ConvertAPI Python library install

ConvertAPI provides a Python library that allows you to perform a PDF to Text conversion with just a few lines of code. Convert PDF to Text documents using Python SDK with no effort at all!

Install with pip:

pip install --upgrade convertapi

PyPI GitHub

2. Authenticate your Python library

You can obtain your API Token by signing up for a free account. Once you sign up, you'll receive 250 free conversions instantly! Grab your API Token from the account dashboard, and authenticate the ConvertAPI Python library like this:

# get your API Token here: https://www.convertapi.com/a/auth


                    import convertapi
                    

                    convertapi.api_credentials = 'api_token'

3. Convert PDF to Text using Python in no time!

Once you have your authentication in place, simply copy-paste this pdf to txt conversion code snippet into your Python project:

PDF to Text in Python

// Code snippet is using the ConvertAPI JavaScript Client: https://github.com/ConvertAPI/convertapi-library-js

// Code snippet is using the ConvertAPI Node.js Client: https://github.com/ConvertAPI/convertapi-nodejs

// Code snippet is using the ConvertAPI PHP Client: https://github.com/ConvertAPI/convertapi-php

// Code snippet is using the ConvertAPI Java Client: https://github.com/ConvertAPI/convertapi-java

// Code snippet is using the ConvertAPI C# Client: https://github.com/ConvertAPI/convertapi-dotnet

# Code snippet is using the ConvertAPI Ruby Client: https://github.com/ConvertAPI/convertapi-ruby

# Code snippet is using the ConvertAPI Python Client: https://github.com/ConvertAPI/convertapi-python

// Code snippet is using the ConvertAPI Go Client: https://github.com/ConvertAPI/convertapi-go

REM Code snippet is using the command line utility program: https://github.com/ConvertAPI/convertapi-cli

Integrate within minutes

Easy PDF to Text automation using our simple Python SDK

GitHub Repository

Explore the source code and examples on GitHub.

PyPI Package

View ConvertAPI package and versions on the Python Package Index.

Python SDK Documentation

Read more about the ConvertAPI Python SDK capabilities.

Try the PDF to Text conversion online

Try it Free

Compatible With all Python Frameworks & Tools

Businesses trust us

Highest rated File Conversion API on major B2B software listing platforms: Capterra, G2, and Trustpilot.

"ConvertAPI has been a game-changer for our document automation workflows. Their conversion accuracy and API reliability are unmatched in the industry for over 7 years."

"ConvertAPI is a reliable, cost-effective solution with a proven track record of stability. It has grown significantly in maturity, adopting enterprise-grade practices over the years."

"We've integrated ConvertAPI across our entire document processing platform. The performance is exceptional and the support team is always responsive. Highly recommended!"

Enterprise-Grade Security

We ensure that all document processing is handled securely in the cloud, adhering to industry-leading standards like ISO 27001, GDPR, and HIPAA. To enhance security even further, we can ensure that no files or data are stored on our servers and never leave your country.

Learn more about security

Ready to Streamline Your File Conversions?

Get Started for Free Contact Us

PDF Print Production

PDF Redact

PDF Accessibility

PDF Templating

API Tools

SDK Libraries

No-Code Integrations

Developer Hub

Security & Compliance

Blog

Affiliates

Support

PDF to Text Python

Extract text from PDF documents to plain text files with password support, optional OCR, page range, and formatting control.

PDF to Text in Python

Instant Text Extraction

Accurate Parsing

Works with Scanned PDFs

Lightweight Output

Custom OCR Settings

Privacy First

1. ConvertAPI Python library install

2. Authenticate your Python library

3. Convert PDF to Text using Python in no time!

PDF to Text in Python

GitHub Repository

PyPI Package

Python SDK Documentation

Try the PDF to Text conversion online

Compatible With all Python Frameworks & Tools

Related converters

Businesses trust us

Enterprise-Grade Security

Ready to Streamline Your File Conversions?