Compress PDF using Python

Compress and reduce a PDF file size by up to 90%

PDF

The Compress PDF Python library by ConvertAPI is a tool that allows for the compression and reduction of PDF file sizes by up to 90%. This Python library optimizes PDF quality by compressing text, graphics, images, subsetting fonts, and optimizing document structure. It offers various parameters for customization, including compression presets, color and grayscale image compression, and options to remove elements like bookmarks, annotations, forms, and embedded files from the PDF. The Python library also provides options for optimizing the PDF for web viewing and preserving the PDF/A standard.

Try for FREE

ConvertAPI Python library install

ConvertAPI provides a Python library that allows you to perform a Compress PDF conversion with just a few lines of code. Compress PDF documents using Python SDK with no effort at all!

Install with pip:
pip install --upgrade convertapi

Authenticate your Python library

You can obtain your secret key by signing up for a free account. Once you sign up, you'll receive 250 free conversions instantly! Grab your authentication secret from the account dashboard, and authenticate the ConvertAPI Python library like this:

# get your secret key here: https://www.convertapi.com/a/auth
import convertapi
convertapi.api_secret = 'your-api-secret'

Compress PDF using Python in no time!

Once you have your authentication in place, simply copy-paste this pdf to compress conversion code snippet into your Python project:

// Code snippet is using the ConvertAPI JavaScript Client: https://github.com/ConvertAPI/convertapi-js

// Code snippet is using the ConvertAPI Node.js Client: https://github.com/ConvertAPI/convertapi-nodejs

// Code snippet is using the ConvertAPI PHP Client: https://github.com/ConvertAPI/convertapi-php

// Code snippet is using the ConvertAPI Java Client: https://github.com/ConvertAPI/convertapi-java

// Code snippet is using the ConvertAPI C# Client: https://github.com/ConvertAPI/convertapi-dotnet

# Code snippet is using the ConvertAPI Ruby Client: https://github.com/ConvertAPI/convertapi-ruby

# Code snippet is using the ConvertAPI Python Client: https://github.com/ConvertAPI/convertapi-python

// Code snippet is using the ConvertAPI Go Client: https://github.com/ConvertAPI/convertapi-go

REM Code snippet is using the command line utility program: https://github.com/ConvertAPI/convertapi-cli

<!-- For conversions with the multiple file result please refer to this example: https://repl.it/@ConvertAPI/HTML-Form-with-multiple-file-result -->

Upload the file and see how it works

You can set up the advanced conversion parameters and test the conversion result online using our interactive demo tool. It will auto-generate the code snippet for you!


A detailed guide to PDF compression using Python

ConvertAPI provides a wide variety of converters suite. One of our popular converters is PDF Compression API. It can reduce a document's size by up to 99% while maintaining the same visual clarity. You can use Python programming language to compress PDF documents easily. We created an SDK library for Python to avoid any explicit HTTP calls - we handle it for you. In this tutorial, we will go through the steps needed to use Python for PDF compression.

PDF Document compression algorithm

Our PDF Compression API applies multiple techniques with configurable options to reduce the size of the PDF. These include PDF structure optimization, linearizing the document, and subsetting embedded fonts, so only the used characters are included in the PDF assets. It will also allow you to remove multiple objects from the final PDF, like alternative images, unused fonts, duplicate elements, annotations, bookmarks, etc. The PDF compressor can preserve the PDF/A format and handle the compression for password-protected documents using Python. It is achievable easier than it sounds using our ConvertAPI library for Python programming language.

How to use Python to compress a PDF document?

Compressing PDF documents using Python is super simple. Follow these steps to reduce the document's size programatically:

  1. Install ConvertAPI library
  2. Get your API secret key
  3. Set up the conversion parameters
  4. Copy-paste the code into your project

Install ConvertAPI library for Python

The first thing you want to do is to install our Python library into your project. Here you have two options. You can install it using pip:

pip install --upgrade convertapi

Or you can use our library's source code from GitHub by using:

python setup.py install

Sign up for a free account

Secondly, please sign up for a free account on the ConvertAPI website to retrieve your API secret key.

image

Set up PDF compression parameters

Once you have your library installed and found your API secret, using the Python library for PDF compression is super simple. You can set up all conversion parameters and test the compression result using our PDF Compression API interactive demo page.

Get your auto-generated code snippet

As soon as you are happy with the conversion result, please find an auto-generated code snippet at the bottom of the page. All parameters in the code snippet are generated dynamically based on your choices, so there is no more coding involved - copy-paste the code snippet into your project, and you are good to go!

Code example

An extended example of a PDF compression in Python programming language might look something like this:

convertapi.api_secret = 'your-api-secret'
convertapi.convert('compress', {
    'File': '/path/to/large.pdf',
    'ColorImageCompression': 'zip',
    'ColorImageQuality': '70',
    'ColorImageDownsample': 'true',
    'ColorImageThresholdDpi': '150',
    'ColorImageResampleDpi': '100',
    'RemoveBookmarks': 'true',
    'RemoveAnnotations': 'true',
    'RemoveForms': 'true',
    'RemovePageLabels': 'true',
    'RemoveLayers': 'true',
    'RemoveArticleThreads': 'true',
    'RemoveNamedDestinations': 'true',
    'RemoveEmbeddedFiles': 'false',
    'RemovePieceInformation': 'false',
    'UnembedBaseFonts': 'true',
    'SubsetEmbeddedFonts': 'true',
    'CreateObjectStreams': 'false',
    'Linearize': 'true'
}, from_format = 'pdf').save_files('/path/to/dir')

Advanced Python PDF compression options

You can convert local files from your disc drive as well as remote files accessible by a public URL, or even pass a file stream to gain all performance benefits using the ConvertAPI library for Python.

Compress a local PDF file

To compress a local PDF file stored on your machine, specify the path to the file and the destination folder where you want to store your result like so:

convertapi.api_secret = 'your-api-secret'
convertapi.convert('compress', {
    'File': '/path/to/sample.pdf'
}, from_format = 'pdf').save_files('/path/to/dir')

Compress a remote PDF accessible by a URL

If you want to compress a remote file hosted on a server, please ensure it is publicly accessible by a URL. The URL's response must be a PDF file with the appropriate "application/pdf" content type set in the header.

convertapi.api_secret = 'your-api-secret'
convertapi.convert('compress', {
    'File': 'https://cdn.convertapi.com/cara/testfiles/document-large.pdf'
}, from_format = 'pdf').save_files('/path/to/dir')

Compress a file stream

For large file processing, consider using file streams. It can increase performance significantly. Here is an example of how to pass a file stream to our converters:

import convertapi
import io
import tempfile

convertapi.api_secret = 'your-api-secret'

with io.FileIO("path\\to\\file", 'r') as file_stream:
    result = convertapi.convert('pdf', { 'File': file_stream })
    saved_files = result.save_files(tempfile.gettempdir())
    print("The PDF saved to %s" % saved_files)

You can find more examples of using the alternative converters, conversion workflows, etc., in our GitHub examples repo.

Conclusion

ConvertAPI makes your Python PDF compression easy. Simply install our library and use the auto-generated code snippets from our PDF Compression API page. You can set up a fully customizable PDF compression level and improve performance with significantly reduced PDF sizes. Check out our ConvertAPI Python library on GitHub, and feel free to contribute if you have an idea of how to make it even better!

Advanced Compress PDF conversion parameters

Password String

Sets the password to open protected documents.

Presets Collection

Choose compression level from presets. If preset is selected all other compression options are ignored.

Values:   none text archive web ebook printer

ColorImageCompression Collection

Color image compression algorithm.

Values:   none jpg jpx zip

ColorImageQuality Integer

Color image compression quality. The parameter applies only to JPX and JPG compressions.

ColorImageDownsample Bool

Enable Bicubic image downsampling and decreases the number of pixels in the color image which in turn makes the file smaller.

ColorImageThresholdDpi Integer

Threshold in DPI to activate color images resampling. ColorImageDownsample property must be enabled.

ColorImageResampleDpi Integer

Color image resolution in DPI after Bicubic resampling. ColorImageDownsample property must be enabled.

GrayscaleImageCompression Collection

Grayscale image compression algorithm.

Values:   none jpg jpx zip

GrayscaleImageQuality Integer

Grayscale image compression quality. The parameter applies only to JPX and JPG compressions.

GrayscaleImageDownsample Bool

Enable Bicubic image downsampling and decreases the number of pixels in the grayscale image which in turn makes the file smaller.

GrayscaleImageThresholdDpi Integer

Threshold in DPI to activate grayscale images resampling. GrayscaleImageDownsample property must be enabled.

GrayscaleImageResampleDpi Integer

Grayscale image resolution in DPI after Bicubic resampling. GrayscaleImageDownsample property must be enabled.

MonochromeImageCompression Collection

Monochrome image compression algorithm.

Values:   none jbig2 jbig2l fax zip

MonochromeImageQuality Integer

Monochrome image compression quality. The parameter applies only to jbig2 and jbig2l compressions.

MonochromeImageDownsample Bool

Enable Bicubic image downsampling and decreases the number of pixels in the monochrome image which in turn makes the file smaller.

MonochromeImageThresholdDpi Integer

Threshold in DPI to activate Monochrome images resampling. MonochromeImageDownsample property must be enabled.

MonochromeImageResampleDpi Integer

Monochrome image resolution in DPI after Bicubic resampling. MonochromeImageDownsample property must be enabled.

RemoveBookmarks Bool

Remove bookmarks from the PDF file.

RemoveAnnotations Bool

Remove text annotations from the PDF file.

RemoveForms Bool

Remove PDF forms from the PDF file.

RemovePageLabels Bool

Remove page labels from the PDF file.

RemoveLayers Bool

Removes hidden layers and flatten visible ones.

RemoveArticleThreads Bool

Remove article threads from the PDF file.

RemoveTaggedInfo Bool

Remove tagged information from the PDF file.

RemovePageThumbnails Bool

Remove page thumbnails from the PDF file.

RemoveDuplicates Bool

Remove duplicate fonts and color profiles from the PDF file.

RemoveAlternateImages Bool

Removes alternate images and leave only the one for on-screen viewing.

RemoveNamedDestinations Bool

Remove named destinations from the PDF file.

RemoveEmbeddedFiles Bool

Remove embedded/attachments files from the PDF file.

RemovePieceInformation Bool

Remove piece information dictionaries like Adobe Illustrator or Photoshop private data.

UnembedBaseFonts Bool

Specifies whether to remove the base fonts from the PDF file.

SubsetEmbeddedFonts Bool

Output PDF should only contain font characters utilized in the original document, and any unused glyphs from all fonts in the document should be removed.

CreateObjectStreams Bool

Create object streams in the PDF file. An object stream represents a stream that contains a sequence of PDF objects. This allows a greater number of PDF objects to be compressed. Property compatible with Acrobat 6/PDF v1.5 and later.

Optimize Bool

Optimize page content streams in the PDF file.

LzwToFlate Bool

In streams that use LZW encoding, use Flate instead.

Linearize Bool

Linearize PDF file and optimize for fast Web View.

PreservePdfa Bool

Preserve the PDF/A standard in the PDF file.

Try Compress PDF for free!