HTTP Request and Response Content Types - Making the Right Choice

Jonas, CTO

HTTP request and response content types are used in the Hypertext Transfer Protocol (HTTP) to specify the format of the data being sent or received. They help servers and clients understand how to interpret the exchanged information.

JSON

It's important to note that while JSON is a widely used content type in APIs, it may not be the optimal choice for file conversion services. This is due to its lack of direct support for binary data, which requires encoding file contents in base64.

POST https://v2.convertapi.com/convert/docx/to/pdf?secret=XXX
Content-Type: application/json
Content-Length: 133800

{
  "Parameters": [
    {
      "Name": "File",
      "FileValue": {
        "Name": "myfile.docx",
        "Data": "UEsDBBQABgAIAAAAIQCj77sdZQEeeeeextremelylongbase64line..."
      }
    }
  ]
}

However, this encoding process can increase data size by approximately one-third, which can negatively impact performance and bandwidth. Therefore, it's crucial to take this into consideration when deciding whether to use JSON for file conversion services.

It's worth noting that there are some limitations to using JSON as a content type for APIs. For example, in many cases, it's difficult to serialize and deserialize JSON as a stream, which means that a lot of memory is required for these processes.

However, there are other content types like form-data and octet-stream that can handle binary data more efficiently and enable seamless streaming of files. These alternatives can be a great way to optimize memory usage and enhance performance, especially when working with large amounts of data.

Octet-Stream

If you want to include generic binary data in your request or response body without requiring specific serializers or deserializers, consider using the application/octet-stream content type. This content type allows for seamless handling of binary data without additional complexities, but it does not provide any additional information about the nature of the data.

POST https://v2.convertapi.com/convert/docx/to/pdf?secret=XXX
Content-Type: application/octet-stream
Content-Length: 856031

<BINARY FILE CONTENT>

To distinguish between different file formats, you can use the "content-disposition" HTTP header field. This field specifies that the content is a file and includes the file name. It also determines whether the file will be displayed in a browser or downloaded.

The octet-stream format is universally supported by web browsers, allowing the file content from a response to be easily saved to the local file system. However, one limitation of octet-stream is that it can only accommodate a single file. While this suffices in most cases, certain conversions may involve multiple files. To address this, the form-data content type has been implemented.

Form-Data

The multipart/form-data content type enables the transfer of multiple files with minimal data overhead and processing complexity. It is not only suitable for transferring files efficiently but can also be used to transmit other conversion parameters if required.

POST https://v2.convertapi.com/convert/docx/to/pdf?secret=XXX
Content-Type: multipart/form-data; boundary=----7MA4YWxkTrZu0gW
Content-Length: 97532565

------7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="StoreFile"

true
------7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="file"; filename="myfile.docx"

<BINARY FILE CONTENT>
------7MA4YWxkTrZu0gW--

This content type enjoys widespread support from HTTP clients and libraries and is commonly used for HTML form submissions. ConvertAPI supports this content type, enabling our clients to conveniently convert their files using simple HTML forms.

File Server

Despite the limitations of JSON for file data transfer, it remains the most popular format for API data exchange, and many platforms only support JSON for integration with other systems. To address this challenge, a compromise solution called the "File server" has been introduced. The File server handles file transfers while leaving JSON conversion requests with standard length property values.

Uploading file to the file server:

POST https://v2.convertapi.com/upload?filename=myfile.docx
Content-Type: application/octet-stream
Content-Length: 1026736

<BINARY FILE CONTENT>

When a file is uploaded to the File server, it returns a unique file ID that can be used as a reference.

HTTP/1.1 200 OK
content-type: application/json
date: Tue, 30 May 2023 09:35:00 GMT

{
  "FileName": "myfile.docx",
  "FileExt": "docx",
  "FileSize": 1026736,
  "FileId": "38491q9qd63pknn7ih0kb19c959n3txq",
  "Url": "https://v2.convertapi.com/d/38491q9qd63pknn7ih0kb19c959n3txq"
}

Using file ID in the conversion request:

POST https://v2.convertapi.com/docx/to/pdf?secret=XXX
Content-Type: application/json
Content-Length: 175

{
    "Parameters": [
        {
            "Name": "File",
            "FileValue": {
                "Id": "38491q9qd63pknn7ih0kb19c959n3txq"
            }
        }
    ]
}

The File server not only accepts files as request payloads but also accepts links to files, downloading and preparing them for conversion. This approach allows clients to avoid large file uploads on their side.

Once a file is uploaded to the File server, it can be reused multiple times in different conversions without the need for reuploading. Files are stored on the File server for a maximum of three hours, ensuring that files are not retained unnecessarily after they are no longer needed for conversion. Files can be deleted at any time by using the HTTP DELETE method.

Closing Thoughts

When selecting the appropriate content type for communication with our API, consider the following guideline:

  • For one-to-one conversions (one source file to one destination file), the octet-stream content type is the optimal choice. In this case, provide the conversion parameters in the URL.
  • In other scenarios, utilizing JSON in conjunction with the File server is the recommended approach. This design decision was made while developing the ConvertAPI client libraries and our API user web interface.

In any scenario, the main goal should be to minimize the use of base64 encoding as much as possible, considering its performance and bandwidth implications.