ConvertAPI GeoDNS and load balancing

Jonas, CTO

When it comes to file conversion REST API, there are some key differences compared to regular REST APIs, particularly in terms of larger HTTP request and response sizes. These differences have significant implications for API and infrastructure design. In this article, we will delve into the challenges we encountered while developing the convertapi.com service.

Topology

To manage our infrastructure and facilitate zero downtime deployments, we employ a Kubernetes cluster. However, the traditional Kubernetes cluster setup with a single load balancer distributing the load to worker nodes was not suitable for our needs.

Given that our cluster is spread across the globe, having a single entry point would introduce considerable latency and negatively impact traffic throughput performance. Consequently, we made the architectural decision to create identical and independent worker nodes that directly handle traffic without any intermediate hops. This approach ensures a more responsive, faster, and fault-tolerant service.

Load Balancing

We implemented load balancing using GeoDNS, which resolves our service domain name, v2.convertapi.com, to the worker node geographically closest to the client. This allows the client to establish a direct connection with the hardware worker node responsible for processing their file conversion.

By bypassing any intermediary devices, we eliminate the potential for those devices to impede or slow down the conversion process or be exploited as attack vectors. We continuously monitor the worker nodes, and if any node encounters a fault, it is excluded from the domain name resolution process.

Fault Tolerance

Every architectural choice comes with its own strengths and weaknesses, and GeoDNS balancing is no exception. When a worker node unexpectedly goes offline, there is a brief period during which the client may still be using the IP address of the faulty worker node. This occurs due to the caching mechanism employed by DNS to improve performance. Within this time frame, known as the TTL (Time To Live), the client may experience timeouts if one of the IP addresses becomes faulty.

Network issues can arise not only on our side but also within the client's internal network or anywhere along the path between the client and the worker node. To mitigate this issue, clients can implement retry mechanisms for failed conversions. Incorporating request retry functionality is generally considered a good practice when working with remote API calls. By addressing the challenges associated with file conversion REST APIs and implementing efficient load balancing and fault tolerance mechanisms, we have developed a robust convertapi.com service that provides reliable and responsive file conversion capabilities to our users.

Closing Thoughts

In conclusion, the convertapi.com service demonstrates a well-thought-out approach to addressing the challenges associated with file conversion REST APIs. By leveraging a distributed Kubernetes cluster and implementing independent worker nodes, the service achieves improved responsiveness, speed, and fault tolerance. The use of GeoDNS for load balancing ensures that clients are connected to the closest worker node, optimizing performance and minimizing latency.