Understanding Latency in Distributed Systems
Causes and Mitigations for Latency in Distributed Systems
Why Latency Occurs
Latency refers to the time it takes for a request initiated by a client to travel across the network to one or more servers, be processed, and for the response to travel back to the client. It is a metric used for evaluating the responsiveness and efficiency of software systems. Several factors contribute to latency in distributed systems.
1) Geographic Distance
Data travels at a finite speed through network cables or wireless signals. The physical distance between components directly impacts this delay. For instance, a request from the US to a European server will inherently experience higher latency than one within the same city.
2) Network Congestion
When network links become overloaded with traffic, data packets may experience queuing delays at routers and switches. This congestion can significantly increase latency, especially during peak usage times.
3) Processing Delay
Servers require time to process requests. This includes parsing the request, accessing data, performing computations, and preparing the response. Complex operations or resource-intensive tasks will naturally lead to higher processing delays
4) I/O Delay
Many operations in distributed systems involve reading from or writing to storage systems (like databases or disk drives). Disk I/O operations are generally slower than in-memory operations, contributing to latency.
5) Data Handling
When data is transmitted across a network, it often needs to be converted into a transferable format (serialization) and then back into its original format upon arrival (deserialization). These processes add to the latency, particularly for large data objects or complex data structures.
How to Reduce Latency?
Various techniques can be used to reduce latency in large-scale software systems. The choice of strategy depends on the specific scenario and the primary sources of latency.
Network-Level Optimizations
1) Geo-Specific Routing
Directing user requests to the geographically closest data center reduces network propagation delay. For example, a user in London accessing a service hosted in the US and Europe should be routed to the European data center. DNS-based routing (like AWS Route 53 Geolocation Routing) or specialized traffic management tools can be used to achieve this.
2) Content Delivery Networks (CDNs)
CDNs cache static assets (images, videos, CSS) on servers distributed geographically closer to users. This reduces the need to fetch data from origin servers and minimizes network latency.
Data-Level Optimizations
1) Caching
Caching stores frequently accessed data in memory for faster retrieval, reducing the need for slower operations like database queries. This approach is effective for read-heavy workloads. For example, a frequently accessed user profile can be cached in memory to avoid hitting the database for every request.
2) Data Compression and Serialization
Reducing the size of data transmitted over the network directly reduces the time it takes to transfer. Besides that, efficient data serialization formats can also improve latency. Formats like Protocol Buffers or Apache Avro are often more compact and faster to serialize/deserialize compared to JSON or XML, especially for large data volumes.
System-Level Optimizations
1) Asynchronous Systems
In many scenarios, a user doesn't need an immediate response. Using asynchronous communication patterns, such as message queues (like Kafka, RabbitMQ, or Amazon SQS), allows the system to process requests in the background. The user can receive an acknowledgment quickly, and the actual processing can happen later without blocking the user's interaction. This is useful for tasks like image processing or sending emails.
2) Sharding, Replication, and Load Balancing
Horizontally partitioning the data across multiple nodes can allow queries to be executed in parallel across different shards, reducing the amount of data each server needs to process. Read-heavy workloads can be distributed across multiple replicas, reducing the load on the primary node. Moreover, incoming requests can be distributed evenly (load balancing) across multiple instances of an application server to prevent any single server from becoming overloaded.
3) Connection Pooling and Database Indexing
Establishing and tearing down network connections for every request can be expensive. Connection pooling allows applications to reuse existing connections, reducing the overhead associated with connection establishment. In addition to this, indexing frequently queried database columns and avoiding full-table scans can help with quick data lookups.
If you enjoyed this article, please hit the ❤️ like button.
If you think someone else will benefit from this, then please 🔁 share this post.
Also fabulous write up btw 🙌🏾
Given that you can’t have multiple region for cost reasons,
you have a FE react app making >= 4 api calls to BE ec2 micro services in AWS Ireland and your users are in Australia.
How would you approach improving the latency? I’ve opted for endpoints aggregation because 400ms per API request is killing me