How Amazon Route 53 Handles DDoS Attacks with Shuffle Sharding
Understanding How to Provide Clients Single Tenant Experience in a Shared Cluster
The Problem
The Domain Name System (DNS) is the Internet's directory, linking web browsers to websites by translating domain names into IP addresses. It is a vital component for businesses, as any issues with DNS can cause the entire business to seem offline. Users cannot access the website if a web browser cannot correctly resolve a domain name to its corresponding IP address.
The biggest challenge for any DNS provider, such as Amazon Route 53, is handling a Distributed Denial of Service (DDoS) attack without impacting businesses. A DDoS attack of a significant magnitude can overwhelm a service to the extent that it’s unable to serve actual clients.
Other Solutions
A simple approach to mitigate DDoS attacks is to provision more server capacity to absorb the attack. However, this approach isn’t scalable and can be costly for a DNS provider.
Another solution is to use specialized network hardware that can filter the traffic generated by DDoS attacks. However, many of these hardware devices would be required to cover all the domains hosted by a DNS provider, making this solution costly.
Let’s Recap Sharding First!
In a distributed system with multiple servers handling requests without sharding, each request can land on any server. This means if a DNS provider uses no sharding and a particular customer experiences a DDoS attack, all or multiple servers may be impacted, as requests from each customer can end up on any server, thus impacting other customers as well.
Suppose the servers are divided into shards, each handling a subset of customer requests. In that case, the impact of a DDoS attack can be limited to a particular shard serving the victim customer’s requests. Thus limiting the overall impact only to the customers served by the impacted shard. In short, with ordinary sharding, the overall impact of a DDoS attack can be reduced to a subset of the customers.
What is Shuffle Sharding?
The problem with ordinary sharding is that the same shard of servers serves all the requests from multiple customers, which impacts all servers and customers of that shard if one customer experiences a high influx of traffic.
Shuffle Sharding solves this by routing all requests from a customer to more than one shard (shuffling), thus reducing the overlap of resources amongst different customers. This allows a customer to continue to serve traffic even if one of the shards handling its traffic is unavailable.
In the example presented, there are eight shards labeled M1 to M8, and each customer, from A to H, is assigned to two different shards. For instance, customer A is mapped to shards M1 and M4, while customer D is assigned to M2 and M3. If customer A experiences a DDoS attack, shards M1 and M4 will be affected. Since customers B and F are also associated with M1 and M4, their requests will also be affected.
However, customers B and F have alternative routes through healthy shards M6 and M8, unaffected by the DDoS attack. As a result, all requests for customers B and F can be redirected to M6 and M8. In this scenario, shuffle-sharding effectively limits the impact of the DDoS attack on customer A, ensuring that other customers remain unaffected.
Similar to ordinary sharding, two shards are unavailable in this scenario, but the overall impact is much less. With enough servers, Shuffle Sharding can ensure more shuffle shards than customers, allowing each customer a single-tenant experience.
Conclusion
This article covered Shuffle Sharding in the context of Amazon Route 53 and how it handles DDoS. However, this technique can also be useful in scenarios outside of DNS, such as where the backend resources are queues or databases. Shuffle Sharding can reduce the impact of an invalid request or Thundering Herd Problem.
References
Workload isolation using shuffle-sharding. (n.d.). Amazon Web Services, Inc. https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding
Shuffle sharding. (n.d.). Cortex. https://cortexmetrics.io/docs/guides/shuffle-sharding/
Shuffle sharding | Grafana Loki documentation. (n.d.). Grafana Labs. https://grafana.com/docs/loki/latest/operations/shuffle-sharding/