Rapid Scaling in FaaS: High Performance Approach by AWS Lambda
How AWS Lambda supports container images up to 10 GiB in size?
Over the past decade, Function-as-a-Service (FaaS) has revolutionized software development. It empowers engineers to concentrate solely on writing code, without the burden of managing intricate infrastructure provisioning and scaling. From a cost perspective, FaaS takes efficiency to the next level by charging based on function execution time.
While the internet abounds with articles discussing the advantages and disadvantages of FaaS, this article specifically delves into how AWS Lambda, one of the most popular FaaS platforms, supports on-demand loading of container images up to 10 GiB in size while maintaining rapid scalability (up to tens of thousands of new containers per customer), high request rates (millions of requests/sec), and low start-up times (~50ms).
Core Challenge
Prior to the ability to support loading up to 10GiB of containers, Lambda functions were limited to 250MB of code packaged as a simple compressed archive. This means there was not a lot of data movement when scaling up allowing to keep the scale up time, popularly known as cold start time, under a certain acceptable limit. However, to allow on-demand container images up to 10GiB in size will require moving significant amounts of data across the network, posing a great challenge for a service so popular to achieve the desired scalability and cold-start latency targets.
Solution
TL;DR
The aforementioned problem was solved by caching and deduping the most frequently used container image layers which reduced the data movement for all the container images inheriting them.
Details
Lambda achieves this by flattening the container image deterministically into small chunks. The deterministic flattening process produces unique chunks for the customized parts of the image, and similar chunks for the common parts (layers inherited by multiple images like nodejs layer etc.). This reduces the data storage and movement significantly as single copy of commonly used chunk needs to be stored once and can be cached as well.
In simpler terms, say we have two container images A and B. When passed through the deterministic flattening function, it generates unique chunks for customized layer (layer #2, #3, 4) and generate similar chunks for common layers (layer #0, #1). The deduping operation removes the duplicates.
Further Details
To avoid unauthorized access to the code, the chunks are encrypted, however, this increases complexity while deduping because encryption can generate different cipher texts for the same chunks when encrypted with different keys. Something like:
This problem is solved using convergent encryption, where
a cryptographic hash of the given chunk is generated by using some metadata as salt,
this hash is then used as key to encrypt the chunk,
and finally, the key is encrypted using customer’s unique key
This allows the decryption keys to stay unique to the customer and only the functions authorized by the customer can access them.
Production Nuggets
To limit the failure points, even the common chunks shared across many images are hashed with different salt values. Although, this generates different hash for same chunk, but it also saves the system from issues like hot-sharding and single point of failure. For eg. if a common chunk, say chunk #A, is hashed to the same value all the time, then any failure in the caching service will cause a performance hit for all the images dependent on chunk #A. Besides that, popular chunks will end up creating hot shards as all images will refer to the same hashed value of chunk #A. Having a common chunk hashed to different values, allows it to spread the traffic across multiple different hashed copies of the same chunk, which in return avoids single point of failures and hot shards.
The chunks are cached locally by the functions, if not available locally, its fetched from a remote availability-zone level (AZ-level) cache, and finally, if not available at AZ-level, its fetched from a persistent storage and loaded in to the cache.
The cache eviction strategy preferred is LRU-k over LRU (Least Recently Used). This helps to keep hot entries in the cache for longer and avoid them getting replaced by infrequently used entries as any drop of a hot entry from the cache can cause a significant performance hit to all the callers.