AWS News Blog
AWS Lambda functions now scale 12 times faster when handling high-volume requests
|  | 
Now AWS Lambda scales up to 12 times faster. Each synchronously invoked Lambda function now scales by 1,000 concurrent executions every 10 seconds until the aggregate concurrency across all functions reaches the account’s concurrency limit. In addition, each function within an account now scales independently from each other, no matter how the functions are invoked. These improvements come at no additional cost, and you don’t need to do any configuration in your existing functions.
Building scalable and high-performing applications can be challenging with traditional architectures, often requiring over-provisioning of compute resources or complex caching solutions for peak demands and unpredictable traffic. Many developers choose Lambda because it scales on-demand when applications face unpredictable traffic.
Before this update, Lambda functions could initially scale at the account level by 500–3,000 concurrent executions (depending on the Region) in the first minute, followed by 500 concurrent executions every minute until the account’s concurrency limit is reached. Because this scaling limit was shared between all the functions in the same account and Region, if one function experienced an influx of traffic, it could affect the throughput of other functions in the same account. This increased engineering efforts to monitor a few functions that could burst beyond the account limits, causing a noisy neighbor scenario and reducing the overall concurrency of other functions in the same account.
Now, with these scaling improvements, customers with highly variable traffic can reach concurrency targets faster than before. For instance, a news site publishing a breaking news story or an online store running a flash sale would experience a significant influx of visitors. Thanks to these improvements, they can now scale 12 times faster than before.
In addition, customers that use services such as Amazon Athena and Amazon Redshift with scalar Lambda-based UDFs to perform data enrichment or data transformations will see benefits from these improvements. These services rely on batching data and passing it in chunks to Lambda, simultaneously invoking multiple parallel functions. The enhanced concurrency scaling behavior ensures Lambda can rapidly scale and service level agreement (SLA) requirements are met.
How does this work in practice?
 The following graph shows a function receiving requests and processing them every 10 seconds. The account concurrency limit is set to 7,000 concurrent requests and is shared between all the functions in the same account. Each function scaling-up rate is fixed to 1,000 concurrent executions every 10 seconds. This rate is independent from other functions in the same account, making it easier for you to predict how this function will scale and throttle the requests if needed.
- 09:00:00 – The function has been running for a while, and there are already 1,000 concurrent executions that are being processed.
- 09:00:10 – Ten seconds later, there is a new burst of 1,000 new requests. This function can process them with no problem because the function can scale up to 1,000 concurrent executions every 10 seconds.
- 09:00:20 – The same happens here: a thousand new requests.
- 09:00:30 – The function now receives 1,500 new requests. Because the maximum scale-up capacity for a function is 1,000 requests per 10 seconds, 500 of those requests will get throttled.
- 09:01:00 – At this time, the function is already processing 4,500 concurrent requests. But there is a burst of 3,000 new requests. Lambda processes 1,000 of the new requests and throttles 2,000 because the function can scale up to 1,000 requests every 10 seconds.
- 09:01:10 – After 10 seconds, there is another burst of 2,000 requests, and the function can now process 1,000 more requests. However, the remaining 1,000 requests get throttled because the function can scale to 1,000 requests every 10 seconds.
- 09:01:20 – Now the function is processing 6,500 concurrent requests, and there are 1,000 incoming requests. The first 500 of those requests get processed, but the other 500 get throttled because the function reached the account concurrency limit of 7,000 requests. It’s important to remember that you can raise the account concurrency limit by creating a support ticket in the AWS Management Console.
In the case of having more than one function in your account, the functions scale independently until the total account concurrency limit is reached. After that, all new invocations will be throttled.
Availability
 These scaling improvements will be enabled by default for all functions. Starting on November 26 through mid-December, AWS is gradually rolling out these scaling improvements to all AWS Regions except China and GovCloud Regions.
If you want to learn more about Lambda’s new scaling behavior, read the Lambda scaling behavior documentation page.
– Marcia

