Implementing granular cost analysis for multi-tenant CloudFront distributions

Note: This post references the use of multi-tenant or shared distributions, which has recently released more formalized support through SaaS Manager for CloudFront. Check out the latest blog to see how multiple domain delivery can leverage SaaS Manager.

Amazon CloudFront is the AWS native Content Delivery Network (CDN) that reduces latency, improves availability, and secures web applications by offloading traffic from regional AWS infrastructure to a global edge solution. This post focuses on one solution to provide CDN high-granularity cost transparency that is unavailable using traditional AWS cost analysis tools such as AWS Budgets or AWS Cost Explorer. The solution uses CloudFront standard access logs to identify and allocate CDN costs at a request level to track spend and “chargeback” to the given team or unit.

Prerequisites

This solution uses AWS Cloud Development Kit (CDK) to help automate the deployment and configuration of proposed solution. You will need an AWS account to deploy the reference solution and familiarity with fundamental Amazon CloudFront content delivery concepts such as data transfer out, proxy data, origins, edge computing, etc.

You will need access to the following AWS services in addition to the CDN resources:

QuickSight with an active QuickSight subscription – Used to create an analysis and dashboard. Additionally, you need a QuickSight user registered and active in the AWS Region where you’re going to deploy the solution.
Amazon Athena – Used to query the CloudFront standard access logs. If it is your first time using Athena, you are required to setup a destination bucket for your query results.
AWS Glue – Used to structure CloudFront standard access logs and provide a target table for your Athena queries.

Architecture overview

A streamlined reference architecture for an example chargeback analysis uses Amazon S3 as a static website origin, Amazon API Gateway to serve dynamic content from a REST API, and use edge compute capabilities such as CloudFront Functions and Lambda@Edge, as shown in Figure 1.

Solution architecture diagram for a standard content delivery workflow. Users arrow to Amazon CloudFront, arrow to AWS WAF, arrow to Amazon CloudFront Region Edge Cache, final arrow to various origin sources

Figure 1. Solution architecture diagram for a standard content delivery workflow

Along with the base architecture for serving content, to support granular cost analysis and align with the AWS Well-Architected Framework, we add a few more services to improve security, standardize log aggregation, querying, and basic dashboarding. We use AWS WAF to help secure the perimeter of the deployment. When CloudFront logs are stored in Amazon S3, we can organize and structure this data using AWS Glue, specifically the AWS Glue Data Catalog. This centralized metadata repository allows the data to be queried using SQL with Amazon Athena. Athena enables users to analyze the data in place using standard SQL queries, eliminating the need for complex extract, transform, and load (ETL) processes. Amazon QuickSight is used as the data visualization and dashboarding service. A new view of this architecture with the portion of the workload to support chargeback analysis, within the dotted lines, is shown in Figure 2.

1.The CloudFront distribution takes requests, which can be associated with different teams/tenants by each request’s path, query, host, or any characteristic available through the standard log line. In the following example implementation, the tenant/team identifier is in the request URI: “cloudfrontExample.com/tenant1” and “cloudfrontExample.com/tenant {+N…}” are received by the same distribution.

Figure 2. Content delivery with additional services for chargeback analysis

The CloudFront distribution takes requests, which can be associated with different teams/tenants by each request’s path, query, host, or any characteristic available through the standard log line. In the following example implementation, the tenant/team identifier is in the request URI: “cloudfrontExample.com/tenant1” and “cloudfrontExample.com/tenant {+N…}” are received by the same distribution.
CloudFront standard logs for all tenants on the distribution are stored in a common S3 bucket.
Athena is used to query the CloudFront standard logs, using SQL to differentiate requests by tenant and enrich the request data with pricing information.
1. AWS Glue Data Catalog is used to provide structure and store metadata about the CloudFront log dataset in a table.
QuickSight visualizations and dashboards pull from the Athena table to aggregate cost metrics based on the different CloudFront tenant characteristics (for example path, query, host).
The middle column of the architecture is for collection of AWS WAF logs, and isn’t included in the provided solution/code. Using posts such as, How to use Amazon Athena queries to analyze AWS WAF logs and provide the visibility needed for threat detection, provide the foundation for collection and chargeback of activity on shared AWS services.
The last column is also not included in the provided solution, but it shows the services needed for aggregating Lambda@Edge application logs cross-region, to get more precise estimates for your GB/sec.

Chargeback solution cost inputs

The cost for the further chargeback logging and analysis services depends on the traffic to your workload. An example cost breakdown has 30 million requests per month, which is the average request generates a 2 kb standard log entry. At the time of publishing, the AWS Pricing Calculator estimates the following based on public pricing:

Amazon S3: ~$1.50 for 60 GB of data stored and object requests by CloudFront and Athena (2 kb x 30,000,000). The 60 GB is accumulated each month, and it’s recommended to set up a data archival strategy if the data is no longer used or only needed for infrequent analysis.
Amazon Athena: ~$6 for 20 queries where 60 GB of data is scanned each query. The total data scanned accumulates each month and, depending on frequency and breadth (date range) of your analysis, the number and size of the queries may vary.
Amazon QuickSight: $24/user/month (Authors) ($18 with annual commitment) and $3/user/month (Readers). QuickSight pricing for authors is fixed based on the size of number of users data creating dashboard, whereas the readers depend on the size of the audience consuming the dashboards.

CloudFront unit cost breakdown

When breaking down CDN costs from the CloudFront Pricing page, the two most common inputs to cost are data transfer out (DTO) of AWS to the internet, and the number of HTTP requests. Both dimensions are priced according to the AWS Region from which the content is served. For example, as of March 2025 and not including the free tier, the first 10 TB out of the United States are priced at $0.085 per GB as compared to $0.109 per GB out of India. A separate dimension of DTO is “DTO to Origin (DTOO aka proxy traffic)” where data/requests are measured flowing in the opposite direction from edge locations to your origin with POST requests, PUT requests, or WebSocket traffic.

Some user workloads demand lower latency compute processes, such as on-demand image resizing or light URL redirects/rewrites. These types of edge compute workflows have more costs, which can also be aggregated by tenant/team. CloudFront Functions are more lightweight and are designed for transformations on requests or responses such as adding a header, whereas Lambda@Edge can perform more complex content personalization, processing, or integration with other AWS/third-party services. CloudFront Functions pricing is based on the number of request/responses (invocations) and is included in this solution architecture because of its commonality in edge workloads.

Lambda@Edge cost is calculated based on the total number of requests and compute duration, in GB-second (a product of AWS Lambda compute time and memory). The GB per second request duration is excluded from this solution because of further configuration needed to centralize these logs across different AWS Regions (explained further in the “Solution considerations” section). Other dimensions not integrated into this solution include: CloudFront KeyValueStore, Origin Shield, invalidations, and real-time logging.

Walkthrough

Refer to the project in the GitHub repository for instructions on deploying the solution using AWS CDK. The reference solution for this architecture deploys the following resource: S3 buckets, API Gateway endpoints, Lambda functions, CloudFront distribution, AWS WAF web access control lists (web ACLs), and Glue tables. To fully use this solution example post-deployment, you must log in to the Console, run the example Athena queries provided, and open up QuickSight to finish configuring your visualization.

Generating log data

When the solution is deployed using AWS CDK, to start collecting log data you can create traffic to your assorted pages and origins by directly visiting the CloudFront domain URL, an output of the AWS CDK deployment, or the Console. Alternatively, you can run local curl commands or use Amazon Q Developer on the command line with Q chat to assist with generating some log data.

As the standard logs are collected from requests and stored in the S3 bucket, the AWS Glue Data Catalog table provides a centralized metadata repository allowing for the data to be queried through SQL with Athena. The Athena documentation provides the DDL statement for creating a structured table for CloudFront standard logs. You should observe the CloudFront standard logs populating your S3 bucket as shown in Figure 3.

Figure 3. CloudFront standard access log requests

Athena query generation

To format the summarization in Athena, requests need to be grouped by tenant/team URIs. As mentioned in the CloudFront unit cost breakdown section, the Region is needed for several for dimensions of cost. The Region code can be extracted from the standard log field x_edge_location with a substring case function extracting the first portion of the field and associating to the correct geographic location and cost rate.

CASE SUBSTRING(x_edge_location, 1, 3)
    WHEN ‘IAD’ THEN ‘United States’
    WHEN ‘MEX’ THEN ‘Mexico’
    WHEN ‘YUL’ THEN ‘Canada’

The following code takes the Region substring extracted above and, in our example, applies the $0.085/GB pricing rate to data transfer cost from the United States, Mexico, and Canada. This logic is replicated for each AWS Region code for traffic across the globe. CloudFront units for measurement of traffic is in gigabytes or 10^2 bytes.

CASE
    WHEN region IN (‘United States’, ‘Mexico’, ‘Canada’) 
    THEN cast(sum(bytes) / power(2, 30) * 0.085 as decimal(10,8))

The SQL included in the chargeback-athena-sql.sql file of the repository uses the public pricing for the first pricing tier. This SQL doesn’t account for the cost reduction discount, or tiers, for increased usage of the service. More complex costing estimates can be added to this solution using a pricing fact table where tiering or private pricing can be incorporated.

Multiple per request cost statements are joined together to create a single pane of glass for the cost of your different tenants across the different dimension of cost (DTO, request, etc.), as shown in Figure 4. This includes the following slices of cost dimensions, and their total, historically by standard log line characteristic (URI):

Total data transfer out (DTO) to the internet in GB
Cost of data transfer out (DTO) to the internet, aggregated by AWS Region served
Total request costs, aggregated by AWS Region served
Proxy requests (DTOO), proxy request bytes, and costs based on non-GET requests
CloudFront Function requests and costs, a flat product of requests (invocations)

Figure 4. Athena cost chargeback query output breaking down cost per tenant URI per day

Along with the query constructed for analyzing cost chargeback, we’ve included several other queries for per-tenant analysis, such as cache hit ratio and status code error count/rate, as shown in Figure 5. Cost analysis is just one of the many benefits of having tenant data in an Athena table. A few more advantages from log parsing include enhanced debuggability, user behavior analysis, and general usage exploration across your edge architecture.

Figure 5. Response status by tenant URI with count and percentage of total requests.

Data visualization with QuickSight

Athena is a powerful tool for exploratory SQL-based analysis and can provide the basis for creating data visualization to more quickly and effectively show trends and patterns. The queries constructed in the previous section can be reused within QuickSight to create views for these visualizations.

Go to QuickSight in the Console and choose Add a new dataset on the left side of your screen.
Choose Athena as your data source and provide a name for the dataset.
When choosing your table, choose Use custom SQL and copy and paste the SQL provided in the GitHub repository chargeback-athena-sql.sql file.
When testing the SQL, if there are permission issues, then make sure that the bucket has been added QuickSight users read permissions in the admin console.
The sample visualization shows metrics on total DTO, edge compute usage, and total requests, all segregated by path, or standard line characteristic, which is used to associate chargeback cost a given tenant/team.

Figure 6. Sample QuickSight chargeback visualizations

Solution considerations

There are some considerations to consider when building this type of solution.

Standard logs are delivered on a best-effort basis

CloudFront delivers access logs on a best-effort basis. Standard access logs are generally not intended to be used as an exact or complete accounting of every CloudFront request. The log entry for a particular request might be delivered long after the request was processed and, in rare cases, a log entry might not be delivered. A common practice in data analysis is working with a sampling of the data and not the complete set. You should confirm your use-case falls under this common practice, and having an incomplete set of data won’t impact your analysis.

Lambda@Edge, invalidations, Origin Shield, and other caveats

Lambda@Edge GB per second costs can be extracted from CloudWatch Lambda@Edge logs unlike requests (invocations), which can be derived from the standard log. However, due to the distributed nature of CloudFront Regional Edge Caches (where Lambda@Edge runs), application logs containing the billing metric GB-second are regional. Lambda@Edge logs would need to be centralized and parsed to get a complete picture of Lambda@Edge cost. We previously covered Aggregating Lambda@Edge Logs, which reviews centralizing Lambda@Edge logs, where one can find examples of how to bring in all the Lambda@Edge logs for deeper analysis. Depending on your workstream and cost chargeback precision it may be sufficient to bassline and extrapolate the GB per second costs. Invalidations also weren’t included as part of this solution, but they may be a part of your total cost of goods calculation. Invalidations are charged after the first 1,000 invalidation paths are used and charged the same regardless of the number of files invalidated. For Origin Shield chargeback, standard access logs OriginShieldHit value for their x-edge-detailed-result-type.

Conclusion

In this post, we demonstrated how to implement a simple cost chargeback solution with Amazon CloudFront standard access logs. The standard access logs are stored in Amazon S3 and use the AWS analytical services AWS Glue Data Catalog, Amazon Athena, and Amazon QuickSight to enable granular cost insights by grouping per-request costs at the CloudFront tenant/team level. This solution is meant to improve cost transparency of multi-tenant CloudFront distributions and help derive total cost of edge services for software as a service (SaaS) businesses or organization. To apply this chargeback solution to your existing workload, we recommend deploying this reference implementation in lower-level environment and tuning it to fit your chargeback needs. For more information on Amazon CloudFront reference our developer guides and previous content and blogs.

About the authors

Nick McCord

Nick is a Solutions Architect at AWS, primarily focusing on PE and VC funded startup customers. He regularly engages with founders and executives as a trusted technical advisor on best practices solving complex problems to fuel efficient long-term growth. Prior to AWS, he has worked in Software Development, Site Reliability, DevOps, and Data Engineering roles. He lives in Virginia with his three dogs and is passionate about early career development and professional mentorship.

Alex Moening

Alex is a Senior Edge Solutions Architect at AWS who brings innovative thinking and deep problem-solving expertise to cloud technology. He specializes in Content Delivery Networks and has worked on numerous high-scale applications for his customers, including some of the largest names in the delivery ecosystem. Alex actively contributes to industry research, including the annual Web Almanac, and shares his expertise through interactive sessions and presentations at industry events such as AWS re:Invent.

Networking & Content Delivery