AWS Cloud Operations Blog
Analyze Azure Audit Logs with CloudTrail Lake
Introduction
In the ever-evolving world of cloud computing, maintaining robust security and compliance is paramount. As usage of multicloud environments grows, the need for comprehensive monitoring and logging solutions becomes more critical. Enter the synergy of Azure Audit Logs and AWS CloudTrail Lake—a powerful combination that provides comprehensive visibility across your cloud environments.
Azure Audit Logs offer detailed insights into operations performed within your Azure environment, tracking changes and activities to ensure compliance and security. On the other hand, AWS CloudTrail Lake enables you to store, access, and analyze your cloud activity logs in a centralized location, offering advanced querying capabilities and long-term data retention.
In this blog, we’ll walk you through deploying the sample solution to ingest your Azure activity logs into a CloudTrail Lake event data store, and provide example SQL queries to analyze the data and aid in security investigations. Whether you’re a cloud architect, security analyst, or IT manager, this guide will equip you with the knowledge to harness the full potential of your cloud audit logs.
Pre-requisites
To follow along with this walkthrough, you must have the following:
• AWS CLI – Install the AWS CLI
• SAM CLI – Install the SAM CLI. The Serverless Application Model Command Line Interface (SAM CLI) is an extension of the AWS CLI that adds functionality for building and testing AWS Lambda applications.
• Python 3.12 or higher – Install Python
• An AWS account with an AWS Identity and Access Management (IAM) role that has sufficient access to provision the required resources.
• An Azure account that has sufficient access to provision the required resources and permissions to send logs to AWS.
AWS Setup
1. Enable CloudTrail Lake by following the steps in this blog post
2. Create an AWS Secrets Manager secret to store the connection string for the Azure Storage account.
Azure Setup
1. Create Azure storage account An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, and tables. The storage account provides a unique namespace for your Azure Storage data that is accessible from anywhere in the world over HTTP or HTTPS. For more information about Azure storage accounts, see Storage account overview.
2. Register client application in Microsoft Entra ID to access Azure Health Data Services
3. Provide the connection string for the Azure storage account which includes the DefaultEndpointsProtocol
, AccountName
, AccountKey
& EndpointSuffix
for the AWS CloudFormation stack deployment.
Solution Overview
Figure 1: Solutions Architecture Diagram
Solution Overview
- Azure is configured to export activity and audit logs to an Azure blob storage container in a blob named PT1H.json. Each PT1H.json blob contains a JSON object with events from log files that were received during the hour specified in the blob URL.
- An EventBridge schedule rule invokes the Lambda function used for aggregating Azure log files.
- The aggregator Lambda function queries a DynamoDB table to obtain the timestamp for the last time Azure log files were processed by our solution. This enables the function to only retrieve newer log files and avoid processing duplicate events.
- The aggregator Lambda function retrieves the Azure Storage connection string from an AWS Secrets Manager secret.
- The aggregator Lambda function connects to the Azure blog storage container and collects the path of any log file which was modified after the timestamp obtained in step three.
- The log file paths collected in the previous step are enqueued in an SQS queue for further processing by the Lambda function responsible for processing log files.
- The processor Lambda function is invoked automatically by an event source mapping when Azure log file paths are enqueued to the SQS queue in the previous step.
- The processor Lambda function retrieves the Azure Storage connection string from an AWS Secrets Manager secret.
- The processor Lambda function dequeues a batch of up to ten log file paths from the SQS queue and downloads each log file from the Azure blob storage container.
- The processor Lambda function iterates over each log file, and parses every event and converts it to match the CloudTrail Lake customer event schema. Each event is then ingested into the CloudTrail Lake event data store using the PutAuditEvents API.
- If any events cannot be parsed or converted to the CloudTrail Lake customer event schema, or if any call to PutAuditEvents API fails, the processor Lambda function will enqueue the corresponding events into a second SQS queue which can be used to investigate the cause of the failures.
Deploying the Solution
1. Use git to clone this repository to your workspace area. SAM CLI should be configured with AWS credentials from the AWS account where you plan to deploy the example. Run the following commands in your shell to begin the guided SAM deployment process:
git clone https://github.com/aws-samples/aws-cloudtrail-lake-analyze-azure-audit-logs.git
cd aws-cloudtrail-lake-analyze-azure-audit-logs/SAM
sam build
sam deploy --guided
2. Provide values for the CloudFormation stack parameters.
a. CloudTrailEventDataStoreArn
(Optional) – ARN of the event data store that will be used to ingest the Azure activity events. If no ARN is provided, a new event data store will be created in the target region.
Note: Please be aware that there is a quota on the number of event data stores that can be created per region. See Quotas in AWS CloudTrail for more details. Additionally, after you delete an event data store, it remains in the PENDING_DELETION state for seven days before it is permanently deleted, and continues to count against your quota during that time. See Manage event data store lifecycles for more details.
b. CloudTrailEventRetentionPeriod
– The number of days to retain events ingested into CloudTrail. The minimum is 7 days and the maximum is 2,557 days. Defaults to 7 days.
c. AzureStorageContainerName
– Name of the Azure Storage container where Azure activity events are stored.
d. AzureStorageConnectionStringSecret
– Name of the Secrets Manager secret which contains the Azure Storage connection string.
3. After SAM has successfully deployed the example, check the outputs and note the EventDataStoreId value that is returned. This EventDataStoreId will be needed to query the CloudTrail Lake event data store.
Figure 2: Successful SAM deployment output
4. The Lambda function responsible for aggregation will have a scheduled trigger attached to invoke the function every 1 hour. After the CloudFormation stack is deployed successfully, the function will be automatically invoked by by an EventBridge scheduled rule for the first time within 1 hour of the deployment.
You can verify that a Lambda function has been invoked at least once by navigating to the AWS Lambda Console and looking at the Invocations graphs on the Monitoring tab. For more details, see Monitoring functions on the Lambda console. You can also invoke the function manually to ingest events right away. For more details, see Understanding Lambda function invocation methods.
Once the aggregator lambda function is invoked, it will query the Amazon DynamoDB table to retrieve the timestamp for the last time log files were processed. The function will then search the Azure storage container so that it can filter new log files since that time. If no timestamp exists, the function will aggregate all present log files. The function will then enqueue the file path of each in-scope log file to the SQS queue.
5. When a log file path is enqueued, the Processor Lambda function will be automatically triggered by the SQS queue and will download each enqueued log file from the Azure storage container. Each log file will be parsed for individual events, each of which will be formatted to adhere to the CloudTrail Lake schema for customer events. See CloudTrail Lake integrations event schema for more details.
The Processor function will attempt to ingest each formatted event into the CloudTrail Lake event data store. Any events which cannot be successfully ingested to CloudTrail Lake will be enqueued into a separate SQS queue for failed events.
6. After Azure log events have been successfully ingested to your CloudTrail Lake event data store, you can follow the steps below to analyze your Azure activity logs using the CloudTrail Lake SQL-based sample queries.
Please note that CloudTrail typically delivers events within an average of about 5 minutes of an API call, though this time is not guaranteed. Therefore, after the Lambda function is invoked there may be an additional delay of about 5 minutes before the events can be queried in CloudTrail Lake.
Note: In the event that the Lambda function encounters any issues during execution, you can inspect the CloudWatch logs for the function . For more information, see Accessing Amazon CloudWatch logs for AWS Lambda.
Analyzing the Data
Analyzing a Security Incident example:
To verify if the Azure audit logs are available in AWS CloudTrail Lake data store, use the sample query below to query your CloudTrail Lake event data store following these instructions: Run a query and save query results
Make sure you replace <EventDataStoreID> with the Id of the event data store, which can be found in the Outputs returned after a successful deployment with SAM.
Finally, ensure the dates are updated to encompass a period following the deployment of the Lambda function.
1. List all the storage accounts in Azure environment using the below query and replace the <EventDataStoreID> specific to your AWS Account
select * from <EventDataStoreID> where eventData.eventName Like 'MICROSOFT.STORAGE/STORAGEACCOUNTS%' limit 10
2. List of Operations performed on Storage Account.
SELECT
eventData.uid,
eventData.eventName,
eventData.sourceipaddress,
eventData.recipientaccountid,
eventData.eventtime
FROM
<EventDataStoreID>
WHERE
eventData.eventSource = 'MICROSOFT.STORAGE'
AND eventData.eventName LIKE 'MICROSOFT.STORAGE/STORAGEACCOUNTS%'
AND eventData.uid IS NOT NULL
limit 10
Who is the user who deleted the storage?
select * from <EventDataStoreID> where eventData.eventName = 'MICROSOFT.STORAGE/STORAGEACCOUNTS/BLOBSERVICES/CONTAINERS/DELETE' and eventTime > '2024-12-10 00:00:00’
Clean-Up
You can use SAM CLI to delete the deployed resources and make sure that you don’t continue to incur charges. To delete the resources, run the following command from your shell and replace <stack-name> with the stack name you provided to SAM when running sam deploy. Follow the prompts to confirm the resource deletion.
sam delete --stack-name <stack-name>
Figure 3: Cleaning up solution using SAM CLI
Conclusion
In this post, we showed how to integrate Azure Audit Logs with Amazon CloudTrail Lake, enabling cloud administrators to analyze logs and monitor security events across cloud environments from a centralized location. This integration, combined with the automation shown in the solution, streamlines security investigations by providing a unified platform to search and analyze activity data from multiple cloud sources, making it easier to respond to and investigate potential security incidents.
About the Authors