AWS Database Blog

Introducing Amazon Keyspaces CDC streams

With Amazon Keyspaces (for Apache Cassandra), you can run your Apache Cassandra workloads on AWS using a fully managed, serverless database service. You can scale your Cassandra applications with virtually unlimited throughput and storage, while maintaining millisecond latency. Because Amazon Keyspaces is Cassandra-compatible, you can use your existing application code and developer tools without having to make additional changes. AWS manages the underlying infrastructure, handling time-consuming tasks like server provisioning, patching, and backups, so you can focus on building innovative applications.

Last week, AWS announced Amazon Keyspaces change data capture (CDC) streams, a new feature that captures real-time data changes in your Amazon Keyspaces tables. You can track and respond to data manipulations (such as insert, update, or delete) instantly, unlocking new possibilities for your applications. With CDC streams, you can stream data changes to services like Amazon OpenSearch Service, enabling advanced search capabilities and generative AI use cases. You can also synchronize data with your data warehouses for analytics and reporting, or build sophisticated event-driven architectures. With this feature, you can access the full potential of your data while maintaining the scalability and reliability of a fully managed service.

In this post, we discuss the architecture of Amazon Keyspaces CDC streams, explore its use cases and benefits, and provide an example demonstrating how to set up CDC streams, stream data, and capture the streamed records.

Solution overview

Amazon Keyspaces CDC streams provide a robust way to capture row-level modifications in tables, storing them as an ordered sequence in a log for 24 hours. Each time a row is inserted, updated, or deleted, the CDC stream generates a detailed record containing the primary key information along with both the previous and new states of the row. This enables applications to consume and process these changes in near real time. When you enable streams on a table, streams organize records into shards based on primary key ranges. Within each shard, records are assigned unique, monotonically increasing sequence numbers to maintain strict ordering. Amazon Keyspaces automatically manages the shard lifecycle, splitting or merging them based on traffic patterns. This dynamic management provides optimal performance while maintaining the ability to track record lineage through the shard history.

The following diagram illustrates this solution.

The system provides two critical capabilities: each mutation record appears exactly once in the CDC stream, and records maintain the same sequence as the actual mutations based on the primary key. This provides data consistency and proper ordering for downstream applications. CDC streams retain records for 24 hours, an immutable retention period that applies even after CDC is disabled on a table. Batch operations are automatically decomposed into individual row-level records while maintaining proper sequencing. For static columns, which share values across all rows in a partition, mutations are captured as separate records to accurately reflect the data model.

CDC data is automatically encrypted at rest using the same encryption keys as the underlying table. In multi-Region deployments, CDC streams operate independently in each AWS Region. This means that although each Region maintains its own consistent CDC stream, the order of events might vary across Regions due to the asynchronous nature of multi-Region replication and conflict resolution.

In the following sections, we explore how to enable streams and access them for downstream use cases.

Accessing the endpoints

When a CDC stream is created for the table, Amazon Keyspaces starts to capture information about modifications in the table. The CDC stream is identifiable as an Amazon Resource Name (ARN) with the following format:

arn:<AWS_PARTITION>:cassandra:<REGION>:<AWS_ACCOUNT_ID>:/keyspace/<KEYSPACE_NAME>/table/<TABLE_NAME>/stream/<STREAM_LABEL>

You can select the type of information (or view type) that the CDC stream carries within each record when you enable the CDC stream. The view type of the stream can’t be changed after the CDC stream is enabled. Amazon Keyspaces supports the following view types:

  • NEW_AND_OLD_IMAGES (default) – Captures the versions of the row before and after the mutation
  • NEW_IMAGE – Captures the version of the row after the mutation
  • OLD_IMAGE – Captures the version of the row before the mutation
  • KEYS_ONLY – Captures the partition and clustering keys of the row that was mutated

The CDC streams can be consumed using the Amazon Keyspaces Streams API and also using the Kinesis Client Library (KCL). Although both methods have their advantages, the KCL provides built-in features like automatic shard management, fault tolerance, and worker load balancing, which helps with complex data processing at scale. The direct API approach offers more granular control and is well-suited for simpler use cases where you need custom implementation of streaming logic. For more information on how to set up and use the KCL, refer to Use the Kinesis Client Library (KCL) to process Amazon Keyspaces streams.

Prerequisites

You’ll need an AWS account with appropriate permissions to proceed. Refer to the documentation for the required permissions.

Enable CDC streams

Let’s consider a use case where a digital media platform needs to store and manage uploaded content from its creators. Each piece of content (video, image, or audio) is assigned a unique content_id and tracked with metadata including the upload time and processing status. As content flows into this table, downstream applications use Streams APIs to monitor changes and trigger necessary processing like content recommendation systems, user notifications, or analytics dashboards.

Complete the following steps to enable CDC streams:

  1. On the Amazon Keyspaces console, choose Keyspaces in the navigation pane.
  2. Choose Create keyspace.
  3. Create a keyspace called media, where we will capture the metadata for the table media_content.
  4. Choose Create keyspace.
  5. Open the CQL editor to create the table with the following command:
    CREATE TABLE media.media_content (
        content_id uuid,
        title text,
        creator_id uuid,
        media_type text,
        upload_timestamp timestamp,
        status text,
        PRIMARY KEY (content_id)
    );
  6. Choose Run command.
  7. Under Stream details, select Turn on streams so you can capture the data changes to the table with just the default view type.
  8. Choose Save changes.

After the stream is enabled, you should see the stream status as on and the view type as New and old images.

You have completed the setup to enable the streams for the table media_content. Now let’s insert some data and see how you can access the data from the streams.

Access the streams

Before accessing the streams, let’s insert some data from the cql editor. We use the following INSERT statements:

INSERT INTO media.media_content (content_id, title, creator_id, media_type, upload_timestamp, status) VALUES (uuid(), 'Summer Vacation Video', uuid(), 'video', toTimestamp(now()), 'active');
INSERT INTO media.media_content (content_id, title, creator_id, media_type, upload_timestamp, status) VALUES (uuid(), 'Birthday Party Photos', uuid(), 'image', toTimestamp(now()), 'processing');
INSERT INTO media.media_content (content_id, title, creator_id, media_type, upload_timestamp, status) VALUES (uuid(), 'Podcast Episode 1', uuid(), 'audio', toTimestamp(now()), 'active');
INSERT INTO media.media_content (content_id, title, creator_id, media_type, upload_timestamp, status) VALUES (uuid(), 'Wedding Ceremony', uuid(), 'video', toTimestamp(now()), 'archived');
INSERT INTO media.media_content (content_id, title, creator_id, media_type, upload_timestamp, status) VALUES (uuid(), 'Concert Recording', uuid(), 'audio', toTimestamp(now()), 'processing');

Now you can fetch the change records of the table using the AWS CLI to call the new APIs of Amazon Keyspaces CDC streams.

  1. Retrieve the shards within the streams using get-stream:
    aws keyspacesstreams get-stream \
    --stream-arn arn:aws:cassandra:<AWS_REGION>:<account_id>:/keyspace/media/table/media_content/stream/2025-05-16T20:23:29.918 \
    --endpoint https://cassandra-streams.<AWS_REGION>.api.aws

    The output displays the shards within the CDC stream as follows:

    {
    "streamArn": "arn:aws:cassandra:<AWS_REGION>:<account_id>:/:/keyspace/media/table/media_content/stream/2025-05-16T20:23:29.918",
    "streamLabel": "2025-05-16T20:23:29.918",
    "streamStatus": "ENABLED",
    "streamViewType": "NEW_AND_OLD_IMAGES",
    "creationRequestDateTime": "2025-05-16T15:23:29.918000-05:00",
    "keyspaceName": "media",
    "tableName": "media_content",
    "shards": [
    {
    "shardId": "shardId-00000001747427011617-c6dcfa62",
    "sequenceNumberRange": {
    "startingSequenceNumber": "6500003910786539459926"
    },
    "parentShardIds": [
    null
    ]
    },
    {
    "shardId": "shardId-00000001747427011721-ee994773",
    "sequenceNumberRange": {
    "startingSequenceNumber": "6300003358018353308682"
    },
    "parentShardIds": [
    null
    ]
    },
    {
    "shardId": "shardId-00000001747427011822-286d4fbc",
    "sequenceNumberRange": {
    "startingSequenceNumber": "6400000689805508842349"
    },
    "parentShardIds": [
    null
    ]
    },
    {
    "shardId": "shardId-00000001747427011926-6cb2833c",
    "sequenceNumberRange": {
    "startingSequenceNumber": "6400002768480685596628"
    },
    "parentShardIds": [
    null 
       ] 
      }
     ]
    }
  2. To fetch records from the CDC stream, you first need to get a shard iterator using the get-shard-iterator API. This iterator serves as the starting point within each shard, so you can retrieve the records in sequence. For this example, we use an iterator of type TRIM_HORIZON that retrieves from the last trimmed point (or beginning) of the shard:
    aws keyspacesstreams get-shard-iterator \
    --stream-arn arn:aws:cassandra:<AWS_REGION>:<account_id>:/keyspace/media/table/media_content/stream/2025-05-16T20:23:29.918 \
    --endpoint https://cassandra-streams.<AWS_REGION>.api.aws \
    --shard-id 'shardId-00000001747427011617-c6dcfa62' \
    --shard-iterator-type TRIM_HORIZON
    
    Output:
    {
        "shardIterator": "arn:aws:cassandra:<AWS_REGION>:<account_id>:/keyspace/media/table/media_content/stream/2025-05-16T20:23:29.918|1|AAAAAAAAAAGGbMgz4HIyXo9L8aTaJd1ywoUKycEbfPNeN+ZDJZXYVXgBKdAqkaX90P0OAhVOhjqqvmf09IMSsoheb2ej7QcLhziQOxN8O2Syi50SOIpnrcgRgfuoXu1v3Y/ZD6eKeBU8LvrI4gowtZ0FGA7+KPtKYA4o2LxMGNaTvntfqtEDJgCdFPxavIDSpyo2S4pSsY+5o4BHbUTslnsrb0RAj4fs27NyxwLTAO5v8bu7bexMmGWEclpKS6mZ0P8eAJaHLqlRMjlsql1TtryBEVVAUC47JsXQxgNTwBlkbo03i+jFDEG9bDtd7UymbTeeR5mi9K+FzBYOktcBe+1HgIpSzNu2BnqsHqXqAKWjLAUBgjlwTRe3ZaUoF7C3iCNigNjtHD3ySzq0Kh7aI7l5jxKVcDVQTtGXjf9T6V4XM9Upvurva+t9axGSGZ3OSS7EjF5nfayPpkgRFTgf/rkL3yRsNliRjyZ6znsK0KoZxw8xv/TMIOdxUIlw+5QUV5GZV+S9If2KQmxEwQac6tqt+Zcea9ucovR7t2uudHzFyKucW0gqDxC3q+NQEoWKAz8aS7qnRhpwMcFxLBucC2BVfhKFE6QjRNXW+h7+6T80eyYwCc/rNxjFeJoLC9nJ/FoL2ZG7dEqQWxVAfZiuYIJYDT8C0p8A/YPSeUsFMZcjPWyrAj9JD3TprREBa5+xGDi7B9dWPM7jC3xgcFOGhgUUIxgs0M724vR+Ehn+K3QfgxhiOj8LaOOIKS0uWW9pOszXjwkyKHFyk7gHZj59FWjjqToQVshkIj/IKLlFRGD8T1FpH/JiklWmrGkG8FiGKIXM8t+PnvRLGAJU9hopt6Ob70oyUuu9uNxbQKN+wBYQ6SaviAzrsquMcOC/sTq5T0/dF7RDrZHtwPqZ9nBFauc2MF/lc+pk1XbvhHoQ4x8kONI6a54KIxscIA279ZEV51oGH172PyQ="
    }
  3. Using the shard iterator obtained earlier, you can now fetch the CDC records through the get-records API call:
    aws keyspacesstreams get-records \
    --shard-iterator 'arn:aws:cassandra:<AWS_REGION>:<account_id>:/:/keyspace/media/table/media_content/stream/2025-05-16T20:23:29.918|1|AAAAAAAAAAFTfD2zqaaYW74Jnrddm2Q9Zd3oxxAoYEMz1w5oIr+ynzpDFejQP0BQeJI9hpgW7VEXEFwaXoSodcI04eZpteotlqwEyEu6kSqXX3BAgKZzjdeWIooe0OrGukS5y/IEilRQ36HZSGKIySZBhGj/3RFCWb9X6pKQpnQ5w2kFooOe7RNTdj9DbzIMIc27p2Lx51bI7kDuEwdl/TnC6JbzfXLxx+/Nm2isaeWAaUeLuwN0KVL3UnHwtxYamQkfH586DpAaaplBvEjwg46JhdnehhTqWDg/6IVk4FkpbwT1HaQQ6gmUoVfPcJ7psboD0tBLZj/ei9tlTPUNhXh7o3sYhggUDAhmeP5hbVaOnyzLqXrw/eNpXhltXIS6QbPb3lMBLr+VsVeZHqvzNWEM8BwzgTnkfzgYQ3kkFkGaLdiOwBXeIIXG2hpRvKURUc+iWj1EPZQgPUjbBycA6Zb8YMMnt5ULbUti42HalpGizK8isaCE5Y8SK2C77GcE/r5HSLufFRw4RV/nFES/BuGY8UhEN0yOwTYUQ2PxZ9GD8hjSstb/Q+Y6/DIU14fX8FB7gjxMk1z3+gKizSZbTSzg3+uangnOC5vtOGUhxGTjQo5mrLIIlIuOt0DegciEWFZvwoohZZpnrEW16erHHUhTIPw1Oxe5n65WrMMmLJ5vtlFFXYBWywKKATC5d0kEefwEQhYCRM7zRYE0VzYB+uuoT2+OYqvISn76sSOIk5g+K1Dvp4QAlg2hTVU3VV8HaSGwzpes1KvptlV5vGRTcKaFnQnRteSBrrGtkJDynWBe5Iq/nJ191nyO2GsY9CrWwVg0ITzmmO8UhZU6DaXtWrPKfUPAiUqitwwnYUhwSC9kJ1by3iP6Tl1ldUTN+hlJoYwW2puQkR+XQj6999+r3+kBDfIF0L/SURigkdxsgw3BV5QMkreEZE6O/VAmsUgMNqtTWCuZaTY='  \
    --endpoint https://cassandra-streams.<AWS_REGION>.api.aws

    The output will have the data that you inserted. You can repeat the same to get the data from the other shard IDs that you retrieved from Step 1.

You have now walked through how to set up and access the steams using the APIs. You can also integrate with other AWS services like AWS Lambda.

Considerations

Assuming you get several MBps of records for your media content changes, managing the stream consumption directly through APIs becomes challenging due to the need to handle shard management, checkpointing, and scaling. For such high throughput use cases, we strongly recommend using the KCL instead of direct API calls. The KCL provides built-in features for managing worker coordination, checkpointing, and automatic scaling of stream processors, making it suitable for processing large volumes of streaming data in production environments.

Benefits and use cases

The following are some of the benefits and common use cases for using Amazon Keyspaces CDC streams:

  • Real-time data integration:
    • Enable seamless synchronization between Amazon Keyspaces and data warehouses
    • Stream changes to analytics platforms for real-time insights
    • Build event-driven architectures with immediate response to data changes
  • Search and analytics use cases:
    • Keep search indexes up-to-date by streaming changes to OpenSearch Service
    • Enable full-text search capabilities on your Cassandra data
    • Power real-time analytics dashboards with fresh data
  • Compliance and audit trail:
    • Maintain detailed history of data modifications
    • Track who changed what and when for compliance requirements
    • Enable point-in-time recovery and data reconciliation

These are some of the use cases that make Amazon Keyspaces CDC streams a powerful tool for building modern, data-driven applications that require real-time processing and integration capabilities.

Clean up

When you’re done with the solution, clean up the resources you created to avoid ongoing charges.

  1. Delete the media_content table
    aws keyspaces delete-table --keyspace-name media \
    --table-name media_content
  2. Delete the media keyspace
    aws keyspaces delete-keyspace --keyspace-name media

Conclusion:

In this post, we explored Amazon Keyspaces CDC streams—a feature that captures data changes in Amazon Keyspaces tables. We demonstrated how to enable CDC streams and process the resulting records. Streaming data changes from Amazon Keyspaces tables to downstream applications or data sources enables event-driven use cases, provides valuable insights by integrating with various systems, and helps address complex data modeling challenges.

Try out Amazon Keyspaces CDC streams in your own use case and share your feedback in the comments.


About the authors

Rajesh Kantamani

Rajesh Kantamani

Rajesh is a Senior Database Specialist Solutions Architect. He partners with customers to design, migrate, and optimize their database solutions on Amazon Web Services, focusing on scalability, security, and peak performance. With a passion for distributed databases, he helps organizations transform their data infrastructure. When not architecting database solutions, he enjoys outdoor activities with family and friends.