AWS Database Blog

Introducing pre-warming for Amazon Keyspaces tables

Amazon Keyspaces now supports the pre-warming feature to provide you with proactive throughput management. With pre-warming, you can set minimum warm throughput values that your table can handle instantly, avoiding the cold start delays that occur during dynamic partition splits.

In this post, we discuss the Amazon Keyspaces pre-warming feature capabilities and demonstrate how it can enhance your throughput performance. Through a detailed examination of its core functionality, practical implementation patterns, and cost analysis, we show how to effectively prepare your tables for product launches or sales events.

Amazon Keyspaces (for Apache Cassandra) can run your Apache Cassandra workloads on AWS using a fully managed, serverless database service. You can scale your Cassandra applications with virtually unlimited throughput and storage, while maintaining millisecond-level latency. Amazon Keyspaces automatically scales tables based on workload demands, with scaling behavior determined by the table’s capacity mode. Amazon Keyspaces offers two capacity modes, on-demand and provisioned, designed to handle fluctuating and predictable workloads, respectively. However, although on-demand mode excels at automatic scaling, it has a built-in delay when tables need to handle massive traffic spikes immediately upon creation or during unexpected surges such as product launches or sales events.

Understanding warm throughput

Warm throughput (achieved by pre-warming tables) defines the minimum read and write operations your Amazon Keyspaces table can handle instantly without requiring dynamic scaling. Measured in read units per second and write units per second, it establishes a performance baseline rather than a maximum limit, with default values of 12,000 read and 4,000 write units for on-demand tables, and matching your current provisioned capacity for provisioned tables. Unlike provisioned capacity (which sets billable throughput limits), warm throughput represents the capacity your table’s infrastructure can handle immediately. You can configure warm throughput up to 40,000 units for both read and write operations by default, with higher limits available through AWS Support. This pre-warming process works with both capacity modes, allowing on-demand tables to scale from a higher baseline and provisioned tables to scale up to the warm throughput limit without experiencing delays.

How pre-warming works

Pre-warming in Amazon Keyspaces is an asynchronous process that enables tables to handle high throughput immediately upon creation or modification. When you create or update a table with pre-warming settings, Amazon Keyspaces configures the table with the specified throughput values. When you proactively pre-warm your table, you’re essentially setting the number of reads and writes your table can instantaneously support, making sure it can handle a specific level of traffic right from the start and your applications can achieve consistent sub-millisecond response times for expected traffic patterns. You can monitor the pre-warming status using the GetTable API, which returns real-time information about the pre-warming process along with the configured warm throughput values. Status indicators such as AVAILABLE or UPDATING help you track when your table is ready for high-throughput operations.

For multi-Region deployments, pre-warming settings are automatically propagated to all AWS Regions, facilitating consistent performance across the entire table replication group. When using AddReplica to add a new Region to a keyspace that contains pre-warmed tables, the same configuration is applied to tables in the newly added Region without requiring additional setup. The feature integrates with existing AWS Identity and Access Management (IAM) permissions, using standard actions like cassandra:Create, cassandra:Modify, and cassandra:Select for table management, without introducing new pre-warming specific permissions. Additionally, pre-warming works with both provisioned and on-demand capacity modes, so you can maintain your preferred billing model while gaining immediate high-throughput capabilities, with billing based on a one-time charge model for the difference between requested warm throughput values and current warm throughput values. Pre-warming also integrates with Amazon CloudWatch to provide visibility into table performance, helping you monitor existing Amazon Keyspaces metrics to verify that your pre-warmed tables are handling the expected throughput.

Example use case

To illustrate the pre-warming use case, consider a newly launched Internet of Things (IoT) service with 200,000 connected sensors that store sensor readings in an Amazon Keyspaces table configured in on-demand mode. In on-demand mode, the table initially supports up to 4,000 Write Capacity Units (WCUs) and 12,000 Read Capacity Units (RCUs), and requests exceeding this capacity will be throttled until the table scales up to meet the throughput requirements. When all 200,000 sensors come online simultaneously and attempt to send their sensor readings to the Amazon Keyspaces table, the table lacks the capacity to handle 200,000 write requests per second, causing requests to be throttled until the table automatically scales up to accommodate the workload. The following example demonstrates this throttling behavior and the gradual scale-up process.

Create a table with the following code:

aws keyspaces create-table \
  --keyspace-name iot_demo \
  --table-name sensor_readings_fresh \
  --schema-definition 'allColumns=[{name=sensor_id,type=text},{name=reading_time,type=timestamp},{name=temperature,type=double},{name=humidity,type=double}],partitionKeys=[{name=sensor_id}],clusteringKeys=[{name=reading_time,orderBy=DESC}]' \
  --region us-east-1

Use the following script to run a simulated workload. The script generates approximately 200,000 write requests per second and will keep retrying until all the requests are complete.

from cassandra.cluster import Cluster, ConsistencyLevel
from cassandra_sigv4.auth import SigV4AuthProvider
from ssl import SSLContext, PROTOCOL_TLS
import time
from datetime import datetime
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import json

# Configuration
KEYSPACE = 'iot_demo'
TABLE = 'sensor_readings_fresh'
DURATION_MINUTES = 15
TARGET_WRITES = 10000000  # High number to ensure we run for full duration
WORKERS = 500
BATCH_SIZE = 1000

# Metrics tracking
metrics_lock = threading.Lock()
metrics = {
    'consumed_writes': 0,
    'throttled_writes': 0,
    'total_attempts': 0,
    'timeline': []
}

# Setup Keyspaces connection
ssl_context = SSLContext(PROTOCOL_TLS)
auth_provider = SigV4AuthProvider()

cluster = Cluster(
    ['cassandra.us-east-1.amazonaws.com'],
    ssl_context=ssl_context,
    auth_provider=auth_provider,
    port=9142
)

session = cluster.connect(KEYSPACE)
insert_stmt = session.prepare(
    f"INSERT INTO {TABLE} (sensor_id, reading_time, temperature, humidity) VALUES (?, ?, ?, ?)"
)
insert_stmt.consistency_level = ConsistencyLevel.LOCAL_QUORUM

def is_throttle_error(error_str):
    """Check if error indicates throttling"""
    throttle_indicators = ['throttl', 'overload', 'exceeded', 'unavailable', 'timeout']
    return any(indicator in error_str.lower() for indicator in throttle_indicators)

def write_with_retry(sensor_id):
    """Write with retry logic to handle throttling"""
    import random
    import uuid
    max_retries = 100
    retry_count = 0
    
    # Generate highly randomized sensor_id to ensure even partition distribution
    random_uuid = str(uuid.uuid4())[:8]  # Use UUID for better randomization
    random_number = random.randint(1, 10000000)  # Larger range
    random_sensor_id = f"{random_uuid}_{random_number}"
    
    while retry_count < max_retries:
        try:
            session.execute(insert_stmt, (
                random_sensor_id,
                datetime.utcnow(),
                20.0 + (random_number % 10),
                50.0 + (random_number % 20)
            ))
            
            with metrics_lock:
                metrics['consumed_writes'] += 1
                metrics['total_attempts'] += retry_count + 1
                
            return {'success': True, 'retries': retry_count}
            
        except Exception as e:
            retry_count += 1
            
            with metrics_lock:
                if is_throttle_error(str(e)):
                    metrics['throttled_writes'] += 1
                metrics['total_attempts'] += 1
            
            if retry_count < max_retries:
                time.sleep(0.01)  # Minimal backoff
    
    return {'success': False, 'retries': max_retries}

def collect_metrics(start_time):
    """Collect timeline metrics for analysis"""
    end_time = start_time + (DURATION_MINUTES * 60)
    while True:
        time.sleep(0.5)
        
        current_time = time.time()
        with metrics_lock:
            elapsed = current_time - start_time
            metrics['timeline'].append({
                'timestamp': elapsed,
                'consumed_writes': metrics['consumed_writes'],
                'throttled_writes': metrics['throttled_writes'],
                'total_attempts': metrics['total_attempts']
            })
            
            if current_time >= end_time:
                break

print(f"\n{'='*70}")
print(f"KEYSPACES 15-MINUTE PREWARMED BURST TEST")
print(f"{'='*70}\n")
print(f"Table: {KEYSPACE}.{TABLE}")
print(f"Duration: {DURATION_MINUTES} minutes")
print(f"Workers: {WORKERS}")
print(f"Purpose: Demonstrate sustained high throughput with prewarming\n")

start_time = time.time()
end_time = start_time + (DURATION_MINUTES * 60)

# Start metrics collection
metrics_thread = threading.Thread(target=collect_metrics, args=(start_time,))
metrics_thread.daemon = True
metrics_thread.start()

completed = 0
failed_final = 0
total_retries = 0

batch_count = 0
while time.time() < end_time:
    batch_start = batch_count * BATCH_SIZE
    batch_end = batch_start + BATCH_SIZE
    
    with ThreadPoolExecutor(max_workers=WORKERS) as executor:
        futures = [executor.submit(write_with_retry, i) 
                  for i in range(batch_start, batch_end)]
        
        for future in as_completed(futures):
            if time.time() >= end_time:
                break
                
            result = future.result()
            completed += 1
            
            if result['success']:
                total_retries += result['retries']
            else:
                failed_final += 1
            
            # Progress update every 2000 writes
            if completed % 2000 == 0:
                elapsed = time.time() - start_time
                remaining = (end_time - time.time()) / 60
                with metrics_lock:
                    consumed = metrics['consumed_writes']
                    throttled = metrics['throttled_writes']
                    attempts = metrics['total_attempts']
                
                throttle_rate = throttled / max(1, attempts) * 100
                print(f"  Progress: {completed:,} writes | "
                      f"Success: {consumed:,} | Throttled: {throttled:,} ({throttle_rate:.1f}%) | "
                      f"Rate: {consumed/elapsed:.1f}/sec | Remaining: {remaining:.1f}min")
    
    batch_count += 1

time.sleep(2)  # Allow metrics collection to complete
elapsed = time.time() - start_time

print(f"\n{'='*70}")
print(f"15-MINUTE PREWARMED BURST TEST RESULTS")
print(f"{'='*70}")
print(f"Duration:           {DURATION_MINUTES} minutes ({elapsed:.1f}s)")
print(f"Total writes:       {metrics['consumed_writes']:,}")
print(f"Successful:         {metrics['consumed_writes']:,} ({metrics['consumed_writes']/(metrics['consumed_writes']+failed_final)*100:.1f}%)")
print(f"Failed:             {failed_final:,}")
print(f"Throttled attempts: {metrics['throttled_writes']:,}")
print(f"Total attempts:     {metrics['total_attempts']:,}")
print(f"Retry overhead:     {(metrics['total_attempts']-metrics['consumed_writes'])/metrics['consumed_writes']*100:.1f}%")
print(f"Average rate:       {metrics['consumed_writes']/elapsed:.1f} writes/sec")
print(f"Peak capacity used: {max(5000, metrics['consumed_writes']/elapsed):.0f} WCU/sec")

# Save metrics
metrics_file = f'prewarmed_15min_metrics_{int(start_time)}.json'
with open(metrics_file, 'w') as f:
    json.dump({
        'test_config': {
            'duration_minutes': DURATION_MINUTES,
            'workers': WORKERS,
            'batch_size': BATCH_SIZE,
            'table': f'{KEYSPACE}.{TABLE}',
            'warm_throughput': '40,000 WCU / 40,000 RCU'
        },
        'summary': {
            'duration_seconds': elapsed,
            'total_writes': metrics['consumed_writes'],
            'consumed_writes': metrics['consumed_writes'],
            'throttled_writes': metrics['throttled_writes'],
            'total_attempts': metrics['total_attempts'],
            'failed_writes': failed_final,
            'average_rate_per_sec': metrics['consumed_writes']/elapsed,
            'retry_overhead_percent': (metrics['total_attempts']-metrics['consumed_writes'])/max(1,metrics['consumed_writes'])*100,
            'throttle_percentage': metrics['throttled_writes']/max(1, metrics['total_attempts'])*100,
            'success_rate': metrics['consumed_writes']/(metrics['consumed_writes']+failed_final)*100
        },
        'timeline': metrics['timeline']
    }, f, indent=2)

print(f"\nMetrics saved to: {metrics_file}")
print(f"Use this data for your blog post about prewarming effectiveness")
print(f"{'='*70}\n")

cluster.shutdown()

The following CloudWatch graphs show that without pre-warming, throttling started when the table reaches the initial 4,000 WRUs or when storage split is required to add more partitions.

Now, create the table with pre-warming and run the same load and observe the behavior to see if throttling occurs:

aws keyspaces create-table \
--keyspace-name iot_demo \
--table-name sensor_readings_fresh \
--schema-definition 'allColumns=[{name=sensor_id,type=text},{name=reading_time,type=timestamp},{name=temperature,type=double},{name=humidity,type=double}],partitionKeys=[{name=sensor_id}],clusteringKeys=[{name=reading_time,orderBy=DESC}]' \
--warm-throughput-specification 'readUnitsPerSecond=40000,writeUnitsPerSecond=40000' \
--region us-east-1

After the table is created, Amazon Keyspaces creates the storage partitions and becomes ready to serve the incoming traffic without throttling. The following CloudWatch graphs show that you can support the incoming surge of traffic without issues.

Pricing

The pricing for pre-warming is based on the cost of provisioned WCUs and RCUs in the specific Region where your table is located. When you pre-warm a table, the cost is calculated as a one-time charge based on the difference between the new values and the current warm throughput that the table or index can support.

By default, on-demand tables have a baseline warm throughput of 4,000 WCUs and 12,000 RCUs. When pre-warming a newly created on-demand table, you are only charged for the difference between your specified values and these baseline values. The IoT example in this post demonstrating pre-warning tables to 40,000 WCUs and 40,000 RCUs. This incurs a one-time charge that applies to the additional 36,000 (40,000 – 4,000) WCUs and 28,000 (40,000 – 12,000) RCUs needed.

The pre-warming cost calculation for us-east-1 are as follows:

  • Read Capacity Units: 28,000 RCUs × $0.00013 = $3.64
  • Write Capacity Units: 36,000 WCUs × $0.00065 = $23.40
  • Total one-time cost: $27.04

By pre-warming tables, you mitigate operational risk and make sure your application can handle the traffic surge without throttling, providing a smooth customer experience during critical sales events.

Conclusion

Pre-warming provides a powerful capability in Amazon Keyspaces to prepare your tables for immediate high-throughput workloads. Whether you are orchestrating a major product release or gearing up for anticipated traffic surges, Use pre-warming to prepare your tables with the necessary capacity from the start, reducing throttling and delivering consistent, sub-millisecond performance your applications demand.

To learn more about pre-warming feature please refer to the Amazon Keyspaces documentation.

About the authors

Rajesh Kantamani

Rajesh Kantamani

Rajesh is a Senior Database Specialist Solutions Architect at AWS. He partners with customers to design, migrate, and optimize their database solutions, focusing on scalability, security, and peak performance. With a passion for distributed databases, he helps organizations transform their data infrastructure. When not architecting database solutions, he enjoys outdoor activities with family and friends.

Vijaykumar

Vijaykumar

Vijaykumar is a Software Development Engineer at AWS on the Amazon Keyspaces team and the lead developer for the prewarming capability featured in this blog. With over 6 years at Amazon and 2+ years focused on Keyspaces, he works closely with customers to understand their needs and build solutions that help organizations leverage Amazon Keyspaces for scalable, reliable, and performant Cassandra workloads in the cloud.

Siva Palli

Siva Palli

Siva is a Senior Product Manager for Amazon Keyspaces. He works with customers and engineering teams to define and deliver features that help organizations build scalable, high-performance applications on Amazon Web Services. With a passion for NoSQL databases and serverless architecture, he helps shape the future of managed database services. When not building products, he enjoys traveling, exploring new places, and playing tennis.