AWS Database Blog
SQL to NoSQL: Modeling data in Amazon DynamoDB
Building upon our analysis from Part 1 of our series, where we examined existing database structures and access patterns, this post focuses on designing effective Amazon DynamoDB data models. DynamoDB offers distinct approaches to data modeling that align with your application’s specific requirements and usage patterns.
A well-designed data model supports optimal performance and cost-efficiency in DynamoDB. Our social media platform example shows how different modeling approaches influence both application performance and operational costs.
In this post, we explore strategies for designing DynamoDB data models, including entity identification, table design decisions, and relationship modeling approaches. We examine practical scenarios comparing different modeling strategies, helping you make informed decisions for your specific use case.
Identify entities
Define the DynamoDB items, their attributes, and corresponding data types. Although these should generally align with your existing database schema, adapt them to fit DynamoDB modeling patterns. Focus on optimizing for your application’s access patterns rather than exactly mirroring your current schema structure. Consider these aspects:
- Core entities – Start by listing the primary business entities based on how your application accesses and manages data. For example, in a social media application, these would include posts, users, and comments—entities that represent the core data elements the application frequently interacts with.
- Supporting entities – Identify additional entities needed to support your application’s functionality:
- Entities for tracking metrics and counts.
- Entities for managing application state.
- Entities for supporting specific access requirements.
- Attributes – For each entity, list the required attributes based on application needs rather than existing table structures. Understanding attribute characteristics, data types, and their usage in your application helps plan their representation in DynamoDB. For example, datetime fields need mapping to either ISO strings or epoch numbers based on your query and sort requirements.
Decide on table design
After identifying DynamoDB items, we evaluated whether to use single-table or multi-table design. Multi-table design, being simpler to implement, is often the recommended starting point for most of the applications. However, for our social media application, we chose single-table design per microservice based on our requirements analysis:
- Application characteristics – The data is highly relational, with frequent needs to retrieve related items together. For instance, displaying posts requires associated comments and user details in the same operation. The application primarily handles transactional data, without significant volumes of audit, historical, or analytical data. These characteristics naturally aligned with a single-table approach.
- Performance and cost analysis – Single-table design provides consistent performance for our complex queries involving related items, because all data resides in one table. For capacity management, we started with On-Demand mode, which is recommended for most workloads due to its automatic scaling and simplified operations. After analyzing our usage patterns and establishing predictable workloads, we evaluated Provisioned mode for cost optimization. The consolidated capacity planning and efficient RCU/WCU allocation across related data access in single-table design helped maximize the benefits of Provisioned mode. From a storage perspective, our data characteristics didn’t present any notable inefficiencies that would justify separate tables.
- Operational benefits – Managing a single table per service has simplified our monitoring, capacity planning, and data model evolution processes, reducing operational overhead.
The impact of table design choice varies based on your specific access patterns and data characteristics. We recommend testing both approaches with representative workloads to accurately assess performance and cost implications for your use case. For more details, refer to Single-table vs. multi-table design in Amazon DynamoDB.
Define partition and sort keys
Identify the primary access patterns for the data, using the insights gained from the SQL query analysis. This can help determine the appropriate partition keys for the DynamoDB tables. Use the information from the ORDER BY clauses and TOP clauses in the SQL queries to guide the selection of sort keys, including the use of composite sort keys for multi-level sorting requirements.
Model entity relationships
Although the DynamoDB flexible schema allows for a different approach compared to traditional relational databases, it’s worth investigating a few of the following DynamoDB strategies and how to evaluate an optimal strategy.
Single item
The single-item approach in DynamoDB can be efficient when an entity and its related data fit within a single item (up to 400 KB). This model enables fast reads by storing all related data together. However, this approach impacts write operations. Any change requires updating the entire item, increasing write costs (WCUs). The more frequent the writes, the higher the cost and the complexity in maintaining data integrity. This approach is particularly well-suited for applications with high read volumes where parent and child data is typically accessed together and benefits of lower read costs (RCUs) outweigh the increased write costs. It’s crucial to evaluate the read-to-write ratio for child data to make sure the advantages surpass the drawbacks.
Filtering on child item collections in a single item can only be done through filter expressions or client-side filtering, which doesn’t reduce RCUs consumed because DynamoDB reads the entire item before applying filters.
By understanding these trade-offs and carefully assessing your access patterns, you can determine if this data modeling pattern suits your use case, balancing fast reads against increased write costs and filtering complexities.
Vertical partitioning
Vertical partitioning in DynamoDB is useful for highly targeted queries on specific attributes. This pattern stores related data in adjacent items with the same partition key but different sort keys.
Some of the benefits include:
- Flexibility in querying – Efficiently retrieve child items alone or with the parent item
- Granular control on writes – Update individual child items without rewriting the parent item
However, with this approach, cross-entity filtering becomes more complex. To filter on both parent and child attributes, some parent attributes must be de-normalized into child items. This can increase write costs if parent attributes change frequently. For example, imagine we want to find all video posts from active users. This involves two types of filtering: finding users who are active (parent entity) and finding which of their posts are videos (child entity). One solution is to de-normalize by adding the user’s status directly to each post record. Although this simplifies querying, it comes with a downside: whenever a user’s status changes, we need to update this status across all their posts. This increases our write operations and, consequently, the operational costs. This trade-off between query simplicity and write efficiency is a common consideration in DynamoDB design patterns.
The key is to carefully analyze access patterns and balance query flexibility, write management, and the cost of data de-normalization across entities.
Imagine a social media platform where users can create multiple posts. The following diagram illustrates the relationship between users and their posts. In this scenario, we explore how to determine the optimal data modeling approach for DynamoDB by carefully analyzing access patterns, usage statistics, and item sizes. This comprehensive evaluation can guide us in selecting a strategy that balances performance, cost-efficiency, and scalability, tailored specifically to our social media application’s needs.
Item size considerations
When designing your DynamoDB data model, estimate the size of your items. DynamoDB has a 400 KB limit per item, so understanding both average and maximum data sizes can be helpful.
For the users and posts model in our social media application, consider the following:
- Estimate the average size of a user’s profile data
- Calculate the average number of posts per user and their typical size
These estimates will guide your decision-making. The RCUs and WCUs are proportional to the item size, so it’s important to evaluate the average item size to design the optimal data modeling strategy.
Access patterns
Understanding your application’s access patterns helps in designing an efficient DynamoDB model. Consider the following questions:
- Do you need to retrieve users based on filters applied to their posts?
- Do some queries filter both user profile data and posts simultaneously?
- Is quick access to a user’s most recent N posts required?
- Do you need to retrieve the N most commented posts?
These access requirements inform decisions on partition keys, sort keys, and secondary indexes for optimal performance. The filter criteria also help determine the feasibility of a single-item approach. For instance, if users need to be identified based on post filters, a single-item approach might not be efficient, and you might have to explore alternative data modeling approaches.
Usage metrics
Analyze your data’s read/write usage patterns to determine the optimal strategy. Consider these questions to understand the read-to-write ratio:
- How often are posts viewed together with user profile data?
- What’s the frequency of post updates compared to user profile changes?
- Which attributes in user profiles and posts change most often?
- How frequently are post counters (such as likes and shares) read?
- How does the update frequency of post counters compare to post content updates?
Select the data modeling strategy
Having collected information on item sizes, access patterns, and usage metrics, the next section examines how to use these data points to evaluate and select the most appropriate data modeling strategy for the social media application in DynamoDB.
Scenario 1: 1:N relationship
The following table shows a 1:N relationship with user information (20 KB each) and post content (5 KB each) stored as a single item compared to a vertical partition.
Single Item | Vertical Partition | |
Item size for 30 posts | User and posts: 20 KB + 30*5 KB = 170 KB | User item = 20 KB First post item = 5 KB Second post item = 5 KB . . 30th post item = 5 KB |
Number of DynamoDB API calls to get top 10 posts of a user with user information: 1,000 reads/hour | 1,000 | 2,000* |
RCU (eventual consistent) to read top 10 posts of a user along with user information: 1,000 reads/hour | 170 KB/4 KB = 42.5 * 0.5 RCU = 21.25 ~ 21.5 RCU 1000 *21.5 RCU = 21,500 RCU |
User information: 20 KB ~ 2.5 RCU 10 posts: 50 KB/4 KB = 12.5 * 0.5 RCU = 6.25 RCU ~ 6.5 RCU 1000 * 6.5 RCU + 1000 *2.5 RCU = 9000 RCU |
WCU to update the user email address: 10 writes/hour | 170 WCU (updating 170 KB data) 10*170 WCU = 1,700 WCU |
20 WCU (updating 20 KB user item only) 10*20 WCU = 200 WCU |
* The total API calls needed will depend on how the application is structured, specifically the data access framework, DynamoDB table design, and query patterns. For this specific use case, the preceding factors that we considered require two separate API calls per read to fetch the required data.
Scenario 2: 1:1 relationship
The following table shows a 1:1 relationship of post (4 KB each) and post counter (0.5 KB) stored as a single item compared to vertical partition.
Single Item | Vertical Partition | |
Item size per post | Post + post counter: 1 KB +5 KB = 6 KB | Post item = 5 KB Post counter Item = 1 KB |
Number of DynamoDB API calls to read a post with post counter: 1,000 reads/hour | 1,000 | 2,000* |
Read post object with post counter: 1,000 reads/hour | 6KB ~ 1 RCU 1000 *1 RCU = 1,000 RCU |
Post: 5KB ~ 1 RCU Post counter: 1 KB ~ 0.5 RCU 1000 *0.5 +1000 *1 = 1,500 RCU |
Update post counter: 10 updates/hour | 6 KB ~ 6 WCU 10 * 5 = 60 WCU |
1 KB ~ 1 WCU 10*1 WCU = 10 WCU |
Update post counter: 1,000 updates/hour | 6 KB ~ 6 WCU 1,000 * 5 = 6,000 WCU |
1 KB ~ 1 WCU 1,000*1 WCU = 1,000 WCU |
* The total API calls needed will depend on how the application is structured, specifically the data access framework, DynamoDB table design, and query patterns. For this specific use case, the preceding factors that we considered require two separate API calls per read to fetch required data.
The analysis compares single-item vs. vertical partition designs in DynamoDB for both 1:N relationships (user-posts) and 1:1 relationship (post-counter). In 1:N scenarios, vertical partitioning requires more API calls but significantly reduces RCU/WCU costs (21,500 vs 9,000 RCU for reads, 1,700 vs 200 WCU for writes). Similarly, for 1:1 relationship, while API calls double with vertical partitioning, it offers substantial WCU savings especially for frequent updates (6,000 vs 1,000 WCU). However, it’s important to understand that these findings are highly specific to the discussed use cases. We recommend careful consideration before generalizing the data captured here to different scenarios.
Results
For the specific use cases evaluated, we determined the following:
- 1:1 relationship (posts and post counter):
- Single-item design showed fewer API calls and RCUs for reads but higher WCU consumption for updates
- Vertical partition design needed more API calls but was more efficient in WCU usage, particularly for frequent updates
- 1:N relationship (user and posts):
- Single-item design resulted in fewer API calls but significantly higher RCU consumption due to larger item sizes and substantially higher WCU costs for updates
- Vertical partition design doubled the number of API calls but dramatically reduced both RCU and WCU consumption, proving more cost-effective at scale
Conclusion
The key takeaway isn’t just these specific results but rather the analytical process used to arrive at these conclusions. For any given use case, a similar thorough analysis and thought process is essential to define an optimized data model strategy balancing performance need with cost considerations. Factors such as update frequency, read patterns, data scaling requirements, and the specific nature of relationships between entities must be carefully evaluated. In Part 3, we will explore how to adapt your application’s data access layer to work effectively with these data models, enabling your application to take advantage of DynamoDB capabilities.