AWS Database Blog
JSON database solutions in AWS: Amazon DocumentDB (with MongoDB compatibility)
JSON has become the standard data exchange protocol in modern applications. Its human-readable format, hierarchical structure, and schema flexibility make it ideal for representing complex, evolving data models. As applications grow more sophisticated, traditional relational databases often struggle with several challenges:
- Rigid schemas that resist frequent changes
- Complex joins for hierarchical data
- Performance bottlenecks when scaling
- Difficulty representing varied data structures
Amazon DocumentDB (with MongoDB compatibility) addresses these challenges by embracing JSON’s inherent flexibility while providing the enterprise-grade reliability and performance organizations demand.
In this post, we explore Amazon DocumentDB, a serverless, fully managed MongoDB API-compatible document database service optimized for JSON document storage. For organizations with variable workloads, Amazon DocumentDB can support millions of reads/writes per second and petabytes of storage capacity, and its serverless option alleviates the need for upfront commitments and simplifies resource management—you only pay for the database capacity you use. The service also allows up to 15 read replicas within a single region for horizontal scaling of read-heavy workloads.
Key advantages of Amazon DocumentDB include:
- Schema flexibility that adapts to evolving business requirements without costly migrations Native JSON support that minimizes object-relational mapping complexity.
- Powerful query capabilities that handle complex document relationships efficiently
- Enterprise-grade reliability with automated backups, point-in-time recovery, and high availability ·
- Seamless scaling through independent serverless compute and storage scaling
JSON document model in Amazon DocumentDB
As a document database, Amazon DocumentDB is designed specifically for JSON documents. Amazon DocumentDB offers a comprehensive JSON document model with the following features:
- Flexible schema without predefined document structure
- Dot notation for accessing nested fields
- Array operators for manipulation
- Comprehensive aggregation pipelines for rich document transformation and analysis workflows
- Geospatial indexing and queries for location-aware applications
- Text search capabilities for content-rich applications
- Vector search for AI-powered semantic similarity queries
Real-world example: Hotel booking system
Let’s explore a practical scenario: building a hotel booking system using Amazon DocumentDB. Hotel listings worldwide have complex and varied attributes that benefit from a flexible document model.
Here’s how a hotel document might be structured in Amazon DocumentDB:
This document structure handles the complex relationships in a hotel booking system:
- Nested location data with geographic coordinates
- Arrays of amenities for quick filtering
- Complex room information with varying attributes
- Availability tracking with date-specific inventory
- Customer reviews embedded directly in the hotel document
This structure would be challenging to model efficiently in some traditional relational databases without complex joins or denormalization.
Amazon DocumentDB provides a rich query language specifically designed for JSON document manipulation. In the following sections, we explore these query capabilities using our hotel booking system example.
Advanced JSON querying in Amazon DocumentDB
Let’s explore the DocumentDB query capabilities using our hotel booking system example.
Querying nested JSON fields
For a hotel search platform, users often need to filter by multiple criteria simultaneously. The following query finds hotels with specific room types and amenities:
This query demonstrates several powerful capabilities:
- Using
$in
to match multiple possible values in a nested field - Using
$all
to ensure all specified amenities are present - Using
$elemMatch
to find matching elements within an array
The query returns the following result:
Complex array filtering
When users are looking for specific room types within their budget, we need to filter elements within arrays:
This query returns hotels that have deluxe rooms priced over $300
per night. The $elemMatch
operator makes sure that both conditions apply to the same room object within the array. The query returns a hotel in Austin, TX
.
This query returns hotels that have deluxe rooms priced over $300
per night. The $elemMatch
operator makes sure that both conditions apply to the same room object within the array. The query returns a hotel in Austin, TX
.
Aggregation pipeline for JSON transformation
For analytics purposes, hotel managers might want to see average room prices across their properties:
This aggregation pipeline:
- Uses
$unwind
to create a separate document for each room in the array - Groups the results by hotel name
- Calculates the average price across all rooms
The following is a sample output:
Although $unwind
is powerful for deconstructing arrays, it can lead to performance issues, especially when dealing with many documents, because it replicates documents, increasing the number of documents processed in subsequent stages. The following code demonstrates a more efficient approach to aggregation:
The aggregation framework is particularly powerful for transforming and analyzing JSON data, with operators for grouping, filtering, projecting, and performing calculations.
Availability queries with complex conditions
For a booking system, checking room availability is a critical operation:
This query finds hotels with at least two standard rooms available on July 15, 2025
. The combination of $elemMatch
and dot notation allows for precise filtering on nested array elements. Note that the query adds a projection to limit the fields returned. The result is:
Full-text search capabilities
Amazon DocumentDB provides text search capabilities through the $text
operator and text indexes:
The following query finds hotels with luxury
or beach
in their description:
We get the following result:
The result can be sorted by relevance:
This query returns hotels with beach
or luxury
in their description, sorted by relevance score. Hotel Example-2
has no room available before July 20, 2025
.
Semantic search capabilities
For more advanced semantic search, Amazon DocumentDB 5.0+ supports vector search with both HNSW
and IVFFlat
indexing methods. For our hotel collection, we can create a vector index as follows.
The preceding command uses the following key parameters:
dimension
: 10 specifies the dimensionality of each vectorsimilarity
: “cosine
” uses cosine similarity for measuring distancetype
: “hnsw
” specifies the underlying engine used for indexingm
is anHNSW
parameter for the number of bidirectional links
After the index is created, you generate embeddings for the description
field using an embedding model available in Amazon Bedrock, such as Cohere.
For example, for our hotels, we have the following embeddings.
For hotel Example-1
, we add the embedding for the description
field:
For hotel “Example-2
“, we add the embedding for the “description
” field:
We perform vector similarity search for the phrase water front hotel with gym and bar, in warm state
with the vector embedding [0.2979, 0.5413, 0.3457, 0.1234, 0.6159, 0.6741, 0.1873, 0.4317, 0.3741, 0.5916].
This query finds hotels with vector embeddings most like the query vector, using cosine similarity. For our use case, the cursor output as follows:
In a real-world implementation, these embeddings would represent semantic features of the hotels, generated from descriptions, amenities, and other attributes using machine learning models.
Geospatial query capabilities
The hotel booking system example would significantly benefit from the advanced geospatial search capabilities of Amazon DocumentDB for multiple compelling reasons. By incorporating different geospatial search capabilities into the hotel booking system, the application can deliver a powerful and intuitive user experience while providing valuable business intelligence to hotel operators. Support for these geospatial query capabilities in Amazon DocumentDB makes it an excellent choice for this use case, avoiding the need for complex application-level implementations or additional specialized search services.
Let’s explore this geospatial query capability.
First, we need to make sure, each location is stored using the GeoJSON
Point format:
To enable spatial queries, create a 2dsphere
index:
This index makes it possible to use operators like $nearSphere
, $geoWithin
, and $geoIntersects
. To find hotels within 5 kilometers of a point, use the following query:
We get the following result:
Amazon DocumentDB supports proximity, inclusion and intersection querying of Geospatial data. You can perform the following searches:
- Distance-based sorting and pricing – This includes the following operators:
$geoNear
– This aggregation stage can calculate and output the distance, enabling subsequent sorting operations on that distance field.sort()
– After distances are calculated (for example, using$geoNear
), you can use the standardsort()
method to order results by distance for features like pricing or ranking by proximity.
- Landmark-based search – This can be implemented using the proximity operators (
$nearSphere
,$geoNear
) by treating the landmark as the referenceGeoJSON
point. - Polygon-based neighborhood search – This includes the following operators:
$geoWithin
– Finds documents with geospatial data entirely within a specified shape, like a polygon.$geoIntersects
– Finds documents whose geospatial data intersects with a specifiedGeoJSON
object, including polygons.
- Radius-based search (proximity search) – This includes the following operators:
$nearSphere
– Finds points nearest to aGeoJSON
point on a sphere$geoNear
– An aggregation operator that calculates distances from aGeoJSON
point, often used for proximity searches within an aggregation pipeline.$minDistance
and$maxDistance
– Used in conjunction with$nearSphere
or$geoNear
to filter results based on minimum and maximum distances from the center point.
Hybrid search
With a hybrid search, you can combine the results of multiple search techniques. For example, you want to search for water front hotel
with gym
and bar
, in hotels, near Miami
:
You get the following result:
If you view the explain plan for this query, you notice that although the 2dsphere
index is used, the HNSW
index is skipped. The reason for this skip is that in hybrid search operations with Amazon DocumentDB, you can’t combine HNSW
vector search with another search. This behavior results in the system performing an Exact Nearest Neighbor (ENN)
search instead of an Approximate Nearest Neighbor (ANN)
search for vector components. See Vector Search Methods Comparison Simulation for a comparison of the two search techniques.
This approach has performance implications. ENN
provides higher precision, but potentially slower performance compared to ANN
. This behavior is generally acceptable when precision is prioritized over speed, when working with smaller datasets, or when the data is heavily pre-filtered by other query conditions in the pipeline.
Alternatively, you can use the flexible JSON schema of Amazon DocumentDB for storing and processing your semi-structured data and use Amazon OpenSearch service for its advanced search capabilities including what it offers for hybrid search.
Amazon DocumentDB use cases
Amazon DocumentDB is particularly valuable for scenarios where schema flexibility and document-oriented access are core requirements:
- E-commerce platforms – Storing product catalogs with highly variable attributes and handling millions of customer interactions
- Content management systems – Managing diverse media types and metadata while supporting complex content relationships
- IoT applications – Ingesting and analyzing device data with varying schemas across different device types
- Healthcare systems – Maintaining patient records with complex, evolving structures while maintaining compliance
- Gaming companies – Storing player profiles and game states that require flexible schemas and low-latency access
Conclusion
Organizations can use Amazon DocumentDB to build modern, document-based applications with the scale and reliability of AWS, turning the complexity of managing MongoDB-compatible databases into a seamless experience. Our hotel booking system example highlights how you can use Amazon DocumentDB to effortlessly handle complex, nested data structures that would require intricate table relationships and joins in traditional SQL databases. This natural fit for hierarchical data simplifies development and accelerates time to market for modern applications.
As we continue our exploration of AWS JSON database solutions, Amazon DocumentDB stands out as a robust choice for applications where document-oriented data access patterns align with business requirements, offering the flexibility developers need and the operational excellence enterprises demand.
Try the JSON capabilities of Amazon DocumentDB discussed in this post and leave your comments.