AWS Database Blog

JSON database solutions in AWS: Amazon DocumentDB (with MongoDB compatibility)

JSON has become the standard data exchange protocol in modern applications. Its human-readable format, hierarchical structure, and schema flexibility make it ideal for representing complex, evolving data models. As applications grow more sophisticated, traditional relational databases often struggle with several challenges:

  • Rigid schemas that resist frequent changes
  • Complex joins for hierarchical data
  • Performance bottlenecks when scaling
  • Difficulty representing varied data structures

Amazon DocumentDB (with MongoDB compatibility) addresses these challenges by embracing JSON’s inherent flexibility while providing the enterprise-grade reliability and performance organizations demand.

In this post, we explore Amazon DocumentDB, a serverless, fully managed MongoDB API-compatible document database service optimized for JSON document storage. For organizations with variable workloads, Amazon DocumentDB can support millions of reads/writes per second and petabytes of storage capacity, and its serverless option alleviates the need for upfront commitments and simplifies resource management—you only pay for the database capacity you use. The service also allows up to 15 read replicas within a single region for horizontal scaling of read-heavy workloads.

Key advantages of Amazon DocumentDB include:

  • Schema flexibility that adapts to evolving business requirements without costly migrations Native JSON support that minimizes object-relational mapping complexity.
  • Powerful query capabilities that handle complex document relationships efficiently
  • Enterprise-grade reliability with automated backups, point-in-time recovery, and high availability ·
  • Seamless scaling through independent serverless compute and storage scaling

JSON document model in Amazon DocumentDB

As a document database, Amazon DocumentDB is designed specifically for JSON documents. Amazon DocumentDB offers a comprehensive JSON document model with the following features:

  • Flexible schema without predefined document structure
  • Dot notation for accessing nested fields
  • Array operators for manipulation
  • Comprehensive aggregation pipelines for rich document transformation and analysis workflows
  • Geospatial indexing and queries for location-aware applications
  • Text search capabilities for content-rich applications
  • Vector search for AI-powered semantic similarity queries

Real-world example: Hotel booking system

Let’s explore a practical scenario: building a hotel booking system using Amazon DocumentDB. Hotel listings worldwide have complex and varied attributes that benefit from a flexible document model.

Here’s how a hotel document might be structured in Amazon DocumentDB:

db.hotels.insertMany([
  {
    "_id": "5f8a7b2e9d3b2c1a3e5d7f9b",
    "name": "Example-1",
    "location": {
      "address": "123 Beachfront Avenue",
      "city": "Miami",
      "state": "FL",
      "type": "Point",
      "coordinates": [-80.1918, 25.7617]
    },
    "amenities": ["pool", "spa", "gym", "restaurant", "beachAccess"],
    "rooms": [
      {
        "type": "standard",
        "bedType": "queen",
        "pricePerNight": 199.99,
        "maxOccupancy": 2
      },
      {
        "type": "deluxe",
        "bedType": "king",
        "pricePerNight": 299.99,
        "maxOccupancy": 2,
        "amenities": ["oceanView", "minibar"]
      }
    ],
    "availability": [
      {
        "date": "2025-07-14",
        "roomsAvailable": {
          "standard": 5,
          "suite": 2
        }
      },
      {
        "date": "2025-07-15",
        "roomsAvailable": {
          "standard": 3,
          "suite": 1
        }
      },
      {
        "date": "2025-07-16",
        "roomsAvailable": {
          "standard": 0,
          "suite": 4
        }
      }
    ],
    "reviews": [
      {
        "userId": "123456",
        "rating": 4.5,
        "comment": "Beautiful views and excellent service!"
      }
    ],
    "description": "Beautiful ocean front resort with stunning ocean views, pool, spa, gym, minibar amenities, inbeatiful Miami, Florida"
  },
  {
    "_id": "5f8a7b2e9d3b2c1a3e5d7f99",
    "name": "Example-2",
    "location": {
      "address": "123 downtown Avenue",
      "city": "Austin",
      "state": "TX",
      "type": "Point",
      "coordinates": [30.2672, 97.7431]
    },
    "amenities": ["gym", "restaurant", "roof"],
    "rooms": [
      {
        "type": "deluxe",
        "bedType": "queen",
        "pricePerNight": 400.00,
        "maxOccupancy": 2,
        "amenities": ["downtownView", "minibar"]
      }
    ],
    "availability": [
      {
        "date": "2025-07-20",
        "roomsAvailable": {
          "standard": 5,
          "suite": 2
        }
      },
      {
        "date": "2025-07-25",
        "roomsAvailable": {
          "standard": 3,
          "suite": 1
        }
      },
      {
        "date": "2025-08-16",
        "roomsAvailable": {
          "standard": 0,
          "suite": 4
        }
      }
    ],
    "description": "Beautiful Austin, Texas downtown hotel with stunning lake views and luxury, restaurant, gym and minibar amenities"
  }
])

This document structure handles the complex relationships in a hotel booking system:

  • Nested location data with geographic coordinates
  • Arrays of amenities for quick filtering
  • Complex room information with varying attributes
  • Availability tracking with date-specific inventory
  • Customer reviews embedded directly in the hotel document

This structure would be challenging to model efficiently in some traditional relational databases without complex joins or denormalization.

Amazon DocumentDB provides a rich query language specifically designed for JSON document manipulation. In the following sections, we explore these query capabilities using our hotel booking system example.

Advanced JSON querying in Amazon DocumentDB

Let’s explore the DocumentDB query capabilities using our hotel booking system example.

Querying nested JSON fields

For a hotel search platform, users often need to filter by multiple criteria simultaneously. The following query finds hotels with specific room types and amenities:

db.hotels.find({
    'rooms.bedType': { $in: ["queen", "king"] }, 
    "amenities": { $all: ["pool", "spa"] },     
    "availability": {
        $elemMatch: {                       
            "roomsAvailable.standard": { $gt: 0 } 
        }
    }
});

This query demonstrates several powerful capabilities:

  • Using $in to match multiple possible values in a nested field
  • Using $all to ensure all specified amenities are present
  • Using $elemMatch to find matching elements within an array

The query returns the following result:

[
  {
    _id: '5f8a7b2e9d3b2c1a3e5d7f9b',
    name: 'Example-1',
    location: { type: 'Point', coordinates: [ -80.1918, 25.7617 ] },
    amenities: [ 'pool', 'spa', 'gym', 'restaurant', 'beachAccess' ],
    rooms: [
      {
        type: 'standard',
        bedType: 'queen',
        pricePerNight: 199.99,
        maxOccupancy: 2
      },
      {
        type: 'deluxe',
        bedType: 'king',
        pricePerNight: 299.99,
        maxOccupancy: 2,
        amenities: [ 'oceanView', 'minibar' ]
      }
    ],
    availability: [
      { date: '2025-07-14', roomsAvailable: { standard: 5, suite: 2 } },
      { date: '2025-07-15', roomsAvailable: { standard: 3, suite: 1 } },
      { date: '2025-07-16', roomsAvailable: { standard: 0, suite: 4 } }
    ],
    reviews: [
      {
        userId: '123456',
        rating: 4.5,
        comment: 'Beautiful views and excellent service!'
      }
    ],
    description: 'Beautiful ocean front resort with stunning ocean views, pool, spa, gym, minibar amenities, inbeatiful Miami, Florida',
    vectorEmbedding: [
      0.1791, 0.0874,
      0.1149, 0.0649,
      0.1498, 0.1067,
      0.1319,  0.156,
      0.0477, 0.1399
    ]
  }
]

Complex array filtering

When users are looking for specific room types within their budget, we need to filter elements within arrays:

This query returns hotels that have deluxe rooms priced over $300 per night. The $elemMatch operator makes sure that both conditions apply to the same room object within the array. The query returns a hotel in Austin, TX.

// Find hotels with deluxe rooms over $300
db.hotels.find({
 "rooms": {
   $elemMatch: {
     "type": "deluxe",
     "pricePerNight": {$gt: 300}
  }
}
})

This query returns hotels that have deluxe rooms priced over $300 per night. The $elemMatch operator makes sure that both conditions apply to the same room object within the array. The query returns a hotel in Austin, TX.

Aggregation pipeline for JSON transformation

For analytics purposes, hotel managers might want to see average room prices across their properties:

// Calculate average room price per hotel
db.hotels.aggregate([
  { $unwind: "$rooms"}, 
  {
    $group: {
      _id: "$_id", 
      name: { $first: "$name" }, 
      averageRoomPrice: { $avg: "$rooms.pricePerNight" } 
    }
  }
]);

This aggregation pipeline:

  • Uses $unwind to create a separate document for each room in the array
  • Groups the results by hotel name
  • Calculates the average price across all rooms

The following is a sample output:

[
  {
    _id: '5f8a7b2e9d3b2c1a3e5d7f9b',
    name: 'Example-1',
    averageRoomPrice: 249.99
  },
  {
    _id: '5f8a7b2e9d3b2c1a3e5d7f99',
    name: 'Example-2',
    averageRoomPrice: 400
  }
]

Although $unwind is powerful for deconstructing arrays, it can lead to performance issues, especially when dealing with many documents, because it replicates documents, increasing the number of documents processed in subsequent stages. The following code demonstrates a more efficient approach to aggregation:

db.hotels.aggregate([
  {
    $project: {
      _id: 1,
      name: 1,
      averageRoomPrice: {
        $avg: "$rooms.pricePerNight" 
      }
    }
  }
])

The aggregation framework is particularly powerful for transforming and analyzing JSON data, with operators for grouping, filtering, projecting, and performing calculations.

Availability queries with complex conditions

For a booking system, checking room availability is a critical operation:

// Find hotels with at least 2 standard rooms available on a specific date
db.hotels.find(
  {
    "availability": {
      $elemMatch: {
        "date": "2025-07-15", 
        "roomsAvailable.standard": {$gte: 2}
      }
    }
  },
  {
    name: 1,
    "location.city": 1,
    "location.state": 1,
    "rooms.type": 1,
    "rooms.pricePerNight": 1,
    "availability": {
      $elemMatch: {
        "date": "2025-07-15"
      }
    }
  }
)

This query finds hotels with at least two standard rooms available on July 15, 2025. The combination of $elemMatch and dot notation allows for precise filtering on nested array elements. Note that the query adds a projection to limit the fields returned. The result is:

[
  {
    _id: '5f8a7b2e9d3b2c1a3e5d7f9b',
    name: 'Example-1',
    location: { city: 'Miami', state: 'FL' },
    rooms: [
      { type: 'standard', pricePerNight: 199.99 },
      { type: 'deluxe', pricePerNight: 299.99 }
    ],
    availability: [
      { date: '2025-07-15', roomsAvailable: { standard: 3, suite: 1 } }
    ]
  }
]

Full-text search capabilities

Amazon DocumentDB provides text search capabilities through the $text operator and text indexes:

// Create a text index on a single field
db.hotels.createIndex({description: "text"})

The following query finds hotels with luxury or beach in their description:

// Search for hotels with specific terms in description
db.hotels.find(
  {$text: { $search: "luxury beach"}},
  {name: 1}
);

We get the following result:

[
  {id: '5f8a7b2e9d3b2c1a3e5d7f9b', name: 'Example-1'},
  {id: '5f8a7b2e9d3b2c1a3e5d7f99', name: 'Example-2'}
]

The result can be sorted by relevance:

// Text search with relevance scoring
db.hotels.aggregate([
  {
    $match: {
      $text: {
        $search: "luxury beach"
      }
    }
  },
  {
    $sort: {
      score: {
        $meta: "textScore"
      }
    }
  },
  {
    $project: {
      _id: 0,
      score: {
        $meta: "textScore"
      },
      name: 1,
      availability: {
        $filter: {
          input: "$availability",
          as: "item",
          cond: { $eq: ["$$item.date", "2025-07-15"] }
        }
      }
    }
  }
]);

This query returns hotels with beach or luxury in their description, sorted by relevance score. Hotel Example-2 has no room available before July 20, 2025.

[
  {
    name: 'Example-2',
    score: 0.6079270860936958,
    availability: []
  },
  {
    name: 'Example-1',
    score: 0.3039635430468479,
    availability: [
      { date: '2025-07-15', roomsAvailable: { standard: 3, suite: 1 } }
    ]
  }
]

Semantic search capabilities

For more advanced semantic search, Amazon DocumentDB 5.0+ supports vector search with both HNSW and IVFFlat indexing methods. For our hotel collection, we can create a vector index as follows.

db.runCommand({
  createIndexes: "hotels",
  indexes: [
    {
      key: { vectorEmbedding: "vector"},
      name: "hotel_embedding_index",
      vectorOptions: {
        type: "hnsw",
        dimensions: 10, 
        similarity: "cosine",
        m: 16,
        efConstruction: 64,
      },
    },
  ],
});

The preceding command uses the following key parameters:

  • dimension: 10 specifies the dimensionality of each vector
  • similarity: “cosine” uses cosine similarity for measuring distance
  • type: “hnsw” specifies the underlying engine used for indexing
  • m is an HNSW parameter for the number of bidirectional links

After the index is created, you generate embeddings for the description field using an embedding model available in Amazon Bedrock, such as Cohere.

For example, for our hotels, we have the following embeddings.

For hotel Example-1, we add the embedding for the description field:

db.hotels.updateOne(
  { "name": "Example-1" }, 
  { $set: {  "vectorEmbedding":[0.1791, 0.0874, 0.1149, 0.0649, 0.1498, 0.1067, 0.1319, 0.1560, 0.0477, 0.1399]  }
  }
);

For hotel “Example-2“, we add the embedding for the “description” field:

db.hotels.updateOne(
  { "name": " Example-2" }, 
  { $set: {  "vectorEmbedding":[0.2979, 0.5413, 0.3457, 0.1234, 0.6159, 0.6741, 0.1873, 0.4317, 0.3741, 0.5916]}
  }
);

We perform vector similarity search for the phrase water front hotel with gym and bar, in warm state with the vector embedding [0.2979, 0.5413, 0.3457, 0.1234, 0.6159, 0.6741, 0.1873, 0.4317, 0.3741, 0.5916].

db.runCommand({
  "aggregate": "hotels",
  "pipeline": [
    {
      $search: {
        "vectorSearch": {
          "vector": [0.2979, 0.5413, 0.3457, 0.1234, 0.6159, 0.6741, 0.1873, 0.4317, 0.3741, 0.5916], 
          "path": "vectorEmbedding", 
          "similarity": "cosine",
          "k": 2,
          "efSearch": 40
        }
      }
    },
    {
      $project: {
        "name": 1,
        "description": 1 
      }
    }
  ],
  "cursor": {}
});

This query finds hotels with vector embeddings most like the query vector, using cosine similarity. For our use case, the cursor output as follows:

firstBatch: [
      {
        _id: '5f8a7b2e9d3b2c1a3e5d7f99',
        name: 'Example-2',
        description: 'Beautiful Austin, Texas downtown hotel with stunning lake views and luxury, restaurant, gym and minibar amenities'
      },
      {
        _id: '5f8a7b2e9d3b2c1a3e5d7f9b',
        name: 'Example-1',
        description: 'Beautiful ocean front resort with stunning ocean views, pool, spa, gym, minibar amenities, inbeatiful Miami, Florida'
      }
    ]

In a real-world implementation, these embeddings would represent semantic features of the hotels, generated from descriptions, amenities, and other attributes using machine learning models.

Geospatial query capabilities

The hotel booking system example would significantly benefit from the advanced geospatial search capabilities of Amazon DocumentDB for multiple compelling reasons. By incorporating different geospatial search capabilities into the hotel booking system, the application can deliver a powerful and intuitive user experience while providing valuable business intelligence to hotel operators. Support for these geospatial query capabilities in Amazon DocumentDB makes it an excellent choice for this use case, avoiding the need for complex application-level implementations or additional specialized search services.

Let’s explore this geospatial query capability.

First, we need to make sure, each location is stored using the GeoJSON Point format:

db.hotels.updateOne(
  { "name": "Example-1" }, 
  { $set: { 
      "location": {
        "type": "Point",
        "coordinates": [80.1918, 25.7617] 
      }
    } 
  }
);

db.hotels.updateOne(
  { "name": "Example-2"}, 
  { $set: { 
      "location": {
        "type": "Point",
        "coordinates": [-97.7431, 30.2672] 
      }
    } 
  }
);

To enable spatial queries, create a 2dsphere index:

db.hotels.createIndex({ location: "2dsphere"})

This index makes it possible to use operators like $nearSphere$geoWithin, and $geoIntersects. To find hotels within 5 kilometers of a point, use the following query:

db.hotels.aggregate([
  {
    "$geoNear": {
      "near": {
        "type": "Point",
        "coordinates": [-80.1918, 25.7617]
      },
      "maxDistance": 5000,
      "distanceField": "distance", 
      "spherical": true
    }
  },
  {
    "$project": { 
      "name": 1
    }
  }
]);

We get the following result:

[ { _id: '5f8a7b2e9d3b2c1a3e5d7f9b', name: 'Example-1' } ]

Amazon DocumentDB supports proximity, inclusion and intersection querying of Geospatial data. You can perform the following searches:

  • Distance-based sorting and pricing – This includes the following operators:
    • $geoNear – This aggregation stage can calculate and output the distance, enabling subsequent sorting operations on that distance field.
    • sort() – After distances are calculated (for example, using $geoNear), you can use the standard sort() method to order results by distance for features like pricing or ranking by proximity.
  • Landmark-based search – This can be implemented using the proximity operators ($nearSphere, $geoNear) by treating the landmark as the reference GeoJSON point.
  • Polygon-based neighborhood search – This includes the following operators:
    • $geoWithin – Finds documents with geospatial data entirely within a specified shape, like a polygon.
    • $geoIntersects – Finds documents whose geospatial data intersects with a specified GeoJSON object, including polygons.
  •  Radius-based search (proximity search) – This includes the following operators:
    • $nearSphere – Finds points nearest to a GeoJSON point on a sphere
    • $geoNear – An aggregation operator that calculates distances from a GeoJSON point, often used for proximity searches within an aggregation pipeline.
    • $minDistance and $maxDistance – Used in conjunction with $nearSphere or $geoNear to filter results based on minimum and maximum distances from the center point.

Hybrid search

With a hybrid search, you can combine the results of multiple search techniques. For example, you want to search for water front hotel with gym and bar, in hotels, near Miami:

db.runCommand({
  "aggregate": "hotels",
  "pipeline": [
  {
    "$geoNear": {
      "near": {
        "type": "Point",
        "coordinates": [-80.1918, 25.7617]
      },
      "maxDistance": 5000,
      "distanceField": "distance", 
      "spherical": true
    }
  },
    {
      $search: {
        "vectorSearch": {
          "vector": [0.2979, 0.5413, 0.3457, 0.1234, 0.6159, 0.6741, 0.1873, 0.4317, 0.3741, 0.5916], 
          "path": "vectorEmbedding", 
          "similarity": "cosine",
          "k": 2,
          "efSearch": 40
        }
      }
    },
    {
      $project: {
        "name": 1,
        "description": 1 
      }
    }
  ],
  "cursor": {}
});

You get the following result:

firstBatch: [
      {
        _id: '5f8a7b2e9d3b2c1a3e5d7f9b',
        name: 'Example-1',
        description: 'Beautiful ocean front resort with stunning ocean views, pool, spa, gym, minibar amenities, inbeatiful Miami, Florida'
      }
    ]

If you view the explain plan for this query, you notice that although the 2dsphere index is used, the HNSW index is skipped. The reason for this skip is that in hybrid search operations with Amazon DocumentDB, you can’t combine HNSW vector search with another search. This behavior results in the system performing an Exact Nearest Neighbor (ENN) search instead of an Approximate Nearest Neighbor (ANN) search for vector components. See Vector Search Methods Comparison Simulation for a comparison of the two search techniques.

This approach has performance implications. ENN provides higher precision, but potentially slower performance compared to ANN. This behavior is generally acceptable when precision is prioritized over speed, when working with smaller datasets, or when the data is heavily pre-filtered by other query conditions in the pipeline.

Alternatively, you can use the flexible JSON schema of Amazon DocumentDB for storing and processing your semi-structured data and use Amazon OpenSearch service for its advanced search capabilities including what it offers for hybrid search.

Amazon DocumentDB use cases

Amazon DocumentDB is particularly valuable for scenarios where schema flexibility and document-oriented access are core requirements:

  • E-commerce platforms – Storing product catalogs with highly variable attributes and handling millions of customer interactions
  • Content management systems – Managing diverse media types and metadata while supporting complex content relationships
  • IoT applications – Ingesting and analyzing device data with varying schemas across different device types
  • Healthcare systems – Maintaining patient records with complex, evolving structures while maintaining compliance
  • Gaming companies – Storing player profiles and game states that require flexible schemas and low-latency access

Conclusion

Organizations can use Amazon DocumentDB to build modern, document-based applications with the scale and reliability of AWS, turning the complexity of managing MongoDB-compatible databases into a seamless experience. Our hotel booking system example highlights how you can use Amazon DocumentDB to effortlessly handle complex, nested data structures that would require intricate table relationships and joins in traditional SQL databases. This natural fit for hierarchical data simplifies development and accelerates time to market for modern applications.

As we continue our exploration of AWS JSON database solutions, Amazon DocumentDB stands out as a robust choice for applications where document-oriented data access patterns align with business requirements, offering the flexibility developers need and the operational excellence enterprises demand.

Try the JSON capabilities of Amazon DocumentDB discussed in this post and leave your comments.


About the authors

Ezat Karimi

Ezat Karimi

Ezat is a Senior Solutions Architect at AWS, based in Austin, TX. Ezat specializes in designing and delivering modernization solutions and strategies for database applications. Working closely with multiple AWS teams, Ezat helps customers migrate their database workloads to the AWS Cloud.