PostgreSQL as a JSON database: Advanced patterns and best practices

Modern applications need adaptable data models. For example, e-commerce product catalogs have diverse attributes and constantly changing requirements. How do you design a database that handles this without painful migrations?

JSON adapts to your data’s shape, alleviating the need for rigid schemas. As your application grows, your data model seamlessly evolves alongside it.

PostgreSQL provides JSON capabilities with ACID compliance and advanced indexing, making it a compelling alternative to dedicated NoSQL solutions for JSON workloads. When deployed on AWS through fully managed services like Amazon Relational Database (Amazon RDS) for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, it provides enterprise-grade capabilities, seamless integration with other AWS services, enhancing its functionality.

JSON capabilities of PostgreSQL

PostgreSQL provides robust JSON support through two primary data types:

JSON – The JSON data type stores an exact copy of input text with parsing on each operation, lacks indexing and requires reparsing each time accessed.
JSONB – The JSONB data type stores data in an optimized binary format for more efficient processing but is marginally slower with insertion operations. JSONB provides extensive functionality.

The following table compares the two data types.

Feature	JSON	JSONB
Storage Format	Text-based, preserves whitespace and key order	Binary format, optimized for processing; it is larger on disk than JSON
Insertion Speed	Faster (no conversion overhead)	Slightly slower (requires binary conversion)
Query Performance	Slower (requires parsing each time)	Significantly faster for most operations
Indexing Support	You can add functional indexes	Supports GIN indexing for efficient searches
Duplicate Keys	Preserves duplicate keys	Removes duplicate keys (keeps last value)
Memory Usage	Generally higher for complex operations	More efficient memory utilization
Nested Queries	Less efficient for deep nesting	Optimized for nested structure queries
Use Case	When input preservation is critical	Most production applications needing performance

Real-world use cases

PostgreSQL is valuable for systems requiring a transactional, object-relational database, offering extended and robust data types, rich SQL interface, and advanced search capabilities. The following are example use cases:

Web and mobile applications – PostgreSQL is a popular choice for the backend of dynamic websites and mobile applications that handle high traffic and require strong transactional integrity due to its scalability and reliability.
Content management systems (CMS) – Many CMS platforms use PostgreSQL for its ability to handle complex content structures and large databases.
Scientific research and data analysis – PostgreSQL is suitable for scientific research and data analysis due to its support for complex data types and advanced analytics.
Geospatial applications (GIS) – With the PostGIS extension, it is a powerful geospatial database for applications like mapping and location services.
Applications requiring advanced search features – PostgreSQL excels for applications demanding sophisticated search capabilities, including full-text search, geospatial search, and semantic search. Examples are job search engines, content recommendation, and product discovery systems.

To explore PostgreSQL’s JSON capabilities for a CMS application, this discussion will exclude its vector search and geospatial search functionalities. This is because dedicated relational databases or optimized database services, such as Amazon OpenSearch Service, provide a more efficient platform for these specialized search techniques.

Solution overview

Let’s build a digital news magazine CMS application. This is a perfect use case for PostgreSQL’s JSON capabilities because news articles have both structured components (titles, publication dates) and unstructured, variable components (different types of metadata depending on the article type).

Prerequisites

To try our examples, you need an AWS account, PostgreSQL 17+, and knowledge of SQL query language. See this link on how to create an Aurora database cluster.

Configure data

We start by setting up our database structure. The following table definition demonstrates how PostgreSQL can store both structured and semi-structured data:

CREATE TABLE json_articles (
    article_id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    edition SMALLINT NOT NULL,
    content TEXT NOT NULL,
    metadata JSONB           
);
-- Create a GIN index for efficient search on JSON fields
CREATE INDEX idx_news_articles_metadata_fields ON json_articles USING GIN (
    (metadata->'tags') jsonb_path_ops,
    (metadata->'categories') jsonb_path_ops,
    (metadata->'author') jsonb_path_ops
);

PostgreSQL can combine the results from multiple independent index scans (including GIN indexes) by creating bitmaps of potential row locations.

In this example, the metadata field is of type JSONB. This is helpful for performance in a CMS where editors frequently search and filter articles by various metadata attributes. The GIN index we created speeds up queries that filter JSON data by tags, which is a common operation in a news platform.

Next, we insert some sample articles that might appear in our digital magazine:

INSERT INTO json_articles VALUES 
(1,
 'Climate Change Summit Results',
 1,
 'World leaders agreed on new emissions targets to fight global warming...',
 '{
  "author": {"name": "Jane Smith", "email": "jane@example.com"},
  "categories": ["environment", "politics", "global"],
  "tags": ["climate change", "summit", "emissions", "policy"],
   "metrics": {"read_time": 4.5, "comments_count": 28, "shares": 156}
  }'),
(2,
 'Global Warming Effects on Marine Life',
 1,
 'Rising ocean temperatures are threatening coral reefs and marine biodiversity...',
'{
   "author": {"name": "Mark Johnson", "email": "mark@example.com"},
   "categories": ["environment", "science", "marine"],
   "tags": ["global warming", "oceans", "marine life", "climate change"],
   "metrics": {"read_time": 5.2, "comments_count": 42, "shares": 205}
   }')

In this example, if we were using a purely relational approach, we would need separate tables for authors, categories, tags, and metrics, with foreign keys linking everything together. With JSON, we keep related data together, making it straightforward to retrieve complete article information in a single query. Furthermore, if we later decide to add new metadata fields (like “video_duration” for video articles or “interview_subject” for interviews), we can do so without altering the database schema. This flexibility is invaluable for a CMS that needs to adapt to evolving content types.

Query patterns and optimization

In this section, we explore the search capabilities that PostgreSQL provides for querying JSON data.

Query with JSON path operators

In our CMS, editors might need to find all environment-related articles that mention “climate change” to create a special feature. This query combines traditional SQL with JSON path operators:

-- Example of a basic query combining JSON and text search
-- This query finds articles with specific tags in the environment category
SELECT title, metadata->>'author' AS author_name
FROM json_articles
WHERE metadata->'categories' ? 'environment'
AND metadata->'tags' ? 'climate change';
Output:
Title                            			        Author_name
--------------------------------------------	       -------------------------------------------------------
Climate Change Summit Results    	               {"name": "Jane Smith", "email": "jane@example.com"}
Global Warming Effects on Marine Life               {"name": "Mark Johnson", "email": "mark@example.com"}

In this query:

metadata->'categories' ? 'environment' uses the containment operator “?” to check if the categories array contains “environment“
metadata->'tags' ? 'climate' similarly checks if “climate change” is in the tags array
metadata->>'author' extracts the author field as text (note ->>), because author is itself a JSON object, we get the entire object as a string

Transforming JSON into tabular data

CMS applications often need to extract structured data from the JSON metadata for analytics purposes. The JSON_TABLE is an ANSI spec compliant function to present JSON data in a tabular format:

-- Using JSON_TABLE to extract structured data from JSON (PostgreSQL 17+ feature)
-- This demonstrates how to convert nested JSON data into a relational format
SELECT 
    a.article_id, 
    a.title,  
    jt.author_name, 
    jt.author_email, 
    jt.category
FROM 
    json_articles a,
    JSON_TABLE(a.metadata, '$' COLUMNS (
        author_name VARCHAR(100) PATH '$.author.name',
        author_email VARCHAR(100) PATH '$.author.email',
        NESTED PATH '$.categories[*]' COLUMNS (
            category VARCHAR(50) PATH '$'
        )
    )) AS jt
WHERE 
    (a.metadata -> 'metrics' ->> 'read_time')::float < 5;

Output:
article_id                      title                 author_name          author_email                      category
----------  -----------------------------           --------------   ------------------------            ---------------
1           Climate Change Summit Results             Jane Smith           jane@example.com                environment
1           Climate Change Summit Results             Jane Smith           jane@example.com                politics
1           Climate Change Summit Results             Jane Smith           jane@example.com                global

This query uses the JSON_TABLE function with PATH and NESTED PATH expressions to flatten JSON data, providing a relational view of our JSON data that we can use for reporting or exporting data.

If using older PostgreSQL versions, we can achieve similar results using lateral joins:

-- Alternative approach for PostgreSQL versions before 17
-- This achieves similar results using lateral joins and jsonb_array_elements
SELECT 
    a.article_id, 
    a.title,
    a.metadata -> 'author' ->> 'name' AS author_name,
    a.metadata -> 'author' ->> 'email' AS author_email,
    c.category
FROM 
    json_articles a,
    LATERAL jsonb_array_elements_text(a.metadata -> 'categories') AS c(category)
WHERE 
    (a.metadata -> 'metrics' ->> 'read_time')::float < 5;
Output:
article_id        title                              author_name        author_email                   category
----------  -----------------------------          ---------------   --------------------------      ----------
1           Climate Change Summit Results            Jane Smith         jane@example.com             environment
1           Climate Change Summit Results            Jane Smith         jane@example.com             politics
1           Climate Change Summit Results            Jane Smith         jane@example.com             global

Full-text search with JSON data

CMS applications require advanced search techniques to search their JSON data. In this section, we show how our CMS application can use full-text search capabilities of PostgreSQL.

For our CMS application, readers need to search article content efficiently. PostgreSQL’s full-text search capabilities work seamlessly with JSON content. For example, see the following query:

-- Create a text search index on content
CREATE INDEX json_articles_content_idx ON json_articles   
USING GIN (to_tsvector('english', content));

-- Full-text search example with JSON filtering
-- This query finds articles about "global" AND "warming" in the environment category
-- and highlights the matching terms in the content
SELECT 
    a.article_id,  
    a.title, 
    ts_headline('english', a.content, q) AS highlighted_content
FROM   
    json_articles a, 
    to_tsquery('english', 'global & warming') AS q
WHERE 
    to_tsvector('english', a.content) @@ q
    AND a.metadata @> '{"categories": ["environment"]}'
ORDER BY  
    ts_rank(to_tsvector('english', a.content), q) DESC;

Output:
article_id                title                                               highlighted_content
----------  -----------------------------  -------------------------------------------------
1           Climate Change Summit           Results  World leaders agreed on new emissions targets to fight <b>global</b> <b>warming</b>...

This query showcases several complex search features:

Highlighting search terms in results
Ranking articles by relevance
Filtering by metadata attributes
Combining full-text search with structured data conditions

The query uses the following components:

Full-text search operators – The @@ operator checks if a tsvector matches a tsquery
Text search functions – It uses the following functions:
- to_tsvector() converts text to a searchable vector, handling stemming and stop words
- to_tsquery() parses a search query with Boolean operators
- ts_headline() generates highlighted excerpts with matching terms in bold
- ts_rank() ranks results by relevance
JSON containment – The @> operator checks if the left JSON contains the right JSON structure

More GIN Indexing Options

Our GIN examples use jsonb_path_ops, which works fine for presence checks like ?, but we should emphasize that it doesn’t support full-path queries or composite keys. If you intend to use deeper JSON navigation, use jsonb_ops as an alternative.

-- Use jsonb_ops for full-path and composite key queries
CREATE INDEX idx_news_articles_metadata_ops ON json_articles
USING GIN (metadata jsonb_ops);

The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, and the latter creates index items only for each value in the data. As a rule, use each index type in the following scenarios:

Use jsonb_path_ops for simple containment queries (smaller, faster)

CREATE INDEX idx_articles_tags ON json_articles USING GIN ((metadata->'tags') jsonb_path_ops);

Use jsonb_ops for complex path queries and composite operations

CREATE INDEX idx_articles_metadata ON json_articles USING GIN (metadata jsonb_ops);

For more details on jsonb_path_ops and jsonb_ops, refer to JSON Types.

Now you can have SQL statements like the following:

SELECT title, metadata->'author'->>'name' AS author_name
FROM json_articles
WHERE metadata @> '{"categories": ["environment"]}'
  AND metadata @> '{"tags": ["climate change"]}';

Optimizing GIN indexes

In addition to the rule for using the right GIN operator class that we mentioned in the previous section, consider the general GIN index optimization strategies:

Use expression and partial indexes
- If you frequently query a specific key, create an expression index on just that key. This uses a standard B-tree index, which is much faster for simple equality checks.
- Combine an expression index with a WHERE clause to index only a subset of your data. This is useful for large tables where only a fraction of rows needs to be indexed.
Optimize bulk data loads – For large data loads, it’s more efficient to load the data first and then create the index afterward. Creating the index in a single bulk operation is often faster than building it incrementally with each INSERT.
Optimize gin_pending_list_limit – This configuration parameter controls the size of a GIN index’s “pending list“. It is crucial for tables with high write traffic. Reduce the size of this parameter, to prioritize fast, frequent writes.
Monitor and tune

Semantic and geospatial search capabiltities

When your application requires these two search techniques with your JSON data, alongside PostgreSQL’s strengths, AWS recommends complementary architectural patterns. Common approaches include integrating PostgreSQL with Amazon OpenSearch Service, creating purpose-built search solutions that use each service’s strengths. The PostgreSQL’s integration with OpenSearch, implemented using a pull or push approach and an OpenSearch Integration Pipeline, offers a fully managed, no-code experience for ingesting data into OpenSearch.

The use of separate, optimized services like Amazon OpenSearch Service for vector and geospatial search allows PostgreSQL to be more efficient for tasks where it excels, such as complex queries and transactional integrity. Amazon OpenSearch Service is designed to handle high-dimensional vector embeddings and complex geospatial queries, making it a more performant choice for those specific workloads.

Clean up

When you’re done experimenting with this solution, cleanup the resources you created to avoid incurring additional charges.

AWS-specific benefits

Although PostgreSQL has built-in JSON capabilities, Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL provide infrastructure and management features that significantly enhance how effectively you can work with JSON data:

Distributed storage architecture – Aurora’s storage layer reduces I/O bottlenecks when processing large JSON documents, allowing for more efficient read/write operations. Storage automatically grows without manual intervention, which is ideal for applications with unpredictable JSON data growth.
·Aurora cluster cache management – The shared buffer cache across instances improves performance for frequently accessed JSON documents. Buffer cache persistence across instance restarts prevents performance degradation after maintenance events.
Parallel query processing – PostgreSQL’s parallel query processing can improve performance for some JSON operations by distributing work across multiple CPU cores. This feature is most effective for analytical queries on large datasets, though complex JSONB path operations and GIN index lookups might not always benefit from parallelization.
Read replicas with minimal lag – Aurora replication enables up to 15 read replicas with minimal lag, helping you scale JSON read operations effectively. Amazon Aurora Serverless v2 dynamically scales to handle varying JSON read operations, automatically adjusting compute capacity as your workload changes.
Instance type flexibility – You can choose the following instance types:
- Memory-optimized instances for applications with large JSON documents that benefit from in-memory processing
- Burstable-performance instances are cost-effective for workloads with moderate CPU usage and occasional spikes.
- Read-optimized instances for high-throughput JSON ingestion workloads.
CloudWatch database insights – Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL provide detailed monitoring that is specifically helpful for identifying bottlenecks in complex JSON operations.

Best practices

By following best practices, you can effectively use PostgreSQL’s JSON capabilities while maintaining performance and data integrity in your AWS-hosted databases.

Strategic data modeling:
- Use JSONB for most use cases unless exact text preservation is required
- Normalize frequently queried fields into regular columns for better performance
- Consider partial normalization where some structured data lives in columns while variable data stays in JSON
Indexing strategies:
- Create GIN indexes for frequently queried JSON paths.
- Use expression indexes for specific JSON properties
- Monitor index usage and remove unused indexes that slow down writes
Query optimization:
- Use containment operators @> and <@ with GIN indexes for best performance
- Leverage path operators ->, ->> to avoid extracting entire documents
- Extract values to appropriate types when performing comparisons:
```
WHERE data->>'age' > '18'                  -- Incorrect (string comparison)
WHERE (data->>'age')::integer > 18         -- Correct (numeric comparison)
```
Performance considerations:
- Don’t overuse JSON for data that would be better in relational columns
- Use JSONB_PATH_OPS for GIN indexes when only using the containment operator.
- Batch update JSON documents rather than updating individual fields frequently
- Perform targeted path updates rather than the full document
```
UPDATE users SET data = jsonb_set(data, '{address,zip}', '"10001"')
```
Avoid query anti-patterns:
- Scanning all documents when more selective filters could be use
- Extracting entire JSON documents when only specific fields are needed
- Joining on JSON fields without proper indexing or type casting

Schema evolution:

Validate your JSON schema at the database level. You can use check constraints to check the validity of the JSON schema. For example, consider the following table definition:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB,
    CONSTRAINT chk_user_data_schema CHECK (
        jsonb_typeof(data) = 'object' AND
        data ? 'name' AND jsonb_typeof(data->'name') = 'string' AND
        data ? 'age' AND jsonb_typeof(data->'age') = 'number' AND (data->>'age')::int > 0
    )
);

INSERT INTO users (data)
VALUES ('{"name": "Alice", "age": 30}');

INSERT INTO users (data)
VALUES ('{"name": "Bob", "age": 25, "is_active": true}'); -- Additional keys are allowed

-- Fails: "data" is a JSON array
INSERT INTO users (data)
VALUES ('["name", "Alice", "age", 30]');

-- Fails: "data" is a simple string
INSERT INTO users (data)
VALUES ('"Alice"');

Recent versions of PostgreSQL offer several functions, such as “IS JSON” , “IS JSON ARRAY” or “IS JSON OBJECT” to do simple validity checks.

SELECT '{"key": "value"}' IS JSON;       -- Returns true
SELECT 'not json' IS JSON;               -- Returns false
SELECT '[1, 2, 3]' IS JSON ARRAY;        -- Returns true
SELECT '{"key": "value"}' IS JSON ARRAY; -- Returns false
SELECT '{"key": "value"}' IS JSON OBJECT;-- Returns true
SELECT '[1, 2, 3]' IS JSON OBJECT;       -- Returns false

You can also do application-level validation using JSON Schema or similar tools

Storage considerations:
- Avoid document bloat by removing obsolete fields
- Avoid storing data in both JSON and regular columns
AWS-specific considerations:
- Use CloudWatch Database Insights to identify bottlenecks in JSON operations.
- Account for JSON storage size because it can be less compact than normalized tables.

Conclusion

PostgreSQL’s JSON capabilities, combined with AWS managed services, offer a compelling alternative to NoSQL databases. You get schema flexibility without sacrificing transactions, powerful querying without learning new languages, and enterprise features without operational overhead.

To migrate your JSON workloads to a PostgreSQL platform, start with Amazon RDS for PostgreSQL, or Amazon Aurora PostgreSQL-Compatible for cost optimization.

Try PostgreSQL’s JSON capabilities discussed in this post and leave your comments here. Refer to other posts in this blog series to learn about AWS JSON database solutions.

AWS Database Blog