AWS Database Blog

Building a job search engine with PostgreSQL’s advanced search features

In today’s employment landscape, job search platforms play a crucial role in connecting employers with potential candidates. Behind these platforms lie complex search engines that must process and analyze vast amounts of structured and unstructured data to deliver relevant results. Building such systems requires database technologies that can handle complex queries, full-text search, semantic search for semantic understanding, and geospatial capabilities for location-aware recommendations.

PostgreSQL stands out as an ideal solution for implementing job search platforms due to its comprehensive search capabilities. PostgreSQL offers a unique combination of full-text search, vector similarity search, and geospatial search within a single database system. This convergence of capabilities enables developers to build rich search experiences without managing multiple specialized systems.

This post explores how to use PostgreSQL’s search features to build an effective job search engine. We examine each search capability in detail, discuss how they can be combined in PostgreSQL, and offer strategies for optimizing performance as your search engine scales.

Additionally, we discuss how the techniques presented in this post are broadly applicable to many search-driven applications across different domains, such as e-commerce product discovery and content recommendation.

The anatomy of a modern job search engine

Before diving into PostgreSQL’s specific search capabilities, it’s important to understand the fundamental components and requirements of a modern job search platform. At its core, a job search engine consists of two primary components:

Data repository – The foundation of any job search platform is its database serving as a repository for job listings from various sources. This database is continuously updated through web crawlers, direct employer submissions, and integrations with job boards and recruitment platforms. The repository additionally stores profiles and resumes of job candidates.

Search engine -The search engine allows for bidirectional search — employers seeking candidates and candidates seeking opportunities, processing queries, analyzing and joining structured data (like job titles, locations, and salary ranges) and unstructured content (like job descriptions and candidate resumes). An advanced search engine goes beyond simple keyword matching to understand context, handle synonyms, recognize related concepts, and factor in location-based constraints.

An effective job search engine requires:

  • Full-text search -this provides precise lexical matching for job titles, skills, and organization names. It supports exact phrase matching and typo-tolerant fuzzy searches for partial matches. Full-text search excels when users can articulate specific search criteria, though it lacks contextual understanding.
  • Semantic search – vector-based similarity search introduces important contextual understanding by interpreting job descriptions and candidate qualifications beyond literal terminology. This dimension captures nuanced professional relationships and implicit requirements and qualifications that keyword matching would miss, enabling more intelligent matching between candidates and positions.
  • Geospatial search – location intelligence refines results by incorporating geographic considerations, allowing users to discover opportunities within specific distance parameters, commute thresholds, or regional boundaries. This connects professional qualifications with physical job market realities.

By integrating these complementary search techniques, job search engines can process elaborate queries that simultaneously evaluate exact terms, contextual meaning, and geographic considerations, delivering more relevant matches in an increasingly complex employment landscape.

PostgreSQL as a comprehensive search solution

PostgreSQL serves dual purposes as both a robust data repository and an advanced search engine. Through built-in features and extensions, PostgreSQL can handle all three essential search dimensions within a single system:

  • Full-text search using built-in types like tsvector, tsquery, and GIN indexes
  • Vector similarity search for semantic matching through the pgvector extension
  • Geospatial queries via the PostGIS extension with GiST indexes

A job search engine uses PostgreSQL to store job listings, job candidates’ profiles, provide real-time full-text search and semantic search across millions of resumes and job listings, and find job matches within specified geographical radius. This unified approach simplifies architecture, reduces operational complexity, and enables hybrid search strategies.

Consider the following simple data model for a job search engine, composing of “job” and “resume” tables. These tables have columns of type tsvector, vector and geometry to hold vector data, vector embedding, and geometry for the location of the job and the candidate.

CREATE TABLE job (
   Job_id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    company TEXT,
    title TEXT NOT NULL,
    description TEXT,
    title_tsv TSVECTOR  -- Computed full-text search vectors
        GENERATED ALWAYS AS (to_tsvector('english', title)) STORED,
    description_tsv TSVECTOR
        GENERATED ALWAYS AS (to_tsvector('english', description)) STORED,
    semantic_vector VECTOR(3)  
        GENERATED ALWAYS AS (
          embedding_function(title || ' ' || description) -- replace with actual embedding generation
        ) STORED,
    skills_vector VECTOR(3)   -- Computed skills vector
        GENERATED ALWAYS AS (
          embedding_function(array_to_string(skills, ' '))
        ) STORED, 
    geom GEOMETRY(Point, 4326),
    location TEXT,
    salary_range INT4RANGE,
    experience_level TEXT,
    job_location GEOMETRY(Point, 4326),
    skills TEXT[]
);
CREATE TABLE resume (
    candidate_id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    candidate_name TEXT,
    raw_resume TEXT,
    resume_tsv TSVECTOR  -- Computed search vectors
        GENERATED ALWAYS AS (to_tsvector('english', raw_resume)) STORED,
    skills_vector VECTOR(3)   -- Computed semantic and skills vectors
        GENERATED ALWAYS AS (
         --   embedding_function(array_to_string(skills, ' '))
        ) STORED, 
    geom GEOMETRY(Point, 4326),
    location TEXT,
    skills TEXT[],
    job_history TEXT[],
    education_levels TEXT[]
);

Now, let us explore how a job search engine can implement different search techniques in PostgreSQL.

Full-text search in PostgreSQL

PostgreSQL’s full-text search capabilities provide a solid foundation for matching job listings and candidate profiles based on specific keywords, phrases, and requirements. When the engine ingests job listings, and resumes, it employs tokenization techniques, breaking documents into lexemes using predefined linguistic dictionaries. These dictionaries guide the process of normalizing text, removing stop-words, and applying stemming to reduce words to their root forms. The resulting standardized lexemes are then mapped into an inverted index, creating an efficient structure for rapid retrieval. When a candidate enters a search query, the engine matches the query tokens against indexed job description tokens. The system ranks results based on term frequency and the lexical matches, providing a list of relevant job opportunities. The following diagram explains the process.

Full-Text Search in PostgreSQL

Full-Text Search in PostgreSQL

PostgreSQL full-text search component consists of:

  • Dictionaries – PostgreSQL uses dictionaries to enable language-aware lexeme parsing, stemming, and stop-word removal. These dictionaries transform raw text into standardized lexemes (root forms of words), ensuring that variations like “working,” “worked,” and “works” all match a search for “work.” PostgreSQL includes built-in dictionaries for many languages and allows custom dictionaries for specialized terminology.
  • Text processing – The to_tsvector function converts documents (like job descriptions or resumes) into a special tsvector format that stores normalized lexemes with their positions and optional weights. Similarly, the to_tsquery function processes search queries into a format optimized for matching against these document vectors.
  • Match operator -The match operator (@@) evaluates the similarity between a document vector and a query, returning true if there’s a match.
  • Ranking functions – Functions like ts_rank and ts_rank_cd determine the relevance of matches based on factors like term frequency and document structure, allowing results to be sorted by relevance.

The following example finds candidates with “JavaScript” and either “React” or “Angular” specific skills but excludes those mentioning “WordPress”.

WITH resume AS(SELECT * FROM (VALUES
  ('John','react,javascript, wordpress'),
  ('Mary','angular, javascript'),
) resume(candidate_name, skills)) 
SELECT  candidate_name
FROM resume
WHERE  TO_TSVECTOR(resume.skills)@@ to_tsquery('english', 'javascript & (react | angular) & !wordpress');

candidate_name
------------------
      Mary

Advanced full-text search features

PostgreSQL offers several advanced features for more complex full-text search:

Proximity search – Find words that appear near each other in a document:

SELECT * FROM resume WHERE resume_tsv @@ to_tsquery('software <-> engineering');

This would match “software engineering” but not “software testing engineering” where the terms aren’t adjacent.

Simple ranking – makes sure that the most relevant results are ranked higher. Ts_rank considers the frequency of words; the more tokens that match the text, the higher the rank. You can rank resumes by frequency of the word “Amazon” in the job skills requirements

WITH resume AS(SELECT * FROM (VALUES
  ('some-company:software engineering'),
  ('Amazon: software engineering')
) resume(skills)) 
SELECT resume.skills, TS_RANK_CD(TO_TSVECTOR(resume.skills), TO_TSQUERY('Amazon')) AS rank FROM resume
ORDER BY rank DESC;

               skills                          rank
---------------------------------------------------
Amazon: software engineering	               0.1 
some-company:software engineering	       0.0 

Weighted ranking – assign different importance to different parts of a document.

  • A (most important): Highest weight
  • B (high importance): Second highest
  • C (medium importance): Third level
  • D (lowest importance): Default weight

Here is an example:

-- Create a function to generate weighted search vector
 -- This function assigns weight “A” to column “title”, and weight “B” to column “skills” 
CREATE FUNCTION job_search_vector(title TEXT, skills TEXT) 
RETURNS tsvector AS $$
BEGIN
    RETURN 
        setweight(to_tsvector('english', title), 'A') || setweight(to_tsvector('english', skills), 'B');
END
$$ LANGUAGE plpgsql;
WITH job AS(SELECT * FROM (VALUES
  (1,'programmer','java, python, junit'),
  (2,'QA','python, junit')
) job(id,title, skills)) 
SELECT  job.id, job.title, job.skills 
ts_rank(job_search_vector(job.title, job. skills), to_tsquery('QA & python & junit & java')
 		   ) AS rank
FROM job
ORDER BY rank DESC;

job_id    title          skills                      rank   
-------------------------------------------------------------
2	  QA	         python, junit	           0.915068
1	  programmer     java, python, junit       0.77922493

Fuzzy matchingpg_trgm extension complements full-text search, enabling similarity-based matching. This capability is important for job search platforms where users might misspell technical terms or job titles.

SET pg_trgm.similarity_threshold = 0.4; 
 WITH job AS(SELECT * FROM (VALUES
  (1,' programmer’,'java, python, junit'),
  (2,'QA','python, junit')
) job(id,title, skills)) 
SELECT * FROM job WHERE job.title % 'programer';
job_id           title                skills           
-------------------------------------------------------
1	         programmer	      java, python, junit

Indexing for performance

PostgreSQL provides specialized index types for optimizing full-text search performance:

GIN (Generalized Inverted Index) – Ideal for static text data where search speed is prioritized over update speed. GIN indexes excel with tsvector columns and are the preferred choice for most job search scenarios.

GiST (Generalized Search Tree) – More balanced between search and update performance, consuming less space but potentially slower for complex queries. GiST indexes is more appropriate for applications with frequent updates.

Semantic search with pgvector

While full-text search excels at finding exact matches, it lacks understanding of meaning and context. For instance, a full-text search wouldn’t naturally understand that “software engineer” and “developer” represent similar roles, or that “cloud architecture” relates to “AWS expertise.” This is where semantic search through vector embeddings is valuable.

Understanding vector embeddings

Vector embeddings represent text as points in a high-dimensional space, where the geometric relationships between these points capture semantic relationships. Similar concepts appear closer together in this vector space, even if they share no common terms. The pgvector extension adds vector data types and operations to PostgreSQL, enabling storage of these embeddings directly in the database and performing efficient similarity searches.

Implementing semantic search

The following diagram shows how semantic search is implemented in PostgreSQL.

Semantic Search in PostgreSQL

Semantic Search in PostgreSQL

The following are the steps for implementing vector search:

  • Generate embeddings – Convert job descriptions and candidate resumes into vector embeddings. This typically involves using machine learning models available through services such as Amazon Bedrock. Here is an example of embeddings generated for job postings.
Vector Embedding

Vector Embedding

  • Store vectors – store the embeddings in PostgreSQL using pgvector’s vector data type:
  • Similarity search – Use vector operators to find similar items by calculating the distance between the vector embeddings, as depicted in the following diagram.
Vector Similarity Search

Vector Similarity Search

Following is a query calculating the distance between an applicant’s skillset embedding and the embeddings for the job postings.

SELECT * FROM (
	WITH job AS(SELECT * FROM (VALUES
	        (1, 'Data Scientist', 'python, machine learning, sql, statistics, deep learning', '[0.9, 0.7, 0.2]'::vector(3)),
	        (2, 'Frontend Developer', 'javascript, react, css, html, redix', '[0.2, 0.1, 0.9]'::vector(3)),
	        (3, 'Backend Engineer', 'java, spring, microservices, sql, api design', '[0.5, 0.9, 0.3]'::vector(3)),
	        (4, 'DevOps Engineer', 'kubernetes, docker, aws, terraform, ci/cd', '[0.6, 0.8, 0.4]'::vector(3))
	    ) job(job_id, title, skills, skill_vector)
	),
	resume AS(SELECT * FROM (VALUES
	        (1, 'Mary', 'java, spring boot, microservices, postgresql, rest api', '[0.45, 0.95, 0.25]'::vector(3)),
	        (2, 'Jean', 'docker, kubernetes, aws, jenkins, terraform', '[0.65, 0.75, 0.35]'::vector(3)),
	        (3, 'Bill', 'javascript, python, react, express, mongodb, node.js', '[0.35, 0.6, 0.65]'::vector(3))
	    ) resume(candidate_id, name, skills, skill_vector)
	)
	SELECT job.title, resume.name, 1 - (job.skill_vector <-> resume.skill_vector) AS similarity_score  -- Cosine similarity from distance
	FROM job CROSS JOIN resume
	)scores
WHERE similarity_score > 0.9
ORDER BY similarity_score DESC;

title			     name	similarity_score
---------------------------------------------------
DevOps Engineer		Jean	0.9133974713
Backend Engineer	Mary	0.9133974391

The <-> operator calculates the distance between vectors. A smaller distance indicates greater similarity.

Optimizing vector search performance

As size of dataset grows, performance becomes critical for vector searches. PostgreSQL offers specialized index types for vector similarity searches:

IVFFlat index – Divides the vector space into smaller partitions for more efficient searching:

CREATE INDEX ON resume USING ivfflat (skills_vector vector_l2_ops) WITH (lists = 100);

HNSW index – Hierarchical Navigable Small World graph index provides even faster approximate nearest neighbor searches:

CREATE INDEX ON resume USING hnsw (skills_vector vector_cosine_ops) WITH (m=16, ef_construction=64);

These indexes dramatically improve search performance at the cost of some precision, making them ideal for large job search platforms where sub-second response times are important.

Geospatial search with PostGIS

Location is often a critical factor in job searches. Candidates typically seek positions within commuting distance, while employers target local talent pools. PostgreSQL’s PostGIS extension provides geospatial capabilities for implementing location-aware job searches.

Geospatial search implementation

The following captures the geospatial search architecture and the implementation steps in PostgreSQL.

Geospatial Search in PostgreSQL

Geospatial Search in PostgreSQL

  1. Install and enable PostGIS.
    CREATE EXTENSION postgis;
  2. Add geometry or geography columns.
    --The SRID 4326 refers to the WGS84 coordinate system used by GPS and most mapping applications.
    ALTER TABLE job ADD COLUMN location GEOMETRY(Point, 4326);
    ALTER TABLE resume ADD COLUMN location GEOMETRY(Point, 4326);
  3. Index using GiST indexes.
    CREATE INDEX job_location_idx ON job USING GIST (location);
    CREATE INDEX resume_location_idx ON resume USING GIST (location);
  4. Perform geospatial queries using geospatial functions, such as ST_DWithin and sort locations by distance using ST_Distance, as needed. The following query shows an example.
    WITH job AS (SELECT * FROM (VALUES
            (1, 'Data Scientist', 'python, machine learning, sql, statistics, deep learning', 
             ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)), -- San Francisco
            (2, 'Frontend Developer', 'javascript, react, css, html, redux', 
             ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326)), -- New York
            (3, 'Backend Engineer', 'java, spring, microservices, sql, api design', 
             ST_SetSRID(ST_MakePoint(-118.2437, 34.0522), 4326)) -- Los Angeles
        ) job(job_id, title, skills, location)
    ),
    resume AS(    SELECT * FROM (VALUES
            (1, 'John ', 'python, machine learning, tensorflow, statistics, pandas', 
             ST_SetSRID(ST_MakePoint(-122.3321, 37.8272), 4326)), -- Berkeley 
            (3, 'Mary', 'java, spring boot, microservices, postgresql, rest api', 
             ST_SetSRID(ST_MakePoint(-118.4912, 34.0195), 4326)), -- Santa Monica 
            (10, 'Riley', 'javascript, python, node.js, react, graphql, mongodb', 
             ST_SetSRID(ST_MakePoint(-84.3880, 33.7490), 4326)) -- Atlanta 
        ) resume(candidate_id, name, skills, location)
    ),
    -- Calculate distances and rank candidates by proximity
    ranked_candidates AS (
        SELECT j.title AS job_title,
            CASE j.job_id
                WHEN 1 THEN 'San Francisco'
                WHEN 2 THEN 'New York'
                WHEN 3 THEN 'Los Angeles'
                WHEN 4 THEN 'Chicago'
                WHEN 5 THEN 'Dallas'
            END AS job_location,
            r.name AS candidate_name,
            -- Calculate distance in kilometers
            ST_Distance(j.location::geography, r.location::geography) / 1000 AS distance_km,
            -- Rank candidates for each job by distance
            ROW_NUMBER() OVER (PARTITION BY j.job_id ORDER BY ST_Distance(j.location::geography, r.location::geography)) AS rank
        FROM job j CROSS JOIN resume r
    )
    -- Select only the closest candidate for each job
    SELECT job_title, job_location, candidate_name, ROUND(distance_km::numeric, 1) AS distance_km,
        CASE    -- Provide context about the proximity
            WHEN distance_km < 5 THEN 'Walking distance'
            WHEN distance_km < 15 THEN 'Short commute'
            WHEN distance_km < 30 THEN 'Moderate commute'
            ELSE 'Remote work only'
        END AS commute_assessment
    FROM ranked_candidates
    WHERE TO_TSVECTOR ('english', job_title) @@ TO_TSQUERY ('english', 'engineer') -- select engineers only
    AND rank = 1
    ORDER BY job_title;
    
    job_title	            job_location	candidate_name	distance_km	commute_assessment
    ------------------------------------------------------------------------------------------
    Backend Engineer	    Los Angeles	        Mary	        23.1	  Moderate commute 
    

Combining search techniques

While each search technique offers distinct advantages, a hybrid search uses the strengths of each method to deliver more relevant results across diverse use cases. A hybrid of full-text search and semantic search is particularly effective for complex queries where both specific terms and overall meaning are important. Using a full-text search to match user preferences and similarity search to expand recommendations to related content can be ideal to give users a more comprehensive view of the available options. We can match specific skills or job titles (full-text search) while also understanding related roles or transferable skills through similarity search. For queries where user intent might not be fully captured by exact keyword matching, similarity search can help find relevant results, and full-text search makes sure no exact matches are missed. To make a job search, location aware, geo-contextual search can be combined with full-text search or semantic search, to further enhance the search results.

In PostgreSQL’s hybrid search, different search methods independently rank results using their own relevance algorithms. To combine these diverse rankings meaningfully, the Reciprocal Rank Fusion (RRF) algorithm merges them using a specific formula that assigns each result a unified score

The following is an example of hybrid search that shows top candidates for job engineering positions considering distance and skill match.

WITH job AS(SELECT * FROM (VALUES
        (1, 'Data Scientist', 'python, machine learning, sql, statistics, deep learning', 
         '[0.9, 0.7, 0.2]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)), -- San Francisco
        (2, 'Frontend Developer', 'javascript, react, css, html, redux', 
         '[0.2, 0.1, 0.9]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326)), -- New York
        (3, 'Backend Engineer', 'java, spring, microservices, sql, api design', 
         '[0.5, 0.9, 0.3]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-118.2437, 34.0522), 4326)), -- Los Angeles
        (4, 'DevOps Engineer', 'kubernetes, docker, aws, terraform, ci/cd', 
         '[0.6, 0.8, 0.4]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-87.6298, 41.8781), 4326)), -- Chicago
        (5, 'Full Stack Developer', 'javascript, python, react, node.js, mongodb', 
         '[0.4, 0.5, 0.7]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-96.7970, 32.7767), 4326)) -- Dallas
    ) job(job_id, title, skills, skill_vector, location)
),
resume AS(SELECT * FROM (VALUES
        (1, 'John', 'python, machine learning, tensorflow, statistics, pandas',  '[0.85, 0.6, 0.25]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-122.3321, 37.8272), 4326)), -- Berkeley (near SF)
        (2, 'Sam', 'javascript, react, css, html, typescript, sass', '[0.15, 0.2, 0.95]'::vector(3),
         ST_SetSRID(ST_MakePoint(-73.9352, 40.7306), 4326)), -- Queens, NY
        (3, 'Mary', 'java, spring boot, microservices, postgresql, rest api', '[0.45, 0.95, 0.25]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-118.4912, 34.0195), 4326)), -- Santa Monica (near LA)
        (4, 'Jean', 'docker, kubernetes, aws, jenkins, terraform', '[0.65, 0.75, 0.35]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-87.9065, 41.9742), 4326)), -- Evanston (near Chicago)
        (5, 'Bill', 'javascript, python, react, express, mongodb, node.js', '[0.35, 0.6, 0.65]'::vector(3), 
         ST_SetSRID(ST_MakePoint(-84.3880, 33.7490), 4326)) -- Atlanta (far from Dallas)
    ) resume(candidate_id, name, skills, skill_vector, location)
),
all_matches AS ( -- Calculate all matches with scores
    SELECT job.job_id, job.title AS job_title, job.skills AS job_skills, resume.candidate_id, resume.name AS candidate_name, resume.skills AS candidate_skills,
        cosine_similarity(job.skill_vector, resume.skill_vector) AS skill_similarity,
        ST_Distance(job.location::geography, resume.location::geography) / 1000 AS distance_km,
        -- Location score (inversely related to distance)
        GREATEST(0, 1 - (ST_Distance(job.location::geography, resume.location::geography) / 1000 / 500)) AS location_score,
        (cosine_similarity(job.skill_vector, resume.skill_vector) * 0.7) + -- Combined score (70% skill, 30% location)
        (GREATEST(0, 1 - (ST_Distance(job.location::geography, resume.location::geography) / 1000 / 500)) * 0.3) AS combined_score FROM job CROSS JOIN resume
),
ranked_matches AS ( -- Rank candidates for each job
    SELECT job_id, job_title, job_skills, candidate_id, candidate_name,candidate_skills, skill_similarity, distance_km, location_score, combined_score,
        -- Rank within each job_id group by combined score
        ROW_NUMBER() OVER (PARTITION BY job_id ORDER BY combined_score DESC) AS rank
    FROM all_matches
)
SELECT job_title, candidate_name, 
    ROUND(skill_similarity::numeric, 2) AS skill_score, ROUND(distance_km::numeric, 1) AS distance_km,
    ROUND(location_score::numeric, 2) AS location_score, ROUND(combined_score::numeric, 2) AS combined_score,
    CASE  -- Detailed match assessment
        WHEN skill_similarity >= 0.9 AND distance_km <= 50 THEN 'Perfect Match'
        WHEN skill_similarity >= 0.8 AND distance_km <= 100 THEN 'Excellent Match'
        WHEN skill_similarity >= 0.7 AND distance_km <= 200 THEN 'Good Match'
        WHEN skill_similarity >= 0.6 OR distance_km <= 300 THEN 'Potential Match'
        ELSE 'Low Match'
    END AS match_quality
FROM ranked_matches 
WHERE TO_TSVECTOR ('english', job_title) @@ TO_TSQUERY ('english', 'engineer')
AND rank = 1 ORDER BY job_id;

job_title	        candidate_name   skill_score	distance_km	location_score	combined_score	match_quality
-------------------------------------------------------------------------------------------------------------------
Backend Engineer	Mary	           1		       23.1		  0.95		       0.98		    Perfect Match
DevOps Engineer	    Jean	           1		       25.3		  0.95		       0.98	        Perfect Match

Performance and scaling considerations

As your job search platform grows in terms of data volume, user base, and query complexity, performance optimization becomes increasingly important. PostgreSQL offers various mechanisms to ensure the search engine remains responsive at scale.

Job search applications face several specific performance challenges:

  •  Computational complexity – Hybrid search queries combining multiple techniques can be resource-intensive, especially when they involve complex operations like vector similarity calculations or geospatial distance measurements.
  • Indexing overhead – Maintaining specialized indexes for different search techniques increases storage requirements and can slow write operations.
  • Result merging – Combining results from different search algorithms often requires complex join operations and scoring calculations.
  • Concurrent query load – Popular job search platforms must handle many simultaneous search requests, especially during peak usage periods.

PostgreSQL offers several features to address performance challenges:

  • Parallel query execution – Distribute query workloads across multiple CPU cores
    -- Enable parallel query with 4 workers
    SET max_parallel_workers_per_gather = 4;
     
    -- Execute a complex search query with parallel processing
    SELECT * FROM job WHERE <complex_search_condition>;
  • Query pipelining – Process multiple query stages concurrently:
    -- Use Common Table Expressions (CTEs) for pipelining
    WITH text_matches AS (
        SELECT * FROM job WHERE title_tsv @@ to_tsquery('data & science')
    )
    SELECT * FROM text_matches 
    WHERE skills_vector <-> '[0.2, 0.7, 0.1]'::vector < 0.5;
    
  • Materialized views– Pre-compute common search operations
    -- Create a materialized view for frequently used search results
    CREATE MATERIALIZED VIEW popular_tech_jobs AS
    SELECT * FROM job 
    WHERE 
        title_tsv @@ to_tsquery('software | developer | engineer') 
        AND salary_range && numrange(100000, 150000);
  • Right indexing – Choose the right index types for each search dimension
    -- Full-text search index
    CREATE INDEX ON job USING GIN (title_tsv);
     
    -- Vector search index
    CREATE INDEX ON job USING ivfflat (skills_vector vector_cosine_ops) WITH (lists = 100);
     
    -- Geospatial index
    CREATE INDEX ON job USING GIST (location);
  • Table partitioning – Divide large tables into more manageable chunks based on logical divisions to prune the data unnecessary to search.
    -- Partition job listings by posting date
    CREATE TABLE job (
        job_id BIGINT,
        title TEXT,
        description TEXT,
        posting_date DATE,
        -- other columns
    ) PARTITION BY RANGE (posting_date);
    

What about other applications?

While our focus in this post is job search platforms, the architecture discussed applies to a wide range of applications. The following table captures some examples.

Application Full Text Search Vector Search Geospatial Search How
E-commerce Product Discovery Product names, descriptions, and specifications “Similar products” recommendations based on product embeddings Local availability and delivery time estimation Helps shoppers find products matching their specific requirements while also discovering related items they might be interested in, filtered by what’s available in their region.
Real estate platforms Property features, amenities, and descriptions Find properties with similar overall characteristics Neighborhood analysis and proximity to points of interest Helps homebuyers find properties meeting their explicit criteria while also discovering neighborhoods they hadn’t considered but match their lifestyle preferences.
Content recommendation systems
Content recommendation systems Topic-specific articles or videos Thematically similar content based on embeddings Locally relevant news and events Enables both precise content discovery and serendipitous recommendations contextually relevant to the user’s location and interests.
Travel and hospitality Accommodation amenities and features “Places similar to this one” recommendations Proximity to attractions, transportation, and activities Helps travelers find accommodations that meet their specific requirements while also discovering options in areas they might not have initially considered.
Healthcare provider matching Medical specialties and treatments Providers with similar practice patterns and patient reviews Proximity and accessibility Helps patients find providers who match their specific medical needs while considering factors like practice style and convenient location.

Conclusion

PostgreSQL’s combination of full-text search, vector similarity search, and geospatial capabilities makes it a versatile platform for building sophisticated search applications. By integrating these search techniques within a single database system, PostgreSQL enables developers to implement complex, multi-dimensional search experiences without managing multiple specialized systems.

For job search platforms, this integrated approach allows matching candidates to jobs based on explicit skills and requirements (full-text search), conceptually related experience and qualifications (vector search), and practical location considerations (geospatial search). The ability to combine these dimensions can potentially deliver more relevant matches than any single search technique could provide alone.

Try out these search techniques for your own use case and share your feedback and questions in the comments.


About the authors

Ezat Karimi

Ezat Karimi

Ezat is a Senior Solutions Architect at AWS, based in Austin, TX. Ezat specializes in designing and delivering modernization solutions and strategies for database applications. Working closely with multiple AWS teams, Ezat helps customers migrate their database workloads to the AWS cloud.