AWS Big Data Blog
Amazon OpenSearch Service 101: Create your first search application with OpenSearch
Organizations today face the challenge of managing and deriving insights from an ever-expanding universe of data in real time. Industrial Internet of Things (IoT) sensors stream millions of temperature, pressure, and performance metrics from field equipment every second. Ecommerce platforms need to surface relevant products from vast catalogs instantly. Security teams must analyze system logs in real time to detect threats. As data volumes grow, organizations increasingly struggle with fragmented monitoring tools that create critical visibility gaps and slow incident response times. The cost of commercial observability solutions becomes prohibitive, forcing teams to manage multiple separate tools and increasing both operational overhead and troubleshooting complexity. Across these diverse scenarios, the ability to efficiently search, analyze, and visualize data in real time has become crucial for business success.
Amazon OpenSearch Service addresses these challenges by providing a fully managed search and analytics service. This managed service configures, manages, and scales OpenSearch clusters so you can focus on your search workloads and end customers. Amazon OpenSearch Serverless further makes it straightforward to run search and log analytics workloads by automatically scaling compute and storage resources up and down to match your application’s demands—with no infrastructure to manage. Whether you’re processing continuous streams of IoT telemetry, enabling product discovery, or performing security analytics, OpenSearch Service scales to meet your needs.
In this post, we walk you through a search application building process using Amazon OpenSearch Service. Whether you’re a developer new to search or looking to understand OpenSearch fundamentals, this hands-on post shows you how to build a search application from scratch—starting with the initial setup; diving into core components such as indexing, querying, result presentation; and culminating in the execution of your first search query.
Components of OpenSearch Service
Before building your first search application, it’s important to understand some key architectural components in OpenSearch. The fundamental unit of information in OpenSearch is a document stored in JSON format. These documents are organized into indices—collections of related documents that function similar to database tables. When you search for information, OpenSearch queries these indices to find matching documents.
OpenSearch operates on a distributed architecture where multiple servers, called nodes, work together in a cluster or domain. Each cluster can utilize dedicated master nodes that focus solely on cluster management tasks, such as maintaining cluster state, managing indices, and orchestrating shard allocation. These specialized nodes enhance cluster stability by offloading cluster management duties from data nodes. Data nodes, on the other hand, handle the storage, indexing, and querying of data—essentially performing the heavy lifting of data operations. Together, they provide scalability, availability, and efficient data processing in the cluster. Configure dedicated coordinator nodes that specialize in routing and distributing search and indexing requests across the cluster. These nodes reduce the load on data nodes, which allows them to focus on data storage, indexing, and search operations.
Coordinator nodes in OpenSearch are most beneficial in the following scenarios:
- Large cluster deployments – When managing substantial data volumes across many nodes.
- Query-intensive workloads – For environments handling frequent search queries or aggregations, especially those with complex date histograms or multiple aggregations, benefit from faster query processing.
- Heavy dashboard utilization – OpenSearch Dashboards can be resource-intensive. Offloading this responsibility to dedicated coordinator nodes reduces the strain on data nodes.
To manage large datasets efficiently, OpenSearch splits indices into smaller pieces called shards. Each shard is distributed across the cluster, with a recommended size of 10–50 GB for optimal performance. For reliability and high availability, OpenSearch maintains replica copies of these shards on different nodes, which means that your data remains accessible even if some nodes fail.
Search operations in OpenSearch are powered by inverted indices, a data structure that maps terms to the documents containing them. The BM25 ranking algorithm helps make sure that search results are relevant to users’ queries. Although searches happen in near real time, with configurable refresh intervals, individual document retrievals are immediate.
This architecture provides the foundation for handling high-volume IoT data streams, complex full-text search operations, and real-time analytics, all while maintaining fault tolerance. Understanding these components will help you make informed decisions as you build your search application.OpenSearch Dashboards is a visualization and analytics tool for exploring, analyzing, and visualizing data in real time. It provides an intuitive interface for querying, monitoring, and reporting on OpenSearch data using visualizations such as charts, graphs, and maps. Key features include interactive dashboards, alerting, anomaly detection, security monitoring, and trace analytics.
Sample Amazon OpenSearch Service tutorial application overview
The following architecture diagram demonstrates how to build and deploy a scalable, fully managed search application on Amazon Web Services (AWS). The architecture uses Amazon OpenSearch Service for indexing and searching data. The UI application is deployed on AWS App Runner and interacts with Amazon OpenSearch Service through secure serverless Amazon API Gateway and AWS Lambda.
Here is the end-to-end workflow for our application detailing how user requests are handled from initial access through to data retrieval or indexing:
- Users access the application through AWS App Runner, which hosts the frontend interface.
- Amazon Cognito handles user authentication and authorization for secure access to the application.
- When users interact with the application, their requests are sent to API Gateway. API Gateway communicates with Amazon Cognito to verify user authentication status. It serves as the primary entry point for all API operations and routes the requests appropriately. It forwards requests to Lambda functions within the virtual private cloud (VPC).
- Lambda functions process the requests, performing either:
- Data indexing operations into OpenSearch Service
- Search queries against the OpenSearch Service cluster
- The OpenSearch Service cluster resides within a private subnet in a VPC for enhanced security.
Prerequisites
Before you deploy the solution, review the prerequisites.
Install the sample app
The entire infrastructure is deployed using AWS Cloud Development Kit (AWS CDK), with cluster configurations customizable through the cdk.json file on GitHub. This deployment approach provides consistent and repeatable infrastructure creation while maintaining security best practices. The steps to deploy this infrastructure are available in this README file. After deployment, you’ll access a comprehensive search application built with Cloudscape React components that includes:
- Interactive search functionality – Test various OpenSearch query methods including prefix match keyword searches, phrase matching, fuzzy searches, and field-specific queries against the sample product dataset
- Document management tools – Bulk index the product catalog with a single click or delete and recreate the index as needed for testing purposes
- Educational resources – Access embedded guides explaining OpenSearch concepts, query syntax, and best practices
Index the documents
After you’ve deployed this search application, the first step is to index some documents into OpenSearch Service. Sign in to the search application UI and follow these steps:
- To trigger a bulk index process, under Index Documents in the navigation pane, choose Bulk Index Product Catalog.
- Choose Index Product catalog, as shown in the following screenshot.
The Lambda function indexes a comprehensive ecommerce product catalog into your newly created OpenSearch Service cluster. This sample dataset includes detailed fashion and lifestyle products spanning multiple categories. Each product record contains rich metadata, including title, detailed description, category, color, and price.
Keyword searches
OpenSearch Service offers multiple search features. For an exhaustive list, refer to Search features. We focus on a few keyword search types to help you get started with OpenSearch.
With the product catalog in OpenSearch, you can perform prefix searches through the search application’s intuitive interface. To better understand the search functionality, expand the Guide section at the top of the interface. This interactive guide explains how various kinds of searches work, complete with a practical example in context of the product catalog dataset. The guide includes best practices and a link to the detailed documentation to help you make the most of OpenSearch’s powerful query capabilities.
You can do a prefix search on any of the three key search fields: Title, Description, or Color.
A typical prefix match query looks like this:
You can use this query pattern to find documents where specific fields begin with your search term, offering an intuitive “starts with” search experience.
The following image illustrates a practical example of the Prefix Match search. Entering “Ru” in the title field matches products with titles such as “Running”, “Runners” and “Ruby.” Prefix Match search is particularly useful when users only remember the beginning of a product name or are searching across multiple variations or simply exploring product categories.
Multi Match search enables searching across multiple fields simultaneously. For example, you can search for “Coral” across product title, description, and color fields simultaneously. The search query can be customized using field boosting in which matches in certain fields carry more weight than others.
A typical multi match query looks like this:
You can explore Wildcard Match, Range Filter, and other search features through the search application. For developers and administrators managing this search infrastructure, OpenSearch Dashboards is a native, developer-friendly interface for indexing, searching, and managing your data. It serves as a comprehensive control center where you can interact directly with your indices, test queries, and monitor performance in real time. The following screenshot shows OpenSearch Dashboards which provides an interactive UI to explore, analyze and visualize search and log data.
While our example demonstrates lexical search functionality on a sample product catalog, OpenSearch Service is equally powerful for observability usecases. When handling time-series data from logs, metrics, or traces, OpenSearch excels at real-time analytics and visualization. For instance, DevOps teams can index application logs and system telemetry data, then use date histograms and statistical aggregations to identify performance bottlenecks or security anomalies as they occur. This real-time search allows IT teams to detect and respond to incidents with minimal delay. Using OpenSearch Dashboards, teams can create live operational dashboards that update automatically as new data streams in. For IoT applications monitoring thousands of sensors, this means temperature anomalies or equipment failures can trigger immediate alerts through OpenSearch’s alerting capabilities. These observability workloads benefit from the same distributed architecture that powers our product search example, with the added advantage of time-series optimized indices and retention policies for managing high-volume streaming data efficiently.
Beyond search management, you can configure alerts for specific conditions, set up notification channels for operational events, and enable data discovery features. If you want to experiment with the same search queries we implemented in our application, you can launch OpenSearch Dashboards and use relevant index and search APIs from the Dev Tools section, which is an ideal environment for developing and testing before implementing in your production application. Because our OpenSearch Service cluster resides within a private subnet, you need to create a Secure Shell (SSH) tunnel to access the dashboard. For more information and steps to do this, refer to How do I use an SSH tunnel to access OpenSearch Dashboards with Amazon Cognito authentication from outside a VPC? in the Knowledge Center. So far, we’ve explored OpenSearch’s query domain-specific language (DSL). However, for those coming in from a traditional database background, OpenSearch also offers SQL and Piped Processing Language (PPL) functionality, making the transition smoother. You can explore more on this at SQL and PPL in the OpenSearch documentation.
In this post, we introduced you to different types of keyword searches. You can also store documents as vector embeddings in OpenSearch and use it for semantic search, hybrid search, multimodal search, or to implement Retrieval Augmented Generation (RAG) pattern.
Conclusion
You can now build sample search applications by following the steps outlined in this post and the implementation details available at sample-for-amazon-opensearch-service-tutorials-101 on GitHub. By using the distributed architecture of Amazon OpenSearch Service, an AWS managed service, you get fast, scalable search capabilities that grow with your business, built-in security and compliance controls, and automated cluster management—all with pay-only-for-what-you-use pricing flexibility.
Ready to learn more? Check out the Amazon OpenSearch Service Developer Guide. For more insights, best practices and architectures, and industry trends, refer to Amazon OpenSearch Service blog posts and hands-on workshops at AWS Workshops. Please also visit the OpenSearch Service Migration Hub if you are ready to migrate legacy or self-managed workloads to OpenSearch Service.
We hope this detailed guide and accompanying code will help you get started. Try it out, let us know your thoughts in the comments section, and feel free to reach out to us for questions!
About the authors
Sriharsha Subramanya Begolli works as a Senior Solutions Architect with Amazon Web Services (AWS), based in Bengaluru, India. His primary focus is assisting large enterprise customers in modernizing their applications and developing cloud-based systems to meet their business objectives. His expertise lies in the domains of data and analytics.
Fraser Sequeira is a Startups Solutions Architect with Amazon Web Services (AWS) based in Melbourne, Australia. In his role at AWS, Fraser works closely with startups to design and build cloud-native solutions on AWS, with a focus on analytics and streaming workloads. With over 10 years of experience in cloud computing, Fraser has deep expertise in big data, real-time analytics, and building event-driven architecture on AWS. He enjoys staying on top of the latest technology innovations from AWS and sharing his learnings with customers. He spends his free time tinkering with new open source technologies.