Amazon OpenSearch Ingestion
Ingest, transform and route data at scale to Amazon OpenSearch Domains and Serverless collections
Why Amazon OpenSearch Service Ingestion?
Benefits of Amazon OpenSearch Service
Cost optimization
Data quality
Data protection
Security and compliance
Key features
AWS is a leading contributor of the OpenSearch project, which many customers use. You’ll get all of the new innovations for OpenSearch Data Prepper within this managed service. Beyond those features, which the community drives and contributes to, Amazon OpenSearch Ingestion Service also brings these capabilities:
- AWS-managed software installation and patching
- AWS monitors and repairs the service, 24x7
- AWS upgrades versions
- Zero downtime for updates and upgrades
- Availability SLA: 99.9%
- Serverless, with automatic scaling for ingestion workloads
Customers and partners
CyberArk customer review
“At CyberArk EPM (Endpoint Privilege Manager), a cloud-based multi-tenant system, we manage millions of endpoints and collect high-traffic data events using AWS OpenSearch. By leveraging Amazon OpenSearch Ingestion, we replaced our previous self-managed Logstash pipeline with an AWS-managed one, which eliminated the burden of managing our own infrastructure and provided us with a more scalable, cost-effective, reliable, and secure architecture for our data ingestion. This decision was made with the added advantage of CyberArk EPM achieving FedRAMP High In-Process status, while Amazon OpenSearch Ingestion already being FedRAMP compliant, allowing us to keep high level of security in our offering."
Ori Doolman, Senior Software Architect - CyberArk EPM
 
 
                      Calyptia customer review
“At Calyptia we’ve been working with data ingestion for 12+ years as the creators and maintainers of the Cloud Native Computing Foundation project, Fluentd and Fluent Bit. With the latest versions of these projects we are excited for users to gain more control in their first mile with the combination of the Fluent projects and OpenSearch Ingestion Service. With the ingestion service users can continue to scale agents and processing without having to worry about managing and maintaining infrastructure.”
Anurag Gupta, Co-founder - Calyptia
 
 
                      Confluent customer review
“We are thrilled to partner with the Amazon OpenSearch team as they build their OpenSearch Ingestion service, which will provide a native integration with Apache Kafka and Confluent. This integration will help our joint customers access real-time data via Apache Kafka inside OpenSearch so they can rethink customer experiences, build real-time backend operations, or launch new products and services. As the leading contributor to Apache Kafka, Confluent has 10X’ed Kafka by building a complete and cloud-native data streaming platform that allows you to move data from wherever it is created to where businesses can take action in the multi-SaasS world we all live in. This allows OpenSearch users to benefit from the 100's of data sources that Confluent is integrated with. We are excited to see what our joint customers build as they set data in motion with Confluent and OpenSearch.”
Paul Mac Farland, VP of Partner & Innovation Ecosystem - Confluent
 
 
                      Page topics
Ingestion FAQs
Open allWhy should I use Amazon OpenSearch Ingestion?
Amazon OpenSearch Ingestion is a data ingestion tier that enables you to filter, enrich, transform, normalize and aggregate data for downstream analytics and visualization in Amazon OpenSearch domains and Amazon OpenSearch Serverless collections. Amazon OpenSearch Ingestion allows you to create custom data pipelines to improve the operational view of your applications. The serverless nature of Amazon OpenSearch Ingestion abstracts away the complexities of self-managing data pipelines and ensure that the processing capabilities of your data pipelines auto-scales as per the demands of your workloads. With Amazon OpenSearch Ingestion, you can
- Realize storage cost reductions by data deduplication, and sampling to prevent noisy data from being indexed in Amazon OpenSearch.
- Enforce data quality and adopt common schemas by transforming, formatting, and enriching data before it is indexed in Amazon OpenSearch domains making it easier to troubleshoot issues.
- Redact or obfuscate sensitive information before it gets to a destination enabling compliancy with data residency laws.
What are the major components of an Amazon OpenSearch pipeline?
An Amazon OpenSearch Ingestion pipeline consists of three major components:
- Source is the input component of a pipeline. It defines the mechanism through which a pipeline consumes records. The source can consume records either by receiving data over http/s or by reading from external 3rd part endpoints.
- Processors are intermediate processing units that can filter transform, and enrich records into a desired format before publishing them to the sink. The processor is an optional component of a pipeline. If you don't define a processor, records are published in the format defined in the source. You can have more than one processor. Processors are executed in the order that you define them in the pipeline.
- Sink is the output component of a pipeline. It defines one or more destinations to which a pipeline publishes records. A sink can also be another pipeline, which allows you to chain multiple pipelines together.
What kind of data can I ingest using Amazon OpenSearch Ingestion?
Amazon OpenSearch supports ingesting all types of data that you would normally index in an Amazon OpenSearch domain. This includes but is not limited to structured, unstructured, textual, numerical and geospatial data. OpenSearch Ingestion also supports ingestion of all three pillars of the observability data: logs, metrics and traces. You can use OpenSearch Ingestion along with its support for a rich ecosystem of data sources, processors and sinks to transform your data before storing it in Amazon OpenSearch domains. With OpenSearch Ingestion, you no longer have to write custom lambda function or self-manage Logstash and Elasticsearch ingest nodes to ingest data that needs to be indexed in Amazon OpenSearch clusters. Please refer to our documentation page to see the list of sources, processors and sinks supported by Amazon OpenSearch Ingestion.
How does Amazon OpenSearch Ingestion relate to Amazon OpenSearch project?
Amazon OpenSearch Ingestion is a data ingestion tier that pre-processes data before the data is indexed in Amazon OpenSearch Service. OpenSearch Ingestions is built with Data Prepper which is a component of the OpenSearch project and supports all data formats, sources, processors and sinks supported by Data Prepper.
How do I get started with using Amazon OpenSearch Ingestion?
To get started with Amazon OpenSearch Ingestion, you begin by defining a data pipeline. An OpenSearch Ingestion pipeline is the core of your business logic and consists of a source, a single or a series of processors and a sink. You define your pipeline configuration via a YAML file which contains details of your source, processors and sinks. OpenSearch Ingestion also enables you to set up a minimum and maximum capacity of the OpenSearch Compute Units for Ingestion (OCUs) that you want to set per pipeline. Finally, you can choose on how your data reaches your OpenSearch Ingestion pipelines:
- VPC access: For VPC access, we establish a Private Link from your VPC to the Amazon OpenSearch Ingestion pipeline. This provides private connectivity to your pipelines without exposing your traffic to the public internet.
- Public access: In this network configuration, your data to your OpenSearch pipelines flows over the public internet.
You can get started with creating a data pipeline via the AWS Console or the AWS command line.