Overview
DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility. The company's enterprise SaaS offering, DataHub Cloud, delivers a fully-managed solution with AI-powered discovery, observability, and governance capabilities. Organizations rely on DataHub solutions to accelerate time-to-value from their data investments, ensure AI system reliability, and implement unified governance - enabling AI & data to work together and bring order to data chaos.
For Data Analysts, developers, data scientists, and automated workflows:
Easily find trusted datasets with the most current data
- Access data where you work with a chrome extension for BI tools
- Discover data your way - personalization for multiple business and technical user profiles
- Support AI models and automations with a metadata graph that keeps up with today's data volume and velocity
- Understand data provenance with table, column, and job level lineage graphs
- Auto-enrich metadata with no-code automation
- Use AI-generated documentation and propagation to better understand context
- Always stay up-to-date with subscriptions to assets, activity and notifications
For Data Engineers:
Deliver reliable data quality
- Provide end-to-end observability with user-created data quality checks and reports
- Surface data quality results and impact analysis across all points in lineage
- Monitor freshness SLAs, data volume, table schemas, column quality, and custom SQL
- Use AI Anomaly Detection for freshness, volume, and column stats
- Easily keep an eye on data quality with assertions and AI-based smart assertions
- Evaluate data contracts and quality checks on-demand with API
- Get notified where you work (slack, email, and more)
- Easily manage data quality with a data health dashboard
For Data Governance:
Ensure continuous AI & data governance in production versus episodic compliance checks
- Ensure every AI & data asset is accounted for by defining and enforcing documentation standards
- Integrate governance practices early with automated shift-left governance
- Automatically classify your data as it moves and transforms with lineage-driven compliance
- Keep tags harmonized with seamless metadata flow between DataHub and source systems
- Deliver continuous compliance monitoring with forms, impact analysis, and reporting
- Create and implement bespoke compliance approval workflows
Highlights
- Search All Corners of Your Data Stack- DataHub's unified search experience surfaces results across databases, data lakes, BI platforms, ML feature stores, orchestration tools, and more.
- Trace End-to-End Lineage- Quickly understand the end-to-end journey of data by tracing lineage across platforms, datasets, ETL/ELT pipelines, charts, dashboards, and beyond.
- View Metadata 360 at a Glance- Combine technical, operational and business metadata to provide a 360 degree view of your data entities.Generate Dataset Stats to understand the shape & distribution of the data.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Buyer guide

Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/12 months |
|---|---|---|
Discover & Govern | Up to 20 Monthly Active Users | $75,000.00 |
Vendor refund policy
All fees are non-cancellable and non-refundable except as required by law.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Vendor resources
Support
Vendor support
Email support is offered Monday - Friday during regular business hours.
marketplace@datahub.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Standard contract
Customer reviews
Centralizes data knowledge for all teams but has faced heavy infrastructure and maintenance demands
What is our primary use case?
Since many developers were developing and creating data models, and there were no DDLs left in the company, we had to recreate all the descriptions of tables and clarify which columns meant what. We made it a place where all stakeholders in our company could log in and see which data were used for which data marts, which column values meant for which definitions, and how they were measured. We primarily used Data Hub for sharing information and increasing the data literacy of our company.
What is most valuable?
The data injection feature was valuable to me. If comments were inserted in the tables, the data would automatically gather and enter all necessary data into Data Hub. Additionally, the data lineage graphs were really helpful in showing how data flowed to data marts and which columns were used for creating other columns. Those were helpful but somewhat hard to manage features.
What needs improvement?
I am not familiar with how people use Data Hub as a knowledge system for developing LLM models or RAG, so I am uncertain what improvements could be made in that sense. However, the way we used it, the data was quite heavy because it consisted of multiple components such as graph DBs, Elasticsearch, Kafka for data injection, and MySQL for metadata storage. This made it somewhat bulky on our server when we deployed Data Hub, and we had difficulty managing the memory constraints and disk usage.
When we used Data Hub, we attempted to provide different servers for different components, but we could not find good manuals on how to use it in a more productive server manner. Having guidelines or manuals on this could be helpful.
For how long have I used the solution?
I used Data Hub for about a year.
What do I think about the stability of the solution?
Other than out-of-memory issues, Data Hub was stable because we did not have to restart services much except for memory issues. The overall service itself was stable, but it was very bulky.
What do I think about the scalability of the solution?
For growing data graphs or data lineage, scalability depended on our manpower. From that understanding, Data Hub's scalability was not as great as we expected because we could not enter or obtain better data for all our tables because it was too much for our data teams. We had to be selective in that matter. From our understanding, we could not really enjoy the scalability of the data.
How are customer service and support?
We were fortunate not to need customer service, but we did not know that technical support for Data Hub was possible. If we had been aware of that, we could have asked for support or guidance.
How was the initial setup?
The initial deployment of Data Hub itself was easy because we could get the Docker Compose YAML file and run the Docker command to get the service up. However, since Data Hub uses various types of components, we had trouble assigning each server for each component and connecting them via network. We overcame that problem.
Which other solutions did I evaluate?
We searched for other use cases and how other companies were developing their own solutions for using data, but we have not been able to use other products similar to Data Hub itself.
What other advice do I have?
I am not familiar with how the data is priced, so I cannot answer that question. Since we were running all the components on one server, there were issues when data injection was occurring too frequently, and sometimes the server could run out of memory or the disk storage could become full. We had maintenance issues that we had to handle in terms of memory and disk storage. We had to alter some injection strategies and timelines so that the data would not grow from too many tables at the same time. These maintenance jobs took place regularly. I would rate this product a 7.
Centralized metadata has enabled us to build an enterprise catalog and streamline data discovery
What is our primary use case?
Our main use case for Data Hub is to build our enterprise data catalog within Visa using the open-source version.
We use Data Hub to build pipelines and to construct our enterprise data catalog to see where the data is coming from, how the lineage flows, where the lineage of the data originates from, and how the metadata propagation occurs. With this metadata information and the description of all the fields, we have built a layer on top of this that performs natural language querying for people to find where and how the data comes from.
What is most valuable?
The best features that Data Hub offers include metadata propagation and lineage propagation.
These features have specifically helped my team and our workflows by enabling people to find the right data. We have different sets of data that include business data and application data. People who are new, including data analysts, machine learning scientists, or data scientists, can easily find the specific data they are looking for because it is all centralized in one place.
Data Hub has positively impacted our organization by centralizing and co-locating all data through metadata, and we have made this our enterprise metadata catalog rather than having disorganized information across different teams. It has saved time for many data analysts and data scientists to find the right data.
What needs improvement?
I have no comments on how Data Hub can be improved at this time.
For how long have I used the solution?
I have been using Data Hub for the past four years.
What other advice do I have?
On a scale of one to ten, I rate Data Hub a nine.
I chose nine out of ten because Data Hub is a single solution that we could adopt easily and build our platform on top of it. It provided all the features that we needed, which is why I gave that rating.
Regarding Data Hub's AI capabilities, I have not explored its governance and security features, but I would like to explore them. I have not gone through the AI features of Data Hub concerning the accuracy and reliability of output.
My advice to others looking into using Data Hub is that it is a fantastic tool for people who want to centralize and keep all the data discoverable in one single place. I would highly recommend using it. I give this review an overall rating of nine out of ten.
Data mapping has improved metadata completeness and now supports faster business data discovery
What is our primary use case?
Within the engineering teams of Renault, there was a lot of data without sufficient metadata, such as descriptions of tables and columns. The objective was to complete the definitions and descriptions of business data objects within the glossary and map these descriptions to the tables and columns that comprise the data sets of this engineering department to ensure a comprehensive experience when searching for data, providing adequate definitions and descriptions of the data used in this department.
I use Data Hub within two of my clients. With Renault, the car constructor, they changed their data catalog from Zeenea to Data Hub, and I have a mission to contribute to the enrichment of this data catalog by conducting workshops with data providers, data stewards, and all the stakeholders involved in this data catalog. The aim of this mission is to map real data to the definitions and descriptions of business data objects available in the company's glossary. My second mission was with Hitachi Rail, a company that provides rail services, where the mission involved benchmarking several data catalogs including OpenMetadata and Data Galaxy . Data Hub was chosen for its available functionalities, with the task of implementing this data catalog with a specific scope and then completing the usage of this data if everything works well.
What is most valuable?
The best feature that Data Hub offers in my experience is the ability to map between real data and data sets.
The mapping feature helps my team and clients significantly because it addresses the lack of metadata information about the tables and columns used in the company's data lake, enriching the data catalog considerably through this mapping.
Data Hub positively impacts my organization and clients by making it easier to search for data. It facilitates easier collaboration and helps save time. However, concerning data quality, it is not sufficiently equipped as it lacks components to evaluate the data quality level, which is a feature available in other data catalogs, indicating an area for improvement.
What needs improvement?
Data Hub can be improved in several ways, primarily by enhancing the data quality evaluation capabilities. Additionally, I would suggest improving the hierarchy of business glossary terms, as understanding the characteristics of each business data object can be challenging within the current structure of business glossary terms in Data Hub.
For how long have I used the solution?
What do I think about the scalability of the solution?
How are customer service and support?
What other advice do I have?
I have conducted benchmarks with OpenMetadata and Data Galaxy