Listing Thumbnail

    DataHub

     Info
    Sold by: Datahub 
    Deployed on AWS
    DataHub vision is to bring clarity to your data through its next-generation multi-cloud metadata management platform. The technology is based on LinkedIn DataHub and Apache Gobblin - two successful open-source projects incubated at LinkedIn and battle-hardened in production at scale at major enterprises.
    4.1

    Overview

    DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility. The company's enterprise SaaS offering, DataHub Cloud, delivers a fully-managed solution with AI-powered discovery, observability, and governance capabilities. Organizations rely on DataHub solutions to accelerate time-to-value from their data investments, ensure AI system reliability, and implement unified governance - enabling AI & data to work together and bring order to data chaos.

    For Data Analysts, developers, data scientists, and automated workflows:
    Easily find trusted datasets with the most current data

    • Access data where you work with a chrome extension for BI tools
    • Discover data your way - personalization for multiple business and technical user profiles
    • Support AI models and automations with a metadata graph that keeps up with today's data volume and velocity
    • Understand data provenance with table, column, and job level lineage graphs
    • Auto-enrich metadata with no-code automation
    • Use AI-generated documentation and propagation to better understand context
    • Always stay up-to-date with subscriptions to assets, activity and notifications

    For Data Engineers:
    Deliver reliable data quality

    • Provide end-to-end observability with user-created data quality checks and reports
    • Surface data quality results and impact analysis across all points in lineage
    • Monitor freshness SLAs, data volume, table schemas, column quality, and custom SQL
    • Use AI Anomaly Detection for freshness, volume, and column stats
    • Easily keep an eye on data quality with assertions and AI-based smart assertions
    • Evaluate data contracts and quality checks on-demand with API
    • Get notified where you work (slack, email, and more)
    • Easily manage data quality with a data health dashboard

    For Data Governance:
    Ensure continuous AI & data governance in production versus episodic compliance checks

    • Ensure every AI & data asset is accounted for by defining and enforcing documentation standards
    • Integrate governance practices early with automated shift-left governance
    • Automatically classify your data as it moves and transforms with lineage-driven compliance
    • Keep tags harmonized with seamless metadata flow between DataHub and source systems
    • Deliver continuous compliance monitoring with forms, impact analysis, and reporting
    • Create and implement bespoke compliance approval workflows

    Highlights

    • Search All Corners of Your Data Stack- DataHub's unified search experience surfaces results across databases, data lakes, BI platforms, ML feature stores, orchestration tools, and more.
    • Trace End-to-End Lineage- Quickly understand the end-to-end journey of data by tracing lineage across platforms, datasets, ETL/ELT pipelines, charts, dashboards, and beyond.
    • View Metadata 360 at a Glance- Combine technical, operational and business metadata to provide a 360 degree view of your data entities.Generate Dataset Stats to understand the shape & distribution of the data.

    Details

    Sold by

    Delivery method

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Buyer guide

    Gain valuable insights from real users who purchased this product, powered by PeerSpot.
    Buyer guide

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Pricing is based on the duration and terms of your contract with the vendor. This entitles you to a specified quantity of use for the contract duration. If you choose not to renew or replace your contract before it ends, access to these entitlements will expire.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    12-month contract (1)

     Info
    Dimension
    Description
    Cost/12 months
    Discover & Govern
    Up to 20 Monthly Active Users
    $75,000.00

    Vendor refund policy

    All fees are non-cancellable and non-refundable except as required by law.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Software as a Service (SaaS)

    SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.

    Resources

    Support

    Vendor support

    Email support is offered Monday - Friday during regular business hours.
    marketplace@datahub.com 

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Product comparison

     Info
    Updated weekly

    Accolades

     Info
    Top
    10
    In Data Catalogs
    Top
    10
    In Data Catalogs, Data Governance, Master Data Management
    Top
    10
    In Data Catalogs, Data Governance

    Customer reviews

     Info
    Sentiment is AI generated from actual customer reviews on AWS and G2
    Reviews
    Functionality
    Ease of use
    Customer service
    Cost effectiveness
    Positive reviews
    Mixed reviews
    Negative reviews

    Overview

     Info
    AI generated from product descriptions
    Unified Search Across Data Stack
    Search functionality that surfaces results across databases, data lakes, BI platforms, ML feature stores, and orchestration tools within a multi-cloud environment.
    End-to-End Lineage Tracing
    Lineage tracking capability that traces data journey across platforms, datasets, ETL/ELT pipelines, charts, and dashboards at table, column, and job levels.
    AI-Powered Metadata Management
    Metadata graph with AI-generated documentation, AI anomaly detection for freshness and volume metrics, and smart assertions for data quality monitoring.
    Data Quality Monitoring and Observability
    End-to-end observability with user-created data quality checks, freshness SLA monitoring, schema tracking, column quality assessment, and custom SQL evaluation through API.
    Automated Governance and Compliance
    Lineage-driven compliance classification, automated shift-left governance integration, continuous compliance monitoring with forms and impact analysis, and metadata harmonization across source systems.
    Metadata Centralization
    Centralizes metadata from disparate sources into a unified platform for discovering, describing, governing, and managing data assets including data, BI reports, and AI models.
    Behavioral Analysis Engine
    Incorporates a Behavioral Analysis Engine to provide advanced analytics and insights across data assets.
    Data Lineage and Tracking
    Enables documentation of insights and tracking of data lineage across teams for transparency and compliance purposes.
    Self-Service Analytics
    Supports self-service analytics capabilities allowing users to independently discover and analyze data assets.
    AI Governance Framework
    Provides an AI governance framework that ensures data quality, transparency, and compliance for AI initiatives.
    AI Governance Framework
    Active metadata-based governance with rules, processes and responsibilities to ensure ethical AI practices, mitigate risk, adhere to legal requirements, and protect privacy
    Automated Data Lineage
    End-to-end lineage tracking providing transparency into data transformation and flow across systems, including both summary-level business lineage and detailed technical lineage
    Unified Data Catalog
    Multi-cloud and hybrid environment data discovery with business context including data origin, ownership, usage patterns, and access to reports, AI models and data products
    Data Quality Automation
    Automated monitoring and rule management system for enterprise-wide data quality management replacing manual processes
    Privacy and Compliance Workflow
    Centralized automation of privacy workflows to operationalize privacy requirements and address global regulatory compliance

    Contract

     Info
    Standard contract
    No
    No

    Customer reviews

    Ratings and reviews

     Info
    4.1
    20 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    40%
    60%
    0%
    0%
    0%
    12 AWS reviews
    |
    8 external reviews
    External reviews are from PeerSpot .
    reviewer2866170

    Centralizes data knowledge for all teams but has faced heavy infrastructure and maintenance demands

    Reviewed on Jun 29, 2026
    Review provided by PeerSpot

    What is our primary use case?

    Since many developers were developing and creating data models, and there were no DDLs left in the company, we had to recreate all the descriptions of tables and clarify which columns meant what. We made it a place where all stakeholders in our company could log in and see which data were used for which data marts, which column values meant for which definitions, and how they were measured. We primarily used Data Hub for sharing information and increasing the data literacy of our company.

    What is most valuable?

    The data injection feature was valuable to me. If comments were inserted in the tables, the data would automatically gather and enter all necessary data into Data Hub. Additionally, the data lineage graphs were really helpful in showing how data flowed to data marts and which columns were used for creating other columns. Those were helpful but somewhat hard to manage features.

    What needs improvement?

    I am not familiar with how people use Data Hub as a knowledge system for developing LLM models or RAG, so I am uncertain what improvements could be made in that sense. However, the way we used it, the data was quite heavy because it consisted of multiple components such as graph DBs, Elasticsearch, Kafka for data injection, and MySQL  for metadata storage. This made it somewhat bulky on our server when we deployed Data Hub, and we had difficulty managing the memory constraints and disk usage.

    When we used Data Hub, we attempted to provide different servers for different components, but we could not find good manuals on how to use it in a more productive server manner. Having guidelines or manuals on this could be helpful.

    For how long have I used the solution?

    I used Data Hub for about a year.

    What do I think about the stability of the solution?

    Other than out-of-memory issues, Data Hub was stable because we did not have to restart services much except for memory issues. The overall service itself was stable, but it was very bulky.

    What do I think about the scalability of the solution?

    For growing data graphs or data lineage, scalability depended on our manpower. From that understanding, Data Hub's scalability was not as great as we expected because we could not enter or obtain better data for all our tables because it was too much for our data teams. We had to be selective in that matter. From our understanding, we could not really enjoy the scalability of the data.

    How are customer service and support?

    We were fortunate not to need customer service, but we did not know that technical support for Data Hub was possible. If we had been aware of that, we could have asked for support or guidance.

    How was the initial setup?

    The initial deployment of Data Hub itself was easy because we could get the Docker  Compose YAML file and run the Docker  command to get the service up. However, since Data Hub uses various types of components, we had trouble assigning each server for each component and connecting them via network. We overcame that problem.

    Which other solutions did I evaluate?

    We searched for other use cases and how other companies were developing their own solutions for using data, but we have not been able to use other products similar to Data Hub itself.

    What other advice do I have?

    I am not familiar with how the data is priced, so I cannot answer that question. Since we were running all the components on one server, there were issues when data injection was occurring too frequently, and sometimes the server could run out of memory or the disk storage could become full. We had maintenance issues that we had to handle in terms of memory and disk storage. We had to alter some injection strategies and timelines so that the data would not grow from too many tables at the same time. These maintenance jobs took place regularly. I would rate this product a 7.

    Somashekar Venkataramaiah

    Centralized metadata has enabled us to build an enterprise catalog and streamline data discovery

    Reviewed on Jun 28, 2026
    Review provided by PeerSpot

    What is our primary use case?

    Our main use case for Data Hub is to build our enterprise data catalog within Visa using the open-source version.

    We use Data Hub to build pipelines and to construct our enterprise data catalog to see where the data is coming from, how the lineage flows, where the lineage of the data originates from, and how the metadata propagation occurs. With this metadata information and the description of all the fields, we have built a layer on top of this that performs natural language querying for people to find where and how the data comes from.

    What is most valuable?

    The best features that Data Hub offers include metadata propagation and lineage propagation.

    These features have specifically helped my team and our workflows by enabling people to find the right data. We have different sets of data that include business data and application data. People who are new, including data analysts, machine learning scientists, or data scientists, can easily find the specific data they are looking for because it is all centralized in one place.

    Data Hub has positively impacted our organization by centralizing and co-locating all data through metadata, and we have made this our enterprise metadata catalog rather than having disorganized information across different teams. It has saved time for many data analysts and data scientists to find the right data.

    What needs improvement?

    I have no comments on how Data Hub can be improved at this time.

    For how long have I used the solution?

    I have been using Data Hub for the past four years.

    What other advice do I have?

    On a scale of one to ten, I rate Data Hub a nine.

    I chose nine out of ten because Data Hub is a single solution that we could adopt easily and build our platform on top of it. It provided all the features that we needed, which is why I gave that rating.

    Regarding Data Hub's AI capabilities, I have not explored its governance and security features, but I would like to explore them. I have not gone through the AI features of Data Hub concerning the accuracy and reliability of output.

    My advice to others looking into using Data Hub is that it is a fantastic tool for people who want to centralize and keep all the data discoverable in one single place. I would highly recommend using it. I give this review an overall rating of nine out of ten.

    Chakib Bekhouche

    Data mapping has improved metadata completeness and now supports faster business data discovery

    Reviewed on Jun 24, 2026
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Data Hub in these projects is the mapping of business glossary terms to real data for the first project and the calibration and enrichment of all the necessary information within a specific scope in the second project, which involves real data and the business glossary.

    Within the engineering teams of Renault, there was a lot of data without sufficient metadata, such as descriptions of tables and columns. The objective was to complete the definitions and descriptions of business data objects within the glossary and map these descriptions to the tables and columns that comprise the data sets of this engineering department to ensure a comprehensive experience when searching for data, providing adequate definitions and descriptions of the data used in this department.

    I use Data Hub within two of my clients. With Renault, the car constructor, they changed their data catalog from Zeenea to Data Hub, and I have a mission to contribute to the enrichment of this data catalog by conducting workshops with data providers, data stewards, and all the stakeholders involved in this data catalog. The aim of this mission is to map real data to the definitions and descriptions of business data objects available in the company's glossary. My second mission was with Hitachi Rail, a company that provides rail services, where the mission involved benchmarking several data catalogs including OpenMetadata and Data Galaxy . Data Hub was chosen for its available functionalities, with the task of implementing this data catalog with a specific scope and then completing the usage of this data if everything works well.

    What is most valuable?

    I find that my main use case for Data Hub is easy to execute because the tool is user-friendly and its functionalities are simple to understand.

    The best feature that Data Hub offers in my experience is the ability to map between real data and data sets.

    The mapping feature helps my team and clients significantly because it addresses the lack of metadata information about the tables and columns used in the company's data lake, enriching the data catalog considerably through this mapping.

    Data Hub positively impacts my organization and clients by making it easier to search for data. It facilitates easier collaboration and helps save time. However, concerning data quality, it is not sufficiently equipped as it lacks components to evaluate the data quality level, which is a feature available in other data catalogs, indicating an area for improvement.

    What needs improvement?

    One aspect that could be improved is the ability to have more specific KPIs regarding the enrichment, completeness, and accuracy of the information.

    Data Hub can be improved in several ways, primarily by enhancing the data quality evaluation capabilities. Additionally, I would suggest improving the hierarchy of business glossary terms, as understanding the characteristics of each business data object can be challenging within the current structure of business glossary terms in Data Hub.

    For how long have I used the solution?

    I have been using Data Hub across these projects for approximately less than six months.

    What do I think about the scalability of the solution?

    In my experience, Data Hub offers good scalability.

    How are customer service and support?

    The customer support for Data Hub is robust. I had full support and did not use it extensively, relying primarily on Slack for questions and the documentation, which was sufficient since I utilized the open-source version.

    What other advice do I have?

    I do not have information about Data Hub's AI capabilities. However, I can mention that the documentation of Data Hub is usable within an AI tool, specifically an LLM tool, which would simplify finding information in the documentation.

    I have conducted benchmarks with OpenMetadata and Data Galaxy , but I have never used them for a mission with my clients. Before choosing Data Hub, I evaluated all the principal tools on the market, including Castor, Data Galaxy, and OpenMetadata.

    I have no experience with pricing as I used the free license. My advice for others looking into using Data Hub is to consider the paid version for enhanced options related to data quality and the availability of KPIs regarding the completeness and accuracy of metadata, which results in a superior experience with this tool. I would rate this product an eight out of ten.

    Jueun Moon

    Cataloging data and business terms has reduced questions and speeds up KPI tracking

    Reviewed on Jun 24, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is for a catalog system because we are integrating all of the data sources to Snowflake  and then we want to catalog and share business glossary terms with our company employees.

    A quick specific example of how I use Data Hub in my daily workflow is that we have all of the data in Snowflake  and all of the employees using Snowflake did not know what kind of data is in Snowflake. They did not know all of the tables and what kind of columns and metrics, KPI definitions exist, so we are using Data Hub for searching the data in Snowflake and identifying who is using Snowflake.

    My main use case is covered.

    How has it helped my organization?

    Data Hub has positively impacted my organization because there are many data analysts in each team, and the time to Q&A has significantly decreased since we started using Data Hub. This improvement is also seen in our KPI tracking.

    I cannot provide specific time savings, but for example, we used to have 100 user requests for questions, which required searching Snowflake tables to determine what tables should be used, but now it is down to almost 10 questions.

    What is most valuable?

    In my opinion, the best features Data Hub offers are the searching function and tagging function. If I add a tag for some of the tables or columns, it is very easy to find people who need that information.

    I am trying to use the tagging function for all of our data, but we are currently developing it, so we have covered almost 70% of our data.

    What needs improvement?

    We are using the free version of Data Hub with Docker  Compose, so it is somewhat difficult to find out the lineage. If we are using Data Hub free version, then we can only figure out the tables' lineage, but we cannot search the column lineage, which is why I would like to add the columns-level lineage.

    I need the lineage function for more column-level lineage and I think more example documents that are essential for our company would be very useful because there are many glossary terms and features in Data Hub, but I did not know which are more essential for us.

    Additionally, I also have one more concern regarding using Docker  Compose for Data Hub; the memory issues come up sometimes and consume a lot of memory resources, so I need a more efficient way to use Data Hub without these issues.

    For how long have I used the solution?

    I have been using Data Hub for almost one year.

    What other advice do I have?

    We are using private clouds in AWS , and we have deployed Data Hub on the AWS  EC2  server with Docker Compose.

    The cloud provider we use is AWS.

    I did not purchase Data Hub through the AWS Marketplace ; I am just using the EC2  server and deploying it with Docker Compose.

    My advice for others looking into using Data Hub is that if there is no catalog system or data dictionary system and if there are many KPIs or metrics within their company, then I recommend Data Hub to those kinds of teams.

    I give Data Hub an overall rating of 8.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    RohitJoshi1

    Metadata lineage tracking has improved governance and currently supports clear data observability

    Reviewed on Jun 22, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Data Hub is data lineage tracking. With Data Hub, we track multiple sources, ingestion sources, and different sources where the data resides in S3 . We bring all that metadata into Data Hub to track lineage on the data ingestion patterns that we perform or transformations that we do, and how they move from different tables or assets or the data pipelines. Whatever transformations we do with Spark and S3 , Snowflake , all those are being tracked via Data Hub. We have S3 buckets and Snowflake  tables, and all those lineage tracking is managed through the platform.

    My main use case is mostly covered as we used Data Hub for metadata tracking and lineage for whatever transformations that we do so that we can track each transformation down the line.

    What is most valuable?

    In my experience, the best features Data Hub offers include lineage tracking, which is mostly on the asset level, a good glossary, and good connector support.

    Regarding asset level and the good glossary, we need the glossary of our products so that it is easy to track which product, what went at what time on that particular product, how many assets are related, and so on. For asset integrations, Data Hub makes it easy to ingest all that metadata of those particular assets from S3 via connectors, which is quite easy. It has good connector support, although limited in some cases.

    Overall, Data Hub is a good tool. If we talk about lineage, metadata, and observability on some high level, including domain descriptions, PII classification, datasets, and keeping datasets in one place along with policies, it is good in that particular sense. We do have a plan based on project-to-project usage, but in some of the projects, we do use Data Hub as well.

    What needs improvement?

    I would like to add that for the connectors, there is sometimes limited support for using wildcards to get the items or assets ingested from sources like S3; it does not support very good wildcard filters. Additionally, Data Hub has a problem with column-level lineage support, especially regarding non-pro users or those without any plans. If I talk about the free features of Data Hub open source, those two I found could be improved during my use case.

    Regarding improvements needed for Data Hub, I have already mentioned the limitations on the usage of wildcards in the ingestion or connectors; that can be worked upon, especially regarding the open-source part of Data Hub. The rest is that I hope the UI is quite good.

    For how long have I used the solution?

    I used Data Hub for one and a half years.

    What other advice do I have?

    My advice for others looking into using Data Hub is that it is a good tool if you want to capture all that metadata, lineage, keep track of governance, security, and observability. It just depends on how you want to use it; you can choose the open-source version or the paid version and subscription-based model. The paid versions have more features, but open-source Data Hub, which most people will try to go for, has some limitations, such as the missing column-level lineage with Spark. You need to consider those points, but overall, it is good. I would rate this product an 8 out of 10.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    View all reviews