Listing Thumbnail

    Data Inspector

     Info
    Sold by: Cleanlab 
    Deployed on AWS
    Find erroneous values in any column of a tabular dataset

    Overview

    Data Inspector is an AI tool to automatically identify entries in any tabular dataset (CSV file) that are likely incorrect.

    Simply provide any data table (including columns that are: text, numeric, or categorical), and ML models will be trained to flag any entry (cell value) that is likely erroneous. Data Inspector returns 3 CSV files with quality assessments about each entry (cell value) in your dataset, stating: whether its value appears corrupted, how likely this entry is correct vs an erroneous/corrupted value, plus an alternative predicted/imputed value expected for this entry.

    The Data Inspector audit is especially useful to catch errors in applications involving: data entry, measurement errors (surveys, sensor noise, etc), or a Quality Assurance team that spends time reviewing data. AI can inspect your data more systematically to detect issues with consistent coverage -- all in a fully automated way!

    Documentation and examples: https://github.com/cleanlab/aws-marketplace/ 

    Highlights

    • Data Inspector works for any standard tabular dataset (including columns that are: text, numeric, or categorical — with missing values allowed). It trains state-of-the-art ML models to automatically detect any erroneous values in the dataset.
    • Documentation and example usage notebooks for the latest version are available here: https://github.com/cleanlab/aws-marketplace/
    • Cleanlab invents novel solutions to assess and improve data quality for applications with messy real-world data. Many of our algorithms are published in top-tier venues for transparency: https://cleanlab.ai/research/ We have created the most popular library for Data-Centric AI: https://github.com/cleanlab/cleanlab

    Details

    Delivery method

    Latest version

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Data Inspector

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (12)

     Info
    Dimension
    Description
    Cost/host/hour
    ml.m5.xlarge Inference (Batch)
    Recommended
    Model inference on the ml.m5.xlarge instance type, batch mode
    $5.00
    ml.m5.xlarge Inference (Real-Time)
    Recommended
    Model inference on the ml.m5.xlarge instance type, real-time mode
    $5.00
    ml.m5.xlarge Training
    Recommended
    Algorithm training on the ml.m5.xlarge instance type
    $5.00
    ml.p3.2xlarge Inference (Batch)
    Model inference on the ml.p3.2xlarge instance type, batch mode
    $5.00
    ml.p3.16xlarge Inference (Batch)
    Model inference on the ml.p3.16xlarge instance type, batch mode
    $5.00
    ml.m5.24xlarge Inference (Batch)
    Model inference on the ml.m5.24xlarge instance type, batch mode
    $5.00
    ml.p3.2xlarge Inference (Real-Time)
    Model inference on the ml.p3.2xlarge instance type, real-time mode
    $5.00
    ml.p3.16xlarge Inference (Real-Time)
    Model inference on the ml.p3.16xlarge instance type, real-time mode
    $5.00
    ml.m5.24xlarge Inference (Real-Time)
    Model inference on the ml.m5.24xlarge instance type, real-time mode
    $5.00
    ml.p3.2xlarge Training
    Algorithm training on the ml.p3.2xlarge instance type
    $5.00

    Vendor refund policy

    We do not currently support refunds, but you can cancel your subscription to the service at any time.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Amazon SageMaker algorithm

    An Amazon SageMaker algorithm is a machine learning model that requires your training data to make predictions. Use the included training algorithm to generate your unique model artifact. Then deploy the model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

    Deploy the model on Amazon SageMaker AI using the following options:
    Before deploying the model, train it with your data using the algorithm training process. You're billed for software and SageMaker infrastructure costs only during training. Duration depends on the algorithm, instance type, and training data size. When training completes, the model artifacts save to your Amazon S3 bucket. These artifacts load into the model when you deploy for real-time inference or batch processing. For more information, see Use an Algorithm to Run a Training Job  .
    Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference  .
    Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI  .
    Version release notes

    Automatically detect potential errors in any column of your tabular dataset.

    Additional details

    Inputs

    Summary

    Your data should be in a CSV file with a header containing column names for your data. If your data contains an index column, it should be specified using the index_col hyperparameter, otherwise it is assumed that there is no index column.

    By default, all categorical and numeric columns will be inspected for issues, if you want to inspect specific columns, pass those in as a list to the columns_to_inspect hyperpameter. Text columns that cannot be inspected will be skipped automatically.

    Input MIME type
    text/csv
    https://github.com/cleanlab/aws-marketplace/blob/main/label-inspector/data/input/dataset.csv
    https://github.com/cleanlab/aws-marketplace/blob/main/label-inspector/data/input/dataset.csv

    Input data descriptions

    The following table describes supported input data fields for real-time inference and batch transform.

    Field name
    Description
    Constraints
    Required
    Dataset
    Each row in the input data must represent a single example. The columns of your data can contain either numeric, categorical, or text (arbitrary string) values, however data errors can only be detected on numeric or categorical data. Data with multiple text columns and missing values are supported.
    Type: FreeText
    Yes

    Support

    Vendor support

    For questions/support, please email: support@cleanlab.ai . Free Trials and Subscription Plans available! Email us for more details.

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    |
    13 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    Oil & Energy

    A decent Large Language Model, if we are keen in tracking our responses with score basis

    Reviewed on Aug 14, 2025
    Review provided by G2
    What do you like best about the product?
    Its scoring mechanism for all the generated responses is a great feature in amongst of GPTs.
    What do you dislike about the product?
    Context mapping check would have been better, along with the scoring mechanism. And report genertaion and mechanisim would have been a great tool if its included.
    What problems is the product solving and how is that benefiting you?
    Its real-time tracking and scoring mechanisim and compatibility with varies of LLM, makes it more useful.
    Ashish A.

    CleanLab: Best ML Modules Optimizer

    Reviewed on May 21, 2025
    Review provided by G2
    What do you like best about the product?
    The best part of Cleanlab is it's AI models which optimizes any pretrained modules with great level of efficiency. Another best part is it's documentation, Any type of users can use Cleanlab by reading it's documentation. And TLM module is best, it optimizes any LLM. It's API feature helps the integration part much easier.
    What do you dislike about the product?
    As of now I find it a bit hard to dislike such great module. But still talking about it's dislike : It is expensive and some small startups may not afford it. Also, TLM doesn't do great with unstructured data.
    What problems is the product solving and how is that benefiting you?
    I work as a Data Manager in a Company which works with US Healthcare Data. We train modules on Healthcare datasets. Cleanlab helps us identifies and flags incorrect labels. The modules we train sometimes misinterpret the inputs. Here Cleanlab plays a vital role. This optimizes our ML modules and also helps to identifies outliers. In general Cleanlab helps us with optimizing our AI models.
    Ritesh S.

    Powerful label-cleaning with a slight learning curve

    Reviewed on May 16, 2025
    Review provided by G2
    What do you like best about the product?
    Accurate error detection. The ability to automatically spot mislabeled and low-confidence examples has saved me countless hours of manual review.

    Seamless pandas integration. Working directly on DataFrames makes it trivial to plug Cleanlab into existing preprocessing pipelines.

    Clear, example-driven docs. The step-by-step tutorials helped me get up and running in under an hour.
    What do you dislike about the product?
    Initial setup complexity. Installing all dependencies (and configuring environments) can feel a bit involved if you’re just experimenting.

    Performance on very large datasets. Label-error detection can be slow without additional tuning or sampling.
    What problems is the product solving and how is that benefiting you?
    Cleanlab tackles the hidden “label noise” in your datasets—mislabeled, ambiguous or low-confidence examples that quietly drag down model accuracy. By automatically flagging and ranking these problematic records (and even suggesting which labels to trust), Cleanlab lets me:

    Catch mistakes early, before they poison training, so my models learn from clean, reliable data.

    Streamline data audits, turning hours of manual review into minutes of focused corrections.

    Boost final performance, since models trained on higher-quality labels consistently deliver better accuracy and robustness.

    Overall, Cleanlab empowers me to maintain a trustworthy, production-ready dataset with far less effort—and to iterate on models faster and with greater confidence.
    Hemant R.

    Best and easy to use AI

    Reviewed on May 06, 2025
    Review provided by G2
    What do you like best about the product?
    Easy to use. No much hardware setup is required and the way it helps in refining data & on the e-commerce side is wonderful.
    What do you dislike about the product?
    Nothing as such I can think of. need to look more into the product before making any statement.
    What problems is the product solving and how is that benefiting you?
    So we have a lot of customer data but it's mostly messy and not linked properly but with Cleanlab it gives us a properly formatted data.
    nageen n.

    The AI tools to easy my job to clean data from row to smart data set and help our team

    Reviewed on Apr 19, 2025
    Review provided by G2
    What do you like best about the product?
    The time we spent in dataset to significanty decrese after using cleanlab. i would say its save lots of time.
    What do you dislike about the product?
    sometime it getting slow on large dataset but we have not so frequnt those dataset but yes there is need to improvment.
    What problems is the product solving and how is that benefiting you?
    The problem with our existing dataset is clean by manully most of time and sometime heuristics so this is best suited for us to solve our problem. our 70-80% time and human affort are decrese.
    View all reviews