Listing Thumbnail

    The Data Cloner

     Info
    The Data Cloner system designed to create synthetic data by processing data from various sources, identifying relationships, categorizing metadata, and generating realistic but non-sensitive data for cloud storage.

    Overview

    Key Features

    1. Multi-Source Data Integration: Supports relational databases (Oracle, MySQL, PostgreSQL, SQL Server) and file-based data (CSV, TSV, etc.).

    2. Machine Learning Model: Utilizes ML algorithms to analyze metadata and identify table relationships.

    3. Business Categorization: Determines business relevance of table columns using automated classification techniques.

    4. Validation GUI: Clients can review and validate metadata via an interactive GUI.

    5. Metadata Storage: Captures and stores metadata in a MySQL database for further analysis.

    6. Synthetic Data Generation: Use multiple libraries to generate synthetic datasets for testing and development.

    7. Cloud Storage Integration: Exports processed and synthetic data to cloud-based storage (AWS S3, Blob storage, etc.).

    Development Options

    • Programming Language: Python

    • Data Processing SDKs: Microsoft Presidio (for sensitive data processing and anonymization)

    • Database Systems: MySQL, PostgreSQL, Oracle, SQL Server

    • GUI Development: Client-side web application

    • Synthetic Data Libraries: Faker, Mimesis

    • Cloud Storage: AWS S3, Blob Storage

    AWS Tools for Logging & Monitoring

    • Amazon CloudWatch: Monitors logs, metrics, and alerts for application health tracking.

    • AWS Lambda Logging: Logs function executions and errors for debugging.

    • Amazon S3 Logging: Tracks access and modification history of stored data.

    • AWS IAM Policies: Ensures secure access control for data storage and processing services.

    Key Benefits

    • Automated Metadata Processing: Reduces manual effort in analyzing database schema and relationships.

    • Enhanced Data Privacy: Uses Microsoft Presidio to anonymize sensitive data.

    • Scalable Architecture: Supports cloud-based storage for large-scale data processing.

    • Faster Insights: Enables quick validation and categorization of metadata.

    • Synthetic Data Creation: Generates realistic yet non-sensitive data for testing and development.

    • Cloud-Ready Solution: Facilitates seamless data transfer to cloud storage solutions.

    How It Works

    1. Source System Integration: Imports data from RDBMS or files (CSV, TSV, etc.).

    2. Metadata Extraction: Collects schema information from the data sources.

    3. Machine Learning Processing: Identifies table relationships and business categories.

    4. Client Validation: Provides an interactive GUI for metadata verification.

    5. Metadata Storage: Saves processed metadata in a MySQL database.

    6. Synthetic Data Generation: Uses Faker and Mimesis to create realistic but anonymized datasets.

    7. Cloud Storage: Transfers synthetic data to AWS S3 or Blob storage.

    Highlights

    • The system integrates machine learning, open-source SDKs, and database management tools to achieve automated metadata processing and synthetic data generation.
    • Uses AES-256 encryption for secure data storage and transmission. Implements role-based access control (RBAC) for data access. Ensures data handling aligns with global privacy regulations. Tracks user activities and changes for compliance reporting.

    Details

    Delivery method

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Pricing

    Custom pricing options

    Pricing is based on your specific requirements and eligibility. To get a custom quote for your needs, request a private offer.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Support

    Vendor support

    The Data Cloner service is customized based on the scope and complexity of each engagement. [Contact us](<https://aws.amazon.com/marketplace/management/products/prod-2anm55pwihoxi/overview/ awsmarketplacesales@altimetrik.com >) for a personalized quote that fits your specific needs.