Overview
Key Features
-
Multi-Source Data Integration: Supports relational databases (Oracle, MySQL, PostgreSQL, SQL Server) and file-based data (CSV, TSV, etc.).
-
Machine Learning Model: Utilizes ML algorithms to analyze metadata and identify table relationships.
-
Business Categorization: Determines business relevance of table columns using automated classification techniques.
-
Validation GUI: Clients can review and validate metadata via an interactive GUI.
-
Metadata Storage: Captures and stores metadata in a MySQL database for further analysis.
-
Synthetic Data Generation: Use multiple libraries to generate synthetic datasets for testing and development.
-
Cloud Storage Integration: Exports processed and synthetic data to cloud-based storage (AWS S3, Blob storage, etc.).
Development Options
• Programming Language: Python
• Data Processing SDKs: Microsoft Presidio (for sensitive data processing and anonymization)
• Database Systems: MySQL, PostgreSQL, Oracle, SQL Server
• GUI Development: Client-side web application
• Synthetic Data Libraries: Faker, Mimesis
• Cloud Storage: AWS S3, Blob Storage
AWS Tools for Logging & Monitoring
• Amazon CloudWatch: Monitors logs, metrics, and alerts for application health tracking.
• AWS Lambda Logging: Logs function executions and errors for debugging.
• Amazon S3 Logging: Tracks access and modification history of stored data.
• AWS IAM Policies: Ensures secure access control for data storage and processing services.
Key Benefits
• Automated Metadata Processing: Reduces manual effort in analyzing database schema and relationships.
• Enhanced Data Privacy: Uses Microsoft Presidio to anonymize sensitive data.
• Scalable Architecture: Supports cloud-based storage for large-scale data processing.
• Faster Insights: Enables quick validation and categorization of metadata.
• Synthetic Data Creation: Generates realistic yet non-sensitive data for testing and development.
• Cloud-Ready Solution: Facilitates seamless data transfer to cloud storage solutions.
How It Works
-
Source System Integration: Imports data from RDBMS or files (CSV, TSV, etc.).
-
Metadata Extraction: Collects schema information from the data sources.
-
Machine Learning Processing: Identifies table relationships and business categories.
-
Client Validation: Provides an interactive GUI for metadata verification.
-
Metadata Storage: Saves processed metadata in a MySQL database.
-
Synthetic Data Generation: Uses Faker and Mimesis to create realistic but anonymized datasets.
-
Cloud Storage: Transfers synthetic data to AWS S3 or Blob storage.
Highlights
- The system integrates machine learning, open-source SDKs, and database management tools to achieve automated metadata processing and synthetic data generation.
- Uses AES-256 encryption for secure data storage and transmission. Implements role-based access control (RBAC) for data access. Ensures data handling aligns with global privacy regulations. Tracks user activities and changes for compliance reporting.
Details
Unlock automation with AI agent solutions

Pricing
Custom pricing options
How can we make this page better?
Legal
Content disclaimer
Support
Vendor support
The Data Cloner service is customized based on the scope and complexity of each engagement. [Contact us](<https://aws.amazon.com/marketplace/management/products/prod-2anm55pwihoxi/overview/ awsmarketplacesales@altimetrik.com >) for a personalized quote that fits your specific needs.