Overview
Why Choose cloudimg AMIs?
This is a repackaged open source software product wherein additional charges apply for cloudimg support services.
DuckDB on AWS - Pre-Configured AMI with 24/7 Expert Support
Get a production-ready DuckDB analytical database running on EC2 in minutes.## Why This AMI Instead of DIY?
Installing DuckDB from source means configuring drivers, monitoring, networking, and security yourself. This AMI eliminates that work and adds ongoing expert support:
- Pre-hardened security - configured with restricted access, optimized for analytical workloads in production environments
- Monitoring built in - CloudWatch Agent pre-installed for resource and query performance visibility
- S3 integration ready - AWS extension pre-configured for direct Parquet/CSV querying from S3 buckets
- Expert support included - DuckDB architecture guidance, query optimization, data pipeline design, and Python integration assistance from cloudimg's team
- Multiple versions available - choose from multiple DuckDB versions spanning multiple OS variants at launch
Unlike the free open-source binary or competing AMIs without support, this listing gives you a direct line to specialists who can help with query optimization, S3 configuration, migration from other databases, and performance tuning.
Getting Started
- Subscribe and launch the AMI on your chosen instance (recommended: r5/r6i for analytics, c5/c6i for compute, m5/m6i balanced, minimum t3.medium)
- SSH into your instance using your key pair
- Verify DuckDB is installed by running: duckdb --version
- Run your first query: SELECT * FROM 'your-file.parquet'
- For S3 access, the AWS extension is pre-configured - query remote Parquet files directly
Python 3 with the DuckDB package is pre-installed, so you can immediately use DuckDB in Jupyter notebooks or Python scripts with zero-copy Arrow integration.
Real-World Use Case: Large-Scale File Analytics
Consider an analytics team processing hundreds of gigabytes of daily clickstream data stored as Parquet files on S3. Using a single r6i.2xlarge instance with this AMI, they can query that data directly without loading it into a separate data warehouse - no ETL pipeline to maintain, no cluster to manage. DuckDB's vectorized execution and metadata pushdown mean only relevant columns and row groups are read, keeping query times fast and costs low compared to always-on cluster solutions.
Performance
DuckDB ranks #1 in ClickBench and TPC-H benchmarks. Its columnar storage and vectorized execution engine deliver analytical query performance that rivals dedicated data warehouses - but running in-process on a single EC2 instance with zero network overhead. Larger-than-memory workloads are handled through intelligent spilling to disk with parallel execution.
Pre-Configured Components
- DuckDB with Python 3 DuckDB package
- CloudWatch Agent for monitoring
- Systems Manager Agent for management
- ENA drivers for enhanced networking
- NVMe drivers for optimal I/O performance
- Pre-configured S3 access via AWS extension
Key Capabilities
- File format support - Native Parquet with metadata pushdown, CSV with auto type detection, JSON/NDJSON, Excel via extension, Delta Lake and Iceberg support
- SQL dialect - Full SQL:2016 with window functions, CTEs, recursive queries, PIVOT/UNPIVOT, JSON operators, and user-defined functions
- Data science integration - Python API with zero-copy Arrow, Pandas interchange, Polars integration, NumPy support, Jupyter-ready
- Extensions - Spatial for GIS (PostGIS compatible), full-text search, HTTP/HTTPS, community ecosystem
- Processing - Parallel query execution, larger-than-memory with spilling, full ACID transactions, partitioned dataset support including Hive partitioning
Use Cases
- Interactive analytics on large Parquet/CSV datasets stored in S3
- ETL pipelines replacing heavyweight cluster solutions for single-node workloads
- Data science workflows with Python/R needing fast SQL on local or remote files
- Log analysis and aggregation without standing up Elasticsearch or similar
- Business intelligence querying S3 data lakes directly
- Geospatial data processing with the spatial extension
Support Included
24/7 cloudimg support with guaranteed 24-hour response SLA and average one-hour response for critical issues. Coverage includes DuckDB architecture guidance, query optimization, data pipeline design, Python integration assistance, performance tuning, S3 configuration, and migration from other databases.
FAQ
What instance should I use? r5/r6i for memory-intensive analytics, c5/c6i for compute-heavy workloads, m5/m6i for balanced use. Minimum t3.medium.
Can I query S3 directly? Yes. The AWS extension is pre-configured for direct Parquet/CSV querying from S3 with metadata pushdown.
Highlights
- 24/7 expert support with guaranteed 24-hour response SLA and average one-hour response for critical issues. Coverage includes DuckDB architecture guidance, query optimization, data pipeline design, Python integration, S3 configuration, and migration assistance - a direct line to specialists rather than community forums or self-support.
- Pre-configured and production-ready AMI - DuckDB, Python 3, CloudWatch monitoring, Systems Manager, ENA/NVMe drivers, and S3 access all set up so you skip hours of manual configuration. Multiple DuckDB versions available across multiple OS variants. Launch on recommended r5/r6i instances and run your first query in minutes.
- Benchmark-leading analytical performance on a single EC2 instance - DuckDB ranks #1 in ClickBench and TPC-H. Query Parquet, CSV, and JSON files directly from S3 without import steps. Handle larger-than-memory datasets with parallel execution and intelligent spilling, replacing heavyweight cluster solutions for single-node analytical workloads.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
- ...
Dimension | Description | Cost/hour |
|---|---|---|
m5.large Recommended | m5.large | $0.10 |
t3.micro | t3.micro instance type | $0.06 |
t2.micro | t2.micro instance type | $0.06 |
r7iz.8xlarge | r7iz.8xlarge instance type | $0.28 |
r6idn.12xlarge | r6idn.12xlarge instance type | $0.28 |
c6i.8xlarge | c6i.8xlarge instance type | $0.28 |
m5d.2xlarge | m5d.2xlarge instance type | $0.28 |
m6idn.metal | m6idn.metal instance type | $0.28 |
c6a.xlarge | c6a.xlarge instance type | $0.15 |
i3.metal | i3.metal instance type | $0.28 |
Vendor refund policy
Refunds available on request.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Security update: full dnf update to current el9 (0 advisories remaining), rpm database reconciled, kernel refreshed. No application or configuration changes; DuckDB validated on a fresh boot.
Additional details
Usage instructions
Please visit the User Guide for this product on the cloudimg website (https://www.cloudimg.co.uk ).
Resources
Vendor resources
Support
Vendor support
24/7 Support from cloudimg
This AMI includes 24/7x365 expert support from cloudimg with a guaranteed 24-hour response SLA and average one-hour response for critical issues.
Contact: support@cloudimg.co.uk
Coverage includes:
- DuckDB architecture guidance and best practices
- Query optimization and performance tuning
- Data pipeline design and implementation
- Python integration assistance
- S3 configuration and access troubleshooting
- Migration from other databases
- Instance sizing recommendations
- General troubleshooting and issue resolution
- Refund requests and billing inquiries
Recommended Instance Types:
- r5/r6i - memory-intensive analytical workloads
- c5/c6i - compute-heavy processing
- m5/m6i - balanced workloads
- Minimum: t3.medium
Getting Started After Launch:
- Launch the AMI on your chosen instance type
- SSH into the instance using your key pair
- Run "duckdb --version" to verify installation
- Execute your first query: SELECT * FROM 'your-file.parquet'
- For S3 queries, the AWS extension is pre-configured and ready to use
Python 3 with the DuckDB package is pre-installed for immediate data science workflows.
Enjoyed our software on AWS Marketplace? Share your experience with the community. Your input matters to us, whether it is praise or suggestions. You will find the review section at the bottom of this page or above if subscribing via the AMI Catalog in the AWS Console.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.