
Overview
Contact Us | Pentaho:
https://pentaho.comÂ
For Private Offer Pricing, please contact:
PrivateOfferPricing@pentaho.com
Datasheet:Pentaho Data IntegrationÂ
Datasheet:Pentaho Business AnalyticsÂ
With Pentaho Data Integration - Managing the enormous volumes, variety, and velocity of data is simplified
By allowing data preparation from any source and automating your data pipeline, Pentaho Data Integration allows you to curate data better for your business user. This software delivers business analytics to end users faster with visual tools that reduce time and complexity - without writing SQL or coding in Java or Python. Organizations immediately gain real value from their various data sources in the cloud or on premises, including files, relational databases, big data sets and more.
With Pentaho Business Analytics integrated - Dynamic Exploration, Simplified integration, and Innovative Reporting Tools are at your fingertips
Visualize your data, create reports, and design dashboards without depending on IT or developers with Pentaho Business Analytics, a powerful, cost-effective, and customizable no-code data visualization tool. Enable seamless, data-driven decision making and break down barriers to data access with real-time self-service capabilities and built-in data modeling.
Turn Data Into Actionable Insights
More than just ETL (Extract, Transform, Load), Pentaho Data Integration is a codeless data orchestration tool that blends diverse data sets into a single source of truth as a basis for analysis and reporting. Effortlessly managed in a drag-and-drop graphical interface, so you can easily track where it's coming from, where it's going and how it's transforming.
Data Processing Performance and Productivity
PDI speeds performance time, reduces the complexity of integrating big data sources, and provides:
- Code-free data transformation
- Template-based approach to rapidly onboard data sources into Hadoop
Scalability, Simplicity, and Self-Service
With broad connectivity to any data type and high-performance Spark and MapReduce execution, PDI simplifies and speeds the process of integrating existing databases with new sources of data.
- Intuitive, drag-and-drop designer
- Rich library of prebuilt components
- Powerful orchestration capabilities
Integration and Extensibility
- API Integration: Comprehensive REST and SOAP APIs
- Plugin Architecture: Extend capabilities with a rich plugin ecosystem
- Third-Party Tool Integration: BI tools, databases, etc
Broad Connectivity and Data Delivery
PDI offers broad connectivity to a variety of diverse data, including structured, unstructured and semi-structured data.
- Relational database management system (RDBMS): Oracle, IBM DB2, MySQL, Microsoft SQL Server, Postgres, IBM MQ
- Spark and Hadoop: Cloudera, Hortonworks, Amazon EMR, MapR (HPE Ezmeral Data Fabric), Microsoft Azure HDInsights, and Elastic Search
- NoSQL databases and object stores: MongoDB, Cassandra, HBase, Hitachi Content Platform, AWS S3, Google Cloud Storage, Microsoft Azure ADLS Gen 2
- Analytic databases: Redshift, Snowflake, Vertica, Greenplum, Teradata, SAP HANA, Amazon Redshift, Google Big Query
- Business applications: SAP, Salesforce, Google Analytics
- Files: XML, JSON, Microsoft Excel, CSV, txt, Avro, Parquet, ORC, EBCDIC (mainframe), unstructured files with metadata, including audio, video and visual files
Highlights
- Code-free data transformation design that empowers 15x faster productivity versus hand-coding and executes in-cluster for high performance - Template-based approach to rapidly onboard data sources into Hadoop via metadata injection feature set.
- Ability to seamlessly switch between execution engines, such as Spark and the PDI native engine, to fit data volume and transformation complexity - Support for advanced analytics models from R, Python, Scala and Weka to operationalize predictive intelligence while reducing data prep time.
- Robust Dataflow Orchestration of pipeline - Support both structured and unstructured data.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Cost/hour |
---|---|
m5.2xlarge Recommended | $14.27 |
m5.8xlarge | $43.66 |
m5.4xlarge | $24.97 |
Vendor refund policy
No Refunds
Custom pricing options
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Additional details
Usage instructions
Application access: http://IP_ADDRESS:8080
Launching a Pentaho instance in a hyperscaler https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia001/pentaho-installation/hyperscalersÂ
Product Documentation: https://docs.hitachivantara.com/p/pentaho-diaÂ
Getting Started: https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia000Â
Administration: https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia002Â
Product Documentation: https://docs.hitachivantara.com/p/pentaho-diaÂ
Pentaho website: https://pentaho.com/Â
Resources
Vendor resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
![Pentaho Data Integration and Analytics [BYOL]](https://d7umqicpi7263.cloudfront.net/img/product/b3d8e003-9c64-4c6d-b2e8-1bebec028c84.png)

Customer reviews
Pentaho an etl tool for bussiness
Has drag-and-drop functionality and good integration while being easy to use
What is our primary use case?
I use Pentaho Data Integration for data integration and ETL processes. I developed with Pentaho from CoproSema. I work on machine learning projects using Pentaho in different projects, such as forecasting for clients who have not paid their credit.
What is most valuable?
I find the drag and drop feature in Pentaho Data Integration very useful for integration. I can use JavaScript and Java in some notes for ETL development. It's easy to use and friendly, especially for larger data sets.Â
I use Pentaho for ETLs while relying on other tools like Power BI for data visualization and Microsoft Fabric for other tasks.
What needs improvement?
While Pentaho Data Integration is very friendly, it is not very useful when there isn't a lot of data to handle. Communicating with the vendor is challenging, and this hinders its performance in free tool setups.
What do I think about the stability of the solution?
It's pretty stable, however, it struggles when dealing with smaller amounts of data.
What do I think about the scalability of the solution?
Pentaho Data Integration handles larger datasets better. It's not very useful for smaller datasets.
How are customer service and support?
Communication with the vendor is challenging, which makes customer service less satisfactory despite being a free tool.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
I use Pentaho for data integration, however, for machine learning and business intelligence, I rely on other tools such as Power BI and Microsoft Fabric .
How was the initial setup?
The initial setup of Pentaho is easy and straightforward.
What about the implementation team?
Deploying Pentaho usually requires around two people, possibly with roles such as server administrator or technical lead.
Which other solutions did I evaluate?
I use Power BI for business intelligence, Microsoft Fabric for other tasks, and AWS Glue for data processing in other projects. I do not have experience with Azure Data Box .
What other advice do I have?
On a scale of one to ten, I would rate Pentaho Data Integration around an eight.
Transform data efficiently with rich features but there's challenges with large datasets
What is our primary use case?
Currently, I am using Pentaho Data Integration for transforming data and then loading it into different platforms. Sometimes, I use it in conjunction with AWS , particularly S3 and Redshift, to execute the copy command for data processing.
What is most valuable?
Pentaho Data Integration is easy to use, especially when transforming data. I can find the necessary steps for any required transformation, and it is very efficient for pivoting, such as transforming rows into columns. It is also free of cost and rich in available transformations, allowing extensive data manipulations.
What needs improvement?
I experience difficulties when handling millions of rows, as the data movement from one source to another becomes challenging. The processing speed slows down significantly, especially when using a table output for Redshift. The availability of Python code integration as an inbuilt function would be beneficial.
For how long have I used the solution?
I have been using Pentaho Data Integration since 2018.
What do I think about the stability of the solution?
I would rate the stability of Pentaho Data Integration as eight out of ten.
What do I think about the scalability of the solution?
Pentaho Data Integration has a scalability rating around 8.5 out of ten, as noted from our experience.
How are customer service and support?
I have contacted customer support once or twice, however, did not receive a response. Therefore, I have not had much interaction with the support team, and their assistance does not seem frequent.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Pentaho Data Integration's main competitor is Talend. Many companies are moving towards cloud-based ETL solutions.
How was the initial setup?
The initial setup is simple. It involves downloading the tool, installing necessary libraries, like the JDBC library for your databases, and then creating a connection to start working.
What's my experience with pricing, setup cost, and licensing?
Pentaho Data Integration is low-priced, especially since it is free of cost.
What other advice do I have?
I rate Pentaho Data Integration seven out of ten. I definitely recommend it for small to medium organizations, especially if you are looking for a cost-effective product.
Efficient data integration with cost savings but may be less efficient
What is our primary use case?
I have a team who has experience with integration. We are service providers and partners. Generally, clients buy the product directly from the company.
How has it helped my organization?
It is easy to use, install, and start working with. This is one of the advantages compared to other key vaulting products. The relationship between price and functionality is excellent, resulting in time and money savings of between twenty-five and thirty percent.
What is most valuable?
One of the advantages is that it is easy to use, install, and start working with. For certain volumes of data, the solution is very efficient.
What needs improvement?
Pentaho may be less efficient for large volumes of data compared to other solutions like Talend or Informatica. Larger data jobs take more time to execute.
Pentaho is more appropriate for jobs with smaller volumes of data.
For how long have I used the solution?
I have used the solution for more than ten years.
What do I think about the stability of the solution?
The solution is stable. Generally, one person can manage and maintain it.
What do I think about the scalability of the solution?
Sometimes, for large volumes of data, a different solution might be more appropriate. Pentaho is suited for smaller volumes of data, while Talend is better for larger volumes.
How are customer service and support?
Based on my experience, the solution has been reliable.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We did a comparison between Talend and Pentaho last year.
How was the initial setup?
The initial setup is straightforward. It is easy to install and start working with.
What about the implementation team?
A team with experience in integration manages the implementation.
What was our ROI?
The relationship between price and functionality is excellent. It results in time and money savings of between twenty-five and thirty percent.
What's my experience with pricing, setup cost, and licensing?
Pentaho is cheaper than other solutions. The relationship between price and functionality means it provides good value for money.
Which other solutions did I evaluate?
We evaluated Talend and Pentaho.
What other advice do I have?
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Loads data into the required tables and can be plug-and-played easily
What is our primary use case?
The use cases involve loading the data into the required tables based on the transformations. We do a couple of transformations, and based on the business requirement, we load the data into the required tables.
What is most valuable?
It's a very lightweight tool. It can be plug-and-played easily and read data from multiple sources. It's a very good tool for small to large companies. People or customers can learn very easily to do the transformations for loading and migrating data. It's a fantastic tool in the open-source community.
When compared to other commercial ETL tools, this is a free tool where you can download and do multiple things that the commercial tools are doing. It's a pretty good tool when compared to other commercial tools. It's available in community and enterprise editions. It's very easy to use.
What needs improvement?
It is difficult to process huge amounts of data. We need to test it end-to-end and conclude how much is the processing of data. If it is an enterprise edition, we can process the data.
For how long have I used the solution?
I have been using Pentaho Data Integration and Analytics for 11-12 years.
What do I think about the stability of the solution?
We process a small amount of data, but it's pretty good.
What do I think about the scalability of the solution?
It's scalable across any machine,
How are customer service and support?
Support is satisfactory. A few of my colleagues are also there, working with Hitachi to provide solutions whenever a ticket or Jira is raised for them.Â
How would you rate customer service and support?
Positive
How was the initial setup?
Installation is very simple. When you go to the community and enterprise edition, it's damn simple. Even you can install it very easily.
One person is enough for the installation
What's my experience with pricing, setup cost, and licensing?
The product is quite cheap.
What other advice do I have?
It can quickly implement slowly changing dimensions and efficiently read flat files, loading them into tables quickly. Additionally, "several copies to the stat h enables parallel partitioning. In the Enterprise Edition, you can restart your jobs from where they left off, a valuable feature for ensuring continuity. Detailed metadata integration is also very straightforward, which is an advantage. It is lightweight and can work on various systems.
Any technical guy can do everything end to end.
Overall, I rate the solution a ten out of ten.