Sign in
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Pentaho Data Integration and Analytics

Hitachi Vantara LLC | 10.2.0.3 (10.2 with Service Pack 3)

Linux/Unix, Ubuntu Ubuntu 20.04 LTS - 64-bit Amazon Machine Image (AMI)

Reviews from AWS customer

1 AWS reviews
  • 5 star
    0
  • 4 star
    0
  • 1
  • 2 star
    0
  • 1 star
    0

External reviews

21 reviews
from and

External reviews are not included in the AWS star rating for the product.


    Sandeep C.

Pentaho an etl tool for bussiness

  • March 12, 2025
  • Review provided by G2

What do you like best about the product?
Pentaho is one of the best etl tool to extract ,transform and load the data among various sources ,it just requires connections of the database and transfers data very fast .it also executes sql and generates reports into excel or any other required source.it has all basic components like execute sql,table input,excel input ,excel output,txt output,hdfs output.
What do you dislike about the product?
Pentaho job runs fast but modifying a job is time consuming ,it is usually so slow.better if it fast like any other application and also required some tutorials from pentaho side.
What problems is the product solving and how is that benefiting you?
It usually best to transfer huge data from one source to another like oracle to hue.It is best to generate reports into excel using sql queries.


    Jefferson Hernandez

Has drag-and-drop functionality and good integration while being easy to use

  • December 03, 2024
  • Review provided by PeerSpot

What is our primary use case?

I use Pentaho Data Integration for data integration and ETL processes. I developed with Pentaho from CoproSema. I work on machine learning projects using Pentaho in different projects, such as forecasting for clients who have not paid their credit.

What is most valuable?

I find the drag and drop feature in Pentaho Data Integration very useful for integration. I can use JavaScript and Java in some notes for ETL development. It's easy to use and friendly, especially for larger data sets. 

I use Pentaho for ETLs while relying on other tools like Power BI for data visualization and Microsoft Fabric for other tasks.

What needs improvement?

While Pentaho Data Integration is very friendly, it is not very useful when there isn't a lot of data to handle. Communicating with the vendor is challenging, and this hinders its performance in free tool setups.

What do I think about the stability of the solution?

It's pretty stable, however, it struggles when dealing with smaller amounts of data.

What do I think about the scalability of the solution?

Pentaho Data Integration handles larger datasets better. It's not very useful for smaller datasets.

How are customer service and support?

Communication with the vendor is challenging, which makes customer service less satisfactory despite being a free tool.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I use Pentaho for data integration, however, for machine learning and business intelligence, I rely on other tools such as Power BI and Microsoft Fabric.

How was the initial setup?

The initial setup of Pentaho is easy and straightforward.

What about the implementation team?

Deploying Pentaho usually requires around two people, possibly with roles such as server administrator or technical lead.

Which other solutions did I evaluate?

I use Power BI for business intelligence, Microsoft Fabric for other tasks, and AWS Glue for data processing in other projects. I do not have experience with Azure Data Box.

What other advice do I have?

On a scale of one to ten, I would rate Pentaho Data Integration around an eight.


    Aqeel UR Rehman

Transform data efficiently with rich features but there's challenges with large datasets

  • November 28, 2024
  • Review provided by PeerSpot

What is our primary use case?

Currently, I am using Pentaho Data Integration for transforming data and then loading it into different platforms. Sometimes, I use it in conjunction with AWS, particularly S3 and Redshift, to execute the copy command for data processing.

What is most valuable?

Pentaho Data Integration is easy to use, especially when transforming data. I can find the necessary steps for any required transformation, and it is very efficient for pivoting, such as transforming rows into columns. It is also free of cost and rich in available transformations, allowing extensive data manipulations.

What needs improvement?

I experience difficulties when handling millions of rows, as the data movement from one source to another becomes challenging. The processing speed slows down significantly, especially when using a table output for Redshift. The availability of Python code integration as an inbuilt function would be beneficial.

For how long have I used the solution?

I have been using Pentaho Data Integration since 2018.

What do I think about the stability of the solution?

I would rate the stability of Pentaho Data Integration as eight out of ten.

What do I think about the scalability of the solution?

Pentaho Data Integration has a scalability rating around 8.5 out of ten, as noted from our experience.

How are customer service and support?

I have contacted customer support once or twice, however, did not receive a response. Therefore, I have not had much interaction with the support team, and their assistance does not seem frequent.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

Pentaho Data Integration's main competitor is Talend. Many companies are moving towards cloud-based ETL solutions.

How was the initial setup?

The initial setup is simple. It involves downloading the tool, installing necessary libraries, like the JDBC library for your databases, and then creating a connection to start working.

What's my experience with pricing, setup cost, and licensing?

Pentaho Data Integration is low-priced, especially since it is free of cost.

What other advice do I have?

I rate Pentaho Data Integration seven out of ten. I definitely recommend it for small to medium organizations, especially if you are looking for a cost-effective product.


    MARIA PILAR CANDA

Efficient data integration with cost savings but may be less efficient

  • September 20, 2024
  • Review provided by PeerSpot

What is our primary use case?

I have a team who has experience with integration. We are service providers and partners. Generally, clients buy the product directly from the company.

How has it helped my organization?

It is easy to use, install, and start working with. This is one of the advantages compared to other key vaulting products. The relationship between price and functionality is excellent, resulting in time and money savings of between twenty-five and thirty percent.

What is most valuable?

One of the advantages is that it is easy to use, install, and start working with. For certain volumes of data, the solution is very efficient.

What needs improvement?

Pentaho may be less efficient for large volumes of data compared to other solutions like Talend or Informatica. Larger data jobs take more time to execute.

Pentaho is more appropriate for jobs with smaller volumes of data.

For how long have I used the solution?

I have used the solution for more than ten years.

What do I think about the stability of the solution?

The solution is stable. Generally, one person can manage and maintain it.

What do I think about the scalability of the solution?

Sometimes, for large volumes of data, a different solution might be more appropriate. Pentaho is suited for smaller volumes of data, while Talend is better for larger volumes.

How are customer service and support?

Based on my experience, the solution has been reliable.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did a comparison between Talend and Pentaho last year.

How was the initial setup?

The initial setup is straightforward. It is easy to install and start working with.

What about the implementation team?

A team with experience in integration manages the implementation.

What was our ROI?

The relationship between price and functionality is excellent. It results in time and money savings of between twenty-five and thirty percent.

What's my experience with pricing, setup cost, and licensing?

Pentaho is cheaper than other solutions. The relationship between price and functionality means it provides good value for money.

Which other solutions did I evaluate?

We evaluated Talend and Pentaho.

What other advice do I have?

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other


    KrishnaBorusu

Loads data into the required tables and can be plug-and-played easily

  • July 24, 2024
  • Review provided by PeerSpot

What is our primary use case?

The use cases involve loading the data into the required tables based on the transformations. We do a couple of transformations, and based on the business requirement, we load the data into the required tables.

What is most valuable?

It's a very lightweight tool. It can be plug-and-played easily and read data from multiple sources. It's a very good tool for small to large companies. People or customers can learn very easily to do the transformations for loading and migrating data. It's a fantastic tool in the open-source community.

When compared to other commercial ETL tools, this is a free tool where you can download and do multiple things that the commercial tools are doing. It's a pretty good tool when compared to other commercial tools. It's available in community and enterprise editions. It's very easy to use.

What needs improvement?

It is difficult to process huge amounts of data. We need to test it end-to-end and conclude how much is the processing of data. If it is an enterprise edition, we can process the data.

For how long have I used the solution?

I have been using Pentaho Data Integration and Analytics for 11-12 years.

What do I think about the stability of the solution?

We process a small amount of data, but it's pretty good.

What do I think about the scalability of the solution?

It's scalable across any machine,

How are customer service and support?

Support is satisfactory. A few of my colleagues are also there, working with Hitachi to provide solutions whenever a ticket or Jira is raised for them. 

How would you rate customer service and support?

Positive

How was the initial setup?

Installation is very simple. When you go to the community and enterprise edition, it's damn simple. Even you can install it very easily.

One person is enough for the installation

What's my experience with pricing, setup cost, and licensing?

The product is quite cheap.

What other advice do I have?

It can quickly implement slowly changing dimensions and efficiently read flat files, loading them into tables quickly. Additionally, "several copies to the stat h enables parallel partitioning. In the Enterprise Edition, you can restart your jobs from where they left off, a valuable feature for ensuring continuity. Detailed metadata integration is also very straightforward, which is an advantage. It is lightweight and can work on various systems.

Any technical guy can do everything end to end.

Overall, I rate the solution a ten out of ten.


    Ahad Ahmed

Offers features for data integration and migration

  • May 27, 2024
  • Review provided by PeerSpot

What is our primary use case?

I have used the solution to gather data from multiple sources, including APIs, databases like Oracle, and web servers. There are a bunch of data providers available who can provide you with datasets to export in JSON format from clouds or APIs. 

What is most valuable?

The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business. 

What needs improvement?

The solution should provide additional control for the data warehouse and reduce its size, as our organization's clients have expressed concerns regarding it. The vendor can focus on reducing capacity and compensate for it by enhancing product efficiency. 

For how long have I used the solution?

I have been using Pentaho Data Integration and Analytics for a year.  

How are customer service and support?

I have never encountered any issues with Pentaho Data Integration and Analytics. 

What's my experience with pricing, setup cost, and licensing?

I believe the pricing of the solution is more affordable than the competitors. 

Which other solutions did I evaluate?

I have worked with IBM DataStage along with Pentaho Data Integration and Analytics. The found the IBM DataStage interface to seem outdated in comparison to the Pentaho tool. IBM DataStage demands the user to drag and drop the services as well as the pipelines, similar to the process in SSIS platforms. Pentaho Data Integration and Analytics is also easier to comprehend from the first use than IBM DataStage. 

What other advice do I have?

The solution's ETL capabilities make data integration tasks easier and are used to export data from a source to a destination. At my company, I am using IBM data switches and the overall IBM tech stack for compatibility among the integrations, pipelines and user levels. 

I would absolutely recommend Pentaho Data Integration and Analytics to others. I would rate the solution a seven out of ten. 


    Ridwan Saeful Rohman

Good abstraction and useful drag-and-drop functionality but can't handle very large data amounts

  • June 26, 2022
  • Review from a verified AWS customer

What is our primary use case?

I still use this tool on a daily basis. Comparing it to my experience with other ETL tools, the system I created using this tool was quite straightforward. It involves extracting data from MySQL, exporting it to CSV, storing it on S3, and then loading it into Redshift.

The PDI Kettle Job and Kettle Transformation are bundled by a shell script, then scheduled and orchestrated by Jenkins.

We continue to use this tool primarily because many of our legacy systems still rely on it. However, our new solution is mostly based on Airflow, and we are currently in the transition phase. Airflow is a data orchestration tool that predominantly uses Python for ETL processes, scheduling, and issue monitoring—all within a unified system.


How has it helped my organization?

In my current company, this solution has a limited impact as we predominantly employ it for handling older and simpler ETL tasks.

While it serves well in setting up ETL tools on our dashboard, its functionalities can now be found in several other tools available in the market. Consequently, we are planning a complete transition to Airflow, a more versatile and scalable platform. This shift is scheduled to be implemented over the next six months, aiming to enhance our ETL capabilities and align with modern data management practices.


What is most valuable?

This solution offers drag-and-drop tools with a minimal script. Even if you do not come from an IT background or have no software engineering experience, it is possible to use. It is quite intuitive, allowing you to drag and drop many functions.

The abstraction is quite good.

If you're familiar with the product itself, it has transformational abstractions and job abstractions. We can create smaller transformations in the Kettle transformation and larger ones in the Kettle job. Whether you're familiar with Python or have no scripting background at all, the product is useful.

For larger data, we use Spark.

The solution enables us to create pipelines with minimal manual or custom coding efforts. Even without advanced scripting experience, it is possible to create ETL tools. I recently trained a graduate from a management major who had no experience with SQL. Within three months, he became quite fluent, despite having no prior experience using ETL tools.

The importance of handling pipeline creation with minimal coding depends on the team. If we switch to Airflow, more time is needed to teach fluency in the ETL tool. With these product abstractions, I can compress the training time to three months. With Airflow, it would take more than six months to reach the same proficiency.

We use the solution's ability to develop and deploy data pipeline templates and reuse them.

The old system, created by someone prior to me in my organization, is still in use. It was developed a long time ago and is also used for some ad hoc reporting.

The ability to develop and deploy data pipeline templates once and reuse them is crucial to us. There are requests to create pipelines, which I then deploy on our server. The system needs to be robust enough to handle scheduling without failure.

We appreciate the automation. It's hard to imagine how data teams would work if everything were done on an ad hoc basis. Automation is essential. In my organization, 95% of our data distributions are automated, and only 5% are ad hoc. With this solution, we query data manually, process it on spreadsheets, and then distribute it within the organization. Robust automation is key.

We can easily deploy the solution on the cloud, specifically on AWS. I haven't tried it on another server. We deploy it on our AWS EC2, but we develop it on local computers, including both Windows and MacBooks.

I have personally used it on both. Developing on Windows is easier to navigate. On MacBooks, the display becomes problematic when enabling dark mode.

The solution has reduced our ETL development time compared to scripting. However, this largely depends on your experience.

What needs improvement?

Five years ago, when I had less experience with scripting, I would have definitely used this product over Airflow, as the abstraction is quite intuitive and easier for me to work with. Back then, I would have chosen this product over other tools that use pure scripting, as it would have significantly reduced the time required to develop ETL tools. However, this is no longer the case, as I now have more familiarity with scripting.

When I first joined my organization, I was still using Windows. Developing the ETL system on Windows is quite straightforward. However, when I switched to a MacBook, it became quite a hassle. To open the application, we had to first open the terminal, navigate to the solution's directory, and then run the executable file. Additionally, the display becomes quite problematic when dark mode is enabled on a MacBook.

Therefore, developing on a MacBook is quite a hassle, whereas developing on Windows is not much different from using other ETL tools on the market, like SQL Server Integration Services, Informatica, etc.

For how long have I used the solution?

I have been consistently using this tool since I joined my current company, which was approximately one year ago.

What do I think about the stability of the solution?

The performance is good. I have not tested the product at its bleeding edge. We only perform simple jobs. In terms of data, we extract it from MySQL and export it to CSV. There are only millions of data points, not billions. So far, it has met our expectations and is quite good for a smaller number of data points.

What do I think about the scalability of the solution?

I'm not sure that the product could keep up with significant data growth. It can be useful for millions of data points, but I haven't explored its capability with billions of data points. I think there are better solutions available on the market. This applies to other drag-and-drop ETL tools as well, like SQL Server Integration Services, Informatica, etc.

How are customer service and support?

We don't really use technical support. The current version that we are using is no longer supported by their representatives. We didn't update it yet to the newer version. 

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We're moving to Airflow. The switch was mostly due to debugging problems. If you're familiar with SQL for integration services, the ETL tools from Microsoft have quite intuitive debugging functions. You can easily identify which transformation has failed or where an error has occurred. However, in our current solution, my colleagues have reported that it is difficult to pinpoint the source of errors directly.

Airflow is highly customizable and not as rigid as our current product. We can deploy simple ETL tools as well as machine learning systems on Airflow. Airflow primarily uses Python, which our team is quite familiar with. Currently, only two out of 27 people on our team handle this solution, so not enough people know how to use it.

How was the initial setup?

There are no separations between the deployment and other teams. Each of our teams acts as individual contributors. We handle the entire implementation process, from face-to-face business meetings, setting timelines, developing the tools, and defining the requirements, to production deployment.

The initial setup is straightforward. Currently, the use of version control in our organization is quite loose. We are not using any version control software. The way we deploy it is as simple as putting the Kettle transformation file onto our EC2 server and overwriting the old file, that's it.

What's my experience with pricing, setup cost, and licensing?

I'm not really sure about the pricing of the product. I'm not involved in procurement or commissioning.

What other advice do I have?

We put it on our AWS EC2 server; however, during development, it was on our local server. We deploy it onto our EC2 server. We bundle it in our shell scripts, and the shell scripts are run by Jenkins.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)


    Karthick V.

Totally worth it!!

  • March 31, 2022
  • Review provided by G2

What do you like best about the product?
Best price in market, Hitachi sponsored and high quality in data integration.
What do you dislike about the product?
Limitation in features, connector is
having portability issue and less user friendly.
What problems is the product solving and how is that benefiting you?
We used PDI for data integration for designed reports. So far, had the best experience.


    Information Technology and Services

ETL for Dashboards

  • October 08, 2020
  • Review provided by G2

What do you like best about the product?
Pentaho Data Integration (aka Kettle) is a tool included in the Pentaho suite that we use in our Smart Cities projects to obtain data from various data sources. It has a large number of tools already built for Input, Ouput, Transform ... that allow developers to save a lot of time. Its use is easy even for inexperienced users.
What do you dislike about the product?
If we want to have support with the Pentaho suite we should not use its Community version (free), but in some Smart Cities specifications of our clients they require a free and open source tool with associated support.
What problems is the product solving and how is that benefiting you?
PDI allows us to obtain data from various data sources such as databases, excel files, csv, big data / hadoop type databases and use preconfigured tools so that obtaining this data is simple and parameterizable. Other languages such as python require the writing of complete modules, with PDI the implementation and debugging are integrated through Plug & Play tools.
Recommendations to others considering the product:
The Pentaho suite has a Community version that is free and free software, so our recommendation is to download it and test it to verify that this tool meets your requirements. For our part, we recommend it as we use it practically whenever we need to extract data from a data source quickly and easily.


    Information Technology and Services

ETL with graphical interface

  • June 10, 2020
  • Review provided by G2

What do you like best about the product?
Pentaho data integration is one of the most powerful tools for building ETL processes that we use within our Smart Cities projects. It is a tool with a graphical interface that allows you to debug quickly and easily and has a multitude of preconfigured modules. Furthermore, it combines very well with the Hitachi Pentaho CDE tool for the generation of Dashboards.
What do you dislike about the product?
When you want to do a very simple development maybe you can choose to use Python source code directly. There are other powerful alternatives like Talend Studio.
What problems is the product solving and how is that benefiting you?
Pentaho Data Integration allows us to collect data from different data sources such as both relational and non-relational databases such as Big Data (HDFS), it allows us to bring information from Excel files ... and almost from any source of information we need. Also, their debugging tools save us a lot of time.
Recommendations to others considering the product:
Pentaho has a suite called Community that is free and available to everyone. In addition, it has many examples and information. We recommend trying it out before deciding if we need to purchase the paid version. It is a great tool and we recommend it.