Sign in
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Cloudera on AWS

Cloudera

Reviews from AWS customer

1 AWS reviews
  • 5 star
    0
  • 4 star
    0
  • 1
  • 2 star
    0
  • 1 star
    0

External reviews

9 reviews
from and

External reviews are not included in the AWS star rating for the product.


    Mohammad_Ahmad

Has improved resource efficiency and lowered costs but still lacks full AI workload support

  • October 16, 2025
  • Review from a verified AWS customer

What is our primary use case?

My main use case for Cloudera Data Platform is for data analytics and AI workload.

We have different data sources where the data is coming in tabular format or CSV, semi-structured or structured, unstructured, and some sort of Kafka streaming messages. We use to store it and then we process and transform, apply the business logic, and then make the data ready for the consumer to consume.

What is most valuable?

Cloudera Data Platform offers excellent architectures in terms of decoupling the storage layer from the compute. It is flexible in terms of scaling to your storage account or compute. Additionally, we have different streaming services as part of the ecosystem, and they have added Ranger for security controls, which is a valuable feature.

Decoupling storage from compute has helped my team significantly. Before using Cloudera Data Platform, we were using Cloudera Distribution for Hadoop (CDH), where we had to have on-premises virtual machines or Linux boxes to add to the cluster, which required lots of effort. We had defined authorized maximum storage per system; for example, one computer can have a maximum of 8 TB, and scaling up to add more compute to the cluster was very challenging. In the current Cloudera Data Platform, the backend storage is a data lake that auto-scales, so we don't have to add more storage. In terms of security, we used to use Sentry in traditional CDH, but in Cloudera Data Platform, Ranger provides more granular level of security, allowing us to manage who can access data at different levels, maybe at a tabular level or column level.

Streaming services are provided by NiFi, which is one of the best ecosystems for streaming and ETL support.

Cloudera Data Platform has positively impacted our organization by reducing overall manual intervention, requiring fewer efforts and resources to build a big data cluster compared to traditional methods. It is also cost-effective and more stable than the traditional ways of handling big data workload.

In terms of resources, we have reduced from ten resources to four or five resources, making it an effective reduction in manual effort. Regarding cost saving, since we are in the cloud, we are saving significant money compared to maintaining infrastructure on-premises.

What needs improvement?

Cloudera Data Platform could improve by innovating more in terms of full-fledged support for AI workloads, enriching machine learning or LLM, as there haven't been updates in that aspect over the last one and a half years.

For how long have I used the solution?

I have been using Cloudera Data Platform for almost four years.

What do I think about the scalability of the solution?

Cloudera Data Platform's scalability is very good.

How are customer service and support?

Customer support is good. However, having a common chat channel between firms and service providers would make communication faster and more efficient.

How would you rate customer service and support?

Neutral

What other advice do I have?

My advice to others looking into using Cloudera Data Platform is that if they are looking for big data workloads on the cloud where they can do analysis and achieve cost savings and resource reductions, it is definitely a good use case. It can vary based on business needs, but it is a good option for big data workloads.

I rated Cloudera Data Platform a six out of ten because I wish that it would keep up with market trends and release AI technology and AI-enabled workloads. Sometimes we struggle to get support, and having a common chat channel between firms and service providers would make communication and support more effective, especially in production.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?


    Ciro Porzio

Manages large-scale data ingestion and transformation while improving job performance in hybrid environments

  • October 10, 2025
  • Review provided by PeerSpot

What is our primary use case?

My main use case for Cloudera Data Platform is measuring HDFS and the SQL queries in Impala to troubleshoot some error in YARN applications based on Spark, and control the reporting data between Informatica and Cloudera for transport data between the DB Oracle, Mongo DB to CDP in Impala, between HDFS.

For measuring HDFS, I use Cloudera Data Platform, specifically Cloudera Manager, to analyze small files in HDFS to reduce our number for the duration of jobs that read this file and the partition date.

I mainly use Cloudera Data Platform as part of a large-scale data processing and analytics pipeline in a hybrid cloud environment, primarily on Azure, which involves managing the YARN cluster, monitoring workloads, troubleshooting performance issues, and integrating data ingestion and transformation processes from various enterprise systems. We leverage CDP for its scalability, security, and strong integration with Looker, Informatica, Hive, and Spark.

How has it helped my organization?

Cloudera Data Platform (CDP) has helped our organization improve data management consistency and scalability across multiple environments. The unified control plane and centralized governance have reduced operational overhead and made it easier to manage workloads between on-premise and cloud environments.

We’ve also seen clear benefits in resource optimization — auto-scaling and workload isolation features have allowed better use of infrastructure, while tools like Cloudera Manager and Workload XM improved monitoring and troubleshooting efficiency.

That said, there’s still room for improvement in integration speed and UI responsiveness, especially when managing large clusters or hybrid deployments.

What is most valuable?

In my opinion, the best features of Cloudera Data Platform are its strong integration, scalability, and unified management capabilities, while what stands out the most in Cloudera Manager are SDX, which provide centralized control for governance, security, and data lineage across multiple sources, simplifying operations significantly. Finally, the YARN and Spark resource management in CDP is robust and efficient, which is essential for handling heavy data transformation workloads at scale.

Cloudera Data Platform has positively impacted my organization by providing a unique storage point for a lot of data from various databases in HDFS. With Hive or Impala, it is possible to read and integrate data among all the other platforms, making it a great platform for us to have the data and create integrations.

What needs improvement?

I don't have any challenges or areas I think could use enhancement.

For how long have I used the solution?

I have been using Cloudera Data Platform for one year, and I have experience with the last version of Cloudera Data Platform for four years.

What was our ROI?

A specific example of the positive impact of Cloudera Data Platform is the clearly saved time and improved performance, which is the main result of it. The costs are increasing at the start of the project, but after securing, they are reduced, and the most significant benefit is the availability of data from governance and management.

What other advice do I have?

For the centralized governance of Spark management, we use a dashboard on SAS or Power BI to integrate the data that is stored in HDFS.

My advice to others looking into using Cloudera Data Platform is that it's a great product to save time and reduce costs in the long term.

On a scale of one to ten, I rate Cloudera Data Platform a nine.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure


    reviewer2763942

Has improved data accessibility and control but still needs better innovation for AI and machine learning

  • October 08, 2025
  • Review provided by PeerSpot

What is our primary use case?

My main use case for Cloudera Data Platform is data analytics and AI.

For data analytics and AI in my day-to-day work, we have a multi-source system where the data keeps coming from different source systems, from RDBMS, in tabular format, or semi-structured, or streaming data from Kafka. We process and store data in the backend ADLS, then apply business rule logic to create a golden table which is published for business or end users who consume the data for analytics. Some AI engineers develop or run that code, Python code, or LLM against those data to gain insights.

What is most valuable?

The most unique feature I love about Cloudera Data Platform is its integration with Ranger services. Ranger is more flexible compared to Cloudera's previous data distribution component, Sentry, making it more reliable and allowing for access control at a more granular level.

The Ranger integration makes it more flexible and reliable for me by allowing control over data access, specifying who can access at what level, such as table level, masking, or data layer level. This is crucial for managing all data inside the farm.

In terms of integration, it is very easy with Cloudera Data Platform. We just hook it up since it comes with the package when we install the CDP runtime, allowing us to select the ecosystem we want in our farm depending on our use cases. It is not a standalone installation requirement; it is an easy job. Scalability and flexibility are very good.

What needs improvement?

From a holistic view in the market, I have not seen enough innovation in Cloudera Data Platform, particularly in support for machine learning. It supports it, but not to a robust extent compared to other tech providers, such as Databricks, which are more flexible and in tune with current trends in AI and machine learning. I wish Cloudera would innovate and keep pace with market demands.

Regarding the user interface of Cloudera Data Platform, I have not faced any challenges, though we definitely look forward to innovation to support varied data models and scalability.

For how long have I used the solution?

I have been using Cloudera Data Platform for almost four years.

What do I think about the stability of the solution?

Cloudera Data Platform is generally stable; however, we occasionally face minor network connectivity issues as confirmed by the vendor. Sometimes a node goes down, but it automatically returns to a healthy state.

What do I think about the scalability of the solution?

Cloudera Data Platform has positively impacted my organization by eliminating challenges we faced with CDH, which had not been supported for a cloud journey. When adding scalability, such as horizontal scalability to our existing cluster, the process was time-consuming and required upfront costs for procuring servers. In contrast, CDP allows for easy, mostly automated scalability where I can schedule job workflows, fine-tune system resource metrics, and add nodes with just a click.

How are customer service and support?

Customer support depends on the case severity, but from my experience, it is great. Cloudera support is timely and responsive, adhering to the SLAs they provide.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

Previously, we used Cloudera Data Distribution, known as CDH, which was on-premises and required more manual efforts among multiple teams, taking almost a month to set up a cluster. We switched primarily for cost-effectiveness, flexibility, and the reduced time required for setup.

How was the initial setup?

Cloudera Data Platform has positively impacted my organization by eliminating challenges we faced with CDH, which had not been supported for a cloud journey. When adding scalability, such as horizontal scalability to our existing cluster, the process was time-consuming and required upfront costs for procuring servers. In contrast, CDP allows for easy, mostly automated scalability where I can schedule job workflows, fine-tune system resource metrics, and add nodes with just a click.

What about the implementation team?

A solution architect from the vendor helps us resolve any ongoing issues such as bugs or vulnerabilities, and we appreciate the flexibility of the cloud journey.

What was our ROI?

In terms of return on investment, I see great changes in operational effectiveness measured by RTO when comparing on-premises solutions with cloud solutions. The difference is notable.

What's my experience with pricing, setup cost, and licensing?

I have not been involved overall in cost negotiation, but we find Cloudera Data Platform to be cost-effective. We work with the Cloudera vendor to secure one or two-year licenses upfront for discounts.

Which other solutions did I evaluate?

We evaluated Databricks three years ago, but it was not up to market standards in feature support at that time, particularly lacking an account console, which was introduced afterward. We have seen clients migrating from Cloudera to Databricks since the rollout of that console.

What other advice do I have?

My advice for those considering Cloudera Data Platform is to evaluate their business use case and budget, as these two factors are crucial. If the organization does not need advanced features such as LLM or machine learning, Cloudera Data Platform may be suitable. However, based on the current market, if rating between Databricks and Cloudera, I would give Databricks a one and Cloudera a two.

There are lots of challenges I face while using Cloudera Data Platform. Sometimes, vulnerabilities depend on which version of CDP runtime I am using, so we work with the Cloudera vendor side to remediate any vulnerabilities based on that version. Along with that, we use it for data audit purposes, gathering all inflow data such as how data is being used, who has access, and how many times.

In terms of cost savings with Cloudera Data Platform, moving from on-premises to cloud is very cost-effective. We can use bare metal servers or on-spot servers, which makes it economical. In performance, it is superior to previous versions since multiple Spark versions are added to the CDP runtime, improving data distribution, handling, and fault tolerance, requiring no code fine-tuning.

I rate Cloudera Data Platform six out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?


    Dhananjay Koyani

Processes large volumes of heterogeneous data efficiently but faces challenges in cloud adoption and future readiness

  • September 30, 2025
  • Review provided by PeerSpot

What is our primary use case?

Handling and processing big volumes of data is my main use case for Cloudera Data Platform.

We get the instrument data from various providers, and we process them, do reconciliation, and use Cloudera Data Platform to process it and ingest it in a structured manner which is then used by our downstream consumers.

One unique aspect about my main use case with Cloudera Data Platform involves multiple application teams building their workflows on the platform. I don't have all the insights into other aspects.

What is most valuable?

The best features Cloudera Data Platform offers are the processing power with Spark and the distributed data storage, HDFS, which helps us handle massive volumes of data.

Cloudera Data Platform has positively impacted my organization by making it easier to handle such a massive scale of data onto our existing data warehouse systems, allowing us to store heterogeneous data sources.

What needs improvement?

Cloudera Data Platform can be improved by addressing the feasibility of using it in the cloud; there are some complexities around the components used in cloud by Cloudera Data Platform that are not really convenient. If those can be resolved, it could be widely adopted, similar to Databricks.

Cloudera Data Platform is stable functionality-wise, but it needs some bug fixes for security, which we are expecting Cloudera to provide.

The scalability of Cloudera Data Platform could be enhanced.

For how long have I used the solution?

I have been using Cloudera Data Platform for around 10 years.

What do I think about the stability of the solution?

Cloudera Data Platform is stable functionality-wise, but it needs some bug fixes for security, which we are expecting Cloudera to provide.

What do I think about the scalability of the solution?

The scalability of Cloudera Data Platform could be enhanced.

How are customer service and support?

The customer support for Cloudera Data Platform is good.

How would you rate customer service and support?

Neutral

What other advice do I have?

I don't have any specific advice for others looking into using Cloudera Data Platform as I don't see any negatives coming to my mind.

On a scale of one to ten, I rate Cloudera Data Platform a seven out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other


    Shan Hasan

ETL processes benefit from cost-effective offloading and could see improved deployment capabilities

  • May 05, 2025
  • Review provided by PeerSpot

What is our primary use case?

The primary usage of Cloudera Data Platform is to offload ETL processes because it's cheaper compared to data warehouse solutions like Teradata or Oracle. Furthermore, basic reporting can be done, and some real-time processes can be managed.

What is most valuable?

The foremost benefit is offloading data from the warehouse to Cloudera Data Platform, which allows for cheaper storage. We use it to push transformations and run ETL processes, leveraging tools like Spark. Cloudera also supports various functionalities, including AI and Gen AI tools. Basic reporting and some real-time functions are manageable on the platform.

What needs improvement?

Cloudera Data Platform should include additional capabilities and features similar to those offered by other data management solutions like Azure and Databricks.

For how long have I used the solution?

I have been using Cloudera Data Platform for more than five years.

What was my experience with deployment of the solution?

The installation of Cloudera Data Platform had some challenges, but this is common with many products. An improved deployment process would help deliver solutions more quickly.

What do I think about the stability of the solution?

I would rate the stability of Cloudera Data Platform as eight out of ten.

What do I think about the scalability of the solution?

Integration with other tools works well for us and we successfully scaled the solution after two to three years without any issues. I would rate the scalability as eight out of ten.

How are customer service and support?

I have communicated with technical support, and they are responsive and helpful. I would rate their support as seven out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

Initially, the decision for Cloudera was driven by pricing and the support they provided.

How was the initial setup?

The initial setup may take several hours or days, depending on the challenges faced during installation. It's not always a smooth process due to potential complexities.

What about the implementation team?

The implementation involved multiple teams, including Cloudera support, with three to four people from our client's side involved.

What other advice do I have?

I recommend Cloudera Data Platform. Overall, I would rate it a seven out of ten despite the complexities in deployment. I suggest including my alternative email address for contact in case of access issues. The overall product rating is seven out of ten.

Which deployment model are you using for this solution?

On-premises


    Miodrag-Stanic

Distributed computing improves data processing while upgrade complexity needs addressing

  • April 14, 2025
  • Review provided by PeerSpot

What is our primary use case?

We heavily use Cloudera Data Platform for data science activities. Various departments in the company utilize it as a sandbox for data discovery. We have multiple data pipelines running on a daily and hourly basis, along with some real-time data pipelines.

What is most valuable?

Cloudera Data Platform has significantly improved our data management. Distributed computing with Spark has enabled many processing types that were not possible before. By using the Hadoop File System for distributed storage, we have 1.5 petabytes of physical storage with 500 terabytes of effective storage due to a replication factor of three.

What needs improvement?

There are challenges with upgrading or updating various services like Spark, Impala, and Hive on on-premise and bare metal solutions. We aim to address these issues with a Kubernetes-based platform that will simplify the task of upgrading services. We also wish to implement lakehouse capabilities with Iceberg or Delta Lake frameworks.

For how long have I used the solution?

I have been using Cloudera Data Platform since 2021. We began with a project a year prior, but it has been in production since then.

What do I think about the stability of the solution?

I would rate the stability of Cloudera Data Platform as seven out of ten.

What do I think about the scalability of the solution?

For scalability, I rate Cloudera Data Platform at an eight out of ten as it is an on-premise solution.

How are customer service and support?

I would rate the technical support from Cloudera as seven out of ten. Their support is helpful.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

Before Cloudera, we did not work with other big data platforms. This is our first big data platform, and we also have a classical data warehouse.

What about the implementation team?

We employed local vendors for the implementation, and from our company's side, around ten to twenty people were involved, including engineers, data scientists, and business personnel.

What's my experience with pricing, setup cost, and licensing?

The pricing model for Cloudera Data Platform is complex and has increased significantly compared to CDH. Initially, CDH had a straightforward pricing model based on nodes, but CDP includes factors like processors, cores, terabytes, and drives, making it difficult to calculate costs.

What other advice do I have?

For on-premise use, I would not recommend Cloudera Data Platform as it is expensive and complicated to upgrade. However, for cloud usage, I am uncertain as I do not use it on the cloud. Currently, around thirty to forty people use Cloudera Data Platform in our organization. My final rating for Cloudera Data Platform is seven out of ten.

Which deployment model are you using for this solution?

On-premises


    Sachin Shukre

Good for secure containerization, and governance capabilities

  • December 06, 2023
  • Review provided by PeerSpot

What is our primary use case?

We use it for multiple domains, including oil & gas, finance (Morgan Stanley), and healthcare. We process around 186 TB of data per day for analytics purposes.

Currently, we use it for healthcare domain. 

What is most valuable?

Distributed computing, secure containerization, and governance capabilities are the most valuable features.

What needs improvement?

Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solutions like Gen2, Gen1, Bigtable, BigQuery, Lightstore, S3 buckets, etc., which pose a significant competition to HDP.

For how long have I used the solution?

I have experience with this product. The short form is HDP 2.7. I have been using it since 2011. 

It was on-premises and hybrid for the first three months, then we migrated it to AWS and Azure.

What do I think about the stability of the solution?

In terms of storing data in different formats, it's been somewhat unstable. But when compared to Azure Gen2 and its support and features, it's much more advanced. The suitability depends on specific use cases, but overall, HDP seems more mature than it was in the past.

What do I think about the scalability of the solution?

From my experience with both HDP and CDH, they are both scalable. Currently, most people in my company have shifted to Azure, so they are using Gen2 primarily and discarding Gen1.  

How are customer service and support?

I have frequently contacted technical support for both Cloudera and Hortonworks.

We have an IT system to raise issues against their team. Issues usually get attended by someone at an L1, L2, or L3 support level. They connect with us directly.

Which solution did I use previously and why did I switch?

Previously, we used Cloudera Data Platform (CDP), which turned out to be a cloud-based Azure infrastructure, and implemented metadata solutions like Hive and others.

How was the initial setup?

The setup was very difficult on non-cloud platforms. We had to implement a version-based approach. However, it became simpler with the use of Docker. We used to do it HDP sandboxes and VM boxes and then created clusters in the ancient days. Now, on cloud platforms, it's much easier, just a matter of a few clicks. That's another approach we can take.

What's my experience with pricing, setup cost, and licensing?

I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high.

It was particularly expensive for Cloudera and Hortonworks Data Platform. Both options were quite resource-intensive.

So, seven, or even nine or ten years ago, it was quite expensive.

What other advice do I have?

I recommend a mature decision-making model. Assess your specific needs and use cases. If HDP suits your requirements, use it. Otherwise, there are many advanced options available. Review and choose the best one for your use case.

Overall, I would rate the solution a nine out of ten. 

I simply love this technology when it comes to new developments. And I've been working with it for the past twelve to thirteen years. However, with the emergence of new technologies, there might be a chance that I would reduce one point because there's room for improvement.  

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure


    Leslie Mavonyani

Helps with data management and has good scalability

  • August 31, 2023
  • Review provided by PeerSpot

What is our primary use case?

We use Hortonworks Data Platform for data management, significant data ingestion, and analytics.

What needs improvement?

Hortonworks Data Platform has a limited user community. I haven't seen much discussion about user experiences. More information could be there to simplify the process of running the product.

For how long have I used the solution?

We have been using Hortonworks Data Platform for a couple of months.

What do I think about the stability of the solution?

I rate the product's stability an eight out of ten.

What do I think about the scalability of the solution?

We have five Hortonworks Data Platform users in our organization. It is a scalable platform.

How was the initial setup?

The initial setup could be more straightforward. It would help if you are technically inclined to follow the necessary steps. There could be easy ways to set it up. It takes 45 minutes to complete and requires a team of five people to execute the process.

What about the implementation team?

We implement the product in-house.

What's my experience with pricing, setup cost, and licensing?

Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results.

What other advice do I have?

I recommend Hortonworks Data Platform to others and rate it an eight out of ten.


    TonyOladipo

Upgrades and patching are addressed by the solution, and they offer a sandbox for testing

  • July 19, 2023
  • Review provided by PeerSpot

What is our primary use case?

There are a lot of use cases for the Hortonworks Data Platform. We use it alongside GPFS, so most of the information we use for operational analytics is primarily on the Hortonworks Data Platform.

What is most valuable?

The upgrades and patches must come from Hortonworks. Therefore, if we encounter any problems, they will be responsible for addressing them. This is one of the instances where we have to rely on them for all the upgrades.

What needs improvement?

The cost of the solution is high and there is room for improvement.

For how long have I used the solution?

I have been using the Hortonworks Data Platform for two years.

What do I think about the scalability of the solution?

Hortonworks Data Platform is scalable, but it lacks the capability for horizontal scaling. Therefore, we need to add more servers to increase its capacity.

How was the initial setup?

I am responsible for setting up the infrastructure, but I don't handle the engineering work.

What other advice do I have?

I would rate Hortonworks Data Platform an eight out of ten. The solution delivers on its promises, and Hortonworks provides a sandbox for testing before making a purchase.

The maintenance requires a lot of people, including the DRE and IRE teams.

It is not practical for most organizations that lack large amounts of resources to maintain their own data platform. The Hortonworks Data Platform makes it easier for such organizations.


    Anubhav A.

Cloudera is a great hadoop environment

  • July 09, 2015
  • Review provided by G2

What do you like best about the product?
Ease of use and setup. You are easily able to diagnose problems with the cluster through the GUI. Spark integration as well as Hbase is great for our needs. Kafka integration has helped us test a new feature in application thereby increasing performance. All the metrics related to the environment really gives us an idea about our clusters health, thereby reducing surprises.
What do you dislike about the product?
Couple of small setup that are very integrated to our old system were hard to figure out. A little bit more documentation is needed. SparkSQL is not fully supported and there is no way for us to upgrade an individual component our-self. The change on location of libraries from the given virtualbox image to the production environment caused small issues. It might be better if the VM was able to replicate the production environment as close as possible.
What problems is the product solving and how is that benefiting you?
Storing large amounts of data and processing it in reasonable time frame. Being able to use our old code base with small changes to library as opposed to rewriting our entire code. Option of having mapreduce or YARN is great as our code does not work with YARN. Installing cloudera 5.4 reduces our time to deployment from 4 days to 1 which is great.
Recommendations to others considering the product:
Storm integration, Support for SparkSQL and its newer components. Allowing users to upgrade individual components to match with opensource release. It may not be compatible but will give us users a chance to fix/learn in the meantime.


showing 1 - 10