AWS Marketplace: Databricks Data Intelligence Platform Reviews

Rishabh P.

I am using databricks on my daily routine , experienced a wonderful experience

December 14, 2022
Review provided by G2

What do you like best about the product?

I like delta live table the most because of its working and the exposure it gave to the customer like data constraints and data quality check , that is best

What do you dislike about the product?

I dislike the python syntax and code to create the delta live tables , so confusing and need to be change the logic , sql syntax is best

What problems is the product solving and how is that benefiting you?

As Delta live tables comes into picture , we dont have to focused on the data quality just only focus is to read the file and all the work will be done by delta live tables pipelines

David H.

Swiss-Army Knife of Data Analytics

December 13, 2022
Review provided by G2

What do you like best about the product?

Databricks' versatility is its best feature. The range of languages and functionality afforded by Databricks is impressive. Thus far, I've written code in R, Python, SQL and Scala in Databricks. And im just getting started. But I've composed SQL code in both R and Python, executed in Databricks. And then we come to interoperability. Data written to SQL can be accessed by either R or Python. Parameters can be passed across SQL, R and Python via widgets or environmental variables. If you have an intractable data or analytics problem, Databricks would be my 'go to' to maximise the options as to how you could potentially code your way under, around or over the obstacles standing between your project and successful execution.

What do you dislike about the product?

The options for deployment of Databricks code from dev >> qa >> uat >> prod aren't as intuitive as I might like. This might have more to do with our current use of Azure Data Factory for orchestration. Setting up workflow natively in Databricks was quite straightforward. It seems to be accessing Databricks notebooks from Azure Data Factory in dev >> qa >> uat >> prod where we are perhaps creating problems for ourselves. Perhaps not a shortcoming in Databricks at all. Curious as to how Databricks would operate with AWS rather than Azure. Perhaps a better experience?

What problems is the product solving and how is that benefiting you?

Data migration, data modelling & reporting.

Malathi M.

Best Data Engineering, ML, Data Science & analytics lakehouse platform

December 06, 2022
Review provided by G2

What do you like best about the product?

Autoloader
Change Data Feed
DLT pipelines
Schema evolution
Jobs Multitask
Integration with leading Git Providers, Data Governance and security tools
MLflow AutoML
Serverless SQL endpoints for analyst
Photon accelerated engine

What do you dislike about the product?

No GUI-based drag & drop
Complete Data Lineage visualization at the metadata level is still no there
NO serverless Cluster for data engineering pipelines if you use existing interactive clusters, only available through job clusters through DLT
Every feature has some limitations involved
More work is needed on orchestration workflows

What problems is the product solving and how is that benefiting you?

Unified Batch & streaming pipeline
Delta Lake
Versioning & History
ACID transaction through delta log
Data curation through Validation & quarantine
Data Ingestion through Autoloader

TANVI M.

Databricks : Best Unified Platform for Data Engineering

December 04, 2022
Review provided by G2

What do you like best about the product?

Delta Table is the best. Spark in a very curated format

What do you dislike about the product?

Nothing as now . Its very good overallll

What problems is the product solving and how is that benefiting you?

We wanted to have a unified platform . The Partner connect is the very good feature of Databricks

Sudarsan S.

Databricks - Best Unified Delta Lakehouse Platform in Data & AI Analytics space

November 23, 2022
Review provided by G2

What do you like best about the product?

Unified Batch & Streaming for source systems data
Autoloader capability, along with Schema Evolution
Delta Live Table & orchestrating with Pipelines
CDC Event streams for SCD1 & SCD2 using DELTA apply changes
Databricks Workflows - Multi-task jobs
Serverless SQL Photon cluster along with Re-dash integrated Visualization
Unity Catalog
Delta Sharing & Data MarketPlace
Data Quality expectations
Integration with Collibra, Privacera & other security & governance tools

What do you dislike about the product?

Issue in running multiple streaming jobs in same cluster
Job clusters can't be reused even for the same retry in PRODUCTION, since shutdown immediately after the job run/fail is set by default - Need to check any options to increase this limit
Multi-Task jobs requires TASK output should be passed to next input TASK and also need to support FAIL on trigger and setting OR dependent predecessors to trigger ,Currently supports only AND
No serverless option for Data engineering jobs outside DLT
DLT need to be matured to handle wide variety of integrating source & target, currently only support DELTA table in databricks. Expecting that be supported for any tool/service/product which supports DELTA format filesystems

What problems is the product solving and how is that benefiting you?

Easier integration from various source system starting from IOT, Streaming connectors & batch connectors
Helpful to easily design the lakehosue medallion architecture RAW, REFINED AND GOLD to contextualize the enterpise common data model & warehosue systems
Data quality expectations in DLT is very helful to speed up the quality check process & displays in monitoring dashboard lineage process.
Auto tuning - compaction is helpful along with VACCUM
Able to integarted metastore well with Collibra for data governance

Martand S.

Scalable, fast and easy to use

November 15, 2022
Review provided by G2

What do you like best about the product?

Databricks delta lake is the default storage for databricks which makes it very useful. Time travel, transaction, partitioning makes it very efficient.

What do you dislike about the product?

Until now I have not faced any limitations for my use case.

What problems is the product solving and how is that benefiting you?

We are using databricks lakehouse to manage batch and realtime data pipeline to fetch data from elasticsearch & Azure datalake.

Nhat H.

Very friendly with Jupyter users.

November 10, 2022
Review provided by G2

What do you like best about the product?

Data Engineer & Machine Learning is very easy to use.

What do you dislike about the product?

It takes time to start a job cluster so I must create a cluster for live update dashboard.

What problems is the product solving and how is that benefiting you?

Schedule ETL jobs.
Many teams can use with different programming languages.

Computer & Network Security

Great product

November 09, 2022
Review provided by G2

What do you like best about the product?

Integarted UI for SQL , Spark , Python . This makes the job really seamless

What do you dislike about the product?

Nothing as of now . Enjoying the product

What problems is the product solving and how is that benefiting you?

We can leverage the concept of delta lake for our use cases

Axel Richier

Simple to set up, fast to deploy, and with regular product updates

November 07, 2022
Review provided by PeerSpot

What is our primary use case?

We're using it to provide a unified development experience for all our data experts, including all data engineers, data scientists, and IT engineers. With the Databrick Platform we allows teams to collaborate easily towards building Data Science models for our clients. The development environment allows us to ingest data from various data sources, scale the data processing and expose them either trough API or through enriched datasets made available to web app or dashboard leveraging the serverless capacities of SQL warehouse endpoints.

How has it helped my organization?

Databricks allowed us to offer an homogeneous development environment accross different accounts and domains, and also across different clouds. The upskilling of our employees is far more linear and faster, while removing the complexity of infrastructure management. This lead to an increased collaboration between domain thanks to a better onboarding experience, more performant pipelines and a smoother industrialization process. Overall client satisfaction has increased and the time to first insight has been reduced.

What is most valuable?

The shared experience of collaborative notebooks is probably the most useful aspect since, as an expert, it allows me to help my juniors debug their books and their code live. I can do some live coding with them or help them find the errors very efficiently.

It has become very simple to set up thanks to its official Terraform provider and the open-source modules made available on GitHub.

I love Databricks due to the fact that we can now deploy it in 15 minutes and it's ready to use. That's very nice since we often help our clients in deploying their first Data Platform with Databricks.

The solution is stable, with LTS Runtimes that have proven to remain stable over the years.

What needs improvement?

I would love to be able to declare my workflows as-code, in an Airflow-like way. This would help creating more robust ingestion python modules we can test, share and update within the company.

We would also love to have access to cluster metrics in a programmatic way, so that we can analyse hardware logs and identify potential bottlenecks to optimize.

Lastly, the latest VS Code extension has proven to be useful and appreciated by the community, as it allows to develop locally and benefits from traditional software best-practices tools like pre-commits for example.

For how long have I used the solution?

I've been using the solution for more than four years now, in the context of PoC to full end-to-end Data Platform deployment.

What do I think about the stability of the solution?

The product is very stable. I've been using it for three years now, and I have projects that have been running for three years without any big issues.

What do I think about the scalability of the solution?

It's very scalable. I have a project that started as a proof of concept on connected cars. We had 100 cars to track at first - just for the proof of concept. Now we have millions of cars that are being tracked. It scales very well. We have terabytes of data every day and it doesn't even flinch.

How are customer service and support?

I've had very good experiences with technical support where they answer me in a couple of hours. Sometimes it takes a bit longer. It's usually a matter of days, so it's very good overall.

Even if it took a bit of time, I got my answer. They never left me without an answer or a solution.

How would you rate customer service and support?

Positive

How was the initial setup?

The implementation is very simple to set up. That's why we choose it over many other tools. Its Terraform provider is our way-to-go for the initial setup has we are reusing templates to get a functional workspace in minutes.

Usually, we have two to five data engineers handling the maintenance and running of our solutions.

What about the implementation team?

We deploy it in-house.

What's my experience with pricing, setup cost, and licensing?

The solution is a bit expensive. That said, it's worth it. I see it as an Apple product. For example, the iPhone is very expensive, yet you get what you pay for.

The cost depends on the size of your data. If you have lots of data, it's going to be more expensive since your paper compute units will be more. My smallest project is around a hundred euros, and my most expensive is just under a thousand euros a week. That is based on terabytes of data processed each month.

Which other solutions did I evaluate?

We looked into Azure Synapse as an alternative, as well as Azure ML and Vertex on GCP. Vertex AI would be the main alternative.

Some people consider Snowflake a competitor; however, we can't deploy Snowflake ourselves just like we deploy Databricks ourselves. We use that as an advantage when we sell Databricks to our clients. We say, "If you go with us, we are going to deploy Databricks in your environment in 15 minutes," and they really like it.

Lately Fabric was released and can offer quite a similar product as Databricks. Yet, the user experience, the CI/CD capabilities and the frequent release cycle of Databricks remains a strong advantage.

What other advice do I have?

We're a partner.

We use the solution on various clouds. Mostly it is Aure. However, we also have Google and AWS as well.

One of the big advantages is that it works across domains. I'm responsible for a data engineering team. However, I work on the same platform with data scientists, and I'm very close to my IT team, who is in charge of the data access and data access control, and they can manage all the accesses from one point to all the data assets. It's very useful for me as a data engineer. I'm sure that my IT director would say it's very useful for him too. They managed to build a solution that can very easily cross responsibilities. It unifies all the challenges in one place and solves them all mostly.

I'd rate the solution nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Hospital & Health Care

It's really useful for big datasets

October 15, 2022
Review provided by G2

What do you like best about the product?

Lesser Running time, handling big datsets, user-friendly platform

What do you dislike about the product?

Cluster active time is less, active time should be increased when not in use

What problems is the product solving and how is that benefiting you?

Helps in Data Warehousing on big datasets, building data engineering pipeline and ML models end to end

Databricks Data Intelligence Platform

Reviews from AWS customer

External reviews

I am using databricks on my daily routine , experienced a wonderful experience

Swiss-Army Knife of Data Analytics

Best Data Engineering, ML, Data Science & analytics lakehouse platform

Databricks : Best Unified Platform for Data Engineering

Databricks - Best Unified Delta Lakehouse Platform in Data & AI Analytics space

Scalable, fast and easy to use

Very friendly with Jupyter users.

Great product

Simple to set up, fast to deploy, and with regular product updates

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What about the implementation team?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Which deployment model are you using for this solution?

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

It's really useful for big datasets