I am working as a data engineer at Fractal. On a daily basis, I work on Azure Cloud, and I use Databricks frequently. We have EDF pipelines and utilize Synapse for our daily tasks.

External reviews
External reviews are not included in the AWS star rating for the product.
Best platform for data engineering and data science
Capability to integrate diverse coding languages in a single notebook greatly enhances workflow
What is our primary use case?
What is most valuable?
Databricks offers various courses that I can use, whether it's PySpark, Scala, or R. I can leverage all these courses in a single notebook, which is beneficial for clients as they can access various tools in one place whenever needed. This is quite significant.
I usually work with PySpark based on client requirements. After coding, I feed the Databricks notebooks into the ADF pipeline for updates. Databricks' capability to process data in parallel enhances data processing speed. Furthermore, I can connect our Databricks notebook directly with Power BI and other visualization tools like Qlik. Once we develop code, it allows us to transform raw data into visualizations for clients using analysis diagrams, which is very helpful.
What needs improvement?
As a data engineer, I see cluster failure in our Databricks user databases as a major issue. I am unsure why, however, our flow, typically involving three to four notebooks, sometimes leads to cluster failure. Despite attempts to identify the problem, there are times when the reason remains unclear. Adjusting features like worker nodes and node utilization during cluster creation could mitigate these failures.
For how long have I used the solution?
I have been using the solution for three years now.
What do I think about the stability of the solution?
Cluster failure is one of the biggest weaknesses I notice in our Databricks.
Which solution did I use previously and why did I switch?
Databricks is beneficial for cost-saving since clients I work for transitioned from AWS Cloud to Azure Cloud for this reason.
How was the initial setup?
The initial setup is very straightforward for us.
What's my experience with pricing, setup cost, and licensing?
I am not very aware of the pricing. We use three to four clusters in our project. Increasing the number or size of clusters, such as adding more workers, would result in higher costs. That's why we limit ourselves to four clusters for our business.
Which other solutions did I evaluate?
In terms of cost efficiency, it's very useful because our clients switched from AWS Cloud to Azure Databricks to save costs.
What other advice do I have?
I would rate the overall product eight out of ten.
Everything is probably good as far as I have used it, but there's room for improvement in cluster integration. Enhancing cluster capabilities while keeping costs lower would be beneficial.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
The Best Data Engineering Tool uses Delta Lake
AutoML and Delta Lake is best features.
The best Bigdata Processing Tool
Revolutionizing Data analytics and AI integration
Big Data processing using Databricks
Designated as Associate Data engineer, sharing my experience as a feedback using this feedback form
Easy to build data pipeline
Very likely to recommend data brick intelligence platform
Provides resources to users quickly without much hassle
What is our primary use case?
I have recently gotten into Databricks and trained on one model. I started using Databricks because of its hardware support and all the other things that it provides, and it is easier to get into. Earlier, when I had to test some part of my code or test if it was working or not, it was not just a fair, not a full production run, but just a fair testing; I had to get a machine, raise a request, get into the whole process. With Databricks, I can just simply create one myself. I could get the resources, whatever they are required, test it out all there, and then go ahead with that, and that is why I have been using it primarily.
What is most valuable?
The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle.
What needs improvement?
I think setting up the whole account for one person and giving access are areas that can be difficult to manage and should be made a little easier.
For how long have I used the solution?
I have experience with Databricks.
What do I think about the stability of the solution?
I think there's a duration after which our training without any activity would expire, which I think is a fair point, and that is the only place where I think this will stop. I haven't come across a lot of problems with Databricks.
What do I think about the scalability of the solution?
The tool is not used as frequently as PyTorch. I don't know why I am comparing Databricks to PyTorch, but I think around five people use it.
How are customer service and support?
I have not contacted the solution's technical support team.
Which solution did I use previously and why did I switch?
Before Databricks, I used to use a cloud support platform.
How was the initial setup?
The solution is deployed on the cloud.
Which other solutions did I evaluate?
I chose Databricks over other products, considering the hardware support it offers.
What other advice do I have?
A little bit of time will be needed to get comfortable with Databricks.
I rate the tool an eight out of ten.