I use Starburst Galaxy to support interactive queries and dashboards.
When comparing it to Databricks, which is also deployed to serve ETL pipelines, Starburst is much faster and much more friendly to non-technical employees.
External reviews are not included in the AWS star rating for the product.
I use Starburst Galaxy to support interactive queries and dashboards.
When comparing it to Databricks, which is also deployed to serve ETL pipelines, Starburst is much faster and much more friendly to non-technical employees.
Starburst is the most important portal for both technical and non-technical employees to access the data lake.
Starburst also provides a user-permission system which protects sensitive data.
The most fundamental feature is the query engine, which is much faster than any of the competitors.
Starburst is able to finish most queries within 10 seconds, which is especially important for many non-technical employees.
I would like Starburst to leverage AI to improve usability.
Data lakes are complicated and difficult for users to explore. AI would help a lot in this respect.
I have used the solution for over three years.
I used other tools before, and I switched because Starburst is faster.
The price is reasonable and controllable.
For example, you have a fixed-size cluster and the cost is predictable. Queries may slow during rush hours, but there is no spike in billing.
I evaluated other solutions such as Databricks.
I wish there were more products available in the ecosystem.
We use Starburst Galaxy to query data across our diverse data ecosystem. Our data has evolved over many years and is spread across many data sources. Starburst enables us to query across this ecosystem without having to move everything into a single location.
Our teams require a method for integrating data from various systems for reporting and ad-hoc analysis, and Starburst Galaxy fundamentally meets this need.
The biggest win has been the ability to combine data from multiple sources and deliver it to the business at record speed.
This capability has allowed us to query directly through Starburst Galaxy, enabling teams to access integrated data that would otherwise be hard to pull together.
This has reduced both our ETL processing time and storage costs. We are answering questions that would have been hard, if not impossible, to answer previously because the data came from disparate, disconnected sources.
Federated querying through Starburst Galaxy has unlocked our ability to move data using SQL, keeping data in the data layer. The ability to use SQL to query multiple data sources and then write to a single destination has been essential.
Additionally, setting up new data connections is straightforward.
I would like to see per-model cluster routing selection when using dbt. Cluster startup time can be slow, sometimes taking over a minute.
We started using Trino, which worked, but we wanted a reliable managed solution to help us scale.
The pricing is transparent and reasonable.
We considered using open source Trino.
Starburst Galaxy addresses our primary problem of managing and working with data spread across multiple systems. Our teams can access and combine data from any source, enabling faster insights and reducing the time spent on manual data wrangling.
Starburst Galaxy is becoming a cornerstone of our data platform, empowering us to make smarter and faster decisions across the organization.
Our primary use case is to manage hundreds of terabytes of data efficiently across a wide range of internal use cases, including ingestion/ETL, machine learning pipelining, and customer-facing product workflows.
It is a top priority to enable all engineers to have access to this volume of data without the concern of overspending on expensive cloud warehouse providers.
We have experienced several improvements across our organization.
Our data ingestion processes previously involved copying data from S3 to Snowflake, which was fairly costly and required constant vigilance to purge old data so that our source tables would not bloat.
Now we are able to move ingestion staging data to Iceberg tables, resulting in a much better experience in terms of both compute and storage costs as well as maintenance.
Data transformation has also become more efficient.
Starburst on Trino, combined with our SQL-native data transformation tool SQLMesh, has delivered anywhere from a two to five times improvement in compute performance across our transformation DAG.
This improvement is largely due to how efficiently Trino scans relevant data without requiring any additional setup, such as defining partitions in Snowflake.
In terms of cost effectiveness, we are already forecasting a 25% reduction in cloud data provider spending, even while continuing to use both Snowflake and Starburst.
This is because we are able to shift a significant amount of compute to Galaxy, and the cost difference compared to our previous approach of running jobs exclusively on Snowflake is substantial.
Cross-catalog querying and compatibility with AWS Glue have both significantly enhanced the user experience.
We operate several accounts within our AWS organization, each containing substantial volumes of data, and the onboarding process with Starburst has been fairly quick, even in the face of AWS IAM complexities.
The most persistent issue is the cluster spin-up time.
Coming from Snowflake, where warehouse spin-ups are nearly instantaneous, it has been a challenge to adapt.
However, I believe the Starburst team is working on solutions for this.
Additionally, the cluster and query monitoring UI lacks an optimal user experience.
I would recommend that the Starburst team invest in forking the Trino console and enhancing that tool, as observability is very important to us.
More Starburst-specific documentation would also be helpful.
I understand that some Trino functionality, such as certain parameters, is not supported, so clearer guidance would be appreciated.
We previously used only Snowflake but are now shifting toward a more hybrid architecture.
We primarily added Starburst to our stack due to the potential for significant cost savings and because implementing a lakehouse is a more effective long-term data strategy.
The setup cost is fairly transparent.
There are many opportunities to find cost savings or discounts, especially for a startup like ours.
I appreciate that the pricing is available online, although I will note that comparable compute is only slightly cheaper than Snowflake warehouse costs, for example.
We considered Onehouse and Clickhouse as alternative solutions.
We are in the early phases of our Starburst relationship and are looking forward to how we can grow with it in the future.
I use Starburst as a cost-efficient hosted option for Trino for data integration and ad-hoc analysis across a broad range of data sources. It is surprisingly useful to query SQL Server, a Google Sheet, data in a blob store, and persist it in Postgres for downstream consumption.
In addition, the Galaxy platform features such as scheduling jobs, offering a data catalog, easy permission and access control management, and the strong technical support from Starburst make it a breeze to use compared to something like Athena.
I have removed data silos, sped up my pipelines (three to five times the speed of Redshift on a per-cost basis), and now have a single point of entry with consistent SQL semantics to all of my data systems.
Query federation coupled with excellent performance is the best feature by far. A consistent interface to all my data systems and a friendly UI that supports data personas from Analyst to Architect and everyone in between is extremely valuable.
As a hosted option, I wish I had more control over the cluster configuration, specifically regarding some of the more advanced options. Trino is extremely flexible and powerful, but some of this functionality is gated on the Galaxy platform.
Most users and admins will never need these features, but on occasion I have encountered issues that could have been resolved by a configuration change in five minutes rather than redesigning a data product. That said, I have a high degree of expertise with the tool, and this is more of a quibble than a major issue.
I have used the solution for four years.
If I have the choice of tooling for managing and interacting with data systems, I always choose Trino first and Starburst Galaxy if I am responsible for managing the deployment. My current team deployed on Redshift before I joined, and the first and best architectural choice I made was to migrate to Galaxy.
You pay for cluster uptime. It is important to be aggressive about autoscaling, as a single worker will get you a long way. I recommend never connecting a BI tool to your Galaxy cluster. Instead, write the data to Postgres or a hot database and serve it from there so you don't pay for expensive uptime to serve dashboards.
Having a good amount of expertise in the domain, I knew that Galaxy was the right choice for quick deployment. Having managed data at scale (hundreds of terabytes) in the past, I know Trino will get the job done without a lot of hassle.
Athena specifically has two major issues. First, connectors are restricted on write functionality and are more difficult to configure. Not being able to write through connectors is a deal breaker. Second, if you scale out enough, you will encounter issues due to Athena's shared tenancy model and then need to migrate to Trino eventually. It is better to save yourself the hassle.
If you are unsure about the service, try the free trial. You can be up to speed with your existing systems in half an hour.
I use Starburst Galaxy to connect to many Amazon S3 and RDS data sources, exposing that data for query and analysis by data engineering teams, as well as executive stakeholders in the organization.
I also use the product to serve many Tableau dashboards used by different teams within the organization.
I am able to combine data from across the organization to create integrated dashboards that are difficult to construct otherwise.
The on-demand nature of Starburst Galaxy has greatly reduced the computation and operational costs to achieve this compared to other open source tools I have used in the past. With Starburst Galaxy, the data is ready and available 24/7.
Starburst Warp Speed has helped me reduce overall operating expenses compared to standard query performance.
I am now able to answer questions in a couple of minutes that would otherwise take hours or days of time for my data engineering teams. I have found the cluster management to be extremely useful. I am able to create clusters configured for various workloads and then turn each one on as needed and let it turn itself off when idle.
This has enabled a number of new use cases. I am able to run much larger jobs than in the past without blocking small concurrent tasks. All of my processes are now running on a cluster that is right-sized instead of trying to manage my own infrastructure by scaling up or down numerous times throughout the day. I am also able to segment costs by product, which I was not able to do in the past.
I am able to connect Starburst Galaxy to other tools such as Tableau using the connector, but I would like to see better support for spinning up a cold Starburst Galaxy cluster via Tableau, as it currently just times out.
I would like the Starburst connector in Tableau to have the capability to hold the connection open while Starburst Galaxy starts up.
I have used Starburst Galaxy for 1.5 years.
I previously used Starburst Enterprise on premise in Amazon Web Services.
I switched to Starburst Galaxy to take advantage of automatic feature upgrades as well as shifting infrastructure costs to Starburst's cloud environment, which operates data workloads more efficiently than the Starburst Enterprise on premise solution.
Pricing for Starburst Galaxy is competitive compared to running my own workloads using open source alternatives.
I recommend you consider the total cost of ownership when deciding whether Starburst Galaxy is a good fit for your organization.
I compared Starburst Galaxy to Starburst Enterprise and decided to make the switch to their cloud offering.
This product is worth your time to investigate and evaluate.
I highly recommend Starburst Galaxy to any organization with a need to work with data at scale in the cloud, even in a multi-cloud environment.