For security reasons, we collect millions of signals and put them into the S3 bucket. Once we run Spark job on the raw data, we take all that data and send it to ScyllaDB.
ScyllaDB allows fine-tuning of the table structure. Speed is probably the most critical factor because we perform a lot of heavy data ingestion. One of its core features is its ability to handle high volumes and maintain speed when accessing data. Additionally, high availability and partitioning are built-in features of ScyllaDB.
Properly designing your queries first and then your data model accordingly ensures optimal performance. Another issue to consider is deletions. ScyllaDB does not handle heavy deletions well, which is understandable since it is built for heavy ingestion and fast queries. It works like a charm once users understand the product and its capabilities.
ScyllaDB needs to improve its handling of transactions. When data is deleted, it is not immediately removed; a background process handles the deletion, which takes time. This delay can slow down queries, as they must consider these pending deleted transactions. ScyllaDB should ensure that deletions are processed more efficiently to avoid this issue.
Additionally, ScyllaDB's data modeling needs improvement. If a poorly written query is executed, it can severely impact server performance, causing the server to lock up. ScyllaDB needs to enhance its robustness to handle such scenarios better.
I have been using ScyllaDB for two years. We use a managed service.
Since we use a managed service, they can catch issues much more quickly than we can. They also deploy new patches and updates regularly. We don't need extensive bug testing because they handle all that. However, we tend to break our server through actions, which is usually our fault. Nonetheless, the database releases are very stable.
We have thousands of users on our platform.
Support is very responsive. They handle any issue, regardless of severity. We pay for the managed service, which is not cheap, but the reliability is excellent for those who can afford it.
ScyllaDB is worth the investment if you get returns from the product that benefits your company. However, despite the product's quality, our company is struggling because we are not seeing the expected returns from our customers.
When it comes to performance, ScyllaDB requires you to model your queries first. You need to know what kind of queries you will be running. If you get that part right, data modeling in ScyllaDB will become much more efficient and work well with you. However, running ad hoc queries or queries that were not planned for can lead to increased latency.
Anybody from PostgreSQL, Oracle, or MySQL will experience a learning curve. Typically, with Oracle or PostgreSQL, you design your data model first and then your queries. However, with ScyllaDB, you need to know your queries first and then create your data model accordingly. The learning curve is not too steep—you can learn it within a month or even less. Once you have set it up, switching to another database is hard. ScyllaDB has significant advantages in handling high-volume data ingestion and providing breakneck query speeds. We were struggling with the high volume of data on Postgres.
We moved from Postgres to ScyllaDB. We had to rewrite our queries and data models, resulting in a significant effort. For us, this migration took almost six months. However, for someone starting fresh with ScyllaDB, this extensive effort might not be necessary.
If the use case involves heavy data ingestion and requires very low latency, I would definitely recommend ScyllaDB, provided there is a budget for hosting or managed services. If the requirements fit, it's a great choice. However, you need a developer team that knows how to use it, as well as people for maintenance and database administration. ScyllaDB is worth it, but it does not run by itself. You need people to manage it.
If you have a high volume of data, high ingest rates, and low delete requirements, ScyllaDB is a great choice. It offers features like auto partitioning and many other benefits. However, if your data volume is not very high and latency is not a significant concern, you should evaluate other options. It's important to understand your specific needs and what ScyllaDB has to offer before making a decision.
Overall, I rate the solution a seven out of ten.