We are using StreamSets to migrate our on-premise data to the cloud.
IBM StreamSets
IBM SoftwareExternal reviews
External reviews are not included in the AWS star rating for the product.
Capable streaming data processing tool
a. Presence of inbuilt connectors (in-preise version) which can useful in using it for almost every source/target systems.
b. The is GUI is user friendly and it has certainly helped my platform team to create the streaming data pipeline faster )Previously we were using pyspark)
c. Alongwith tool, the Streamset support team is also excellent.
d. The availability of streamsets academy through which we an get our resources trained easily.
The inability to supports "exactly once" delivery of data creates limitation in few of the use cases.Although we have managed this through workaround but having ths ability in Streamsets will certainly help.
2. It has helped to create event based real time pipeline swhich is being used to generate marketing prompts to the customers.
3.The development time (as compared to pyspark) has been reduced as it is low code GUI tool
4. It has also helped in reducing our dependency on other ELT tools e.g. Informatica DEI.
Streamset make day by day easier
A great tool to work with Streaming Data
2. Its ability to handle & perform transformation on streaming data easily and effectively.
3.Topologies are quite good and provide visibilty on how systems are connected & data flows across enterprise.
4.Orchestration & Scheduling jobs are quite easy.
2. Latency should be reduced as working with large datasets takes a bit of time.
It has also helped to reduce dependency on costly informatica & Abinitio tools.
Graphical interface for easier use
Ease of configuring and managing pipelines centrally
What is our primary use case?
What is most valuable?
I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.
What needs improvement?
StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.
For how long have I used the solution?
I have been using StreamSets for a year and a half.
What do I think about the stability of the solution?
It's reasonably stable.
What do I think about the scalability of the solution?
It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.
How are customer service and support?
Customer service and support are good.
How would you rate customer service and support?
Positive
How was the initial setup?
It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract.
In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.
Just one person is enough for the maintenance.
What's my experience with pricing, setup cost, and licensing?
The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.
What other advice do I have?
It's a very good tool. Overall, I would rate the solution an eight out of ten.
Provides a good bifurcation rate and accuracy, and saves time and money
What is our primary use case?
We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.
How has it helped my organization?
We could bifurcate the datasets that we received from different hospitals. We could bifurcate it on the basis of the medical requirements of the hospitals, and sometimes, on the basis of the schedule or purpose. We were obtaining data that we could then supply to some consulting firms or other sources.
StreamSets saved us time. The accuracy was pretty good, and it was definitely better than what we were using previously. Earlier, we had hired two people who were doing the job manually, and we were also using some other platform. We had to pay for them. Overall, we have saved a lot of time, and the accuracy has improved as well. We didn't calculate the time savings, but I believe we saved about three days in a week, so there were about 30% to 40% time savings.
StreamSets reduced the workload. There was a 10% to 15% reduction in the workload.
StreamSets helped us to scale our data operations. The limit at which we purchased this solution was incredible. We were never able to reach the limit that we purchased, but it helped us to increase or scale our operation. Especially in months when we received a higher number of entries, we were able to perform our work on time.
What is most valuable?
The ability to have a good bifurcation rate and fewer mistakes is valuable. In the scenario we had, when we had to bifurcate the data, we did not completely cut the data. We made a different route for one set of data, which went into a different operating system. There was also a complete set of data along with the original data that got cut, which once again went through the filtration process, and in this way, it kept on happening. Different solutions that were in place were not providing this feasibility. With the other solutions that we were using earlier, we had to reuse the data again and again from the start. It was a time-taking process.
Their support system was pretty good. When we were setting up the bifurcation protocols that we wanted to set up, we had a few support calls with them, and those were really helpful.
What needs improvement?
The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer.
Its initial setup could also be a bit easier.
For how long have I used the solution?
I used this solution for about a year.
What do I think about the stability of the solution?
It's a stable product. We used it for about a year, and we hardly had to shut it down.
What do I think about the scalability of the solution?
We are a medium enterprise. We only have three departments in our company, and only one of the departments is using it. Salespeople don't use it. The development people don't use it. We are the ones using it, and our job is to process the information, so only one department is using the solution. We have about 18 people in the department.
Up to medium enterprises, it's a good choice. You can scale between one million to ten million data files. I don't believe they offer the service for a hundred million or one billion datasets. It isn't too scalable for large enterprises, but for small and medium enterprises, it's good.
How are customer service and support?
I'd rate them an eight out of ten. The only reason for not giving them a ten out of ten is that if you're doing very important work and you need to get the solution the same day, it's a bit tough to have the team support you in a very short period of time. They usually give you appointments about a day or two days later. Other than that, everything is good.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We were using another solution previously. The major reason for switching to StreamSets was that we needed to scale our operations. Our prior solution could have been scaled, but the cost of scaling was a bit higher. We would have had to hire one more person to be able to scale, but we did not want to hire more people, so we decided to use a completely automated solution for this part so that it could be handled by only one of our team members. That was the primary requirement. The cost-benefit analysis was done by one of our peers. His proposal was pretty good, and everyone agreed to it.
How was the initial setup?
Its initial setup is a bit tough. You need to have the technical expertise to do that. The support team is good. They help you around, but if they could make it a bit easier, it would be better.
I believe it operates only from the cloud. We also received the data from our associations on the cloud. We processed it on the cloud, and everything happened on the cloud.
The initial setup was complex because we were not able to directly link the data we were receiving with the StreamSets solution. Linking it required us to fill in or enter some information in StreamSets, but we were not able to figure out what to enter. For that part, we needed their help.
We spent about a week. For the first three days, our team members were trying their best to do it, but then we had to schedule a meeting with them. In terms of the number of people, only one person was working with our team, and there were three people working with the product. I was also involved in the product as a product manager, but I was not directly operating that system.
It didn't require any maintenance as such. Any maintenance activities were related to our side of things. There were mistakes on our end. When we were entering different data, we had to do different configurations in the system.
What was our ROI?
We did the cost-benefit analysis before buying the solution, and it performed even better than that. We were able to replace two of our staff members who were doing this work. The cost that we paid for this solution was pretty less as compared to their salaries, so on the cost-benefit side of things, it was a good deal. We saved about two persons' manual wage, which is about $6,000 a month, and we also saved 15% of a week's time. These two were the biggest returns on the investment. The accuracy was also a bit higher.
What's my experience with pricing, setup cost, and licensing?
Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled. Simultaneously, you need a solution that you don't want to use on a very long-term basis. This solution could not be applied if we were operating with all the hospital chains in the US. We were operating just with one hospital. That's why it worked pretty well, so for medium enterprises, I believe it's very good.
What other advice do I have?
To those evaluating StreamSets, I'd advise doing a cost-benefit analysis because the way of using StreamSets differs from person to person. Someone else might have a very different use case, and they may not run into profit using the solution. For us, it was a good solution because we were hiring people for this work. People were doing the job manually. We saved both time and money, so doing a cost-benefit analysis would be the best thing.
If you are looking to expand your domain or range of operations, StreamSets is very helpful. If you are just looking for a better data analytics tool that can do bifurcation on data, I believe there are other tools or services available in the market that do not focus on the expansion of operations. They focus on doing better and more complex bifurcations.
StreamSets enables you to build data pipelines without knowing how to code. After generating a few responses, you have to enter some basic syntax or code, but generally, one can do a lot of no-code stuff, which was not an important aspect for us because we were operating in the IT space, and our entire team was capable of entering all the syntaxes that were required. It was not an issue for us at any point in time. In fact, in the operations that we were performing, we only used code. When we were testing out our initial datasets, we used some no-code features that were there, but at the later stage, we used only syntaxes.
We did not connect to the messaging systems, but we connected some enterprise databases. We were operating with a set of hospitals in the US, and we had to connect with them only the first time. Afterward, it was the data that was passing through the pipeline. Initially, for a completely new user, it's a bit tricky. Some technical expertise is required. It's a bit tough, but because the support team is there, one would be able to do it.
Overall, I would rate StreamSets an eight out of ten.
StreamSets Data Collector & Transformer Review
1. Data Collector: Collected data from on-premise sources to the cloud.
2. Applied transformations to prepare data for analytics.