Overview
Streamsets animated preview
Streamsets animated preview

Product video
Build scalable, real-time data pipelines with IBM StreamSets on AWS. IBM StreamSets enables seamless data movement across hybrid and multi-cloud environments, integrating with key AWS services to ensure continuous, high-quality data delivery. Automate data integration with scheduled jobs, adapt to evolving sources, and power your enterprise with intelligent, resilient data pipelines built for speed and scale.
Secure your data with enterprise grade protection through control/data plane separation, role based access, and governance compliance
Maintain pipeline integrity and continuity with resilient pipelines that adapt to evolving environments by automatically handling data drift for scaling real time ingestion.
Centralize operations in StreamSets Control Hub to build, run, monitor and manage all your pipelines while minimizing operational overhead and promoting extended team collaboration.
Highlights
- IBM StreamSets enables seamless data ingestion from diverse sources including Kafka, HDFS, databases, files, applications and more into Azure Storage, Azure Event Hub, Azure Synapse and more. It helps enterprises migrate & modernize data platforms to power ML use cases and build cloud-ready applications.
- Reduce breakages by 80% with smart streaming data pipelines that are resilient to change. Eliminate blind spots and control gaps with a single pane of glass for managing and monitoring all pipelines across hybrid and cloud architectures.
- StreamSets powers millions of pipelines from the most trusted global names in healthcare, financial services, and technology. The platform design offers new levels of enterprise visibility across the full lifecycle of data integration.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/12 months |
---|---|---|
IBM StreamSets, 1 VPC | IBM StreamSets, 1 VPC | $12,600.00 |
The following dimensions are not included in the contract terms, which will be charged based on your usage.
Dimension | Cost/unit |
---|---|
IBM StreamSets overage | $1,160.00 |
Vendor refund policy
Please contact your client account team for refund information
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Support
Vendor support
Connect with experts and peers to elevate technical expertise, solve problems and share insights about Data Integration: https://community.ibm.com/community/user/dataops/communities/community-home?CommunityKey=3bfc9f2f-4a5e-470a-9295-4b7cc90c9518Â
Sign in to open a new case or review existing cases:
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products




Customer reviews
Enables effective batch loading with visual interface and enterprise support
What is our primary use case?
We are using StreamSets for batch loading.
What is most valuable?
StreamSets is GUI-based and takes care of load balancing. It allows a hybrid installation approach, rather than being completely cloud-based or on-premises. Additionally, StreamSets provides good enterprise support with a quick turnaround.
What needs improvement?
One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2Â boxes in the Amazon AWSÂ infrastructure. I had to switch to a new EC2Â box, even though the processor was not fully utilized. It would be beneficial if StreamSets addressed any potential memory leak issues to prevent unnecessary upgrades. Additionally, it would be a great enhancement if StreamSets could produce a lineage graph to visualize how the data has passed through the system.
For how long have I used the solution?
I started using StreamSets in 2022, so it's been almost four years now.
What do I think about the stability of the solution?
From one to ten, I would rate the stability of the product at eight point five.
What do I think about the scalability of the solution?
For scalability, I would also rate it at eight point five.
How are customer service and support?
IBM technical support sometimes transfers tickets between different teams due to shift changes, which can be frustrating. The transition can make resolution slow, as I have to explain the issue multiple times. Overall, I would rate the technical support as eight out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of StreamSets isn't simple, but it's not too complex either. It’s a standard setup and is fine.
Which other solutions did I evaluate?
StreamSets is the leader in the market. There are many products, and the choice depends on needed features and use cases, but I view StreamSets as the leader due to its capabilities.
What other advice do I have?
If asked, I definitely recommend StreamSets to other users. My overall rating for the solution is nine.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Useful for data transformation and helps with column encryption
What is our primary use case?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is loaded into Amazon Redshift or other data warehousing solutions.
What is most valuable?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.
What needs improvement?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered.
For how long have I used the solution?
I have been working with the product for five years.Â
What do I think about the scalability of the solution?
The tool's flexibility and performance are good. It allows for task dependency management so others won't be affected if one task fails. It can handle large volumes of data and supports features like change data capture for tracking changes.
Around six months ago, many people in my company were using StreamSets. In the US team, about 42 people across different projects were using it. Similarly, in 2021, there were around 43 users. About 16-18 people in Mumbai used it in my previous company.
How are customer service and support?
The tool's support is good.Â
How was the initial setup?
Installing StreamSets can take time because it has two versions: a data controller and a data transformer. The data controller is easier to install, but the transformer is more complicated and requires more steps, like setting up tasks and configurations.
It would be best to ensure the environment was ready, including that it worked well with other servers. The process can be both easy and difficult, but if you follow the documentation, it should be manageable.
What was our ROI?
Whether the tool is worth the money depends on the situation. If you don't want to spend a lot on competing products like Databricks or Glue, then StreamSets might be a better option. It's particularly valuable if you prefer not to invest heavily in training your team on new technologies. If your ETL developers or data engineers are comfortable with StreamSets, it can be worth the money.
What's my experience with pricing, setup cost, and licensing?
The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good.
What other advice do I have?
We use various tools and alerting systems to notify us of pipeline errors or failures. StreamSets supports data governance and compliance by allowing us to encrypt incoming data based on specified rules. We can easily encrypt columns by providing the column name and hash key.Â
If you're considering using StreamSets for the first time, I would advise first understanding why you want to use it and how it will benefit you. If you're dealing with change tracking or handling large amounts of data, it could be cost-effective compared to services like Amazon. It's easy to schedule and manage tasks with the tool, and you can enhance your skills as an ETL developer. You can easily migrate traditional pipelines built on platforms like Informatica or Talend to StreamSets. I rate the overall solution an eight out of ten.Â
Ease of configuring and managing pipelines centrally
What is our primary use case?
We are using StreamSets to migrate our on-premise data to the cloud.
What is most valuable?
I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.
What needs improvement?
StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.
For how long have I used the solution?
I have been using StreamSets for a year and a half.Â
What do I think about the stability of the solution?
It's reasonably stable.
What do I think about the scalability of the solution?
It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.
How are customer service and support?
Customer service and support are good.Â
How would you rate customer service and support?
Positive
How was the initial setup?
It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract.Â
In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.
Just one person is enough for the maintenance.Â
What's my experience with pricing, setup cost, and licensing?
The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.
What other advice do I have?
It's a very good tool. Overall, I would rate the solution an eight out of ten.Â
Which deployment model are you using for this solution?
Provides a good bifurcation rate and accuracy, and saves time and money
What is our primary use case?
We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.
How has it helped my organization?
We could bifurcate the datasets that we received from different hospitals. We could bifurcate it on the basis of the medical requirements of the hospitals, and sometimes, on the basis of the schedule or purpose. We were obtaining data that we could then supply to some consulting firms or other sources.
StreamSets saved us time. The accuracy was pretty good, and it was definitely better than what we were using previously. Earlier, we had hired two people who were doing the job manually, and we were also using some other platform. We had to pay for them. Overall, we have saved a lot of time, and the accuracy has improved as well. We didn't calculate the time savings, but I believe we saved about three days in a week, so there were about 30% to 40% time savings.
StreamSets reduced the workload. There was a 10% to 15% reduction in the workload.
StreamSets helped us to scale our data operations. The limit at which we purchased this solution was incredible. We were never able to reach the limit that we purchased, but it helped us to increase or scale our operation. Especially in months when we received a higher number of entries, we were able to perform our work on time.
What is most valuable?
The ability to have a good bifurcation rate and fewer mistakes is valuable. In the scenario we had, when we had to bifurcate the data, we did not completely cut the data. We made a different route for one set of data, which went into a different operating system. There was also a complete set of data along with the original data that got cut, which once again went through the filtration process, and in this way, it kept on happening. Different solutions that were in place were not providing this feasibility. With the other solutions that we were using earlier, we had to reuse the data again and again from the start. It was a time-taking process.
Their support system was pretty good. When we were setting up the bifurcation protocols that we wanted to set up, we had a few support calls with them, and those were really helpful.
What needs improvement?
The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer.Â
Its initial setup could also be a bit easier.
For how long have I used the solution?
I used this solution for about a year.
What do I think about the stability of the solution?
It's a stable product. We used it for about a year, and we hardly had to shut it down.
What do I think about the scalability of the solution?
We are a medium enterprise. We only have three departments in our company, and only one of the departments is using it. Salespeople don't use it. The development people don't use it. We are the ones using it, and our job is to process the information, so only one department is using the solution. We have about 18 people in the department.
Up to medium enterprises, it's a good choice. You can scale between one million to ten million data files. I don't believe they offer the service for a hundred million or one billion datasets. It isn't too scalable for large enterprises, but for small and medium enterprises, it's good.
How are customer service and support?
I'd rate them an eight out of ten. The only reason for not giving them a ten out of ten is that if you're doing very important work and you need to get the solution the same day, it's a bit tough to have the team support you in a very short period of time. They usually give you appointments about a day or two days later. Other than that, everything is good.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We were using another solution previously. The major reason for switching to StreamSets was that we needed to scale our operations. Our prior solution could have been scaled, but the cost of scaling was a bit higher. We would have had to hire one more person to be able to scale, but we did not want to hire more people, so we decided to use a completely automated solution for this part so that it could be handled by only one of our team members. That was the primary requirement. The cost-benefit analysis was done by one of our peers. His proposal was pretty good, and everyone agreed to it.
How was the initial setup?
Its initial setup is a bit tough. You need to have the technical expertise to do that. The support team is good. They help you around, but if they could make it a bit easier, it would be better.
I believe it operates only from the cloud. We also received the data from our associations on the cloud. We processed it on the cloud, and everything happened on the cloud.
The initial setup was complex because we were not able to directly link the data we were receiving with the StreamSets solution. Linking it required us to fill in or enter some information in StreamSets, but we were not able to figure out what to enter. For that part, we needed their help.
We spent about a week. For the first three days, our team members were trying their best to do it, but then we had to schedule a meeting with them. In terms of the number of people, only one person was working with our team, and there were three people working with the product. I was also involved in the product as a product manager, but I was not directly operating that system.
It didn't require any maintenance as such. Any maintenance activities were related to our side of things. There were mistakes on our end. When we were entering different data, we had to do different configurations in the system.
What was our ROI?
We did the cost-benefit analysis before buying the solution, and it performed even better than that. We were able to replace two of our staff members who were doing this work. The cost that we paid for this solution was pretty less as compared to their salaries, so on the cost-benefit side of things, it was a good deal. We saved about two persons' manual wage, which is about $6,000 a month, and we also saved 15% of a week's time. These two were the biggest returns on the investment. The accuracy was also a bit higher.
What's my experience with pricing, setup cost, and licensing?
Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled. Simultaneously, you need a solution that you don't want to use on a very long-term basis. This solution could not be applied if we were operating with all the hospital chains in the US. We were operating just with one hospital. That's why it worked pretty well, so for medium enterprises, I believe it's very good.
What other advice do I have?
To those evaluating StreamSets, I'd advise doing a cost-benefit analysis because the way of using StreamSets differs from person to person. Someone else might have a very different use case, and they may not run into profit using the solution. For us, it was a good solution because we were hiring people for this work. People were doing the job manually. We saved both time and money, so doing a cost-benefit analysis would be the best thing.
If you are looking to expand your domain or range of operations, StreamSets is very helpful. If you are just looking for a better data analytics tool that can do bifurcation on data, I believe there are other tools or services available in the market that do not focus on the expansion of operations. They focus on doing better and more complex bifurcations.Â
StreamSets enables you to build data pipelines without knowing how to code. After generating a few responses, you have to enter some basic syntax or code, but generally, one can do a lot of no-code stuff, which was not an important aspect for us because we were operating in the IT space, and our entire team was capable of entering all the syntaxes that were required. It was not an issue for us at any point in time. In fact, in the operations that we were performing, we only used code. When we were testing out our initial datasets, we used some no-code features that were there, but at the later stage, we used only syntaxes.
We did not connect to the messaging systems, but we connected some enterprise databases. We were operating with a set of hospitals in the US, and we had to connect with them only the first time. Afterward, it was the data that was passing through the pipeline. Initially, for a completely new user, it's a bit tricky. Some technical expertise is required. It's a bit tough, but because the support team is there, one would be able to do it.
Overall, I would rate StreamSets an eight out of ten.
It's lightweight and well-integrated, and it saves a lot of money and time
What is our primary use case?
StreamSets is being used in the IT department to make sure that we have a stable solution and that our configuration is secure and running smoothly. We are using it for our data analytic tool as well as for real-time prediction for various real-life business use cases. It's helping us in generating new business ideas. It's a tool that allows us to share data between platforms, which also removes the dependency on other ETL tools, such as SSIS .
How has it helped my organization?
StreamSets is straightforward to use for implementing batch, streaming, or ETL pipelines once you know how to use it. The pipeline can be integrated with Azure Key Vault , which eliminates the need of sharing credentials with developers. The same goes for parameters. It's very easy and straightforward.
It's easy for me to connect StreamSets to enterprise data stores such as OLTP databases and Hadoop , or messaging systems such as Kafka. I've got a good experience with it, and I've been working with it for a long time. It's very easy to connect and integrate for me. However, if you are a beginner, it might not go that well in the first step.
It's easy to move data into analytics platforms using StreamSets.
StreamSets enables us to build data pipelines without knowing how to code. We don't require the best coding skills. We can use the code-free environment to quickly create pipelines. It's very helpful for that.
StreamSets is a helpful tool for pipelines. It's very easy, so we can register data collectors to control hubs using provisioning agents.Â
StreamSets has helped to break down data silos within our organization. It hasn't negatively affected our business. It has fortunately enhanced our development time. We are able to develop secure, stable platforms faster and even remotely.
StreamSets has saved us a lot of time. It saved us the time that we were spending developing applications manually. One budget can be used by the team to come up with a stable solution. Our time savings are 30%. Out of five hours, it has saved us around two hours.
StreamSets has reduced our workload by 35%. It has also saved us money. When you subscribe to StreamSets, it seems very expensive, but when you get to know how their integration and documentation are and how things move, it's definitely efficient. It saves a lot of money. Before implementing it, we spent around 10,000 USD to hire experts. It has saved us 10,000 USD that we would have spent on hiring experts.
What is most valuable?
What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker . If you are a large organization, it's very easy to use Kubernetes .
It has a very easy and user-friendly interface. It only takes a few days for new developers to start and deploy their first pipeline. It provides an easy and powerful integrated environment with different platforms such as Kafka, Salesforce , Oracle Database , REST API, etc. The user interface is a powerful feature of StreamSets.
What needs improvement?
There are so many things that need to be improved. For the StreamSets cloud user interface, there aren't enough use cases and examples for the main problems. In addition, the hybrid data sets cannot be joined in a data connector, which is a significant limitation.Â
There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline. It isn't helpful when you need to apply the same logic for multiple sources. It becomes difficult because you need to create more pipelines and then add coordination between them.
Initially, it's hard to find out or master the logic behind it. It can be hard if you aren't technical enough. There is scope for improvement because it's not straightforward. You need to go through the documentation and make sure that you understand every step. For me, it was a challenging model.
For how long have I used the solution?
I've been using StreamSets for two and a half years.
What do I think about the stability of the solution?
It's stable enough.
What do I think about the scalability of the solution?
It's good enough. We don't use it at multiple locations. We use it at one location, and it's being used by the IT and development departments. We have five users who are using it.
How are customer service and support?
Its deployment was hard. I had to contact them so that they could help me set things up. They are good people. They make sure that you are getting the best experience and that you are getting things in the right way. Their support is good and technical. I'd rate them a 10 out of 10 because of the fact that they were able to troubleshoot the issue.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We did not use a different solution.
How was the initial setup?
In the beginning, it's very hard, but after reading the documentation, you can set up things easily. The documentation is very good and helpful.
For me, deployment was initially very hard because it required a lot of technical skills that I didn't have at that time. I had to contact the team, and they helped me with how to deploy it. The following day, I was able to set up everything. So, deployment is initially very hard, but after you become familiar with StreamSets, you can deploy it more easily.
What about the implementation team?
I deployed it myself. It doesn't require any maintenance because they take care of that.
What was our ROI?
There has been a great return on investment. We can use a single package of one thousand USD to have different applications with different people and different skills. It has saved us the money that we would have spent individually to develop those applications. Using StreamSets has saved us expenses. We have seen 40% ROI.
What's my experience with pricing, setup cost, and licensing?
It's not so favorable for small companies.
Which other solutions did I evaluate?
We didn't evaluate other options. We found StreamSets to be aligned with our expectations.
What other advice do I have?
To those evaluating this solution, I'd advise ensuring that they have someone who is an expert in StreamSets so that you can deploy it in less time. Otherwise, it won't be a great option.Â
I'd recommend StreamSets if you want to design a very good pipeline, but you also have to think about the budget. Its budget is not so favorable for small companies, but it's great software for businesses that want to create good data pipelines and have secure platforms. It will help your business in making sure that you are providing a stable solution to your clients.
Overall, I'd rate StreamSets a 10 out of 10.