IBM DataStage as a Service
IBM SoftwareReviews from AWS customer
0 AWS reviews
-
5 star0
-
4 star0
-
3 star0
-
2 star0
-
1 star0
External reviews
71 reviews
from
External reviews are not included in the AWS star rating for the product.
Unmatched Performance and Reliability for Enterprise Data Workloads
What do you like best about the product?
The most impressive aspect of DataStage is its high-performance parallel processing engine, which allows it to handle massive enterprise data volumes with ease. By utilizing "pipelining" and "partitioning," the system can process different stages of a job simultaneously across multiple CPU nodes. This means that instead of waiting for one task to finish before the next begins, data flows through the pipeline like an assembly line, ensuring that even petabyte-scale workloads are completed within tight processing windows.
Furthermore, its visual design environment offers a sophisticated balance between simplicity and power. The drag-and-drop interface allows engineers to build complex ETL logic using pre-built "Stages" for joins, lookups, and transformations without needing to write manual code. However, it remains highly extensible for developers; if a specific requirement isn't met by a standard component, you can integrate custom Python scripts or SQL, making it flexible enough for both standard reporting and complex data science pipelines.
Finally, DataStage excels in enterprise-grade reliability and governance, which is why it remains a staple in highly regulated industries like finance and healthcare. It integrates seamlessly with metadata catalogs to provide end-to-end data lineage, allowing users to track exactly how data has changed from source to target. Combined with robust error-handling and "Reject Links" that capture bad data without crashing the entire job, it provides a level of stability and auditability that many lightweight or open-source tools struggle to match.
Furthermore, its visual design environment offers a sophisticated balance between simplicity and power. The drag-and-drop interface allows engineers to build complex ETL logic using pre-built "Stages" for joins, lookups, and transformations without needing to write manual code. However, it remains highly extensible for developers; if a specific requirement isn't met by a standard component, you can integrate custom Python scripts or SQL, making it flexible enough for both standard reporting and complex data science pipelines.
Finally, DataStage excels in enterprise-grade reliability and governance, which is why it remains a staple in highly regulated industries like finance and healthcare. It integrates seamlessly with metadata catalogs to provide end-to-end data lineage, allowing users to track exactly how data has changed from source to target. Combined with robust error-handling and "Reject Links" that capture bad data without crashing the entire job, it provides a level of stability and auditability that many lightweight or open-source tools struggle to match.
What do you dislike about the product?
One of the most significant drawbacks of IBM DataStage is its prohibitive cost and complex licensing model, which often makes it inaccessible for small-to-medium businesses. Beyond the high initial purchase price, the "IBM Tax" includes ongoing maintenance and specialized infrastructure requirements that scale aggressively with data volume. Furthermore, because the tool is highly proprietary, organizations face heavy vendor lock-in; migrating logic out of DataStage to a modern, open-source-friendly stack like dbt or Airbyte is notoriously difficult and time-consuming.
From a technical standpoint, many engineers find the platform increasingly clunky and "legacy" compared to agile, cloud-native alternatives. While its parallel engine is powerful, it requires deep, specialized expertise to tune—settings like partition methods and buffer sizes are manual and unintuitive, leading to a steep learning curve for new hires. Additionally, while the newer "Next Gen" versions have improved, the ecosystem is still criticized for being batch-heavy, making it less agile for teams that require modern real-time streaming or "DataOps" automation.
From a technical standpoint, many engineers find the platform increasingly clunky and "legacy" compared to agile, cloud-native alternatives. While its parallel engine is powerful, it requires deep, specialized expertise to tune—settings like partition methods and buffer sizes are manual and unintuitive, leading to a steep learning curve for new hires. Additionally, while the newer "Next Gen" versions have improved, the ecosystem is still criticized for being batch-heavy, making it less agile for teams that require modern real-time streaming or "DataOps" automation.
What problems is the product solving and how is that benefiting you?
IBM DataStage primarily solves the challenge of data fragmentation and processing bottlenecks in massive enterprise environments. Large organizations often have data trapped in "silos" across legacy mainframes, modern cloud databases, and various third-party applications; DataStage provides a unified, high-performance bridge to extract and harmonize this information. Its parallel processing engine solves the "time problem" by breaking down petabyte-scale datasets into smaller chunks and processing them simultaneously, ensuring that critical business reports and data warehouses are updated within strict overnight windows rather than taking days to complete.
The primary benefit to you and your organization is data trust and operational efficiency. Because the platform includes built-in data quality and governance tools, it automatically cleanses and validates records as they move through the pipeline, reducing the risk of making business decisions based on "dirty" or inaccurate data. Furthermore, its "design once, run anywhere" architecture allows your team to build a data flow once and deploy it across on-premises servers or multiple cloud providers without rewriting code. This saves significant development time and future-proofs your infrastructure, allowing you to focus on gaining insights rather than troubleshooting manual data transfers.
The primary benefit to you and your organization is data trust and operational efficiency. Because the platform includes built-in data quality and governance tools, it automatically cleanses and validates records as they move through the pipeline, reducing the risk of making business decisions based on "dirty" or inaccurate data. Furthermore, its "design once, run anywhere" architecture allows your team to build a data flow once and deploy it across on-premises servers or multiple cloud providers without rewriting code. This saves significant development time and future-proofs your infrastructure, allowing you to focus on gaining insights rather than troubleshooting manual data transfers.
Exceptional Performance and Connectivity with Intuitive Interface
What do you like best about the product?
Wide Connectivity, High Performance and Scalability, Intuitive Graphical Interface
What do you dislike about the product?
High Learning Curve, Infrastructure Dependency
What problems is the product solving and how is that benefiting you?
Complex data integration, Data transformation and cleaning
Data Integration and Quality with DataStage
What do you like best about the product?
Best data integration tool on the market with a wide range of connectors and advanced data integration and quality features.
What do you dislike about the product?
I quite like the platform as a whole, but I believe it can improve regarding data lineage (it should indeed improve now with the arrival of Manta to the IBM portfolio).
What problems is the product solving and how is that benefiting you?
Help our clients work with integrated, qualified, and reliable data.
IBM Datastage for ETL
What do you like best about the product?
IBM InfoSphere DataStage is simple yet efficient tool for ETL processing.
It has the variety of stages to implement your designs and test the same at runtime.
It has got additional features compared to other ETL tools, which helps in debugging and error handling.
It has the variety of stages to implement your designs and test the same at runtime.
It has got additional features compared to other ETL tools, which helps in debugging and error handling.
What do you dislike about the product?
Datastage is UI is little at the backseat compared to other ETL tools.
Stages could be categorised based on functionalities.
Stages could be categorised based on functionalities.
What problems is the product solving and how is that benefiting you?
It is solving the data integration problems from variety of platforms and provide approciate data formats at the end user.
Like, JSON, Files, txts, DB , amd Bigdata etc
Like, JSON, Files, txts, DB , amd Bigdata etc
Good product
What do you like best about the product?
Its speed. It is very fast and responsive. Support is good.
What do you dislike about the product?
a little hard to use and implement. hs few bugs
What problems is the product solving and how is that benefiting you?
fast data integration and processing
Analyzing vendor data
What do you like best about the product?
There are two reasons for us to use it, less cost, and because it's user friendly.
What do you dislike about the product?
Customer support is excellent, furthermore there can be some improvement on the number of features.
We did not face any problems during its implementation and its integration.
Frequency of use is not high as we are not just relying on it, but we might in future.
We did not face any problems during its implementation and its integration.
Frequency of use is not high as we are not just relying on it, but we might in future.
What problems is the product solving and how is that benefiting you?
I cannot disclose it because of the company's policy, but in brief we are using it to analyse multiple vendor data.
Data Stage review
What do you like best about the product?
- excellent performance in executing ETL processes for large amounts of data.
What do you dislike about the product?
- Lack of documentation and available knowledge for study and learning.
- Lack of support from the supplier (various problems with the product and also lack of support for functionalities like the quality stage).
- Interface is not at all intuitive and difficult to use.
- Lack of support from the supplier (various problems with the product and also lack of support for functionalities like the quality stage).
- Interface is not at all intuitive and difficult to use.
What problems is the product solving and how is that benefiting you?
execution of ETL processes and data quality.
IBM InfoSphere DataStage
What do you like best about the product?
Easy of use, easy of implementation, compact product. Very good team of customer support.
Great performance for large data volumes, allows parallelism
Great performance for large data volumes, allows parallelism
What do you dislike about the product?
it is not support code versioning without git integration
What problems is the product solving and how is that benefiting you?
Data Transformation, Data governance, interconnection of non-homogeneous origins, quick creation of interfaces between applications.
Creation and impletation of data quality rules
Creation and impletation of data quality rules
Using Datastage for ETL
What do you like best about the product?
We use InfoSphere DataStage for ETL in our organisation and as datastage can easily handle large data (Tbs) and we can transform our data easily. It's easier to design our jobs in datastage and to run them.
What do you dislike about the product?
As a beginner I found using datastage hard. As there are so many functionalities and hence it takes time to get a hang of it. But once you start practicing it, it becomes easy.
What problems is the product solving and how is that benefiting you?
As our organisation handle very large data and to extract, transform and load we need some powerful tool. Hence Datastage is solving our problem by handling it prefectly. And we are easily able to build our ETL jobs.
Review on IBM Infosphere Datastage
What do you like best about the product?
DataStage helps us to construct a source model that describes the rules for querying the source database. We have used several stages while making Dimension tables and fact table like transformer, lookup, joins etc. Steps are so easy to use that we must drag and drop the stages required for building the tables.
What do you dislike about the product?
The thing that I don't like about IBM Infosphere Datastage application is a plan of it is costly. Also, the Metadata propagation in Jobs is somewhat complex for some users and issues in the processing of XML.
What problems is the product solving and how is that benefiting you?
IBM Infosphere Datastage is used to develop jobs that move data from source systems to target systems using simple steps. It is not only data warehousing, we can also use infosphere for analysis and see the enormous architecture of your OLTP systems
showing 1 - 10