AWS Marketplace: IBM DataStage as a Service Reviews

Poojasree M.

Unmatched Performance and Reliability for Enterprise Data Workloads

December 21, 2025
Review provided by G2

What do you like best about the product?

The most impressive aspect of DataStage is its high-performance parallel processing engine, which allows it to handle massive enterprise data volumes with ease. By utilizing "pipelining" and "partitioning," the system can process different stages of a job simultaneously across multiple CPU nodes. This means that instead of waiting for one task to finish before the next begins, data flows through the pipeline like an assembly line, ensuring that even petabyte-scale workloads are completed within tight processing windows.
Furthermore, its visual design environment offers a sophisticated balance between simplicity and power. The drag-and-drop interface allows engineers to build complex ETL logic using pre-built "Stages" for joins, lookups, and transformations without needing to write manual code. However, it remains highly extensible for developers; if a specific requirement isn't met by a standard component, you can integrate custom Python scripts or SQL, making it flexible enough for both standard reporting and complex data science pipelines.
Finally, DataStage excels in enterprise-grade reliability and governance, which is why it remains a staple in highly regulated industries like finance and healthcare. It integrates seamlessly with metadata catalogs to provide end-to-end data lineage, allowing users to track exactly how data has changed from source to target. Combined with robust error-handling and "Reject Links" that capture bad data without crashing the entire job, it provides a level of stability and auditability that many lightweight or open-source tools struggle to match.

What do you dislike about the product?

One of the most significant drawbacks of IBM DataStage is its prohibitive cost and complex licensing model, which often makes it inaccessible for small-to-medium businesses. Beyond the high initial purchase price, the "IBM Tax" includes ongoing maintenance and specialized infrastructure requirements that scale aggressively with data volume. Furthermore, because the tool is highly proprietary, organizations face heavy vendor lock-in; migrating logic out of DataStage to a modern, open-source-friendly stack like dbt or Airbyte is notoriously difficult and time-consuming.
From a technical standpoint, many engineers find the platform increasingly clunky and "legacy" compared to agile, cloud-native alternatives. While its parallel engine is powerful, it requires deep, specialized expertise to tune—settings like partition methods and buffer sizes are manual and unintuitive, leading to a steep learning curve for new hires. Additionally, while the newer "Next Gen" versions have improved, the ecosystem is still criticized for being batch-heavy, making it less agile for teams that require modern real-time streaming or "DataOps" automation.

What problems is the product solving and how is that benefiting you?

IBM DataStage primarily solves the challenge of data fragmentation and processing bottlenecks in massive enterprise environments. Large organizations often have data trapped in "silos" across legacy mainframes, modern cloud databases, and various third-party applications; DataStage provides a unified, high-performance bridge to extract and harmonize this information. Its parallel processing engine solves the "time problem" by breaking down petabyte-scale datasets into smaller chunks and processing them simultaneously, ensuring that critical business reports and data warehouses are updated within strict overnight windows rather than taking days to complete.
The primary benefit to you and your organization is data trust and operational efficiency. Because the platform includes built-in data quality and governance tools, it automatically cleanses and validates records as they move through the pipeline, reducing the risk of making business decisions based on "dirty" or inaccurate data. Furthermore, its "design once, run anywhere" architecture allows your team to build a data flow once and deploy it across on-premises servers or multiple cloud providers without rewriting code. This saves significant development time and future-proofs your infrastructure, allowing you to focus on gaining insights rather than troubleshooting manual data transfers.

Ivan S.

Exceptional Performance and Connectivity with Intuitive Interface

December 03, 2025
Review provided by G2

What do you like best about the product?

Wide Connectivity, High Performance and Scalability, Intuitive Graphical Interface

What do you dislike about the product?

High Learning Curve, Infrastructure Dependency

What problems is the product solving and how is that benefiting you?

Complex data integration, Data transformation and cleaning

Max R.

Data Integration and Quality with DataStage

June 18, 2025
Review provided by G2

What do you like best about the product?

Best data integration tool on the market with a wide range of connectors and advanced data integration and quality features.

What do you dislike about the product?

I quite like the platform as a whole, but I believe it can improve regarding data lineage (it should indeed improve now with the arrival of Manta to the IBM portfolio).

What problems is the product solving and how is that benefiting you?

Help our clients work with integrated, qualified, and reliable data.

Banking

IBM Datastage for ETL

March 08, 2024
Review provided by G2

What do you like best about the product?

IBM InfoSphere DataStage is simple yet efficient tool for ETL processing.
It has the variety of stages to implement your designs and test the same at runtime.
It has got additional features compared to other ETL tools, which helps in debugging and error handling.

What do you dislike about the product?

Datastage is UI is little at the backseat compared to other ETL tools.
Stages could be categorised based on functionalities.

What problems is the product solving and how is that benefiting you?

It is solving the data integration problems from variety of platforms and provide approciate data formats at the end user.
Like, JSON, Files, txts, DB , amd Bigdata etc

Information Technology and Services

Good product

January 31, 2024
Review provided by G2

What do you like best about the product?

Its speed. It is very fast and responsive. Support is good.

What do you dislike about the product?

a little hard to use and implement. hs few bugs

What problems is the product solving and how is that benefiting you?

fast data integration and processing

Computer Software

Analyzing vendor data

January 25, 2024
Review provided by G2

What do you like best about the product?

There are two reasons for us to use it, less cost, and because it's user friendly.

What do you dislike about the product?

Customer support is excellent, furthermore there can be some improvement on the number of features.

We did not face any problems during its implementation and its integration.

Frequency of use is not high as we are not just relying on it, but we might in future.

What problems is the product solving and how is that benefiting you?

I cannot disclose it because of the company's policy, but in brief we are using it to analyse multiple vendor data.

Financial Services

Data Stage review

December 06, 2023
Review provided by G2

What do you like best about the product?

- excellent performance in executing ETL processes for large amounts of data.

What do you dislike about the product?

- Lack of documentation and available knowledge for study and learning.
- Lack of support from the supplier (various problems with the product and also lack of support for functionalities like the quality stage).
- Interface is not at all intuitive and difficult to use.

What problems is the product solving and how is that benefiting you?

execution of ETL processes and data quality.

Information Technology and Services

IBM InfoSphere DataStage

November 20, 2023
Review provided by G2

What do you like best about the product?

Easy of use, easy of implementation, compact product. Very good team of customer support.
Great performance for large data volumes, allows parallelism

What do you dislike about the product?

it is not support code versioning without git integration

What problems is the product solving and how is that benefiting you?

Data Transformation, Data governance, interconnection of non-homogeneous origins, quick creation of interfaces between applications.
Creation and impletation of data quality rules

Kapil K.

Using Datastage for ETL

September 13, 2023
Review provided by G2

What do you like best about the product?

We use InfoSphere DataStage for ETL in our organisation and as datastage can easily handle large data (Tbs) and we can transform our data easily. It's easier to design our jobs in datastage and to run them.

What do you dislike about the product?

As a beginner I found using datastage hard. As there are so many functionalities and hence it takes time to get a hang of it. But once you start practicing it, it becomes easy.

What problems is the product solving and how is that benefiting you?

As our organisation handle very large data and to extract, transform and load we need some powerful tool. Hence Datastage is solving our problem by handling it prefectly. And we are easily able to build our ETL jobs.

Simran T.

Review on IBM Infosphere Datastage

February 10, 2023
Review provided by G2

What do you like best about the product?

DataStage helps us to construct a source model that describes the rules for querying the source database. We have used several stages while making Dimension tables and fact table like transformer, lookup, joins etc. Steps are so easy to use that we must drag and drop the stages required for building the tables.

What do you dislike about the product?

The thing that I don't like about IBM Infosphere Datastage application is a plan of it is costly. Also, the Metadata propagation in Jobs is somewhat complex for some users and issues in the processing of XML.

What problems is the product solving and how is that benefiting you?

IBM Infosphere Datastage is used to develop jobs that move data from source systems to target systems using simple steps. It is not only data warehousing, we can also use infosphere for analysis and see the enormous architecture of your OLTP systems

IBM DataStage as a Service

Reviews from AWS customer

External reviews

Unmatched Performance and Reliability for Enterprise Data Workloads

Exceptional Performance and Connectivity with Intuitive Interface

Data Integration and Quality with DataStage

IBM Datastage for ETL

Good product

Analyzing vendor data

Data Stage review

IBM InfoSphere DataStage

Using Datastage for ETL

Review on IBM Infosphere Datastage