Standardized data pipelines have streamlined ETL workflows but still need clearer logs and UI
What is our primary use case?
My main use case for Apache NiFi is to use it for basic ETL pipelines before ingesting data into the data lake. For example, if we want to ingest anything into Snowflake, before actually moving it to Snowflake, we do basic data massaging and transformations through Apache NiFi. This may include consuming a file, converting that file from one format to another, breaking it into chunks, and then pushing it to Snowflake.
A specific example is that we consume all caller data through Apache NiFi by connecting to Kafka topics and consuming those messages. We then convert those messages into JSON format and push those messages to Snowflake by running Snowflake procedures, which eventually ingest the data directly into Snowflake by reading it through an S3 bucket. We also push the actual JSON messages to S3 buckets through Apache NiFi itself.
What is most valuable?
The best features Apache NiFi offers include the integration capability because we have use cases wherein we have been using Apache NiFi to integrate through and consume from multiple sources. It can connect to any APIs, NAS drives, and databases, and it can consume streaming data. We can connect to Kafka topics and queues as well through Apache NiFi. First of all, there is flexibility in consuming from multiple sources. We can also easily transfer information from one block to another. We have command and control on the controller services that we can maintain separately so that we do not have to repeat connections and create multiple connections in different processor groups.
Apache NiFi has positively impacted our organization by significantly helping us streamline the processes. Earlier, each team was creating their own ETL pipelines with no standard being followed. Apache NiFi gave us the opportunity to streamline that process. Each team can request access to Apache NiFi and be onboarded separately based on their needs. We have specifically standardized the pipeline design so that no one can deviate from that. For example, everyone is supposed to use specific variables when enabling the alerting system. We have a different tool called Moogsoft that everyone can onboard to using the InvokeHTTP processor of Apache NiFi to send their alerts to a centralized system which can generate incidents. No one is now working in silos, and there is a specific pattern being followed by everyone. Furthermore, everyone must have Apache NiFi configured so that they can consume from a specific source but eventually push the data to our strategic cloud partner, AWS, first into an AWS bucket which is common for all, and from that bucket, subsequent processing will be done in Snowflake. The standard being followed now once Apache NiFi has been introduced has allowed others to copy-paste successfully created pipelines, saving a lot of time in our overall software development life cycle by reducing redundancy and efforts.
What needs improvement?
A couple of improvements for Apache NiFi would be better logging. Sometimes when looking at logs and events, they do not always make sense. A strong recommendation is that there has to be improved logging in Apache NiFi. Secondly, when looking at the file states, the history of processed files should be more readable so that not only the centralized teams managing Apache NiFi but also application folks who are new to the platform can read how a specific document is traversing through Apache NiFi.
For how long have I used the solution?
I have been using Apache NiFi for almost four and a half years now.
What do I think about the scalability of the solution?
Apache NiFi's scalability is great; we can easily scale into it without encountering any challenges.
How are customer service and support?
Customer support for Apache NiFi has been excellent, with minimal response times whenever we raise cases that cannot be directly addressed by logs. The support team has consistently provided great assistance with processor failures and helped us create ad-hoc processors as needed.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We did not use any previous solution before Apache NiFi. Apache NiFi has significantly helped us improve our processes and streamline them, while we are also using SnapLogic for integration purposes in parallel.
I am not aware of any other solutions that were explored before onboarding Apache NiFi. Being from the analytics team, we were earlier using Tableau Prep for ETL and transformations and Informatica as well, but I am unsure if anything else was explored in the organization prior to using Apache NiFi.
What was our ROI?
Apache NiFi provides huge relief for all teams with similar use cases for ETL purposes, and it supports not just ETL but also ELT, allowing us to save significant time.
What's my experience with pricing, setup cost, and licensing?
I cannot comment on the pricing, setup cost, and licensing for Apache NiFi, but I can say there was significant time saved across the development life cycle due to reusable pipelines offered by Apache NiFi.
What other advice do I have?
The reason I rate Apache NiFi a seven is that most development folks are afraid to start using Apache NiFi because, to begin with, it does not always directly make sense. For example, there are other integration tools such as SnapLogic where you can simply search for a specific processor, but that is not the case with Apache NiFi. You need some basic understanding of which processors are there before you can fetch what you need. Better UI design should allow newcomers to search using relevant keywords, such as API, to retrieve appropriate processors, which currently is not happening, requiring some understanding of Apache NiFi. The other challenge I mentioned is having better logging, especially for processor-related logs to help newcomers navigate effectively.
I think Apache NiFi is a great tool that one should definitely explore. It is essential to perform basic checks regarding requirements, but if someone is looking for ETL and ELT functionalities needing to connect to CSVs, JSONs, Excels, and databases, they can onboard to Apache NiFi. It offers great connectivity for consuming or pushing data through queues and cloud workloads. Overall, it is an excellent product, especially for basic data massaging and processing before pushing data to Snowflake or creating reports. I rate Apache NiFi a seven out of ten.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)