Efficient real-time log analysis and resilient adaptation to open source dynamics improve operational workflows
What is our primary use case?
We manage log processing with Grafana because we found that it is much easier for us to manage it on our infrastructure on AWS. We can maintain all the things we do not need. DataDog told us we have to wait and that they need to provide features we need to develop, so it does not suit our needs.
We are still using DataDog, but for important assets that we need to analyze the logs, we send it to Grafana.
The challenges we face with DataDog compared to Grafana include the need to analyze very important brands, network trafficking, and maintaining many websites, most of which are very important domains that cost a lot of money, so we are getting attacked each day and we need to analyze all of the logs. Sometimes we have false positives and things similar to that, so we have to make sure that we are doing the correct decision of blocking or trying to mitigate attacks. Using the logs with Grafana it is much easier for us to analyze rather than DataDog. DataDog has their own language and they want you to plot things with their own vocabulary. We do not have time to memorize things. We especially wanted to use something that was open source at the beginning, and then other people started using it, took that product and modified it for extra cost, but it is a better solution for us.
We switched from DataDog to Grafana because we wanted to reduce the logs costs, as we are streaming approximately five million logs or even less.
What is most valuable?
We can find information with Grafana much more quickly compared to DataDog because it was open source and there was extensive documentation about it, enabling us to fetch data or information much more quickly using AI tools. With DataDog, they always wanted us to have a meeting or talk with us on a call. It was redundant. We just wanted to get to the solution without making a big deal out of it.
Grafana saves us hours compared to DataDog. It takes about two weeks to figure out what is going on with DataDog, but with Grafana, we just started to initialize the service, had a few issues, fixed them, and that was it. I did not have any major problems that forced me to halt everything in my work. It cost me hundreds of hours with DataDog because I needed to see all the documentation and all the special caveats they have there.
What needs improvement?
I would rate Grafana overall as an eight out of ten. It is pretty good, and I would recommend it. I would give it a ten if it were much simpler for users who just want to get a simple objective in Grafana and are not experienced with technical configuration. It would be better if users could simply state they want to see the amount of requests on a graph through an AI implementation. For example, when getting attacked, users should be able to easily filter all requests to a specific site or resource, or identify IPs that were recently attempting access.
I would describe Grafana's ease of use as a necessity to just get things fixed. We are working in a marketing company, and if something does not work great, or people are looking for the solution, I am looking to get a temporary solution until I can fix it properly, and then I can reach support if I cannot fix it for the long-term.
For how long have I used the solution?
I have been using Grafana for quite a long time, but we have only recently started using all of its features.
What do I think about the stability of the solution?
I would rate Grafana's overall stability as much more stable because when it is not working, users are given a much broader oversight of what is not working, rather than talking to DataDog and asking them to check it out. When something in their dashboard does not work, because it is open source, I am able to find all the relative combinations that people are having, making it much easier for me to fix.
What do I think about the scalability of the solution?
The size of our infrastructure affects using Grafana as it really depends on that. We are considered quite small or medium at least, so it is quite easy for us. In terms of our company, the infrastructure is using two availability zones in AWS, one is US East, so we are looking to expand. Currently, we have only two availability zones to deploy, so it is quite easy for us right now.
How are customer service and support?
I do not use Grafana's support for technical issues because I have found solutions on Stack Overflow and ChatGPT helps me as well.
On a day-to-day basis with Grafana, I tend to refrain from using their support, not because of their level of professionalism.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We switched from DataDog to Grafana because we wanted to reduce the logs costs, as we are streaming approximately five million logs or even less.
How was the initial setup?
I initially learned to use Grafana by working locally and seeing how it works and presenting a proof of concept. We started to make a small plan of how we could make the change, how to deploy the switch, making the changes and checking it in other environments that we have. We are working with Agile, so I have to work with development first and then we are staging it. It is a long process but eventually we made the switch quite quickly, it was just a simple day when we deployed it.
What about the implementation team?
Three people manage Grafana in our organization: one is a sysadmin, another one is a DevOps full-time, and I am managing the entire operation.
What's my experience with pricing, setup cost, and licensing?
The costs associated with using Grafana are somewhere in the ten thousands because we are able to control the logs in a more efficient way to reduce it. That is pretty much great for us.
What other advice do I have?
My recommendation for future users of Grafana is that it is all great. I am hoping that all the companies, at least the open-source projects, do not go closed source because then users will have to find something else. That is what we love to do - we love to use open source projects and improve them for our usage, not something that follows an agenda of another company's product.
My recommendation to other users of Grafana is to not be afraid and always look online. I started using it a long time ago for different projects. Users need to learn about the basics before they can really understand what is going on. There are people who start the self-hosted web server and see all the metrics going to their server, but they do not really understand what is happening, they just see a small count of the graph. Users need to understand that configuration is necessary. I experienced this when I started and was clueless, thinking of going to DataDog. Then I realized I just did not understand it correctly - there is much more once you see a bigger horizon of things that you could do.
Overall rating: 8/10
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Dashboard integration and data source setup simplify monitoring tasks
What is our primary use case?
My main use case is monitoring. We use a lot of different products for monitoring, but Grafana is specifically for monitoring Kubernetes. We use Grafana mainly for Prometheus.
What is most valuable?
The features I appreciate most are the dashboards and the integrations with multiple data sources. The feature that sets Grafana apart from its competitors is how easy it is to set up data sources. The integration helps our organization in centralizing and analyzing data from diverse sources.
What needs improvement?
Regarding joining between queries, merging between two queries that give the same information could be simple, and there are some ways to do that, but if there was something even easier, it would be great.
For how long have I used the solution?
The deployment process was completed before I joined the organization. We have been using it for a couple of years, but it was deployed previously.
What do I think about the stability of the solution?
We never had any issues with Grafana at all.
How are customer service and support?
Grafana's customer support is mainly for developers. We didn't need to reach out to them for troubleshooting or any issues.
How would you rate customer service and support?
What other advice do I have?
We centralize all the metrics from Prometheus and also from Graphite and all other data sources. We have dashboards to integrate Grafana's real-time metrics with visualization capabilities. We're not using Grafana's role-based access control and multi-tenancy features.
Seeing the metrics helps in finding issues, such as memory leaks or spikes and some optimization. We use Grafana on a day-to-day basis to get a better look at our environments and the usage of our resources.
We don't use Grafana for alerting, just for visualization. For alerting, we have different tools. On a scale of 1-10, I would rate Grafana as 8.5.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?