Splunk Infrastructure Monitoring provides end-to-end visibility into our cloud-native environments. It is very important for us.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve.
External reviews are not included in the AWS star rating for the product.
Splunk Infrastructure Monitoring provides end-to-end visibility into our cloud-native environments. It is very important for us.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve.
It is digitalized. It has been beneficial for our IT infrastructure.
The security could be better.
I have been using Splunk Infrastructure Monitoring for 11 years.
I usually use the community site. I find that helpful.
We did not use any other solution previously.
We set it up ourselves.
I would rate Splunk Infrastructure Monitoring a ten out of ten.
My customers used the solution for application performance in uptime and networking.
Splunk Infrastructure Monitoring has helped our customer's organization by making troubleshooting easier. The solution helped them have a centralized place where they could dig in across multiple other tools and consolidate all the information in one place.
Splunk Infrastructure Monitoring provided our customers with visibility into their overall infrastructure. They could quickly start identifying where the problems were coming from. If something was going sideways, they could more easily target the specific pathways.
One of our customers was on-premises. The other was a hybrid with on-premises and private cloud.
I was on a team helping them build a brand new tool, which was instantaneous. Another team got it a while ago, and they weren't sure what to do with it. So, we came in and helped them over a six-week engagement. We pivoted them from not feeling like they were getting all that much value to getting good value. It was more of a learning curve situation.
Splunk's unified platform has helped our customers consolidate networking, security, and IT observability tools. I was on the team of a company that was helping build a brand-new monitoring solution. They had probably a dozen separate stand-alone silo tools that could not talk to each other.
Instead of logging on to 12 different places to check each tool individually, Splunk Infrastructure Monitoring helped consolidate everything into a single location for viewing. We didn't get them to the point where they were ready to fully decommission the other systems.
They were going to decommission 12 systems on the six-month game plan. By now, they would have realized the cost savings. It would have been a multimillion-dollar savings for them.
Our customer, with 12 separate systems, was all on-premises. Part of our other customer's footprint was in AWS. It was incredibly easy for our customers to monitor multiple cloud environments using Splunk Infrastructure Monitoring. It was a combination of cloud and on-premises for our customer.
The solution provided them with a single pane of glass where they didn't have to log into multiple places and see everything in a single location. You can develop dashboards that give you cross-platform visibility, which is a huge win.
A wide variety of logging makes log onboarding difficult. Over the years, Splunk has done various things to make it easier, so I want to give them props for that. However, the reality is that every vendor has its own logging format. Some vendors have multiple log formats because they change their own products over time.
They have different log formats for different products in their own suites, and no industry standard makes it chaotic. Splunk is probably the best product out there in terms of how they handle it, but it's not perfect yet. They need to keep pushing that cutting edge and trying to improve it. I have no idea how they could do that because they're trying to wrangle chaos, and it's hard.
I have been using Splunk Infrastructure Monitoring for two years.
I think Splunk Infrastructure Monitoring is a solid product from an infrastructure perspective. I haven't seen any bugs in the tool. Like many things with Splunk, everybody knows there will be patches when there's a core upgrade. However, that's more with Splunk Core and not specifically the Splunk Infrastructure Monitoring part.
The solution's scalability is wonderful. I've worked with customers as small as 25 gigs a day, which is tiny, all the way up to close to a petabyte a day. You have to make sure you scale the tool intelligently, but it's more of a budgetary constraint than a technical one. The solution handles the big ones beautifully if you have the budget to have the needed hardware.
Splunk's technical support has significantly improved in the last year. The support went through a rough patch about a year and a half ago. I had to coerce customers to use it because it was really bad there for a while. Splunk's support has vastly improved recently, and I hope it continues to improve.
Those people who changed the attitude, mindset, and processes need all the accolades because it's so much better than it was. Unfortunately, that does mean that it was really bad at one point.
Splunk's technical support still has some room for improvement in certain areas. Mostly, you can tell the more junior people who just read off of a script and really don't know where to go. I always introduce myself as a consultant to let the support person know that I have already done the basic introductory troubleshooting, and they can skip the first ten pages in their script.
Some frontline people in Splunks' support team are wonderful and clearly have more experience. However, it is still obvious that they occasionally bring in somebody brand new who's a little lost.
I rate the technical support seven and a half to eight out of ten.
Positive
I've worked with Core Splunk as a consultant for seven years and was a customer for seven years before that. So I've seen it all: the good, the bad, the ugly, and everything in between. Usually, the actual building of Splunk is super easy because I've done it so many times. Every customer's environment is unique in terms of how to get the data.
It's more about navigating the local customer's politics and archaic technical debts. Somebody thought that a certain architecture was a good idea ten years ago, but today, that doesn't make any sense whatsoever. Wrangling customer chaos is hard, but the Splunk piece is usually easy.
There's always room for improvement, but Splunk Infrastructure Monitoring is a solid product overall. It definitely helps customers who have a lot of legacy systems that don't work well together.
Overall, I rate the solution an eight out of ten.
We use the solution to do a lot of email checking. We also use the tool to monitor different embassies, server IPs and some of the teams.
Splunk Infrastructure Monitoring has helped our organization tremendously. We have onboarded Splunk for the last four years, and we have 30 to 40 contractors who use Splunk daily. The solution has helped not just a small organization like ours but the whole DOS (Department of State).
The solution monitors attacks or unauthorized access to the information we want to protect. There is a dashboard called ISSO that monitors pretty much everything worldwide. We also monitor almost 300 embassies and consulates.
The solution's machine learning deployment is hard and should be made user-friendly. Even if a team doesn't have a data scientist, they should be able to use the machine learning toolkit for monitoring purposes. The solution should include more algorithms and SPL commands that people can use.
I have been using Splunk Infrastructure Monitoring for four months.
We haven’t faced any issues with the solution’s stability.
Splunk Infrastructure Monitoring is highly scalable. We were able to do monitoring and some of the advanced analytics.
I have not contacted Splunk's technical support. We have contacted our account manager for issues, and she's been awesome.
We have different vendors who do deployments, which is different for the government than regular businesses.
We have seen a return on investment with Splunk Infrastructure Monitoring regarding the kind of threats we can identify.
Splunk Infrastructure Monitoring is an expensive solution.
Our organization monitors multiple cloud environments using Splunk Infrastructure Monitoring, which works well. This is the only tool we use, and we aren't considering moving or having additional tools.
It is important for our organization that Splunk Infrastructure Monitoring has end-to-end visibility into our cloud-native environments. Our job is critical and very sensitive, so having end-to-end visibility is really helpful.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve. Looking at the solution's dashboards has helped tremendously because we don't have to look at the individual index or events.
Our business is different from that of a private organization, and Splunk Infrastructure Monitoring has helped improve our organization's business resilience. The machine learning toolkit allows us to do clustering, and we have a couple of deployments on the clusters. That has helped cluster different events based on their critical or security threats.
We have seen time to value using Splunk Infrastructure Monitoring.
Splunk's unified platform has helped consolidate networking, security, and IT observability tools. We don't have to integrate Splunk with a different tool and worry whether those two will integrate. Having everything in one platform helps us create dashboards, alerts, and monitoring tools in one place.
Overall, I rate the solution an eight or nine out of ten.
We are monitoring our servers and their health. We are monitoring their functionality and supporting the Kubernetes platform.
Our team supports multiple different projects. They all have their own clusters and ways of operating, but we just use one Splunk Infrastructure Monitoring system.
Splunk Infrastructure Monitoring has helped improve our organization’s business resilience.
I have primarily used it to go back into the past and understand why something happened. It provides enough information to do research and figure things out.
One thing I recently ran into was that the logs on the server most often get Gzipped after they have been rotated. We found that we were not monitoring some of the things, so we had to go back and pull them in. Right now, it pulls one at a time, untars it, or unzips it, so I cannot look at the entire history. There can be an improvement in that area.
I have been using Splunk Infrastructure Monitoring for four years.
It is stable.
About a year ago, we added another 600 servers and scaled up. We are getting more in the next year or later this year. It works smoothly.
They are good. I have a ticket open now. I told them to go ahead and close it because we thought it was a hardware issue, but they said that they would keep the case open till the hardware replacement to see if the issue goes away. That was pretty nice.
All of our hardware is HPE-based. We rely mostly on OneView, but it does not give us the service aggregation and other things that Splunk Infrastructure Monitoring is giving us.
One of the gentlemen on other teams came to ours. He is very knowledgeable about Splunk, so he helped with the implementation.
All of our servers are RHEL-based.
A different organization group within our organization had Splunk, and they liked it, so we just went with Splunk.
I would rate Splunk Infrastructure Monitoring a ten out of ten.
I use the solution in my company for our customers who use the tool for auditing and compliance in the area of DoD/AC. My company's customers have compliance controls, and STIG controls that they have to satisfy for their ETL processes.
The tool has helped our customer's organization in achieving compliance control. When our customer's organization has an inspection or when the DoD inspects their infrastructure, they can show their auditors that they are compliant. They can show the auditors the dashboards and verify that they are ingesting data from the sources and how all their hosts are being monitored. They can show everything to auditors, check the box, make sure that everything looks green, and then they continue to have authorization to operate.
The most valuable piece of Splunk Infrastructure Monitoring for our company's customers revolves around the data for everything. Everything produces data, and all the data can get ingested, whether it is Windows, RHEL, VMware products, Pure Storage products, or a custom product. Configuring data ingestion and performing everything in Splunk Infrastructure Monitoring is possible. At the same time, a lot of the other SIEM tools focus on a specific type of data. The benefit of Splunk Infrastructure Monitoring is that one can see all their data in one place.
There is not a lot of support for the tool's on-premises version, especially since everything is on the cloud. In my company, we had a really good demo this morning on Keynote, which touches on the APM part, and it was super cool. There was also a demo on AI assistant, which was super cool. It is hard to increase the options for a particular customer when so much of the stuff is limited to the cloud, and there is so much focus on the cloud part.
I have been using Splunk Infrastructure Monitoring for three years for my customer, who has been using it for longer than when I started to use it.
The tool's stability is great.
The tool's scalability is great. My company just moved Splunk from VMs to containers for our customers, so I would say that we have put it on Kubernetes on Tanzu, which has been great for them.
Support is an area I have not really reached out to on behalf of our customers. I usually just go to Splunk Answers or rely on my colleagues to get what I need. My company has never opened a support ticket with Splunk for our customers.
I don't know what one of my company's customers had used before Splunk Infrastructure Monitoring. They may have used some other solutions, but I have been on contract with them for three years.
In terms of ROI, I can say that I have seen a decreased amount of time spent on our company's end validating data ingested from an auditing perspective, especially when we are talking about their authorization to operate. With the tool, it is much quicker to view all your data in one place than it is to go show an auditor 15 different data sources. You can show it all together to the auditor.
Licensing cost is the biggest argument I get from those divesting from Splunk. There are those within our organization who say we are going to go to other tools since Splunk is too expensive. Till now, I have been able to ask others to look at the value Splunk adds to the company, and I have been able to convince them that it is worth it, but that might not always be the case if licensing continues to be an issue, especially if costs continue the way they are and if other solutions offer more competitive pricing for similar results.
The tool is not used to monitor multiple cloud environments.
It is not important for our company that Splunk Infrastructure Monitoring provides end-to-end visibility into your cloud-native environment.
The tool has helped improve our organization's business resilience.
The tool does the job very well. It is easy for me to use, especially as a trained person in Splunk products. The tool also does the job very well. With the tool in place, I can get Windows or RHEL. I can do things like scripted input on a forwarder. Splunk Universal Forwarder are so much more than if I just use Syslog, for example, to just get data. I can do a lot more with Splunk than just ingesting data via something like Syslog.
I rate the tool an eight out of ten.
We use the solution to monitor and calculate the number of systems, applications, and DR sites we have. Then, if there is any problem, we can detect the information on which server belongs to which application. This really helps us.
We have seen 28% to 29% optimization and performance with Splunk Infrastructure Monitoring. You will know the moment you see any anomaly in the system, the server, or the infrastructure. The solution has given us more visibility not only from the infrastructure or server point of view but also from the network perspective.
Splunk's GUI and dashboard capacity are the most valuable features of Splunk Infrastructure Monitoring.
Compared to Microsoft Azure, Splunk Infrastructure Monitoring can ingest all the log sources. You can ingest all the data in one single source. Then, it accumulates the data, calculates internally, and gives you the right information you're looking for. Splunk Infrastructure Monitoring is the optimal solution, where you can see everything on one screen.
Our organization monitors multiple cloud environments, including GCP (Google Cloud Platform) and AWS (Amazon Web Services).
We're all completely dependent on Splunk's end-to-end visibility into our cloud-native environment to see everything, including any incident that comes.
Splunk Infrastructure Monitoring has helped drastically improve our meantime to resolve, detect, and investigate.
The solution has helped reduce our mean time to resolve by 28%, which is a huge number. We aim to reduce it by 30% to 37%, but that would definitely require some AI concept and new enterprise security. That's our plan for next year.
Splunk Infrastructure Monitoring has helped improve our organization's business resilience. The moment you receive an incident, you have full visibility. You can go deep into the investigation, do threat hunting, and find the root cause analysis. That's the visibility and performance we look for in enterprise security solutions like Splunk.
Splunk's unified platform helps consolidate networking, security, and IT observability tools. When you have multidimensional solutions and a multi-cloud environment, you have specific applications for finance and patient care. You can see everything consolidated in one solution.
DevOps and GRC compliance solutions come into one solution, and visibility extends. That gives you confidence, and we build trust with the business. Businesses are confident when they're going outside. Because we have full visibility, we provide that trust to the patient and my health care entities that we are safe.
The utilization of the use cases is not available. You need to write custom out-of-the-box use cases. There's no standard use case available where you can see the utilization of the number of use cases I have. For example, if you have 200 use cases, do you know if you are utilizing all 200 and if they are actually clicking at the right time?
If I can work 20 use cases out of 200, it is 20% utilization for the use cases. So, I'll focus more on 20% and try to optimize them based on my business requirements rather than focusing on 200.
I have been using Splunk Infrastructure Monitoring for six years.
The solution's scalability is marvelous because we can just add on. We are currently using two TB, and the solution gives us the flexibility to add an extra 500 GB next month.
Sometimes, we face technical difficulties because of the limitations of the connectors. Integrating Splunk with post-relational databases like InterSystems is challenging because such applications or databases are not very much publicly exposed. The technical team faces a lot of challenges when integrating because they need to write some custom connectors to integrate the data.
We have some clinical applications specific to a particular specialty, and you have different applications and databases for that. For that, you need to write custom connectors. Sometimes, the technical team lingers on and passes the time because they're also exploring.
I rate the solution's technical support seven and a half out of ten.
Neutral
We previously used a different solution called RSA. We switched to Splunk because RSA was not providing the latest changes and many of the upgrades we were expecting. Also, a lot of functionality we were expecting, like XDR, optimization processes, and connectors, was not available. We used RSA for four and a half years. RSA had performance issues, and a lot of use cases were not met because it was an old solution.
We had a system integrator who initially helped us integrate and deploy the solution. They helped us to deploy the solution, and we take their help to develop any new use cases.
We have seen a return on investment with the solution. Our KPIs have become smooth. When we have more visibility, our KPIs definitely increase. We can easily measure meantime to detect and meantime to resolve. You will definitely be up to the mark when your incident response capability increases. Our performance has increased. Our IT environment and DevOps team have more visibility and are more transparent now.
The solution's pricing is costly. We're now looking for a cloud version that would have a completely different pricing calculation.
Splunk Infrastructure Monitoring has use case capability, visibility capability, and performance. It also has a vast dashboard capability that no other solution currently provides. There are many solutions in the market, but Splunk stands out separately. With Splunk Infrastructure Monitoring, you can correlate data and ingest any kind of data with your connectors. Flexibility is another important functionality of Splunk.
Overall, I rate the solution an eight out of ten.
Splunk Infrastructure Monitoring helps identify bottlenecks within the network domain, including issues related to server databases, application response times, and code. These problems can be resolved by our customers promptly.
It is easy to use. It offers a unique dashboard reporting tool called Ollie. Ollie is essentially an observability tool, and it's also referred to simply as "Ollie" for brevity. It's important to note that this product is agent-based only.
Splunk Infrastructure Monitoring helps improve the efficiency and performance of applications by up to 70 percent.
It has helped reduce our mean time to detect. It has helped to reduce our mean time to resolve by around 50 percent.
Splunk helps us focus on business-critical initiatives.
It integrates well with multiple sets of products.
The vibrant dashboards are valuable.
The main drawback of Splunk for network monitoring is its limited agent deployment. Splunk excels at collecting data from servers and databases where agents can be installed. However, it cannot directly monitor network devices, unlike Broadcom.
Broadcom offers Spectrum and Performance Management tools that primarily work on SNMP to collect data from network devices. Splunk doesn't have a directly comparable functionality for network devices.
While Splunk offers a wider range of data collection, including metrics, logs, and more, it can be more expensive. Splunk's licensing model is based on data volume (terabytes) rather than the number of devices. This can be costlier compared to Broadcom or similar tools, which often use device-based licensing.
The end-to-end visibility is lacking because Splunk cannot directly monitor network devices.
Broadcom provides a topology-based root cause analysis that is not available with Splunk.
I have been using Splunk Infrastructure Monitoring for 10 years.
Splunk Infrastructure Monitoring is stable.
Splunk deployment is simplified because it is cloud-based. The deployment takes no more than 15 days to complete.
Splunk's infrastructure monitoring costs can be high because our billing is based on data volume measured in terabytes, rather than the number of devices being monitored.
Replacing legacy systems with Splunk could cost up to $200,000.
I would rate Splunk Infrastructure Monitoring 7 out of 10.
The decision to move from another infrastructure monitoring solution to Splunk should be based on a customer's specific needs. While Splunk offers visually appealing dashboards and access to a wider range of data compared to Broadcom products, pricing can be a significant factor, especially in the Indian market.
Deploying Splunk for a customer can involve higher upfront infrastructure costs. This is because implementing Splunk effectively often requires writing custom queries to filter data and optimize license usage. While this approach minimizes licensing costs, it can be labor-intensive.
We use Splunk APM to understand and know the inner workings of our cloud-based and on-premises applications. We use the solution mainly for troubleshooting purposes and to understand where the bottlenecks and limits are. It's not used for monitoring purposes or sending an alert when the number of calls goes above or below some threshold.
The solution is used more for understanding and knowing where your bottlenecks are. So, it's used more for observability rather than for pure monitoring.
The solution's service map feature allows us to have a holistic overview and to see quickly where the issues are. It also allows us to look at every session without considering the sampling policy and see if a transaction contains any errors. It's also been used when we instrument real use amounts from the front end and then follow the sessions back into the back-end systems.
Splunk APM should include a better correlation between resources and infrastructure monitoring. The solution should define better service level indicators and service level objectives. The solution should also define workloads where you can say an environment is divided up by this area of back end and this area of integration. The solution should define workloads more to be able to see what is the service impact of a problem.
I've been using Splunk APM in my current organization for the last 2 years, and I've used it for 4-5 years in total.
Splunk APM is a remarkably stable solution. We have only once encountered an outage of the ingestion, which was very nicely explained and taken care of by the Splunk team.
I rate the solution a 9 out of 10 for stability.
Around 50 to 80 users use the solution in our organization. The solution's scalability fits what we are paying for. On the level of what we pay for, we have discovered both the soft limit and the hard limit of our environment. I would say we are abusing the system in terms of how scalable it is. Considering what we are paying for, we are able to use the landscape very well.
We have plans to increase the usage of Splunk APM.
Splunk support itself leaves room for improvement. We have excellent support from the sales team, the sales engineers, the sales contact person, and our customer success manager. They are our contact when we need to escalate any support tickets. Since Splunk support is bound not to touch the consumer's environment, they cannot fix issues for us. It's pretty straightforward to place a support ticket.
Positive
We have previously used AppDynamics, Dynatrace, and New Relic. We see more and more that Splunk APM is the platform for collaboration. New Relic is more isolated, and each account or team has its own part of New Relic. It's very easy to correlate and find the data within an account. Collaborating across teams, their data, and their different accounts is very troublesome.
With Splunk APM, there is no sensitivity in the data. We can share the data and find a way to agree on how to collaborate. If two environments are named differently, we can still work together without infecting each other's operations.
If you're using the more common languages, the initial deployment of Splunk APM is pretty straightforward.
The solution's deployment time depends on the environment. If the team uses the cloud-native techniques of TerraForm and Ansible, it's pretty straightforward. The normal engagement is within a couple of weeks. When you assess the tool they need and look at the architecture and so on, the deployment time is very, very minimal. Most of the time spent internally is caused by our own overhead.
We have a very good conversation with our vendor for Splunk APM. We have full transparency regarding the different license and cost models. We have found a way to handle both the normal average load and the high peak that some of our tests can cause. Splunk APM is a very cost-efficient solution. We have also changed the license model from a host-based license model to a more granular way to measure it, such as the number of metric time series or the traces analyzed per minute.
We have quite a firm statement that for every cost caused within Splunk, you need to be able to correlate it to an IT project or a team to see who the biggest cost driver is. As per our current model, we are buying a capacity, and we eventually want to have a pay-as-you-go model. We cannot use that currently because we have renewed our license for only one year.
We are using Splunk Observability Cloud as a SaaS solution, but we have implemented Splunk APM on-premises, hybrid, and in the cloud. We are using it for Azure, AWS, and Google. Initially, the solution's implementation took a couple of months. Now, we are engaging more and more internal consumers on a weekly basis.
We implement the code and services and send the data into the Splunk Observability Cloud. This helps us understand who is talking to whom, where you have any latencies, and where you have the most error types of transactions between the services.
Most of the time, we do verification tests in production to see if we can scale up the number of transactions to a system and handle the number of transactions a business wants us to handle at a certain service level. It's both for verification and to understand where the slowness occurs and how it is replicated throughout the different services.
We can have full fidelity and totality of the information in the tool, and we don't need to think about the big variations of values. We can assess and see all the data. Without the solution's trace search and analytics feature, you will be completely blind. It's critical as it is about visibility and understanding your service.
Splunk APM offers end-to-end visibility across our environment because we use it to coexist with both synthetic monitoring and real user monitoring. What we miss today is the correlation to logs. We can connect to Splunk Cloud, but we are missing the role-based access control to the logs so that each user can see their related logs.
Visualizing and troubleshooting our cloud-native environment with Splunk APM is easy. A lot of out-of-the-box knowledge is available that is preset for looking at certain standard data sets. That's not only for APM but also for the available pre-built dashboards.
We are able to use distributed tracing with Splunk APM, and it is for the totality of our landscape. A lot of different teams can coexist and work with the same type of data and easily correlate with other systems' data. So, it's a platform for us to collaborate and explore together.
We use Splunk APM Trace Analyzer to better understand where the errors originate and the root cause of the errors. We use it to understand whether we are looking at the symptom or the real root cause. We identify which services have the problem and understand what is caused by code errors.
The Splunk Observability Cloud as a platform has improved over time. It allows us to use profiling together with Splunk Distribution of OpenTelemetry Collector, which provides a lot of insights into our applications and metadata. The tool is now a part of our natural workbench of different tools, and it's being used within the organization as part of the process. It is the tool that we use to troubleshoot and understand.
Our organization's telemetry data is interesting, not only from an IT operational perspective but also to understand how the tools are being used and how they have been providing value for the business. It is a multifaceted view of the data we have, and it is being generated and collected by the solution.
Splunk APM has helped reduce our mean time to resolve. Something that used to take 2-3 weeks to troubleshoot is now done within hours. Splunk APM has freed up some resources if we are going to troubleshoot. If you spend a lot of time troubleshooting something and can't find a problem, we cannot close the ticket saying there's no resolution. With Splunk APM, we can now know for sure where we have the problem rather than just ignoring it.
Splunk APM has saved our organization around 25% to 30% time. It's a little bit about moving away from firefighting to be preventive and estimate more for the future. That's why we are using it for performance. The solution allows us to help and support the organization during peak hours and be preventative with the bottlenecks rather than identify them afterward.
Around 5-10 people were involved in the solution's initial deployment. Integrating the solution with our existing DevOps tools is not part of the developer's IDE environment, and it's not tightly connected. We have both subdomains and teams structured. Normally, they also compartmentalize the environment, and we use the solution in different environments.
Splunk APM requires some life cycle management, which is natural. In general, once you have set it up, you don't need to put much effort into it. I would recommend Splunk APM to other users. That is mainly due to how you collaborate with the data and do not isolate it. There is a huge advantage with Splunk. We are currently using Splunk, Sentry, and New Relic, and part of our tool strategy is to move to Splunk.
As a consumer, you need to consider whether you are going to rely on OpenTelemetry as part of your standard observability framework. If that is the case, you should go for Splunk because Splunk is built on OpenTelemetry principles.
Compared to other tools using proprietary agents and proprietary techniques, you may have more insights into some implementations. However, you will have a tighter vendor lock-in, and you won't have the portability of the back end. If you rely on OpenTelemetry, then Splunk is the tool for you.
Overall, I rate the solution a 9 out of 10.
We use Splunk APM for performance testing.
Splunk offers end-to-end visibility across our environment.
Splunk APM simplifies application performance monitoring. It also provides insights into data quality, including data security, integration, ingestion, and versioning of trace logs. We can directly inject data for monitoring purposes, trace the data flow, and monitor metric values.
Splunk can ingest data in any format, allowing us to easily monitor logs and identify blockages through timestamps, which saves us time.
The most valuable feature is dashboard creation. This allows us to easily monitor everything by setting the data we want to see. For example, imagine we're working on a project within the application. There might be different environments, such as development, testing, and production environments. In the production environment, we can use dashboards to monitor customer activity, like account creation or other user data. This gives us a clear view of how transactions are performing and user response times. This dashboard creation feature is one of the most beneficial aspects of Splunk that I've used in a long time. While Splunk offers many features, including integration with various DevOps tools, its core strength lies in data monitoring and collection.
Splunk's functionality could be improved by adding database connectors for other platforms like AWS and Azure.
I have been using Splunk APM for one year.
We previously used a legacy application for monitoring and when it was decommissioned we adopted Splunk APM.
Splunk offers a 14-day free trial and after that, we have to pay but the cost is reasonable.
I would rate Splunk APM eight out of ten.
Splunk APM requires minimal maintenance and can be monitored by a team of three.
We use Splunk APM to monitor the performance of our applications.
Splunk APM offers end-to-end visibility across our entire environment. We need to control how many types of metrics are ingested by Splunk APM from all incoming requests. While we allow some metrics to be collected, Splunk APM provides the ability to track each request from its starting point to its endpoint at every stage.
Splunk APM trace analyzer allows us to analyze a request by providing its trace ID. This trace ID gives us a detailed breakdown of how the request entered the system, how many services it interacted with along the way, and its overall path within the system. We can also identify any errors that occurred during the request's processing and track any slowness or latency issues. This information is very helpful for troubleshooting performance problems in our application.
Splunk APM telemetry data has been incredibly valuable. While we faced challenges with Splunk Enterprise, such as the lack of a trace analyzer, Splunk APM's user interface is modern and highly flexible. The wide range of data it provides has significantly improved our incident response times, allowing us to quickly create alerts and adhere to the infrastructure as code principle. Splunk APM also proves beneficial during load testing, contributing to a positive impact on our overall infrastructure performance analysis.
Splunk APM helps us reduce our mean time to resolution. With its fast and accurate alerting system, we can quickly identify the exact location of issues. This pinpoint accuracy streamlines the investigation process, leading to faster root-cause analysis.
Splunk APM has helped us save significant time. We're now spending less time resolving production incidents and analyzing performance data. This focus on Splunk APM allows us to dedicate more time to other areas.
Detectors are a powerful feature. They create signal flow code in a format similar to Splunk APM language. For example, if we select five conditions, the detector can automatically generate the code for that signal flow. This code can then be directly integrated into our Terraform modules, streamlining the creation of detectors using Terraform. This is particularly helpful because our infrastructure adheres to a well-defined practice, and detectors help automate this process.
APM dashboards are another valuable tool. They provide more comprehensive information than traditional spotlights. One particularly useful feature is the breakdown of a trace ID. This breakdown allows us to see the entire journey of a request, including where it originated, any slowdowns it encountered, and any issues it faced. This level of detail enables us to track down the root cause of performance problems for every request.
We currently lack log analysis capabilities in Splunk APM. Implementing this functionality would be very beneficial. With log analysis, we could eliminate our dependence on Splunk Enterprise and rely solely on APM. The user interface design of APM seems intuitive, which would likely simplify setting up log-level alerts. Currently, all log-level alerting is done through Splunk Enterprise, while infrastructure-level alerting has already transitioned to Splunk APM.
The Splunk APM documentation on the official Splunk website could benefit from additional resources. Specifically, including more examples of adapter creation and management using real-world use cases would be helpful. During our setup process, we found the documentation lacked specific implementation details. While some general information was available on public platforms like Google and YouTube, it wasn't comprehensive. This suggests that others using Splunk APM in the future might face similar challenges due to the limited information available on social media. It's important to remember that many users rely on social media for setup guidance these days.
I have been using Splunk APM for 1.5 years.
While Splunk APM occasionally experiences slowdowns, it recovers on its own. Fortunately, these haven't resulted in major incidents because most maintenance is scheduled for weekends, with ample notice provided in advance. We have never experienced any data loss that occurred during previous slowdowns.
Splunk APM customer support is helpful. They promptly acknowledge requests and provide regular updates. They've been able to fulfill all our information requests so far. However, Splunk APM is a constantly evolving product. This means there are some limitations due to ongoing industry advancements. They are actively working on incorporating customer feedback, such as the CV request. Overall, the customer support is excellent, but the desired features may not all be available yet.
Positive
Previously, we used Grafana, but we faced challenges that led us to switch to Splunk APM. Since then, Splunk has become our primary tool for data analysis. In our experience, Splunk offers several advantages over Grafana. Setting up and using Splunk is significantly easier than Grafana. Splunk provides a user-friendly interface that allows anyone to start working immediately, while Grafana's setup can be more complex. Splunk also boasts superior reliability. Its architecture utilizes a master-slave node structure, with the ability to cluster for redundancy. This ensures that if a node goes down, another available node automatically takes over, minimizing downtime. Ultimately, our decision to switch to Splunk was driven by several factors: user-friendliness, a wider range of features, cost-effectiveness, and its established reputation. Splunk is a globally recognized and widely used tool, which suggests a higher level of trust and support from the industry.
We use Splunk Enterprise and Splunk APM. Splunk APM offers a comprehensive view of various application elements. We primarily migrated to APM to gain application-level metrics. This includes latency issues, which are delays in processing user requests. Splunk APM generates a unique trace ID for each user request. This allows us to track the request from the user to our servers and identify any delays or errors that occur along the way.
Additionally, Splunk APM utilizes detectors to create alerts based on specific metrics. We've implemented alerts for CPU and memory usage, common issues in our Kubernetes infrastructure. We can also track container restarts within the cluster and pinpoint the causes. Another crucial area for us is subscription latency. Splunk APM allows us to monitor this metric and identify any performance bottlenecks. This capability was absent in Splunk Enterprise, necessitating the switch to APM. Furthermore, Splunk APM enables us to track application status codes, such as 404 errors.
Splunk APM facilitates the creation of informative dashboards using collected metrics. Additionally, the Metrics Explorer tool allows us to investigate specific metrics of interest and generate alerts or customized spotlights.
Spotlights are tailored visualizations that track metrics for critical application areas. They can trigger alerts based on unexpected changes, such as a sudden increase in error codes over a set timeframe. This provides a more proactive approach to identifying potential issues compared to traditional detector-based alerts.
Splunk APM empowers us to effectively monitor various metrics during load testing. This includes analyzing memory usage across ten to eleven metrics, tracking container restarts during flow testing, and verifying the functionality of auto scaling mechanisms. The comprehensive visualization capabilities of Splunk APM surpass those of Splunk Enterprise, making it ideal for analyzing large sets of metrics and graphs.
We're currently exploring the integration of an OpenTelemetry agent with Splunk APM. This will enable us to collect and transmit a wider range of data, including application metrics, latency metrics, and basic infrastructure metrics such as CPU, memory, etc.
During the initial Splunk deployment, I found that most information available on social media platforms catered to enterprise deployments. Fortunately, many of our new hires had prior Splunk experience, which eased the initial learning curve. Splunk's widespread adoption across industries also meant there was a general familiarity with the tool among the team. Additionally, the comprehensive documentation proved helpful. Overall, the initial rollout went smoothly, though there were some challenges that we were able to resolve.
The Splunk deployment was done on multiple environments. We started with development and then deployed to a staging environment, which sits between development and production. As expected, the development deployment took the longest. The total time for the entire deployment, including my cloud setup, was 2 to 3 weeks. It's important to note that this timeframe isn't solely dependent on Splunk implementation. Other factors can influence the timeline, such as network requests, firewall changes, and coordination with IT teams for license purchases. While the development deployment took longer, promoting Splunk to the staging and production environments was significantly faster. It only took 1 week for each environment.
Our cloud deployment didn't require a consultant, but we used one for our on-premise enterprise deployment, which was a bit more complex.
I would rate Splunk APM 9 out of 10.
The maintenance required is minimal because the cluster deployment helps ensure there is always 1 node working.