We use Splunk in APM to monitor our applications. So, we integrated it into our systems to enhance our monitoring and observing capabilities, especially for our microservices.
So, we have used Splunk APM for this.
External reviews are not included in the AWS star rating for the product.
We use Splunk in APM to monitor our applications. So, we integrated it into our systems to enhance our monitoring and observing capabilities, especially for our microservices.
So, we have used Splunk APM for this.
APM integrates well with Splunk’s other observability solutions. These logs with application performance monitoring can significantly impact our business in several positive ways, like troubleshooting and root cause analysis.
Using these logs with APM, we can collaborate performance metrics with log data. It allows us to pinpoint the exact cause of issues, such as identifying specific errors in the logs. Because of this, we have access to faster resolution and detailed logs alongside performance metrics, enabling quicker diagnosis and resolution of problems. It also helps us minimize downtime and improve system reliability.
Additionally, it improves our performance optimization with detailed insights and analyzing historical log data along with APM metrics. This allows us to understand long-term trends and make informed decisions about performance improvements and better user experience, like error reduction and proactive monitoring.
Splunk has reduced our mean time to resolution by 30%.
If there is any issue in Splunk; we’ll identify the issue first and look for the error messages, like alerting with the Splunk user interface or in the logs that might indicate what the issue is and then determine which part of Splunk is affected.
Then, we’ll refer to the Splunk official documentation and check the system's health. We’ll review the logs. By following these steps, I can resolve the issues with Splunk, ensuring that our monitoring and analytics capabilities remain effective.
Mainly, I like Splunk APM because it will show the errors compared to other tools. We use the dashboards to monitor our applications. It will tell us the errors, and we can solve them quickly.
I have used APM but haven’t used Trace Analyzer, though I have some knowledge of it. We are able to implement it. We have some Trace Log Points in Splunk APM to catch the errors. We have a special graph for it where we can see the red points.
We use OpenTelemetry. OpenTelemetry and Splunk APM are similar in terms of observability and monitoring. We use it for observability standardization, which allows us to collect traces and metrics, making it easier to work with different monitoring tools, including Splunk APM. It is more flexible because it allows us to instrument our applications without being locked into a specific monitoring vendor.
It supports collecting traces, metrics, and logs from our applications, providing a comprehensive view of our performance and health endpoints. This data can be fed into Splunk APM, giving us in-depth analysis and insights about our application.
Splunk APM is a robust tool with many capabilities. There are always areas for potential improvement to enhance its functionality and user experience.
For Splunk APM, there could be simplified navigation, like streamlining the user interface to make navigation more intuitive for our users, especially those new to APM, which can enhance usability. We can provide more customization options for dashboards and visualizations to help users tailor the platform to their specific needs.
There could be more integration capabilities with a wider range of third-party tools and platforms would also be beneficial. By focusing on these areas, Splunk APM can enhance its value proposition, improve user satisfaction, and better meet the evolving needs of organizations monitoring their application performance.
I have been using it for a year.
I never had an issue with the stability. It worked fine.
My team has used alternatives to Splunk APM, like Datadog and New Relic.
The initial setup was easy. To fully deploy it, we had to add some signal effects into our applications and just deploy it. It took like 20 minutes. That’s it.
We took some help from our teams and my senior manager and also from other teams across our company. We connected and did all this together.
For deployment, one person can do it, actually, but as we are junior developers, we took help from our senior manager, like three to four people.
Splunk is good like this now. I don’t think any updates would be required, but there are some regular updates and upgrades of Splunk APM, like software updates, version upgrades, and all.
These provide more powerful monitoring capabilities and help ensure the system remains reliable, secure, and aligned with organizational needs. Regular updates, performance tuning, and proactive management help in maximizing these benefits of the Splunk solution.
We’ll see the results after the deployment. It’s not that late, and that’s the reason we are using Splunk APM.
Splunk made our job easier in a way. It will give the points when we use any dashboards, and there are no delays in everything, like performance. It will give the error issues very clearly, and it will monitor 24/7. It will show the issues, and it is very effective. It will pinpoint the exact cause of the issues, and it will help us troubleshoot the issues very fast.
It benefits the IT staff in other teams, like operations, improves efficiency, and manages the IT environments more effectively. When it centralizes the logs and search analytics, the powerful capabilities allow IT teams to perform in-depth troubleshooting, identify root causes, and analyze complex issues with ease.
Splunk also provides real-time visibility into IT infrastructure, and we have connected with cross-functional teams around our team to work with Splunk APM. It supports proactive management, enhances security, and improves operational efficiency. It facilitates better collaboration across the team.
The pricing is based on several factors, including the scale of deployment. The pricing model typically includes considerations like the number of hosts, features, and capabilities.
Overall, I would rate the solution a nine out of ten.
We use Splunk to monitor some devices in the company. We have several cloud groups for monitoring the energy companies in the state. The stack has several devices to monitor if you have a problem. There is a mixture of solutions.
The solution monitors the system in real-time. We can find the resources and investigate security incidents. Splunk and another solution, AppDynamics, monitor several devices.
We integrate Splunk with a data collection solution, and it plugs in the users to collect data at several points in the network and infrastructure. The data is indexed in Splunk, which can be visualized in different dashboards. Monitoring for fraud is critical for the company because you have to resolve many problems in the infrastructure with federal information in the dashboard.
The company has many systems that the customer pays to access. Splunk APM issued via AppDynamics helps find problems in the feed. It reduces the risk of supervising all the devices. I can supervise the flow and simulate the conditions of the repository across several dashboards to show what's happening at the moment.
The dashboards are used mainly to visualize information about the infrastructure, but it isn't easy to construct or use the dashboards. While we tried to resolve the issue by calling support, it would be easier if they had an AI co-pilot to identify the problem and help you solve it.
I have been using Splunk APM.
Splunk APM isn't easy to scale because you have to follow the steps and implement best practices, which can be a little awkward.
I rate Splunk support 10 out of 10. We had good documentation, and the support team at Splunk has a lot of experience with code and the tool.
Positive
I haven't had any problems deploying Splunk. When I installed Splunk for the first time, I thought the product line was complex because I had to build the solution. After working on it for a while, it has become easier to do the solution next time.
Splunk APM is a crucial tool because it controls all the systems and solves a lot of problems.
I rate Splunk APM 8.5 out of 10. It's an excellent solution.
We utilize Splunk APM for security purposes, monitoring all transactions within the organization to prevent potential attacks. Additionally, we leverage Splunk APM to analyze application logs, gaining insights into application behaviour and facilitating a reduction in Mean Time To Resolution should any issues arise in the production environment.
OpenTelemetry provides more accurate information about an application by combining views from the customer perspective, infrastructure metrics, and application-specific data. This holistic view enables full telemetry observability, allowing us to analyze and strategize effectively for our company or clients.
Once configured correctly, the analysis reporting the Splunk APM provides is better than that of the other APM tools. Once the correct fields are defined, we can create different report dashboards.
Splunk isn't an ideal tool for application performance management due to the extensive setup required. It necessitates various configurations to gather diverse information from applications, networks, or other sources. Creating the right tables and defining the appropriate fields to extract comprehensive data involves a significant amount of setup within the tool. Managing this process can be quite challenging. However, once configured, the collected information is invaluable, although not easily manageable.
Splunk falls short compared to other APM tools such as AppDynamics or Datadog. It does not collect online information in real time and relies heavily on log files. Unlike Datadog, which collects real-time application behaviour data like CPU, memory, load, and response time, Splunk requires additional configuration to obtain similar information. This makes using Splunk for APM purposes significantly more difficult compared to the automatic data collection capabilities of AppDynamics or Datadog.
I have been using Splunk APM for more than a decade.
Splunk APM lacks scalability, requiring the administrator to constantly monitor or create specific alerts to ensure sufficient disk space, CPU, and memory for data collection and transaction processing. This results in a tool that is challenging to manage and costly to maintain.
Splunk support is responsive and provides quick resolutions when tickets are opened. Their service has left a positive impression on me.
Positive
The initial deployment is complex, requiring the definition of the switch, storage, correct host, and working with certification. This necessitates at least one expensive specialist, costing approximately $5,000 per month to hire and work with our team.
Splunk APM is expensive. Even before we begin, we need substantial infrastructure investment to collect comprehensive logs. For example, to gather log data, we must create specific tables in Splunk, starting at 50 gigabytes. In a cloud environment, this storage requirement becomes very costly.
I would rate Splunk APM six out of ten.
Cisco recently acquired Splunk, and its roadmap for the coming year includes incorporating aspects of Splunk into AppDynamics. Cisco's intention behind combining these two tools is to showcase its commitment to open telemetry and comprehensive observability to the market and its customers.
I use the solution in my company primarily for distributed tracing and metrics troubleshooting. I use the tool to troubleshoot incidents and find the root cause of errors when something goes wrong. I also personally use it to have a developer's understanding of what is going on in my application. Sometimes, there is a case where you might put your application in a library or a new library, and that library also makes calls somewhere. Splunk APM's monitoring can show you that there is a call you are making now that you never used to make in the prior version of the library. In these cases, which you may not know just by looking at the external view of the application code, the tracing part traces everything, including the lowest types of supports.
The main benefit of the tool I have noticed in the solution is reduced time for the resolution of incidents. The meantime to resolve can help pinpoint the root causes of the issues because you see the connections on the graph in Tag Spotlight. It is easier to pinpoint who is responsible for the incident, especially when you have a larger organization. You have teams that ride services where they need to talk to different services from different teams rather than having to hand off instant resolutions from one team to another. You can often find it much more quickly from the first instance of the problem occurring with the product in place. The tool specifically helps your sites move up more frequently, and then when it does go down, the solution finds the root cause and gets it back up as fast as possible.
The most valuable feature of the solution, and my favorite, is always Tag Spotlight, especially considering the way they slice and dice all of Splunk APM's traces by span attributes.
I like the tool because it looks at a whole set of traces in aggregate, which means that it can find statistical similarities between different traces. Often, the cases are such that you will find some traces that show an error and have some other common attribute, which is much more apparent when you look at the feature known as Tag Spotlight rather than just looking at an overall metric. I like Tag Spotlight as it is one of the most simple to use features.
The meantime to resolve, or MTTR, can help pinpoint the root causes of the issues because you see the connections on the graph in Tag Spotlight. I don't personally have metrics associated with MTTR. I am more of the implementer of making certain that all the data is going in and looking at the debugging part. I am not a part of the set of people who keep track of the tool's MTTR.
In our company's case, we have reasonably good metrics related to the meantime to detect. I can't get a rough number when it comes to the meantime to detect, so I don't know for sure. My guess is that we often detect problems reasonably well. Our company figures out that there is some problem, but we just don't know where it is, so I feel that if there is an improvement, then it is mostly in the area of meantime to resolve. When it comes to the meantime to detect, I think our existing metrics are probably sufficient, and adding Splunk APM makes it much easier to detect the resolution time.
The tool has improved our organization's business resilience. In terms of resilience, in the tool, it is possible not to have downtime and make certain things up and running. The faster you get to web pages working again, the more people can actually do things that they want to do, such as trade players on their NFL Fantasy teams. In general, it gives out a better business result.
In our company's case, we have some very high throughput services, so they might be getting 10,000 requests per second. Currently, Splunk APM and Splunk Observability want to do things in a way that wants you to send every single span for every single request that is a part of the 10,000 requests per second. The process may give you all the data in the back end, but a lot of data, including CPU memory and network costs, is involved in sending data to Splunk. My feeling is that it would be nice if there were an easier way to send only a sample of my traces, which means that I send 10 percent or 5 percent, and then Splunk would extrapolate on the back end. It is obvious that with 10 percent of traces, the real metrics are something like ten times with a plus or minus margin of error. I am okay with the plus or minus margin of error because I think when you have a high enough request rate, you will see such problems appear even in a lower sample population. The process is political polling. You don't call all 150,000,000 people in the US and ask them who they are going to vote for, and I feel it is better if you choose to take a sample of maybe 10,000 and then extrapolate your findings to the rest. I feel the same should be applicable to trace something in Splunk APM.
I have been using Splunk APM for two years.
I really haven't noticed anything going wrong with the tool's stability, and I haven't seen any downtime. I don't know if my company is necessarily measuring the stability part by ourselves, but at least for me, it is a pretty growing and solid solution.
There is one issue with the tool's scalability. In our company, we are fairly big in terms of the number of containers we have, especially since we can run very large clusters. When you look at some of the charts, it will say 30,000 time series, reached the limit, and cannot show anymore, or it states that a particular data may not be complete. For me, it is a problem that I would like to see fixed. I have spoken to Spunk's team about it, and they have told me that they do recognize the issue and that other people have also mentioned the same problem. Once you see the issues related to the scalability part, you need to understand that it is a warning triangle. After seeing the warning triangle, you need to realize that you cannot trust any of the numbers you see in the chart because it is not a complete, full data set. I want the tool to either tell me that it can't show me the numbers or that I need to find some way to show all the numbers in a more summarized view. The tool asks you to filter things down more, but it would be nice to offer specific suggestions as to what you could filter down to get it into a more specific or reasonable number. In some cases, my company just has to have a number, considering that we have 1,00,000 containers. If I want to know how many containers are running, currently, the way the backend works in a way where it requires to know how many different time series there are, and then it just says that the 30,000 limit has been reached, but when it happens, I don't know whether it is for 1,00,000 containers, 1,20,000 containers or 80,000 containers.
The technical support team for the solution is good for our company. My company has a weekly meeting with Splunk's sales support team, and if there are any issues, we bring them up for discussion. I have seen that the technical support team is super responsive.
My company has its own internal solution, which was built ten to fifteen years ago, and it has progressed over time, but it is only ever used to support metrics and events, not for tracing. In short, it is not used for Splunk APM-related stuff, which is a big change that makes a difference for us.
The product's deployment phase is good and very easy because it is done with OpenTelemetry for most of the parts. The product's deployment is not some custom thing where you have to deploy a particular agent that belongs to a particular company and put it on every single host. It is very easy to follow OpenTelemetry's models for the most part. Splunk is a very big contributor to OpenTelemetry, and I value it. It consists of the reasons I recommend using Splunk as a backend provider. In my company, we are more open to being more of an OpenTelemetry-compliant organization instead of going for other vendors.
I can't speak about the tool's ROI since I get paid, but I don't have to spend money on the product.
I don't have much insight into the costs and licensing area attached to the tool. I am the engineer and developer, not the person who writes the checks in the company. I know that my company has a Splunk Enterprise Security license which is used for logging and even for Splunk Observability.
I think the tool has the best trace aggregation features compared to what I have seen in different products, and I feel Tag Spotlight is a good example of it. A lot of the other products support tracing, but when you look at them, you see that they show one trace at a time. I can deep dive into one trace at a time, but what I want to find is commonality across the traces. I think it will give the tool a high grade for all its features. I rate the tool highly since it offers a very good Kubernetes integration. With a lot of data, you can see which part the Kubernetes host is running on, switch between them, and see the application metrics and the actual infrastructure metrics. Seeing it all together can be very useful.
I rate the tool a nine out of ten.
Splunk Infrastructure Monitoring reduces our mean time to resolve. We are more proactive than reactive. I would be very confident to say that there is about a 25% reduction in time. We get things way quicker than when we were just doing it reactively.
It has the ability to identify and solve problems in real time. It saves time.
There is no one feature that stands out more than others. We use a little bit of everything. When we started using it, we did not exactly know it. It was new and fresh, so we just started gathering everything. We did not end up doing anything different. All of the features that we are using have had an effect on the monitoring that we are doing. Everything is very effective.
We never had any issues when it comes to the type of use cases we are using it for. We did not need more advancement on it, but I know that, in general, everything can be updated. There are tiny little tweaks that can be made regardless of whether it looks better or has a different flow to it than it does right now, but it works pretty well for what we use it for.
I have been using Splunk Infrastructure Monitoring for two to three years.
It is stable.
It is scalable. As we continue to grow and expand, the stability and the scalability are there.
They have been very helpful whenever we have had any issues. Only one or two times they did not know. That does happen. We are all humans, but that is the best that you can get.
Positive
I got onto the team when we started using it, so I am not sure what we were using before.
I would rate Splunk Infrastructure Monitoring a ten out of ten.
We have a lot of applications that we monitor. We have a lot of hardware that runs on VMware. We monitor all of that as well.
Dashboards have been helpful because people can go and look for themselves how their systems are running. The requests for us to go look at something have gone down because people can go and do it themselves.
It is important for us that Splunk Infrastructure Monitoring has end-to-end visibility. Developers and those types of teams can look at and troubleshoot any kind of issues quickly.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve, but I do not know how much. We just help as needed, but for the most part, it is just the teams going in there and looking at things themselves.
Splunk Infrastructure Monitoring has helped improve our organization’s business resilience.
Different teams can see a lot of different aspects of what is going on. They can see network traffic. They can see applications, and they can see hardware peaks and performances. They can see everything they need.
We could see the value of Splunk Infrastructure Monitoring within a couple of weeks of implementing it.
Dashboards help the application support teams to have a quick look at how their systems are running. It helps other teams as well.
They can get more integration with a few more products.
They can also update some of the dashboards that are in there now.
It is pretty good in terms of the ability to predict, identify, and solve problems in real-time, but there is always room for improvement.
I am in a new role. I have been there for two months. That is as long as I have been using it.
It is very stable. It is good.
Its scalability is great.
It is very good. I would rate them a nine out of ten. They are usually pretty helpful and knowledgeable.
Positive
We have it on-prem, and we also have a cloud instance. Our cloud provider is AWS. We do not monitor multiple cloud environments.
Deploying it was pretty straightforward. We just had to make sure that we were getting the logs right and setting the apps right. That was pretty much it.
We have seen an ROI in terms of manhours and less work for everyone.
I have always used Splunk.
I would rate Splunk Infrastructure Monitoring a ten out of ten. It is great. It is much better than a lot of other products, so it is definitely up there.
Splunk Infrastructure Monitoring provides end-to-end visibility into our cloud-native environments. It is very important for us.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve.
It is digitalized. It has been beneficial for our IT infrastructure.
The security could be better.
I have been using Splunk Infrastructure Monitoring for 11 years.
I usually use the community site. I find that helpful.
We did not use any other solution previously.
We set it up ourselves.
I would rate Splunk Infrastructure Monitoring a ten out of ten.
My customers used the solution for application performance in uptime and networking.
Splunk Infrastructure Monitoring has helped our customer's organization by making troubleshooting easier. The solution helped them have a centralized place where they could dig in across multiple other tools and consolidate all the information in one place.
Splunk Infrastructure Monitoring provided our customers with visibility into their overall infrastructure. They could quickly start identifying where the problems were coming from. If something was going sideways, they could more easily target the specific pathways.
One of our customers was on-premises. The other was a hybrid with on-premises and private cloud.
I was on a team helping them build a brand new tool, which was instantaneous. Another team got it a while ago, and they weren't sure what to do with it. So, we came in and helped them over a six-week engagement. We pivoted them from not feeling like they were getting all that much value to getting good value. It was more of a learning curve situation.
Splunk's unified platform has helped our customers consolidate networking, security, and IT observability tools. I was on the team of a company that was helping build a brand-new monitoring solution. They had probably a dozen separate stand-alone silo tools that could not talk to each other.
Instead of logging on to 12 different places to check each tool individually, Splunk Infrastructure Monitoring helped consolidate everything into a single location for viewing. We didn't get them to the point where they were ready to fully decommission the other systems.
They were going to decommission 12 systems on the six-month game plan. By now, they would have realized the cost savings. It would have been a multimillion-dollar savings for them.
Our customer, with 12 separate systems, was all on-premises. Part of our other customer's footprint was in AWS. It was incredibly easy for our customers to monitor multiple cloud environments using Splunk Infrastructure Monitoring. It was a combination of cloud and on-premises for our customer.
The solution provided them with a single pane of glass where they didn't have to log into multiple places and see everything in a single location. You can develop dashboards that give you cross-platform visibility, which is a huge win.
A wide variety of logging makes log onboarding difficult. Over the years, Splunk has done various things to make it easier, so I want to give them props for that. However, the reality is that every vendor has its own logging format. Some vendors have multiple log formats because they change their own products over time.
They have different log formats for different products in their own suites, and no industry standard makes it chaotic. Splunk is probably the best product out there in terms of how they handle it, but it's not perfect yet. They need to keep pushing that cutting edge and trying to improve it. I have no idea how they could do that because they're trying to wrangle chaos, and it's hard.
I have been using Splunk Infrastructure Monitoring for two years.
I think Splunk Infrastructure Monitoring is a solid product from an infrastructure perspective. I haven't seen any bugs in the tool. Like many things with Splunk, everybody knows there will be patches when there's a core upgrade. However, that's more with Splunk Core and not specifically the Splunk Infrastructure Monitoring part.
The solution's scalability is wonderful. I've worked with customers as small as 25 gigs a day, which is tiny, all the way up to close to a petabyte a day. You have to make sure you scale the tool intelligently, but it's more of a budgetary constraint than a technical one. The solution handles the big ones beautifully if you have the budget to have the needed hardware.
Splunk's technical support has significantly improved in the last year. The support went through a rough patch about a year and a half ago. I had to coerce customers to use it because it was really bad there for a while. Splunk's support has vastly improved recently, and I hope it continues to improve.
Those people who changed the attitude, mindset, and processes need all the accolades because it's so much better than it was. Unfortunately, that does mean that it was really bad at one point.
Splunk's technical support still has some room for improvement in certain areas. Mostly, you can tell the more junior people who just read off of a script and really don't know where to go. I always introduce myself as a consultant to let the support person know that I have already done the basic introductory troubleshooting, and they can skip the first ten pages in their script.
Some frontline people in Splunks' support team are wonderful and clearly have more experience. However, it is still obvious that they occasionally bring in somebody brand new who's a little lost.
I rate the technical support seven and a half to eight out of ten.
Positive
I've worked with Core Splunk as a consultant for seven years and was a customer for seven years before that. So I've seen it all: the good, the bad, the ugly, and everything in between. Usually, the actual building of Splunk is super easy because I've done it so many times. Every customer's environment is unique in terms of how to get the data.
It's more about navigating the local customer's politics and archaic technical debts. Somebody thought that a certain architecture was a good idea ten years ago, but today, that doesn't make any sense whatsoever. Wrangling customer chaos is hard, but the Splunk piece is usually easy.
There's always room for improvement, but Splunk Infrastructure Monitoring is a solid product overall. It definitely helps customers who have a lot of legacy systems that don't work well together.
Overall, I rate the solution an eight out of ten.
We use the solution to do a lot of email checking. We also use the tool to monitor different embassies, server IPs and some of the teams.
Splunk Infrastructure Monitoring has helped our organization tremendously. We have onboarded Splunk for the last four years, and we have 30 to 40 contractors who use Splunk daily. The solution has helped not just a small organization like ours but the whole DOS (Department of State).
The solution monitors attacks or unauthorized access to the information we want to protect. There is a dashboard called ISSO that monitors pretty much everything worldwide. We also monitor almost 300 embassies and consulates.
The solution's machine learning deployment is hard and should be made user-friendly. Even if a team doesn't have a data scientist, they should be able to use the machine learning toolkit for monitoring purposes. The solution should include more algorithms and SPL commands that people can use.
I have been using Splunk Infrastructure Monitoring for four months.
We haven’t faced any issues with the solution’s stability.
Splunk Infrastructure Monitoring is highly scalable. We were able to do monitoring and some of the advanced analytics.
I have not contacted Splunk's technical support. We have contacted our account manager for issues, and she's been awesome.
We have different vendors who do deployments, which is different for the government than regular businesses.
We have seen a return on investment with Splunk Infrastructure Monitoring regarding the kind of threats we can identify.
Splunk Infrastructure Monitoring is an expensive solution.
Our organization monitors multiple cloud environments using Splunk Infrastructure Monitoring, which works well. This is the only tool we use, and we aren't considering moving or having additional tools.
It is important for our organization that Splunk Infrastructure Monitoring has end-to-end visibility into our cloud-native environments. Our job is critical and very sensitive, so having end-to-end visibility is really helpful.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve. Looking at the solution's dashboards has helped tremendously because we don't have to look at the individual index or events.
Our business is different from that of a private organization, and Splunk Infrastructure Monitoring has helped improve our organization's business resilience. The machine learning toolkit allows us to do clustering, and we have a couple of deployments on the clusters. That has helped cluster different events based on their critical or security threats.
We have seen time to value using Splunk Infrastructure Monitoring.
Splunk's unified platform has helped consolidate networking, security, and IT observability tools. We don't have to integrate Splunk with a different tool and worry whether those two will integrate. Having everything in one platform helps us create dashboards, alerts, and monitoring tools in one place.
Overall, I rate the solution an eight or nine out of ten.
We are monitoring our servers and their health. We are monitoring their functionality and supporting the Kubernetes platform.
Our team supports multiple different projects. They all have their own clusters and ways of operating, but we just use one Splunk Infrastructure Monitoring system.
Splunk Infrastructure Monitoring has helped improve our organization’s business resilience.
I have primarily used it to go back into the past and understand why something happened. It provides enough information to do research and figure things out.
One thing I recently ran into was that the logs on the server most often get Gzipped after they have been rotated. We found that we were not monitoring some of the things, so we had to go back and pull them in. Right now, it pulls one at a time, untars it, or unzips it, so I cannot look at the entire history. There can be an improvement in that area.
I have been using Splunk Infrastructure Monitoring for four years.
It is stable.
About a year ago, we added another 600 servers and scaled up. We are getting more in the next year or later this year. It works smoothly.
They are good. I have a ticket open now. I told them to go ahead and close it because we thought it was a hardware issue, but they said that they would keep the case open till the hardware replacement to see if the issue goes away. That was pretty nice.
All of our hardware is HPE-based. We rely mostly on OneView, but it does not give us the service aggregation and other things that Splunk Infrastructure Monitoring is giving us.
One of the gentlemen on other teams came to ours. He is very knowledgeable about Splunk, so he helped with the implementation.
All of our servers are RHEL-based.
A different organization group within our organization had Splunk, and they liked it, so we just went with Splunk.
I would rate Splunk Infrastructure Monitoring a ten out of ten.