Sign in
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Reviews from AWS customer

7 AWS reviews

External reviews

49 reviews
from

External reviews are not included in the AWS star rating for the product.


    Kenneth Dozier Jr.

Improves monitoring and observability with actionable alerts

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

We are using Datadog to improve our monitoring and observability so we can hopefully improve our customer experience and reliability.  

I have been using Datadog to build better actionable alerts to help teams across the enterprise. Also by using Datadog we are hoping to have improved observability into our apps and we are also taking advantage of this process to improve our tagging strategy so teams can hopefully troubleshoot incidents faster and a much reduced mean time to resolve. 

We have a lot of different resources we use like Kubernetes, App Gateway and Cosmos DB just to name a few.

How has it helped my organization?

As soon as we started implementing Datadog into our cloud environment people really like how it looked and how easy it was to navigate. We could see the most data in our Kubernetes environments than we ever could. 

Some people liked how the logs were color coded so it was easy to see what kind of log you were looking at. The ease of making dashboards has also been greatly received as a benefit. 

People have commented that there is so much information that it takes a time to digest and get used to what you are looking at and finding what you are looking for. 

What is most valuable?

The selection of monitors is a big feature I have been working with. Previously with Azure Monitor we couldn't do a whole lot with their alerts. The log alerts can sometimes take a while to ingest. Also, we couldn't do any math with the metrics we received from logs to make better alerts from logs.  

The metric alerts are ok but are still very limited. With Datadog, we can make a wide range of different monitors that we can tweak in real time because there is a graph of data as you are creating the alert which is very beneficial. The ease of making dashboards has saved a lot of people a lot of time. No KQL queries to put together the information you are looking for and the ability to pin any info you see into a dashboard is very convenient. 

RUM is another feature we are looking forward to using this upcoming tax season, as we will have a front-row view into what frustrates customers or where things go wrong in their process of using our site. 

What needs improvement?

The PagerDuty integration could be a little bit better. If there was a way to format the monitors to different incident management software that would be awesome. As of right now, it takes a lot of manipulating of PagerDuty to get the monitors from Datadog to populate all the fields we want in PagerDuty.  

I love the fact you can query data without using something like KQL. However, it would also be helpful if there was a way to convert a complex KQL query into Datadog to be able to retrieve the same data - especially for very specific scenarios that some app teams may want to look for.

For how long have I used the solution?

I've used the solution for about two years.

Which solution did I use previously and why did I switch?

We previously used Azure Monitor, App Insights, and Log Analytics.  We switched because it was a lot for developers and SREs to switch between three screens to try troubleshoot and when you add in the slow load times from Azure it can take a while to get things done.

What's my experience with pricing, setup cost, and licensing?

I would advise taking a close look at logging costs, man-hours needed, and the amount of time it takes for people to get comfortable navigating Datadog because there is so much information that it can be overwhelming to narrow down what you need.

Which other solutions did I evaluate?

We did evaluate DynaTrace and looked into New Relic before settling on Datadog.

Which deployment model are you using for this solution?

Hybrid Cloud


    Victor Chen1

Good for log ingestion and analyzing logs with easy searchability of data

  • September 19, 2024
  • Review from a verified AWS customer

What is our primary use case?

We use Datadog as our main log ingestion source, and Datadog is one of the first places we go to for analyzing logs. 

This is especially true for cases of debugging, monitoring, and alerting on errors and incidents, as we use traffic logs from K8s, Amazon Web Services, and many other services at our company to Datadog. In addition, many products and teams at our company have dashboards for monitoring statistics (sometimes based on these logs directly, other times we set queries for these metrics) to alert us if there are any errors or health issues.

How has it helped my organization?

Overall, at my company, Datadog has made it easy to search for and look up logs at an impressively quick search rate over a large amount of logs. 

It seamlessly allows you to set up monitoring and alerting directly from log queries which is convenient and helps for a good user experience, and while there is a bit of a learning curve, given enough time a majority of my company now uses Datadog as the first place to check when there are errors or bugs. 

However, the cost aspect of Datadog is tricky to gauge because it's related to usage, and thus, it is hard to tell the relative value of Datadog year to year.

What is most valuable?

The feature I've found most valuable is the log search feature. It's set up with our ingestion to be a quick one-stop shop, is reliable and quick, and seamlessly integrates into building custom monitors and alerts based on log volume and timeframes. 

As a result, it's easy to leverage this to triage bugs and errors, since we can pinpoint the logs around the time that they occur and get metadata/context around the issue. This is the main feature that I use the most in my workflow with Datadog to help debug and triage issues.

What needs improvement?

More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard. I recently struggled a lot to parse text from raw line logs that didn't seem to match directly with facets. There should be smart searching capabilities. However, it's not intuitive to learn how to leverage them, and instead had to resort to a Python script to do some simple regex parsing (I was trying to parse "file:folder/*/*" from the logs and yet didn't seem to be able to do this in Datadog, maybe I'm just not familiar enough with the logs but didn't seem to easily find resources on how to do this either). 

For how long have I used the solution?

I've used the solution for 10 months.

What's my experience with pricing, setup cost, and licensing?

Beware that the cost will fluctuate (and it often only gets more expensive very quickly).


    reviewer2543758

Good visibility into application performance, understanding of end-user behavior, and a single pane of glass view

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

The primary use case for this solution is to enhance our monitoring visibility, determine the root cause of incidents, understand end-user behaviour from their point of view (RUM), and understand application performance.

Our technical environment consists of a local dev env where Datadog is not enabled, we have deployed environments that range from UAT testing with our product org to ephemeral stacks that our developers use to test there code not on there computer.  We also have a mobile app where testing is also performed.

How has it helped my organization?

Datadog has greatly improved our organization in many ways. Some of those ways include greater visibility into application performance, understanding of end-user behavior, and a single pane of glass view into our entire infrastructure.  

Regarding visibility, our organization previously used New Relic, and when incidents or regressions happened, New Relic's query language was very hard to use. End-user behavior in RUM has improved our ability to know what to focus on. Lastly, the single pane of glass view with maneuvering between products has helped us truly understand root causes after incidents.

What is most valuable?

APM has been a top feature for us. I can speak for all developers here: they use it more often than other products. Due to a standard in tracing (even though it is customizable), engineers find it easier to walk a trace than to understand what went wrong when looking at logging.  

Another feature that I find valuable, though it isn't the first one that comes to mind, is Watchdog. I have found that has been a good source of understanding anomalies and where maybe we (as an organization) need more monitoring coverage.

What needs improvement?

I am not 100% sure how this is done or if it can be though I've had a lot of education I've had to do to ramp developers up on the platform. This feels like the nature of just the sheer growth and number of products Datadog now offers.  

When I first started using the Datadog platform, I thought that was a big pro of the company that the ramp-up time was much quicker, not having to learn a query language. I still believe that to be true when comparing the product to someone like New Relic though with the wide range of products Datadog now offers it can be a bit intimidating to developers to know where to go to find what they want.

For how long have I used the solution?

I have been using the solution at my current company for almost four years, and have used it at my previous company as well.

Which solution did I use previously and why did I switch?

A while ago, we used New Relic, and we switched due to Datadog being a better product.

What about the implementation team?

We did the implementation in-house.

What's my experience with pricing, setup cost, and licensing?

The value compared to pricing is reasonable, though it can be a bit of a sticker shock to some.

Which other solutions did I evaluate?

We did not evaluate other options. 

Which deployment model are you using for this solution?

Public Cloud


    reviewer1974104

Centralized pipeline with synthetic testing and a customized dashboard

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. 

Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. 

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

Centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. 

The ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

These features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. 

I like the idea of monitoring on the go, yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. 

In some cases the screenshots don't match the text as updates are made. I spent longer than I should have figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution has been very scalable and customizable.

How are customer service and support?

Sales service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

Generally simple, but .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house. 

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

Excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure


    Ajay Thomas

Great features and synthetic testing but pricing can get expensive

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

Our primary use case is custom and vendor supplied web application log aggregation, performance tracing and alerting. We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. 

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through use of Datadog across all of our apps we were able to consolidate a number of alerting and error tracking apps and Datadog ties them all together in cohesive dashboards. 

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting edge .NET Core with streaming logs all work. 

The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. The centralized pipeline tracking and error logging provides a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution is very scalable, very customizable.

How are customer service and support?

Service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The setup was generally simple. However, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house. 

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

I'm excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure


    reviewer254673

Good monitoring capabilities, centralizing of logs, and making data easily searchable

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

Our primary use of Datadog involves monitoring over 50 microservices deployed across three distinct environments. These services vary widely in their functions and resource requirements. 

We rely on Datadog to track usage metrics, gather logs, and provide insight into service performance and health. Its flexibility allows us to efficiently monitor both production and development environments, ensuring quick detection and response to any anomalies. 

We also have better insight into metrics like latency and memory usage.

How has it helped my organization?

Datadog has significantly improved our organization’s monitoring capabilities by centralizing all of our logs and making them easily searchable. This has streamlined our troubleshooting process, allowing for quicker root cause analysis. 

Additionally, its ease of implementation meant that we could cover all of our services comprehensively, ensuring that logs and metrics were thoroughly captured across our entire ecosystem. This has enhanced our ability to maintain system reliability and performance.

What is most valuable?

The intuitive user interface has been one of the most valuable features for us. Unlike other platforms like Grafana, as an example, where learning how to query either involves a lot of trial and error or memorization almost like learning a new language, Datadog’s UI makes finding logs, metrics, and performance data straightforward and efficient. This ease of use has saved us time and reduced the learning curve for new team members, allowing us to focus more on analysis and troubleshooting rather than on learning the tool itself.

What needs improvement?

While the UI and search functionality are excellent, further improvement could be made in the querying of logs by offering more advanced templates or suggestions based on common use cases. This would help users discover powerful queries they might not think to create themselves. 

Additionally, enhancing alerting capabilities with more customizable thresholds or automated recommendations could provide better insights, especially when dealing with complex environments like ours with numerous microservices.

For how long have I used the solution?

I've used the solution for five years.

What do I think about the stability of the solution?

We have never experienced any downtime.

Which solution did I use previously and why did I switch?

We previously used Sumo Logic.

Which deployment model are you using for this solution?

Public Cloud


    Reviewer 76

Enhances efficiency with robust alerting and visualization tools

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

Our primary use case for Datadog is to monitor and manage our fully cloud-native infrastructure. We utilize DataDog to gain real-time visibility into our cloud environments, ensuring that all our services are running smoothly and efficiently. 

The platform’s extensive integration capabilities allow us to seamlessly track performance metrics across various cloud services, containers, and microservices. 

With Datadog’s robust alerting and visualization tools, we can proactively identify and resolve issues, minimizing downtime and optimizing our system’s performance. This has been crucial in maintaining the reliability and scalability of our cloud-native applications.

How has it helped my organization?

Datadog has significantly enhanced our organization’s operational efficiency and reliability. By providing real-time visibility into our cloud-native infrastructure, Datadog enables us to monitor performance metrics, detect anomalies, and resolve issues swiftly. 

The platform’s robust alerting system ensures that potential problems are addressed before they impact our services, reducing downtime and improving overall system stability. Additionally, Datadog’s comprehensive dashboards and reporting tools have streamlined our troubleshooting processes and facilitated better decision-making.

What is most valuable?

The most valuable feature of Datadog for our organization has been its real-time monitoring capabilities. This feature provides us with instant visibility into our cloud-native infrastructure, allowing us to track performance metrics and detect anomalies as they occur. The ability to monitor our systems in real-time means we can quickly identify and address issues before they escalate, minimizing downtime and ensuring the reliability of our services. 

Additionally, the real-time data helps us make informed decisions and optimize our operations, ultimately enhancing our overall efficiency and performance.

What needs improvement?

While Datadog has been instrumental in enhancing our operational efficiency, there are areas where it could be improved. 

One area is the user interface, which could be more intuitive and user-friendly, especially for new users. 

Additionally, the pricing model can be quite complex and might benefit from more flexible options tailored to different organizational needs. 

For future releases, it would be beneficial to include more advanced machine learning capabilities for predictive analytics, helping us anticipate issues before they occur. 

More third-party tools would also be valuable additions.

For how long have I used the solution?

I've used the solution for six years.

What do I think about the stability of the solution?

DataDog has proven to be a highly stable solution for our monitoring needs. Throughout our usage, we have experienced minimal downtime and consistent performance, even during peak traffic periods. The platform’s reliability ensures that we can continuously monitor our cloud-native infrastructure without interruptions, which is crucial for maintaining the health and performance of our services.

What do I think about the scalability of the solution?

DataDog’s scalability has been impressive and instrumental in supporting our growing cloud-native infrastructure. The platform effortlessly handles increased workloads and scales alongside our expanding services without compromising performance. Its ability to integrate with a wide range of cloud services and technologies ensures that as we grow, DataDog continues to provide comprehensive monitoring and insights.

How are customer service and support?

Our experience with Datadog’s customer service and support has been exceptional. The support team is highly responsive and knowledgeable, providing timely assistance whenever we’ve encountered issues or had questions. 

Their proactive approach to offering solutions and guidance has been invaluable in helping us maximize the platform’s capabilities.

How would you rate customer service and support?

Positive

How was the initial setup?

The setup is straightforward.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

The pricing model can be quite complex and might benefit from more flexible options tailored to different organizational needs.

What other advice do I have?

One area is the user interface, which could be more intuitive and user-friendly, especially for new users.

Which deployment model are you using for this solution?

Public Cloud


    Kevin Palmer

Useful log aggregation and management with helpful metrics aggregation

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

We use Datadog for log aggregation and management, metrics aggregation, application performance monitoring, infrastructure monitoring (serverless (Lambda functions), containers (EKS), standalone hosts (EC2)), database monitoring (RDS) and alerting based on metric thresholds and anomalies, log events, APM anomalies, forecasted threshold breaches, host behaviors and synthetics tests.

Datadog serves a whole host of purposes for us, with an all-in-one UI and integrations between them built in and handled without any effort required from us.

We use Datadog for nearly all of our monitoring and information analysis from the infrastructure level up through the application stack.

How has it helped my organization?

Datadog provides us value in three major ways:

First, Datadog provides best-in-class functionality in many, if not all, of the products to which we subscribe (infrastructure, APM, log management, serverless, synthetics, real user monitoring, DB monitoring). In my experience with other tools that provide similar functionality, Datadog provides the largest feature set with the most flexibility and the best performance.

Second, Datadog allows us to access all of those services in one place. Having to learn and manage only one tool for all of those purposes is a major benefit.

Third, Datadog provides significant connectivity between those services so that we can view, summarize, organize, translate and correlate our data with maximum effect. Not needing to manually integrate them to draw lines between those pieces of information is a huge time savings for us.

What is most valuable?

I use log management and monitors most often.

Log management is a great way for me to identify changes in behavior across services and environments as we make changes or as user behavior evolves. I can filter out excess or not useful logs, in part or in full, I can look for trends and I can group by multiple facets.

Monitors allow me to rest easy knowing that I'll be alerted to unexpected changes in behavior throughout our environments so that I can be proactive without having to dedicate active cycles to watching all facets of our environments.

What needs improvement?

In my four years using the product, the only feature request I, or anyone on my team, has had was the ability to view query parameters in query samples. 

Otherwise, improvements are already released faster than we can give them sufficient time and attention, so I'm very happy with the product and don't have any specific requests at this time.

The cost does add up quickly, so it can be some effort to justify the necessary outlay to those paying the bills. That said, Datadog provides sufficient benefits to warrant our continued use.

For how long have I used the solution?

I've used the solution for four years.

What do I think about the stability of the solution?

In four years of daily use I haven't noticed any periods of downtime.

What do I think about the scalability of the solution?

It's amazing to me how performant Datadog is given how much data we pass to it.

How are customer service and support?

We've opened probably six or eight support tickets in four years of use. In some cases, the problem or question was complex and took some time to resolve. That said, customer support was always able to debug the issue and find a solution for us, so my experience has been very positive.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I've used New Relic, Honeycomb, Grafana, Splunk, Prometheus, Graylog and others.

How was the initial setup?

Given the breadth of configuration options, the initial setup was fairly involved for us. We also use several services and deploy the agent in various ways because we're using traditional servers, serverless, and K8s.

What about the implementation team?

We implemented the solution in-house.

What's my experience with pricing, setup cost, and licensing?

The solution can be pricey if you're using many services and/or shipping lots of data, but in my opinion, the value is greater than the cost, so I would suggest doing an evaluation before making a decision.

Which deployment model are you using for this solution?

Public Cloud


    reviewer902462

Excellent for monitoring, analyzing, and optimizing performance

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

Our primary use case for Datadog is monitoring, analyzing, and optimizing the performance and health of our applications and infrastructure. 

We leverage its logging, metrics, and tracing capabilities to pinpoint issues, track system performance, and improve overall reliability. Datadog’s ability to provide real-time insights and alerting on key metrics helps us quickly address issues, ensuring smooth operations. 

It’s integral for visibility across our microservices architecture and cloud environments.

How has it helped my organization?

Datadog has been incredibly valuable to our organization. Its ability to pinpoint warnings and errors in logs and provide detailed context is essential for troubleshooting. 

The platform's request tracing feature offers comprehensive insights into user flows, allowing us to quickly identify issues and optimize performance. 

Additionally, Datadog's real-time monitoring and alerting capabilities help us proactively manage system health, ensuring operational efficiency across our applications and infrastructure.

What is most valuable?

Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization. This feature helps us quickly identify performance bottlenecks and prioritize improvements. 

Additionally, the ability to filter requests by user email is extremely useful for tracking down user-specific issues faster. It streamlines the troubleshooting process and enables us to provide more targeted support to individual users, improving overall customer satisfaction.

What needs improvement?

The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency. Additionally, the interface can sometimes feel overwhelming, with so much happening at once, which may discourage users from exploring new features. Simplifying the layout or providing clearer guidance could enhance user experience. Any improvements related to query optimization would be highly beneficial, as it would further streamline workflows and boost productivity.

For how long have I used the solution?

I've used the solution for five years.


    Kenneth Dozier

Easy to use with good speed and helpful dashboards

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

We are using Datadog to improve our cloud monitoring and observability across our enterprise apps.  We have integrated a lot of different resources into Datadog, like Kubernetes, App Gateways, App Service Environments, App Service Plans, and other Web App resources. 

I will be using the monitoring and observability features of Datadog. Dashboards are used very heavily by teams and SREs. We really have seen that Datadog has already improved both our monitoring and our observability.

How has it helped my organization?

The ease and speed of which you can create a dashboard has been a huge improvement.  

The different types of monitors we can create have been huge, too. We can do so many different things with monitors that we couldn't do before with our alerts. 

Being able to click on a trace or log and drill down on it to see what happened has been great.  

Some have found the learning curve a bit steep. That said,they are coming around slowly. There is just a lot of information to learn how to navigate.

What is most valuable?

The different types of monitors have been very valuable. We have been able to make our alerts (monitors) more actionable than we were able to previously.  

Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue. 

RUM is another feature a lot of us are looking forward to seeing how it can help us improve our customer experience during tax season.  

We hope to enable the code review feature at some point to so we can see what code caused the issue.

What needs improvement?

I would like to see the integration between PagerDuty and Datadog improved.  The tags in Datadog don't match those in PagerDuty, and we have to make it work.  Also, I would like to see if the ability to replicate a KQL query in Datadog is made easier or better.  

I would like to see the alert communications to email or phones made better so we could hopefully move off PagerDuty and just use Datadog for that. 

There are also a lot of features that we haven't budgeted for yet and I would like for us to be able to use them in the future.

For how long have I used the solution?

I've used the solution for about two years.

Which deployment model are you using for this solution?

Hybrid Cloud