Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.

External reviews
External reviews are not included in the AWS star rating for the product.
Helpful support, with centralized pipeline tracking and error logging
What is our primary use case?
How has it helped my organization?
Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards.
What is most valuable?
The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.
Synthetic testing is great, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.
What needs improvement?
While the documentation is very good, there are areas that need a lot of focus to pick up on the key details. In some cases the screenshots don't match the text when updates are made.
I spent longer than I should trying to figure out how to correlate logs to traces, mostly related to environmental variables.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
We have been impressed with the uptime.
What do I think about the scalability of the solution?
It's scalable and customizable.
How are customer service and support?
Support is helpful. They help us tune our committed costs and alert us when we start spending out of the on-demand budget.
Which solution did I use previously and why did I switch?
We used a mix of SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility.
How was the initial setup?
Setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
There has been significant time saved by the development team in terms of assessing bugs and performance issues.
What's my experience with pricing, setup cost, and licensing?
I'd advise others to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.
Which other solutions did I evaluate?
NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.
What other advice do I have?
We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Consistent, centralized service for varied cloud-based applications
What is our primary use case?
The current use case for Datadog in our environment is observability. We use Datadog as the primary log ingestion and analysis point, along with consolidation of application/infrastructure metrics across cloud environments and realtime alerting to issues that arise in production.
Datadog integrates within all aspects of our infrastructure and applications to provide valuable insights into Containers, Serverless functions, Deep Logging Analysis, Virtualized Hardware and Cost Optimizations.
How has it helped my organization?
Datadog improved our observability layer by creating a consistent, centralized service for all of our varied cloud-based applications. All of our production and non-production environment applications and infrastructure send metrics directly to Datadog for analysis and determination of any issues that would need to be looked at by the Infrastructure, Platform and Development teams for quick remediation. Using Datadog as this centralized Observability platform has enabled us to become leaner without sacrificing project timelines when issues arise and require triage for efficient resolution.
What is most valuable?
All of Datadog's features have become valuable tools in our cloud environments.
Our primary alerts, based on metrics and synthetic transactions, are the most used and relied upon for decreased MTTA/MTTR across all of our platforms. This is followed by deep log analysis that enables us to quickly and easily get to a preliminary root cause that someone on the infrastructure, platform or development teams can take and focus their attention on the precise target that Datadog revealed as the issue to be remediated.
What needs improvement?
The two areas I could see needing improvement or a feature to add value are building a more robust SIM that would include container scanning to rival other such products on the market so we do not need to extend functionality to another third-party provider. The other expands the alerting functions by creating a new feature to add direct SMS notifications, on-call rotation scheduling, etc., that could replace the need to have this as an external third party solution integration.
For how long have I used the solution?
I've been a Datadog user for almost ten years.
What do I think about the stability of the solution?
Datadog is very stable, and we've only come across a few items that needed to be addressed quickly when there were issues.
What do I think about the scalability of the solution?
Scalability is very favorable, aside from cost/budget, which limits the scalability of this platform.
How are customer service and support?
Both customer service and support need a little work, as we have had a number of requests/issues that were not addressed as we needed them to be.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Being an Observability SME, I have used many native and third party solutions, including Dynatrace, New Relic, CloudWatch and Zabbix. As previously mentioned, Datadog provides a superior platform for centralizing and consolidating our Observability layer. Switching to Datadog was a no-brainer when most other solutions either didn't provide the maturity of functions, or have them available, at all.
How was the initial setup?
The initial setup was very straightforward, and the integrations were easily configured.
What about the implementation team?
We implemented Datadog in-house.
What was our ROI?
For the most part, Datadog's ROI is quite impressive when you consider all of the features and functions that are centralized on the platform. It doesn't require us to purchase additional third-party solutions to fill in the gaps.
What's my experience with pricing, setup cost, and licensing?
The setup was dead simple once the cloud integrations and agent components were identified and executed. Licensing falls into our normal third-party processes, so it was a familiar feeling when we started with Datadog. Cost is the only outlier when it comes to a perfect solution. Datadog is expensive, and each add-on drives that cost further into the realm of requiring justifications to finance expanding the core suite of features we would like to enable.
Which other solutions did I evaluate?
What other advice do I have?
They should provide more inclusive pricing, or an "all you can eat" tier that would include all relevant features, as opposed to individual cost increases to let Datadog to become more valuable and replace even more third-party solutions that have a lower cost of entry.
Which deployment model are you using for this solution?
Easy, more reliable, and transparent monitoring
What is our primary use case?
We use the solution to monitor and investigate issues with production services at work. We're periodically reviewing the service catalog view for the various applications and I use it to identify any anomalies with service metrics, any changes in user behavior evident via API calls, and/or spikes in errors.
We use monitors to trigger alerts for on-call engineers to act upon. The monitors have set thresholds for request latency, error rates, and throughput.
We also use automated rules to block bad actors based on request volume or patterns.
How has it helped my organization?
Datadog has made setting up monitors easier, more reliable, and more transparent. This has helped standardize our on-call process and set all of our on-call engineers up for success.
It has also standardized the way we evaluate issues with our applications by encouraging all teams to use the service catalog.
It makes it easier for our platforms and QA teams to get other engineering teams up to speed with managing their own applications' performance.
Overall, Datadog has been very helpful for us.
What is most valuable?
The service catalog view is very helpful for periodic reviews of our application. It has also standardized the way we evaluate issues with our applications. Having one page with an easy-to-scan view of app metrics, error patterns, package vulnerabilities, etc., is very helpful and reduces friction for our full-stack engineers.
Monitors have also been very valuable when setting up our on-call processes. It makes it easy to set up and adjust alerting to keep our teams aware of anything going wrong.
What needs improvement?
Datadog is great overall. One thing to improve would be making it easier to see common patterns across traces. I sometimes end up in a trace but have a hard time finding other common features about the error/requests that are similar to that trace. This could be easier to get to; however, in that case, it's actually an education issue.
Another thing that could be improved is the service list page sometimes refreshes slowly, and I accidentally click the wrong environment since the sort changes late.
For how long have I used the solution?
I've used the solution for about a year.
What do I think about the stability of the solution?
It is very stable. I have not seen any issues with Datadog.
What do I think about the scalability of the solution?
It seems very scalable.
How are customer service and support?
I've had no specific experience with technical support.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We used Honeycomb before. We switched since Datadog offered more tooling.
How was the initial setup?
Each application has been easy to instrument.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
Engineers save an unquantifiable amount of time by having one standard view for all applications and monitors.
What's my experience with pricing, setup cost, and licensing?
I am not exposed to this aspect of Datadog.
Which other solutions did I evaluate?
We did not evaluate other options.
Which deployment model are you using for this solution?
Very good custom metrics, dashboards, and alerts
What is our primary use case?
Our primary use case for Datadog involves utilizing its dashboards, monitors, and alerts to monitor several key components of our infrastructure.
We track the performance of AWS-managed Airflow pipelines, focusing on metrics like data freshness, data volume, pipeline success rates, and overall performance.
In addition, we monitor Looker dashboard performance to ensure data is processed efficiently. Database performance is also closely tracked, allowing us to address any potential issues proactively. This setup provides comprehensive observability and ensures that our systems operate smoothly.
How has it helped my organization?
Datadog has significantly improved our organization by providing a centralized platform to monitor all our key metrics across various systems. This unified observability has streamlined our ability to oversee infrastructure, applications, and databases from a single location.
Furthermore, the ability to set custom alerts has been invaluable, allowing us to receive real-time notifications when any system degradation occurs. This proactive monitoring has enhanced our ability to respond swiftly to issues, reducing downtime and improving overall system reliability. As a result, Datadog has contributed to increased operational efficiency and minimized potential risks to our services.
What is most valuable?
The most valuable features we’ve found in Datadog are its custom metrics, dashboards, and alerts. The ability to create custom metrics allows us to track specific performance indicators that are critical to our operations, giving us greater control and insights into system behavior.
The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues. Additionally, the alerting system ensures we are promptly notified of any system anomalies or degradations, enabling us to take immediate action to prevent downtime.
Beyond the product features, Datadog’s customer support has been incredibly timely and helpful, resolving any issues quickly and ensuring minimal disruption to our workflow. This combination of features and support has made Datadog an essential tool in our environment.
What needs improvement?
One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization. These metrics are critical for understanding the performance and resource usage of our Airflow infrastructure, and having them directly in Datadog would provide a more comprehensive view of our system’s health. This would enable us to diagnose issues faster, optimize resource allocation, and improve overall system performance. Including these metrics in Datadog would greatly enhance its utility for teams working with AWS-managed Airflow.
For how long have I used the solution?
I've used the solution for four months.
What do I think about the stability of the solution?
The stability of Datadog has been excellent. We have not encountered any significant issues so far.
The platform performs reliably, and we have experienced minimal disruptions or downtime. This stability has been crucial for maintaining consistent monitoring and ensuring that our observability needs are met without interruption.
What do I think about the scalability of the solution?
Datadog is generally scalable, allowing us to handle and display thousands of custom metrics efficiently. However, we’ve encountered some limitations in the table visualization view, particularly when working with around 10,000 data points. In those cases, the search functionality doesn’t always return all valid results, which can hinder detailed analysis.
How are customer service and support?
Datadog's customer support plays a crucial role in easing the initial setup process. Their team is proactive in assisting with metric configuration, providing valuable examples, and helping us navigate the setup challenges effectively. This support significantly mitigates the complexity of the initial setup.
Which solution did I use previously and why did I switch?
We used New Relic before.
How was the initial setup?
The initial setup of Datadog can be somewhat complex, primarily due to the learning curve associated with configuring each metric field correctly for optimal data visualization. It often requires careful attention to detail and a good understanding of each option to achieve the desired graphs and insights
What about the implementation team?
We implemented the solution in-house.
Becoming the Gold Standard
This is a good product, but is only just starting to bubble up observability. Takes minutes
APM is growing by leaps and bounds