Listing Thumbnail

    Datadog Enterprise

     Info
    Sold by: Datadog 
    Deployed on AWS
    Vendor Insights
    Quick Launch
    Datadog is a SaaS-based unified observability and security platform providing full visibility into the health and performance of each layer of your environment at a glance.

    Overview

    Datadog is a SaaS-based unified observability and security platform providing full visibility into the health and performance of each layer of your environment at a glance. Datadog allows you to customize this insight to your stack by collecting and correlating data from more than 600 vendor-backed technologies and APM libraries, all in a single pane of glass. Monitor your underlying infrastructure, supporting services, applications alongside security data in a single observability platform.

    Prices are based on committed use per month over total term of the agreement (the Total Expected Use).

    Highlights

    • Get started in minutes from AWS Marketplace with our enhanced integration for account creation and setup. Turn-key integrations and easy-to-install agent to start monitoring all of your servers and resources in minutes.
    • Quickly deploy modern monitoring and security in one powerful observability platform.
    • Create actionable context to speed up, reduce costs, mitigate security threats and avoid downtime at any scale.

    Details

    Delivery method

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Trust Center

    Trust Center
    Access real-time vendor security and compliance information through their Trust Center powered by Drata. Review certifications and security standards before purchase.

    Buyer guide

    Gain valuable insights from real users who purchased this product, powered by PeerSpot.
    Buyer guide

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    AWS PrivateLink

    Get next level security. Connect VPCs and AWS services without exposing data to the internet.

    Quick Launch

    Leverage AWS CloudFormation templates to reduce the time and resources required to configure, deploy, and launch your software.

    Vendor Insights

     Info
    Skip the manual risk assessment. Get verified and regularly updated security info on this product with Vendor Insights.
    Security credentials achieved
    (2)

    Pricing

    Datadog Enterprise

     Info
    Pricing is based on the duration and terms of your contract with the vendor, and additional usage. You pay upfront or in installments according to your contract terms with the vendor. This entitles you to a specified quantity of use for the contract duration. Usage-based pricing is in effect for overages or additional usage not covered in the contract. These charges are applied on top of the contract price. If you choose not to renew or replace your contract before the contract end date, access to your entitlements will expire.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    1-month contract (21)

     Info
    Dimension
    Description
    Cost/month
    Overage cost
    Infra Enterprise Hosts
    Centralize your monitoring of systems and services (Per Host)
    $27.00
    APM Hosts
    Optimize end-to-end application performance (Per APM Host)
    $36.00
    App Analytics
    Analyze performance metrics (Per 1M Analyzed Spans / 15-day retention)
    $2.04
    Custom Metrics
    Monitor your own custom business metrics (Per 100 Custom Metrics)
    $5.00
    Indexed Logs
    Analyze and explore log data (Per 1M Log Events / 15-day retention)
    $2.04
    Ingested Logs
    Ingest all your logs (Per 1GB Ingested Logs)
    $0.10
    Synthetics API Tests
    Proactively monitor site availability (Per 10K test runs)
    $6.00
    Synthetics Browser Tests
    Easily monitor critical user journeys (Per 1K test runs)
    $15.00
    Serverless Functions
    Deprecated. Not available for new customers
    $6.00
    Fargate Tasks
    Monitor your Fargate Environment (Per Fargate Task)
    $1.20

    Additional usage costs (2)

     Info

    The following dimensions are not included in the contract terms, which will be charged based on your usage.

    Dimension
    Description
    Cost/unit
    Custom dimension used for select private offers
    Custom dimension used for select private offers
    $1.00
    consumption_unit
    Additional Datadog Consumption Units
    $0.01

    Custom pricing options

    Request a private offer to receive a custom quote.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Software as a Service (SaaS)

    SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.

    Support

    Vendor support

    Contact our knowledgable Support Engineers via email, live chat, or in-app messages

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Product comparison

     Info
    Updated weekly

    Accolades

     Info
    Top
    10
    In Log Analysis
    Top
    10
    In Monitoring and Observability, Migration
    Top
    10
    In Application Performance and UX Monitoring

    Customer reviews

     Info
    Sentiment is AI generated from actual customer reviews on AWS and G2
    Reviews
    Functionality
    Ease of use
    Customer service
    Cost effectiveness
    2 reviews
    Insufficient data
    Insufficient data
    Insufficient data
    Insufficient data
    Positive reviews
    Mixed reviews
    Negative reviews

    Overview

     Info
    AI generated from product descriptions
    Observability Platform
    Unified monitoring and security platform supporting data collection from over 600 vendor-backed technologies and APM libraries
    Infrastructure Monitoring
    Comprehensive monitoring capabilities for underlying infrastructure, services, and applications in a single interface
    Data Correlation
    Advanced data aggregation and correlation mechanism across multiple technology layers and components
    Agent-Based Monitoring
    Lightweight, easy-to-install agent for collecting performance and health metrics from servers and resources
    Multi-Technology Integration
    Supports monitoring and data collection across diverse technology ecosystems and vendor platforms
    Infrastructure Auto-Discovery
    Automated device recognition and configuration for over 2,000 technologies with instant performance metric collection
    Hybrid Cloud Monitoring
    Comprehensive visibility across on-premises, hybrid, and cloud infrastructures with agentless monitoring capabilities
    Performance Metrics Collection
    Flexible data collection mechanism capable of pulling metrics from diverse devices and APIs with customizable graphing and dashboarding
    Monitoring Coverage
    Granular performance monitoring for thousands of technologies with preconfigured alert thresholds
    Monitoring Automation
    Automatic device detection, configuration, and performance tracking with intelligent, actionable monitoring capabilities
    Data Ingestion Capability
    Supports petabyte-scale telemetry ingestion with high-performance processing across logs, metrics, and traces
    AI-Powered Troubleshooting
    Utilizes natural language processing and AI-driven root cause analysis for complex incident investigation
    Knowledge Graph Technology
    Implements a proprietary Knowledge Graph for structured data correlation and advanced search capabilities
    Open Data Lake Architecture
    Built on Snowflake data lake infrastructure enabling flexible and scalable telemetry storage and analysis
    Multi-Dimensional Telemetry Analysis
    Enables context-aware correlation across different observability data types including logs, metrics, and distributed traces

    Security credentials

     Info
    Validated by AWS Marketplace
    FedRAMP
    GDPR
    HIPAA
    ISO/IEC 27001
    PCI DSS
    SOC 2 Type 2
    -
    -
    -
    -
    -
    No security profile
    No security profile

    Contract

     Info
    Standard contract
    No

    Customer reviews

    Ratings and reviews

     Info
    4.3
    15 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    13%
    87%
    0%
    0%
    0%
    15 AWS reviews
    |
    87 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    Benjamin Martin

    Custom dashboards and alerts have made server issue detection faster

    Reviewed on Oct 20, 2025
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for Datadog  is monitoring our servers.

    A specific example of how I'm using Datadog  to monitor my server is that we are maintaining request and latency and looking for errors.

    What is most valuable?

    I really enjoy the user interface of Datadog, and it makes it easy to find what I need. In my opinion, the best features Datadog offers are the customizable dashboards and the Watchdog.

    The customizable dashboards and Watchdog help me in my daily work because they're easy to find and easy to look at to get the information I need. Datadog has positively impacted my organization by making finding and resolving issues a lot easier and efficient.

    What needs improvement?

    I think Datadog can be improved by continually finding errors and making things easy to see and customize.

    For how long have I used the solution?

    I have been using Datadog for one month.

    What do I think about the stability of the solution?

    Datadog is stable.

    What do I think about the scalability of the solution?

    Datadog's scalability has been easy to put on each server that we want to monitor.

    How are customer service and support?

    I have not had to contact customer support yet, but I've heard they are great.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    We previously used our own custom solution, but Datadog is a lot easier.

    What was our ROI?

    I'm not sure if I've seen a return on investment.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing is that it was easy to find and easy to purchase and easy to estimate.

    Which other solutions did I evaluate?

    I did not make the decision to evaluate other options before choosing Datadog.

    What other advice do I have?

    I would rate Datadog a nine out of ten.

    I give it this rating because I think just catching some of the data delays and latency live could be a little bit better, but overall, I think it's been great.

    I would recommend Datadog and say that it's easy to customize and find what you're looking for.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Carson Waldrop

    Has resolved user errors faster by reviewing behavior with replay features

    Reviewed on Oct 17, 2025
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Datadog  involves working on projects related to our sales reps in terms of registering new clients, and I've been using Datadog  to pull up instances of them while they're beta testing our product that we're rolling out just to see where their errors are occurring and what their behavior was leading up to that.

    I can't think of all of the specific details, but there was a sales rep who was running into a particular error message through their sales registration process, and they weren't giving us a lot of specific screenshots or other error information to help us troubleshoot. I went into Datadog and looked at the timestamp and was able to look at the actual steps they took in our platform during their registration and was able to determine what the cause of that error was. I believe if I remember correctly, it was user error; they were clicking something incorrectly.

    One thing I've seen in my main use case for Datadog is an option that our team can add on, and it's the ability to track behavior based on the user ID. I'm not sure at this time if our team has turned that on, but I do think that's a really valuable feature to have, especially with the real-time user management where you can watch the replay. Because we have so many users that are using our platform, the ability to filter those replay videos based on the user ID would be so much more helpful. Especially in terms where we're testing a specific product that we're rolling out, we start with smaller beta tests, so being able to filter those users by the user IDs of those using the beta test would be much more helpful than just looking at every interaction in Datadog as a whole.

    What is most valuable?

    The best features Datadog offers are the replay videos, which I really find super helpful as someone who works in QA. So much of testing is looking at the UI, and being able to look back at the actual visual steps that a user is taking is really valuable.

    Datadog has impacted our organization positively in a major way because not even just as a QA engineer having access to the real-time replay, but just as a team, all of us being able to access this data and see what parts of our system are causing the most errors or resulting in the most frustration with users. I can't speak for everybody else because I don't know how each other segment of the business is using it, but I can imagine just in terms of how it's been beneficial to me; I can imagine that it's being beneficial to everybody else and they're able to see those areas of the system that are causing more frustration versus less.

    What needs improvement?

    I think Datadog can be improved, but it's a question that I'm not totally sure what the answer is. Being that my use case for it is pretty specific, I'm not sure that I have used or even really explored all of the different features that Datadog offers. So I'm not sure that I know where there are gaps in terms of features that should be there or aren't there.

    I will go back to just the ability to filter based on user ID as an option that has to be set up by an organization, but I would maybe recommend that being something part of an organization's onboarding to present that as a first step. I think as an organization gets bigger or even if the organization starts using Datadog and is large, it's going to be potentially more difficult to troubleshoot specific scenarios if you're sorting through such a large amount of data.

    For how long have I used the solution?

    I have been working in this role for a little over a year now.

    What do I think about the stability of the solution?

    As far as I can tell, Datadog has been stable.

    What do I think about the scalability of the solution?

    I believe we have about 500 or so employees in our organization using our platform, and Datadog seems to be able to handle that load sufficiently, as far as I can tell. So I think scalability is good.

    How are customer service and support?

    I haven't had an instance where I've reached out to customer support for Datadog, so I do not know.

    How would you rate customer service and support?

    Which solution did I use previously and why did I switch?

    I do not believe we used a different solution previously for this.

    What was our ROI?

    I cannot answer if I have seen a return on investment; I'm not part of the leadership in terms of making that decision. Regarding time saved, in my specific use case as a QA engineer, I would say that Datadog probably didn't save me a ton of time because there are so many replay videos that I had to sort through in order to find the particular sales reps that I'm looking for for our beta test group. That's why I think the ability to filter videos by the user ID would be so much more helpful. I believe features that would provide a lot of time savings, just enabling you to really narrow down and filter the type of frustration or user interaction that you're looking for. But in regards to your specific question, I don't think that's an answer that I'm totally qualified to answer.

    Which other solutions did I evaluate?

    I was not part of the decision-making process before choosing Datadog, so I cannot speak to whether we evaluated other options.

    What other advice do I have?

    Right now our users are in the middle of the beta test. At the beginning of rolling the test out, I probably used the replay videos more just as the users were getting more familiar with the tool. They were probably running into more errors than they would be at this point now that they're more used to the tool. So it kind of ebbs and flows; at the beginning of a test, I'm probably using it pretty frequently and then as it goes on, probably less often.

    It does help resolve issues faster, especially because our sales reps are used to working really quickly in terms of the sales registration, as they're racing through it. They're more likely to accidentally click something or click something incorrectly and not fully pay attention to what they're doing because they're just used to their flow. Being able to go back and watch the replay and see that a person clicked this button when they intended to click another button, or identifying the action that caused an error versus going off of their memory.

    I have not noticed any measurable outcomes in terms of reduction in support tickets or faster resolution times since I started using Datadog. For myself, looking at the users in our beta test group, none of those came as a result of any sort of support ticket. It came from messages in Microsoft Teams  with all the people in the beta group. We have resulted in fewer messages in relation to the beta test because they are more familiar with the tool. Now that they know there might be differences in terms of what their usual flow is versus how their flow is during the beta test group, they are resulting in fewer messages because they are probably being more careful or they've figured out those inflection points that would result in an error.

    My biggest piece of advice for others looking into using Datadog would be to use the filters based on user ID; it will save so much time in terms of troubleshooting specific error interactions or occurrences. I would also suggest having a UI that's more simple for people that are less technical. For example, logging into Datadog, the dashboard is pretty overwhelming in terms of all of the bar charts and options; I think having a more simplified toggle for people that are not looking for all of the options in terms of data, and then having a more technical toggle for people that are looking for more granular data, would be helpful.

    I rate Datadog 10 out of 10.

    Which deployment model are you using for this solution?

    Public Cloud
    Dhroov Patel

    Has improved incident response with better root cause visibility and supports flexible on-call scheduling

    Reviewed on Oct 17, 2025
    Review from a verified AWS customer

    What is our primary use case?

    We use Datadog  for all of our observability needs and application performance monitoring. We recently transitioned our logs to Datadog . We also use it for incident management and on-call paging. We use Datadog for almost everything monitoring and observability related.

    We use Datadog for figuring out the root cause of incidents. One of the more recent use cases was when we encountered a failure where one of our main microservices kept dying and couldn't give a response. Every request to it was getting a 500. We dug into some of the traces and logs, used the Kubernetes  Explorer in Datadog, and found out that the application couldn't reach some metric due to its scaling. We were able to figure out the root cause because of the Kubernetes  Event Explorer in Datadog. We pushed out a hotfix which restored the application to working condition.

    Our incident response team leverages Datadog to page relevant on-calls for whatever service is down that's owned by that team, so they can get the appropriate SMEs and bring the service back up. That's the most common use case for our incident response. All of our teams appreciate using Datadog on-call for incident response because there are numerous notification settings to configure. The on-call schedules are very flexible with overrides and different paging rules, depending on urgency of the matter at stake.

    What is most valuable?

    As an administrator of Datadog, I really appreciate Fleet  Automation. I also value the overall APM  page for each service, including the default dashboards on the service page because they provide exactly what you need to see in terms of request errors and duration latency. These two are probably my favorite features because the service page gives a perfect look at everything you'd want to see for a service immediately, and then you can scroll down and see more infrastructure specific metrics. If it's a Java app, you can see JVM metrics. Fleet  Automation really helps me as an administrator because I can see exactly what's going on with each of my agents.

    My SRE team is responsible for upgrading and maintaining the agents, and with Fleet Automation, we've been able to leverage remote agent upgrades, which is fantastic because we no longer need to deploy to our servers individually, saving us considerable time. We can see all the integration errors on Fleet Automation, which is super helpful for our product teams to figure out why certain metrics aren't showing up when enabling certain integrations. On Fleet Automation, we can see each variant of the Datadog configuration we have on each host, which is very useful as we can try to synchronize all of them to the same version and configuration.

    The Kubernetes Explorer in Datadog is particularly valuable. It gives us a look at each live pod YAML and we can see specific metrics related to each pod. I appreciate the ability to add custom Kubernetes objects to the Orchestration Explorer. It gives our team an easier time to see pods without having to kubectl because sometimes you have permission errors related to that. Sometimes it's just quicker than using kubectl.

    Our teams use Datadog more than they used their old observability tool. They're more production-aware, conscious of how their changes are impacting customers, how the changes they make to their application speed up or slow down their app, and the overall request flow. It's a much more developer-friendly tool than other observability tools.

    What needs improvement?

    Datadog needs to introduce more hard limits to cost. If we see a huge log spike, administrators should have more control over what happens to save costs. If a service starts logging extensively, I want the ability to automatically direct that log into the cheapest log bucket. This should be the case with many offerings. If we're seeing too much APM , we need to be aware of it and able to stop it rather than having administrators reach out to specific teams.

    Datadog has become significantly slower over the last year. They could improve performance at the risk of slowing down feature work. More resources need to go into Fleet Automation because we face many problems with things such as the Ansible  role to install Datadog in non-containerized hosts.

    We mainly want to see performance improvements, less time spent looking at costs, the ability to trust that costs will stay reasonable, and an easier way to manage our agents. It is such a powerful tool with much potential on the horizon, but cost control, performance, and agent management need improvement. The main issues are with the administrative side rather than the actual application.

    For how long have I used the solution?

    I have been using Datadog for about a year and nine months.

    What do I think about the stability of the solution?

    We face a high amount of issues with niche-specific outages that appear to be quite common. AWS  metrics being delayed is something that Datadog posts on their status page. We face a relatively high amount of Datadog issues, but they tend to be small and limited in scope.

    What do I think about the scalability of the solution?

    We have not experienced any scalability issues.

    How are customer service and support?

    I have interacted with support. Support quality varies significantly. Some support agents are fantastic, but some tickets take months to resolve.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    We used Dynatrace  previously, and I believe the switch was due to cost, but that decision was outside my scope as I'm not a decision-maker in that situation.

    How was the initial setup?

    The initial setup in Kubernetes is not particularly difficult.

    What other advice do I have?

    I cannot definitively say MTTR has improved as I don't have access to those numbers and don't want to make misleading statements. Developers use it significantly more than our old observability tool. We've seen some cost savings, but we have to be significantly more cost-aware with Datadog than with our previous observability tool because there's more fluctuation and variation in the cost.

    One pain point is that it has caused us to spend too much time thinking about the bill. Understand  that while it is an administrative hassle, it is very rewarding to developers.

    On a scale of 1-10, I rate Datadog an 8 out of 10.

    Which deployment model are you using for this solution?

    On-premises

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2767362

    Have improved incident response and centralized observability while optimizing resource usage

    Reviewed on Oct 16, 2025
    Review provided by PeerSpot

    What is our primary use case?

    Our main use case for Datadog  includes monitoring and logs, custom metrics, as well as utilizing the APM  feature and synthetic tests in our day-to-day operations.

    A quick specific example of how Datadog  helps with our monitoring and logs comes from all our applications sending logs into Datadog for troubleshooting purposes, with alerts built on top of the logs, and for custom metrics, we send our metrics from the applications via Prometheus to Datadog, building alerts on top of those as well, sometimes sending critical alerts directly to PagerDuty.

    We generally have monitors and alerts set up for our applications and specifically rely on them to check our critical business units, such as databases; in GCP, we use Cloud SQL, in AWS , we use RDS , and we also monitor Scylla databases and EC2  instances running Kafka services, which we heavily depend upon. Recently, we migrated from US one to US five, which was a significant shift, requiring us to migrate all alerts and monitors to US five and validate their functionality in the new site.

    What is most valuable?

    The best feature Datadog offers is its user-intuitive interface, making it very easy to track logs and custom metrics. We also appreciate the APM  feature, which has helped reduce our log volumes and custom metric volumes, allowing us to turn off some custom metrics.

    We recently learned how tags contribute to custom metrics volume, which led us to exclude certain tags to further reduce that volume, and we implement log indexing and exclusion filters, leaving us with much to explore and optimize in our use of Datadog as our major observability platform.

    What needs improvement?

    Regarding metrics showing our improvements, the MTTR has been reduced by about 40% after integrating Datadog with PagerDuty, and we've seen our costs significantly drop in the most recent renewal after three years' contract.

    Operationally, we spend about 30-40% less time correlating logs and metrics across services, while potential areas for improvement in Datadog include its integration depth and providing more flexible pricing models for large metric and log volumes.

    I would suggest having an external Slack channel for urgent requests, which would enable quicker access to support or a dedicated support team for our needs.

    I choose eight because, while we have used Datadog for three years and experienced growth in our business and services, the cost has also increased with the growth in metrics and log volumes, and proactive cost management feedback has not been provided to help manage or budget those rising costs. Thus, I'd like to see more proactive cost management in the future, as the pricing model seems to escalate quickly with increasing metrics ingestion and monitoring across clouds. Datadog is a powerful and reliable observability platform, but there is still room for improvement in cost efficiency and usability at scale.

    Regarding pricing, setup costs, and licensing, I find Datadog's pricing model transparent but scaling quickly; the base licensing for host integration is straightforward, but costs can rapidly climb as we add custom metrics and log ingestion, especially in dynamic Kubernetes  or multi-cloud environments, with the pricing being moderate to high, and while cost visibility is straightforward, it could become challenging with growing workloads. The upfront setup cost is minimal, mainly involving fine-tuning dashboards, tags, and alerts, making licensing very flexible to enable features as needed.

    For how long have I used the solution?

    I have been working in my current field for roughly around 10 years, starting my AWS  journey about 10 years ago, mainly focused on infrastructure and observability.

    What do I think about the stability of the solution?

    I believe Datadog is stable.

    What do I think about the scalability of the solution?

    Datadog's scalability is impressive, as it has the necessary integrations, supports agent-based and cloud-native solutions, and accommodates multi-cloud, multi-region features, making overall performance very good.

    How are customer service and support?

    Customer support has improved recently with online support available through a portal, allowing for quicker access to help.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    Previously, we used Splunk SignalFx for a couple of years, switching to Datadog because of Datadog's user-intuitive interface, which was lacking in SignalFx at the time.

    What was our ROI?

    Datadog has had a significant positive impact on our organization overall, particularly in visibility, reliability, and cost efficiency, allowing us to centralize metrics, logs, and traces across our cloud, moving from reactive to proactive monitoring, with improvements including faster incident detection and resolution, enhanced service reliability, better cost and resource optimization, and shared dashboards providing the engineering and product teams a single source of truth for system health and performance, thus enhancing our overall observability and operational efficiency.

    I believe Datadog has delivered more than its value through reduced downtime, faster recovery, and infrastructure optimization; although we sometimes miss critical alerts, overall, it has improved our team's efficiency by maybe 30% less time spent troubleshooting logs and custom metrics while providing measurable ROI through enhanced system reliability, reduced incident costs, and infrastructure spending optimization.

    Which other solutions did I evaluate?

    We only evaluated SignalFx before choosing Datadog, as Datadog offered simpler scaling, better management, broader integrations, and dashboards, allowing for easier monitoring of our multi-cloud setup.

    What other advice do I have?

    After reducing log and custom metric volumes, we notice a significant reduction in costs without any performance issues on our end, actually seeing a lot of cost reductions.

    I strongly recommend using Datadog, but suggest being proactive about resource usage and tracking anomalies monthly.

    I find the interview process okay, although it runs longer than I expected, exceeding the anticipated 10 minutes.

    My rating for Datadog is 8 out of 10.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Google
    reviewer2767335

    Has helped monitor performance across services and enabled faster issue investigation with custom dashboards

    Reviewed on Oct 16, 2025
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Datadog  is monitoring performance of Grainger.com and all the components that are involved within it.

    A specific example of how I use Datadog  to monitor performance is finding out an issue with an internal bot that we use. We had some issues with some of the commands and we looked into the logs which showed the events from that Slack bot. This was quite useful.

    I use Datadog day-to-day to monitor the performance of key services, endpoints, and resources. Currently, we have a migration project for which I created a dashboard to help visualize the performance of key services and endpoints being migrated. At a high level, it helps to capture the performance and health of the services and endpoints.

    How has it helped my organization?

    Datadog has impacted my organization positively as this is our main observability tool when it comes to monitoring services, traces, and all resources within key services. This is our go-to tool and it has helped us to pinpoint issues. One aspect that needs improvement about Datadog is the Watchdog. If there are any escalated conditions or errors happening, it does not indicate which service is causing the issue or which line of code is responsible unless we recreate Watchdog monitors and add the dependency of the GitHub  repo to that service.

    When pinpointing issues, it helps us focus on where the problem is. Sometimes it's finding a needle in a haystack, especially when it comes to network issues. This has been our key concern lately. During network outages, we don't know exactly which device has the issue, but network observability is an area we're working towards improving. For regular issues within services, we can see the errors, but we must configure the GitHub  repo associated with that service to see the key issue. Overall, it helps us to pinpoint issues. While I'm not certain about the exact timing of resolution, it does help overall.

    What is most valuable?

    In my opinion, the best features Datadog offers are their APM  traces and ability to create dashboards with many customizable metrics, from CPU to thread count to host errors by host and errors by service. Having customized dashboards is really useful, and exploring traces is one of my favorite parts.

    We have a list of dashboards primarily showing the key services and APIs related to orders, generating orders, customer direct, and main customer services. Within that list, we have RUM dashboard as well, which shows us the customer impact and the performance of key services which can directly impact customers. During code red or major escalations, I refer to these dashboards for quick analysis of any issues for the services or endpoints.

    What needs improvement?

    To make Datadog better, it should be able to pick up error codes automatically. Currently, you have to programmatically configure every single step. In our previous tool, Dynatrace , it could pick up error codes without developers having to explicitly code that into the configuration. Sometimes the APMs are missing the exact error code and error message which is frustrating.

    Some minor improvements could include adjusting unit display on dashboards. When request counts go from 900,000 to 1.5 million or 2.2 million for endpoints, the graph keeps all units in thousands rather than converting to millions, which would be more useful and visually appealing.

    Datadog Watchdog hasn't been as effective as Dynatrace  Davis, which pinpoints key errors or latency within a specific service and drills down to the specific endpoint. This is an area where Datadog could improve.

    For how long have I used the solution?

    We fully migrated to Datadog last year.

    What do I think about the stability of the solution?

    In my experience, Datadog is stable, though there's typically at least one or two incidents per week. This amounts to approximately four incidents per month that cause disruption. These incidents are related to log service, indexes, and metric capturing issues, which occur in the Datadog platform more frequently compared to other tools we have.

    What do I think about the scalability of the solution?

    Datadog's scalability for my organization is pretty straightforward. When it comes to installation, we just have to install it on the respective service hosts and configure it. There's a new way of installing these agents, though I haven't worked on it in a while, but the process is straightforward for installing.

    How are customer service and support?

    The customer support rates eight out of ten. They require all information upfront and there's still back and forth communication happening. Overall, they provide good service.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    We switched from Dynatrace to Datadog after conducting a survey amongst team members from various service teams. We found that developers preferred using Datadog over Dynatrace. The user interface was more intuitive, modern, and more cloud-focused. Since everybody was moving to cloud, we determined that Datadog would be a suitable tool for us.

    How was the initial setup?

    When comparing the setup between Dynatrace and Datadog, Datadog required more time and effort. Dynatrace was more straightforward - you simply install the agent and it picks up all the traffic with minimal configuration needed for capturing specific things. Overall, the setup for Datadog was more challenging compared to Dynatrace setup.

    What other advice do I have?

    I would rate Datadog overall as eight out of ten.

    My advice for others looking into using Datadog is to be ready to spend a lot of time setting it up and make sure you have a good plan in terms of analyzing the finances because it can easily cost a lot of money to install agents on your service hosts.

    Which deployment model are you using for this solution?

    Hybrid Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Other
    View all reviews