Sign in
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Datadog Pro

Datadog | 1

Reviews from AWS customer

9 AWS reviews

External reviews

678 reviews
from and

External reviews are not included in the AWS star rating for the product.


4-star reviews ( Show all reviews )

    reviewer1599867

Great technology with a nice interface

  • January 20, 2025
  • Review provided by PeerSpot

What is most valuable?

The technology itself is generally very useful and the interface it great.

What needs improvement?

There should be a clearer view of the expenses.

For how long have I used the solution?

I have used the solution for four years.

What do I think about the stability of the solution?

The solution is stable.

How are customer service and support?

I have not personally interacted with customer service. I am satisfied with tech support.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I am using ThousandEyes and Datadog. Datadog supports AI-driven data analysis, with some AI elements to analyze, like data processing tools and so on. AI helps in Datadog primarily for resolving application issues.

How was the initial setup?

It was not difficult to set up for me. There was no problem.

What was our ROI?

I can confirm there is a return on investment.

What's my experience with pricing, setup cost, and licensing?

I find the setup cost to be too expensive. The setup cost for Datadog is more than $100. I am evaluating the usage of this solution, however, it is too expensive.

What other advice do I have?

I would rate this solution eight out of ten. 


    Timothy Spangler

Makes it easy to track down a malfunctioning service, diagnose the problem, and push a fix

  • January 07, 2025
  • Review provided by PeerSpot

What is our primary use case?

We use Datadog for monitoring and observing all of our systems, which range in complexity from lightweight, user-facing serverless lambda functions with millions of daily calls to huge, monolithic internal applications that are essential to our core operations. The value we derive from Datadog stems from its ability to handle and parse a massive volume of incoming data from many different sources and tie it together into a single, informative view of reliability and performance across our architecture.

How has it helped my organization?

Adopting Datadog has been fantastic for our observability strategy. Where previously we were grepping through gigabytes of plaintext logs, now we're able to quickly sort, filter, and search millions of log entries with ease. When an issue arises, Datadog makes it easy to track down the malfunctioning service, diagnose the problem, and push a fix.

Consequently, our team efficiency has skyrocketed. No longer does it take hours to find the root cause of an issue across multiple services. Shortened debugging time, in turn, leads to more time for impactful, user-facing work.

What is most valuable?

Our services have many moving parts, all of which need to talk to each other. The Service Map makes visualizing this complex architecture - and locating problems - an absolute breeze. When I reflect on the ways we used to track down issues, I can't imagine how we ever managed before Datadog.

Additionally, our architecture is written in several languages, and one area where Datadog particularly shines is in providing first-class support for a
multitude of programming languages. We haven't found a case yet where we
needed to roll out our own solution for communicating with our instance.

What needs improvement?

A tool as powerful as Datadog is, understandably, going to have a bit of a learning curve, especially for new team members who are unfamiliar with the bevy of features it offers. Bringing new team members up to speed on its abilities can be challenging and sometimes requires too much hand-holding. The documentation is adequate, but team members coming into a project could benefit from more guided, interactive tutorials, ideally leveraging real-world data. This would give them the confidence to navigate the tool and make the most of all it offers.

For how long have I used the solution?

The company was using it before I arrived; I'm unsure of how long before.


    JOSEPH ROBERT POMPA

Very useful Network Hosts

  • October 06, 2024
  • Review from a verified AWS customer

The user interface is intuitive, making it easy to manage domains, emails, and databases. The dashboard is well-organized, which is a plus for beginners who might feel overwhelmed by technical details.


    reviewer820579

Single pane of glass, easy to share dashboards, and good for monitoring

  • September 20, 2024
  • Review from a verified AWS customer

What is our primary use case?

We primarily use the solution for a variety of purposes, including:

  • Watching RUM data for frontend site, using LCP and INP metrics to compare across the old and new architecture to inform rollout decisions.
  • Watching APM data for backend services, observing how the backend server reacts (CPU util, memory, requests/second) to make sure the backend can handle the load.
  • Using Datadog CCM during our free trial period to get visibility over our AWS spend across accounts and resources and looking at recommendations and acting on those.
  • Browsing the service catalog to look at the current state of services that are running and what resources it uses. 

How has it helped my organization?

This provides a single place to find monitoring data. Prior to DD, we had some metrics living in New Relic, some in Grafana, and some in Circonus, and it was very confusing to navigate across them. Understanding different query languages is challenging. Here, there's a single UI to get used to, and everything is so sharable.

DD has led to teams making more decisions based on data that they observe about their service metrics and RUM metrics. I've seen decisions get made based on what has been observed in DD, and less based on anecdotal data.

What is most valuable?

I really enjoyed using CCM since it showed cloud cost data easily next to other metrics, and I could correlate the two.

Across CCM and the rest of Datadog, I like how sharable everything is. It's so easy to share dashboards and links with my teammates so we can quickly get up to speed on debugging/solving an issue.

I also have really enjoyed K8s view of pods and pod health. It's very visual, and as a non-K8s platform owner at my company, I can still observe the overall health of the system. Then I can drill in and have learned things about K8s by exploring that part of the product and talking with the team.

What needs improvement?

We've had some issues where we had Datadog automatically turned on in AWS regions that we weren't using, which incurred a small but steady cost that amounted to tens of thousands of dollars spent over a few weeks. I wish there was a global setting that lets an admin restrict which regions DD is turned on in as a default setup step.

Sometimes, the APM service dashboard link isn't sharable. I click something in the service catalog, and on that service's APM default view, I try to share a link to that with a teammate, and they reach a blank or error screen. 

I wish there was more organization and detail in the suggestions when I use the query editor. I'm never quite sure when the autofill dropdown shows up if I'm seeing some custom tag or some default property, so I have to know exactly what I'm looking for in order to build a chart. It's hard to navigate and explore using the query autofill suggestions without knowing exactly what tag to look for.

It's been a bit hard to understand how data gets sampled or how many data points a particular dashboard value is using. We've had questions over the RUM metrics that we see and we had to ask for help with how values are calculated, bin sizes, etc to get confidence in our data.

For how long have I used the solution?

I've used the solution for six months.

What do I think about the stability of the solution?

I've only been aware of a recent outage that affected the latency of data collection for one of our production tests. Outside of that, the solution seems stable.

What do I think about the scalability of the solution?

The solution seems like it can scale very well and beyond our needs.

How are customer service and support?

Technical support has been stellar. We love working with a team that responds fast, in great detail, and with great empathy. I trust what they say.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used New Relic, Grafana, and Circonus. Circonus was flakey, always having downtime and we were always on the phone with them. New Relic and grafana, different metrics lived in either and it was hard for consumers of the data to easily find what they need. And we had licensing issues across the 3 so not everybody could easily access all of them.

What's my experience with pricing, setup cost, and licensing?

I didn't do this portion of the product setup.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)


    reviewer3796153

Intuitive user interface with good log management and a helpful Log Explorer feature

  • September 20, 2024
  • Review provided by PeerSpot

What is our primary use case?

In our fast-paced environment, managing and analyzing log data and performance metrics is crucial. That’s where Datadog comes in. We rely on it not just for monitoring but for deeper insights into our systems, and here’s how we make the most of it. 

One of the first things we appreciate about Datadog is its ability to centralize logs from various sources—think applications, servers, and cloud services. This means we can access everything from one dashboard, which saves us a lot of time and hassle. Instead of digging through multiple platforms, we have all our log data in one place, making it much easier to track events and troubleshoot issues.

How has it helped my organization?

Before Datadog, we faced the common challenge of fragmented data. Our logs, metrics, and traces were spread across different tools and platforms, making it difficult to get a complete picture of our system’s health. 

With Datadog, we now have a centralized monitoring solution that aggregates everything in one place. This has streamlined our workflow immensely. Whether it’s logs from our servers, metrics from our applications, or traces from user transactions, we can access all this information easily. This unified view has made it simpler for our teams to identify and troubleshoot issues quickly.

What is most valuable?

In my experience with Datadog, one feature stands out above the rest is the Log Explorer. It has completely transformed the way I interact with our log data and has become an essential part of my daily workflow. 

The user interface is incredibly intuitive. When I first started using it, I was amazed at how easy it was to navigate. The design is clean and straightforward, allowing me to focus on the data rather than getting lost in complicated menus. Whether I’m searching for specific log entries or filtering by certain criteria, everything feels seamless. 

This ease of use allowed me to get up to speed with log management since it's my first time using Datadog.

What needs improvement?

Interactive tutorials could be a game changer. Instead of just reading about how to use query filters, users could engage with step-by-step guides that walk them through the process. For example, a tutorial could start with a simple query and gradually introduce more complex filtering techniques, allowing users to practice along the way. These tutorials could include pop-up tips and hints that provide additional context or best practices as users work through examples. This hands-on approach not only reinforces learning but also builds confidence in using the tool.

For how long have I used the solution?

My company has recently made Datadog available to it's software engineers and I personally have been using it for almost a year now.


    Tony Martinez1

Great logging, session replays, and alerting

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

Our primary use cases include:

  • Alert on errors customers encounter in our product. We've set up logs that go to slack to tell us when a certain error threshold is hit.
  • Investigate slow page load times. We have pages in our app that are loading slowly and the logs help us figure out which queries are taking the longest time.
  • Metrics. We collect metrics on product usage.
  • Session replays. We watch session replays to see what a user was doing when a page took a long time to load or hit an error. This is helpful.

How has it helped my organization?

It's helped us find bugs that customers are experiencing before they're reported to us. Sometimes, customers don't report errors, so being able to catch errors before they're reported helps us investigate before other users find errors

Datadog has helped us investigate slow page loading times and even see the specific queries that are taking a long time to load

Logging lets us see the context around an error. For example, see if a backend service had an error before it surfaced on the frontend.

Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening.

What is most valuable?

The most valuable aspects include: 

  • Logging. Being able to view detailed logs helps debug issues.
  • Session replays. They are helpful for seeing what a customer was doing before they saw an error or had a slow page load
  • Alerting. This is an important part of our on-call process to send alerts to slack when an error threshold is crossed. Alerts/monitors are easy to configure to only alert when we want them to alert.
  • Dashboards. It's helpful to pull up dashboards that show our most common errors or page performance. It's a good way to see how the app is performing from a birds-eye-view.

What needs improvement?

The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog.

The log querying syntax can be confusing. Usually, I filter by finding a facet in a log and selecting to filter by that facet - but I'm not sure how to write the filter myself

The monitor/alert syntax is also somewhat hard to understand.

Overall, it should be easier to learn how to use the product while you're using the product. Perhaps tooltips or a link to learn more about whatever section you're using.

For how long have I used the solution?

I've used the solution for two years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

Which other solutions did I evaluate?

We did not evaluate other options. 


    Caleb Parks

Lots of features with a rapid log search and an easy setup process

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.

How has it helped my organization?

The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.

What is most valuable?

I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.

What needs improvement?

It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.

It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

I'd rate the stability ten out of ten.

What do I think about the scalability of the solution?

I'd rate the scalability ten out of ten.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The setup is very straightforward. Users just install the helm chart, and boom, you're done.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.

Which other solutions did I evaluate?

We've never evaluated other solutions.

What other advice do I have?

It's a great product. However, you have to pay for quality.

Which deployment model are you using for this solution?

Public Cloud


    Jason Karuza

Great dashboards, lots of integrations, and heps trace data between components

  • September 19, 2024
  • Review provided by PeerSpot

What is our primary use case?

We use the product for instrumentation, observability, monitoring, and alerting of our system. 

We have multiple environments and a variety of pieces of infrastructure including servers, databases, load balancers, cache, etc. and we need to be able to monitor all of these pieces, while also retaining visibility into how the various pieces interact with each other. 

Tracing data between components and user interactions that trigger these data flows is particularly important for understanding where problems arise and how to resolve them quickly.

How has it helped my organization?

It provides a lot of options for integrations and tooling to observe what is happening within the system, making diagnosis and triage easier/faster. 

Each user can set up their own dashboards and share them with other users on the team. We can instrument monitors based on various patterns that we care about, then notify us when an event triggers an alert with platforms such as Slack or PagerDuty. 

Our ability to rapidly become aware of problems focused on the symptoms being observed and entry points into the tool to rapidly identify where to investigate further is important for our team and our users.

What is most valuable?

The most valuable aspects of the solution include log search to help triage specific problems that we get notified about (whether by alerts we have configured or users that have contacted us), APM traces (to view how user interactions trace through the various layers of our infrastructure and services to be able to reproduce and identify the source of problems), general performance/system dashboards (to regularly monitor for stability or deviation), and alerting (to be automatically informed when a problem occurs). We also use the incident tools for tracking production incidents.

What needs improvement?

In some ways, the tool has a pretty steep learning curve. Discovering the various capabilities available, then learning how to utilize them for particular use cases can be challenging. Thankfully, there is a good amount of documentation with some good examples (more are always welcome), and support is very helpful. 

While DataDog has started adding more correlation mapping between services and parts of our system, it is still tricky to understand what is the ultimate root cause when multiple views/components spike. Additionally, there are lots of views and insights that are available but hard to find or discover. Some of the best ways to discover is to just click around a lot and get familiar with views that are useful, but that takes time and isn't ideal when in the middle of fighting a fire.

For how long have I used the solution?

I've used the solution for about four years.

What do I think about the stability of the solution?

It seems stable.

What do I think about the scalability of the solution?

It seems to scale well. Performance for aggregating or searching is usually very fast.

How are customer service and support?

Technical support is helpful and pretty responsive.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We did not use a different solution. 

What was our ROI?

It's hard to say what ROI would be as I have not managed our system without it to compare to.

What's my experience with pricing, setup cost, and licensing?

I don't manage licensing.

Which other solutions did I evaluate?

We did not evaluate other options. 

What other advice do I have?

It's a great tool with new features and improvements continuously being added. It is not simple to use or set up, however, if you have the right personnel, you can get a lot of value from what DataDog has to offer.

Which deployment model are you using for this solution?

Public Cloud


    Scott Palmer

Good query filtering and dashboards to make finding data easier

  • September 19, 2024
  • Review from a verified AWS customer

What is our primary use case?

We use the solution for monitoring microservices in a complex AWS-based cloud service.  

The system is comprised of about a dozen services. This involves processing real-time data from tens of thousands of internet connected devices that are providing telemetry. Thousands of user interactions are processed along with real-time reporting of device date over transaction intervals that can last for hours or even days. The need to view and filter data over periods of several months is not uncommon.  

Datadog is used for daily monitoring and R&D research as well as during incident response.

How has it helped my organization?

The query filtering and improved search abilities offered by Datadog are by far superior to other solutions we were using, such as AWS CloudWatch. We find that we can simply get at the data we need quicker and easier than before. This has made responding to incidents or investigating issues a much more productive endeavour. We simply have less roadblocks in the way when we need to "get at the data". It is also used occasionally to extract data while researching requirements for new features.

What is most valuable?

Datadog dashboards are used to provide a holistic view of the system across many services. Customizable views as well as the ability to "dive in" when we see someting anomalous has improved the workflow for handling incidents.    

Log filtering, pattern detection and grouping, and extracting values from logs for plotting on graphs all help to improve our ability to visualize what is going on in the system. The custom facets allow us to tailor the solution to fit our specific needs.

What needs improvement?

There are some areas on log filtering screens where the user interface can take some getting used to. Perhaps having the option for a simple vs advanced user interface would be helpful in making new or less experienced users comfortable with making their own custom queries.

Maybe it is just how our system is configured, yet finding the valid values for a key/value pair is not always intuitively obvious to me. While there is a pop-up window with historical or previously used values and saved views from previous query runs, I don't see a simple list or enumeration of the set of valid values for keys that have such a restriction.

For how long have I used the solution?

I've used the solution for one year.

What do I think about the stability of the solution?

The solution is very stable.

What do I think about the scalability of the solution?

The product is reasonably scalable, although costs can get out of hand if you aren't careful.

How are customer service and support?

I have not had the need to contact support.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did use AWS CloudWatch. It was to awkward to use effectively and simply didn't have the features.

How was the initial setup?

We had someone experienced do the initial setup.  However, with a little training, it wasn't too bad for the rest of us.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

Take care of how you extract custom values from logs. You can do things without thought to make your life easier and not realize how expensive it can be from where you started.

Which other solutions did I evaluate?

I'm not aware of evaluating other solutions.

What other advice do I have?

Overall I recommend the solution. Just be mindful of costs.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)


    Charlie W.

Helpful support, with centralized pipeline tracking and error logging

  • September 19, 2024
  • Review from a verified AWS customer

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. 

What is most valuable?

The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing is great, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

What needs improvement?

While the documentation is very good, there are areas that need a lot of focus to pick up on the key details. In some cases the screenshots don't match the text when updates are made. 

I spent longer than I should trying to figure out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime.

What do I think about the scalability of the solution?

It's scalable and customizable. 

How are customer service and support?

Support is helpful. They help us tune our committed costs and alert us when we start spending out of the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility.

How was the initial setup?

Setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

There has been significant time saved by the development team in terms of assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

I'd advise others to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)