Internally our primary usage of Datadog pertains around APM/tracing, logging, RUM (real user monitoring), synthetic testing of service/application health and state, overall general monitoring + observability, and custom dashboards for aggregate observability. We also are more frequently leveraging the more recent service catalog feature.
We have several microservices, several databases, and a few web applications (both external and internal facing), and all of these within our systems are contained within several environments ranging from dev, sit, eat, and production.

External reviews
External reviews are not included in the AWS star rating for the product.
Prompt support with good logging and helps with standardization
What is our primary use case?
How has it helped my organization?
Datadog has had a massive impact on our department. Before, we had loose logging dumped into a sea of GCP logs with haphazard custom solutions for traceability between logs and network calls. Datadog has helped standardize and normalize our processes around observability while providing fantastic tools for aggregating insight around what is monitored regularly, all wrapped in an easy-to-use UI.
Additionally, a range of types of users exist within our department, each with its own positive impact on Datadog. DevOps leverages it to easily manage infra, developers leverage it to easily monitor/debug services and applications, and business leverages it for statistics.
What is most valuable?
Personally I've found the RUM (real user monitoring) to be above and beyond what I've worked with before. Client-side monitoring has always been on the short end of the stick but the information collected and ease of instrumentation provided by Datadog is second to none.
Having a live dynamic service map is also one of my favourite features; it provides real-time insights into which services/applications are connected to which.
We are also investigating the new API catalog feature set, which I believe will provide a high-value impact for real-time documentation and information about all of our shared microservices that other dev teams can use.
What needs improvement?
In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response.
For how long have I used the solution?
I've used the solution for approximately two years across our department and around a year or so of it being used practically and fully integrated into our systems.
What do I think about the stability of the solution?
Aside from one very brief bad update from the Datadog team around RUM where they broke the native 'fetch' for node in an update to RUM (which was resolved quickly) as it used to -- and may still -- modified the global 'fetch'; Datadog as a whole solution has been highly stable.
What do I think about the scalability of the solution?
It's easy to implement and scale provided a there's a solid IaC solution in place to integrate across your system.
How are customer service and support?
The Datadog support team is prompt and helpful when tickets have been submitted from our end. When their support team have been unsure, they've properly reached out internally to the relevant SME to help answer any questions we've had prior.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I've personally dabbled with some other open-source observability and monitoring solutions; however, prior to Datadog, our department did not have any solutions other than log dumps to GCP.
How was the initial setup?
The initial setup was straightforward from my own experience, helping integrate within the application and service levels; however, our DevOps team handled most of the infra process with minimal complaints.
What about the implementation team?
We handled the solution in-house.
What's my experience with pricing, setup cost, and licensing?
I personally am not involved in the decision around costing; however, I am aware that when we first set up Datadog, we explicitly configured our services/applications to have a master switch to enable Datadog integration so that we can dynamically enable/disable targeted environments as need due to the costs being associated on a per service basis for APM/logging/etc.
Which other solutions did I evaluate?
I was not involved in the decision-making regarding the evaluation of other options.
What other advice do I have?
I highly recommend Datadog, and I would explore it for my own individual projects in the future, provided the cost is within reason. Otherwise, I would highly recommend it for any medium-to-large-sized org.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Good query filtering and dashboards to make finding data easier
What is our primary use case?
We use the solution for monitoring microservices in a complex AWS-based cloud service.
The system is comprised of about a dozen services. This involves processing real-time data from tens of thousands of internet connected devices that are providing telemetry. Thousands of user interactions are processed along with real-time reporting of device date over transaction intervals that can last for hours or even days. The need to view and filter data over periods of several months is not uncommon.
Datadog is used for daily monitoring and R&D research as well as during incident response.
How has it helped my organization?
The query filtering and improved search abilities offered by Datadog are by far superior to other solutions we were using, such as AWS CloudWatch. We find that we can simply get at the data we need quicker and easier than before. This has made responding to incidents or investigating issues a much more productive endeavour. We simply have less roadblocks in the way when we need to "get at the data". It is also used occasionally to extract data while researching requirements for new features.
What is most valuable?
Datadog dashboards are used to provide a holistic view of the system across many services. Customizable views as well as the ability to "dive in" when we see someting anomalous has improved the workflow for handling incidents.
Log filtering, pattern detection and grouping, and extracting values from logs for plotting on graphs all help to improve our ability to visualize what is going on in the system. The custom facets allow us to tailor the solution to fit our specific needs.
What needs improvement?
There are some areas on log filtering screens where the user interface can take some getting used to. Perhaps having the option for a simple vs advanced user interface would be helpful in making new or less experienced users comfortable with making their own custom queries.
Maybe it is just how our system is configured, yet finding the valid values for a key/value pair is not always intuitively obvious to me. While there is a pop-up window with historical or previously used values and saved views from previous query runs, I don't see a simple list or enumeration of the set of valid values for keys that have such a restriction.
For how long have I used the solution?
I've used the solution for one year.
What do I think about the stability of the solution?
The solution is very stable.
What do I think about the scalability of the solution?
The product is reasonably scalable, although costs can get out of hand if you aren't careful.
How are customer service and support?
I have not had the need to contact support.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We did use AWS CloudWatch. It was to awkward to use effectively and simply didn't have the features.
How was the initial setup?
We had someone experienced do the initial setup. However, with a little training, it wasn't too bad for the rest of us.
What about the implementation team?
We handled the setup in-house.
What's my experience with pricing, setup cost, and licensing?
Take care of how you extract custom values from logs. You can do things without thought to make your life easier and not realize how expensive it can be from where you started.
Which other solutions did I evaluate?
I'm not aware of evaluating other solutions.
What other advice do I have?
Overall I recommend the solution. Just be mindful of costs.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Good centralization with helpful monitoring and streamlined investigation capabilities
What is our primary use case?
We utilize Datadog to monitor both some legacy products and a new PaaS solution that we are building out here at Icario which is Micro-Service arch.
All of our infrastructure is in AWS with very few legacies being rackspace. For the PaaS we mainly just utilize the K8s Orchestrator which implements the APM libraries into services deployed there as well as giving us infra info regarding the cluster.
For legacies, we mainly just utilize the Agent or the AWS integration. With APM in specific places. We monitor mainly prod in Legacy and the full scope in the PaaS for now.
How has it helped my organization?
Datadog has greatly improved the time needed to investigate issues. Putting everything into a single pane of glass. Allowing us to get ahead of infra/app-based issues before they affect customer experience with our products.
Outside of that, the ease of management, deployment of agents, integrations etc. has greatly helped the teams. There isn't much leg work needed by the devs to manage or deploy Datadog into their stacks. This is with the use of Terraform, pipelines and the orchestrator. All in all, it has been an improvement.
What is most valuable?
The two most valuable aspects are the Terraform provider for Datadog and the K8s Orchestrator. People don't take that into account when buying into a tooling product like Datadog in this age where scalability, management, and ease of implementation is key. Other tools not having good IaC products or options is a ball drop. Orchestration for the tools agent is good. Not having to use another tool to manage the agents and config files in mutiple places/instances is a huge win!
What needs improvement?
A big problem with Datadog is the billing. They need to make the billing more user-friendly. I know it like the back of my hand at this point, yet trying to explain it to the C-suite as to why costs went up or are what they are is many times more complicated than it needs to be. I can't even say "why" due to of the lack of metadata tied to billing. For instance, with the AWS Integration Host ingestion, I cant say well this month THESE host got added and thats what caused cost to go up. The billing visibility really needs to be resolved!
For how long have I used the solution?
I'd rate the solution for more than four years.
What do I think about the stability of the solution?
Datadog has always been extremely stable, with outages really only ever creating delays, never actual downtime of the service, which is amazing and impressive.
What do I think about the scalability of the solution?
The solution is very scalable if implemented right and not on top of complicated architecture.
How are customer service and support?
Support is excellent. They are always looking for a resolution, and a ticket is never left unresolved unless the feature just can't exist or isn't currently possible.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We did have New Relic, Datadog, Sumo Logic, Pingdom, and some other custom or third-party tooling. We switched because we wanted everything to be in a single pane and because Datadog is a better solution than the competitors.
How was the initial setup?
For us, set-up is a mixed bag as we support legacy apps and architectures as well as a new microservice architecture. That being said, legacy is somewhat complex just due to the nature of how those apps stack and the underlying infra and configuration and setup. Microservice is a breeze and straight-forward for most of the out-of-the-box stuff.
What about the implementation team?
Our Team of SRE Engineers, Platform Engineers and Cloud Engineers implemented the solution.
What was our ROI?
I can't really speak to ROI; however, from my perspective, we definitely get our money's worth from the product.
What's my experience with pricing, setup cost, and licensing?
Users just just really need to make sure they stay on top of costs and don't let all of the engineers do as they please. Billing with Datadog can get out of hand if you let them. Not everything needs to be monitored.
Which other solutions did I evaluate?
We didn't really need to evaluate other options.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Improves monitoring and observability with actionable alerts
What is our primary use case?
We are using Datadog to improve our monitoring and observability so we can hopefully improve our customer experience and reliability.
I have been using Datadog to build better actionable alerts to help teams across the enterprise. Also by using Datadog we are hoping to have improved observability into our apps and we are also taking advantage of this process to improve our tagging strategy so teams can hopefully troubleshoot incidents faster and a much reduced mean time to resolve.
We have a lot of different resources we use like Kubernetes, App Gateway and Cosmos DB just to name a few.
How has it helped my organization?
As soon as we started implementing Datadog into our cloud environment people really like how it looked and how easy it was to navigate. We could see the most data in our Kubernetes environments than we ever could.
Some people liked how the logs were color coded so it was easy to see what kind of log you were looking at. The ease of making dashboards has also been greatly received as a benefit.
People have commented that there is so much information that it takes a time to digest and get used to what you are looking at and finding what you are looking for.
What is most valuable?
The selection of monitors is a big feature I have been working with. Previously with Azure Monitor we couldn't do a whole lot with their alerts. The log alerts can sometimes take a while to ingest. Also, we couldn't do any math with the metrics we received from logs to make better alerts from logs.
The metric alerts are ok but are still very limited. With Datadog, we can make a wide range of different monitors that we can tweak in real time because there is a graph of data as you are creating the alert which is very beneficial. The ease of making dashboards has saved a lot of people a lot of time. No KQL queries to put together the information you are looking for and the ability to pin any info you see into a dashboard is very convenient.
RUM is another feature we are looking forward to using this upcoming tax season, as we will have a front-row view into what frustrates customers or where things go wrong in their process of using our site.
What needs improvement?
The PagerDuty integration could be a little bit better. If there was a way to format the monitors to different incident management software that would be awesome. As of right now, it takes a lot of manipulating of PagerDuty to get the monitors from Datadog to populate all the fields we want in PagerDuty.
I love the fact you can query data without using something like KQL. However, it would also be helpful if there was a way to convert a complex KQL query into Datadog to be able to retrieve the same data - especially for very specific scenarios that some app teams may want to look for.
For how long have I used the solution?
I've used the solution for about two years.
Which solution did I use previously and why did I switch?
We previously used Azure Monitor, App Insights, and Log Analytics. We switched because it was a lot for developers and SREs to switch between three screens to try troubleshoot and when you add in the slow load times from Azure it can take a while to get things done.
What's my experience with pricing, setup cost, and licensing?
I would advise taking a close look at logging costs, man-hours needed, and the amount of time it takes for people to get comfortable navigating Datadog because there is so much information that it can be overwhelming to narrow down what you need.
Which other solutions did I evaluate?
We did evaluate DynaTrace and looked into New Relic before settling on Datadog.
Which deployment model are you using for this solution?
Good for log ingestion and analyzing logs with easy searchability of data
What is our primary use case?
We use Datadog as our main log ingestion source, and Datadog is one of the first places we go to for analyzing logs.
This is especially true for cases of debugging, monitoring, and alerting on errors and incidents, as we use traffic logs from K8s, Amazon Web Services, and many other services at our company to Datadog. In addition, many products and teams at our company have dashboards for monitoring statistics (sometimes based on these logs directly, other times we set queries for these metrics) to alert us if there are any errors or health issues.
How has it helped my organization?
Overall, at my company, Datadog has made it easy to search for and look up logs at an impressively quick search rate over a large amount of logs.
It seamlessly allows you to set up monitoring and alerting directly from log queries which is convenient and helps for a good user experience, and while there is a bit of a learning curve, given enough time a majority of my company now uses Datadog as the first place to check when there are errors or bugs.
However, the cost aspect of Datadog is tricky to gauge because it's related to usage, and thus, it is hard to tell the relative value of Datadog year to year.
What is most valuable?
The feature I've found most valuable is the log search feature. It's set up with our ingestion to be a quick one-stop shop, is reliable and quick, and seamlessly integrates into building custom monitors and alerts based on log volume and timeframes.
As a result, it's easy to leverage this to triage bugs and errors, since we can pinpoint the logs around the time that they occur and get metadata/context around the issue. This is the main feature that I use the most in my workflow with Datadog to help debug and triage issues.
What needs improvement?
More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard. I recently struggled a lot to parse text from raw line logs that didn't seem to match directly with facets. There should be smart searching capabilities. However, it's not intuitive to learn how to leverage them, and instead had to resort to a Python script to do some simple regex parsing (I was trying to parse "file:folder/*/*" from the logs and yet didn't seem to be able to do this in Datadog, maybe I'm just not familiar enough with the logs but didn't seem to easily find resources on how to do this either).
For how long have I used the solution?
I've used the solution for 10 months.
What's my experience with pricing, setup cost, and licensing?
Beware that the cost will fluctuate (and it often only gets more expensive very quickly).
Good visibility into application performance, understanding of end-user behavior, and a single pane of glass view
What is our primary use case?
The primary use case for this solution is to enhance our monitoring visibility, determine the root cause of incidents, understand end-user behaviour from their point of view (RUM), and understand application performance.
Our technical environment consists of a local dev env where Datadog is not enabled, we have deployed environments that range from UAT testing with our product org to ephemeral stacks that our developers use to test there code not on there computer. We also have a mobile app where testing is also performed.
How has it helped my organization?
Datadog has greatly improved our organization in many ways. Some of those ways include greater visibility into application performance, understanding of end-user behavior, and a single pane of glass view into our entire infrastructure.
Regarding visibility, our organization previously used New Relic, and when incidents or regressions happened, New Relic's query language was very hard to use. End-user behavior in RUM has improved our ability to know what to focus on. Lastly, the single pane of glass view with maneuvering between products has helped us truly understand root causes after incidents.
What is most valuable?
APM has been a top feature for us. I can speak for all developers here: they use it more often than other products. Due to a standard in tracing (even though it is customizable), engineers find it easier to walk a trace than to understand what went wrong when looking at logging.
Another feature that I find valuable, though it isn't the first one that comes to mind, is Watchdog. I have found that has been a good source of understanding anomalies and where maybe we (as an organization) need more monitoring coverage.
What needs improvement?
I am not 100% sure how this is done or if it can be though I've had a lot of education I've had to do to ramp developers up on the platform. This feels like the nature of just the sheer growth and number of products Datadog now offers.
When I first started using the Datadog platform, I thought that was a big pro of the company that the ramp-up time was much quicker, not having to learn a query language. I still believe that to be true when comparing the product to someone like New Relic though with the wide range of products Datadog now offers it can be a bit intimidating to developers to know where to go to find what they want.
For how long have I used the solution?
I have been using the solution at my current company for almost four years, and have used it at my previous company as well.
Which solution did I use previously and why did I switch?
A while ago, we used New Relic, and we switched due to Datadog being a better product.
What about the implementation team?
We did the implementation in-house.
What's my experience with pricing, setup cost, and licensing?
The value compared to pricing is reasonable, though it can be a bit of a sticker shock to some.
Which other solutions did I evaluate?
We did not evaluate other options.
Which deployment model are you using for this solution?
Easy to use with good speed and helpful dashboards
What is our primary use case?
We are using Datadog to improve our cloud monitoring and observability across our enterprise apps. We have integrated a lot of different resources into Datadog, like Kubernetes, App Gateways, App Service Environments, App Service Plans, and other Web App resources.
I will be using the monitoring and observability features of Datadog. Dashboards are used very heavily by teams and SREs. We really have seen that Datadog has already improved both our monitoring and our observability.
How has it helped my organization?
The ease and speed of which you can create a dashboard has been a huge improvement.
The different types of monitors we can create have been huge, too. We can do so many different things with monitors that we couldn't do before with our alerts.
Being able to click on a trace or log and drill down on it to see what happened has been great.
Some have found the learning curve a bit steep. That said,they are coming around slowly. There is just a lot of information to learn how to navigate.
What is most valuable?
The different types of monitors have been very valuable. We have been able to make our alerts (monitors) more actionable than we were able to previously.
Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue.
RUM is another feature a lot of us are looking forward to seeing how it can help us improve our customer experience during tax season.
We hope to enable the code review feature at some point to so we can see what code caused the issue.
What needs improvement?
I would like to see the integration between PagerDuty and Datadog improved. The tags in Datadog don't match those in PagerDuty, and we have to make it work. Also, I would like to see if the ability to replicate a KQL query in Datadog is made easier or better.
I would like to see the alert communications to email or phones made better so we could hopefully move off PagerDuty and just use Datadog for that.
There are also a lot of features that we haven't budgeted for yet and I would like for us to be able to use them in the future.
For how long have I used the solution?
I've used the solution for about two years.
Which deployment model are you using for this solution?
Helpful support, with centralized pipeline tracking and error logging
What is our primary use case?
Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.
How has it helped my organization?
Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards.
What is most valuable?
The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.
Synthetic testing is great, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.
What needs improvement?
While the documentation is very good, there are areas that need a lot of focus to pick up on the key details. In some cases the screenshots don't match the text when updates are made.
I spent longer than I should trying to figure out how to correlate logs to traces, mostly related to environmental variables.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
We have been impressed with the uptime.
What do I think about the scalability of the solution?
It's scalable and customizable.
How are customer service and support?
Support is helpful. They help us tune our committed costs and alert us when we start spending out of the on-demand budget.
Which solution did I use previously and why did I switch?
We used a mix of SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility.
How was the initial setup?
Setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
There has been significant time saved by the development team in terms of assessing bugs and performance issues.
What's my experience with pricing, setup cost, and licensing?
I'd advise others to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.
Which other solutions did I evaluate?
NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.
What other advice do I have?
We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Centralized pipeline with synthetic testing and a customized dashboard
What is our primary use case?
Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.
We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge.
Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.
How has it helped my organization?
Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards.
Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.
What is most valuable?
Centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.
Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most.
The ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.
These features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.
What needs improvement?
I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view.
I like the idea of monitoring on the go, yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed.
In some cases the screenshots don't match the text as updates are made. I spent longer than I should have figuring out how to correlate logs to traces, mostly related to environmental variables.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
We have been impressed with the uptime and clean and light resource usage of the agents.
What do I think about the scalability of the solution?
The solution has been very scalable and customizable.
How are customer service and support?
Sales service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.
Which solution did I use previously and why did I switch?
We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.
How was the initial setup?
Generally simple, but .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.
What's my experience with pricing, setup cost, and licensing?
Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.
Which other solutions did I evaluate?
NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.
What other advice do I have?
Excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Consistent, centralized service for varied cloud-based applications
What is our primary use case?
The current use case for Datadog in our environment is observability. We use Datadog as the primary log ingestion and analysis point, along with consolidation of application/infrastructure metrics across cloud environments and realtime alerting to issues that arise in production.
Datadog integrates within all aspects of our infrastructure and applications to provide valuable insights into Containers, Serverless functions, Deep Logging Analysis, Virtualized Hardware and Cost Optimizations.
How has it helped my organization?
Datadog improved our observability layer by creating a consistent, centralized service for all of our varied cloud-based applications. All of our production and non-production environment applications and infrastructure send metrics directly to Datadog for analysis and determination of any issues that would need to be looked at by the Infrastructure, Platform and Development teams for quick remediation. Using Datadog as this centralized Observability platform has enabled us to become leaner without sacrificing project timelines when issues arise and require triage for efficient resolution.
What is most valuable?
All of Datadog's features have become valuable tools in our cloud environments.
Our primary alerts, based on metrics and synthetic transactions, are the most used and relied upon for decreased MTTA/MTTR across all of our platforms. This is followed by deep log analysis that enables us to quickly and easily get to a preliminary root cause that someone on the infrastructure, platform or development teams can take and focus their attention on the precise target that Datadog revealed as the issue to be remediated.
What needs improvement?
The two areas I could see needing improvement or a feature to add value are building a more robust SIM that would include container scanning to rival other such products on the market so we do not need to extend functionality to another third-party provider. The other expands the alerting functions by creating a new feature to add direct SMS notifications, on-call rotation scheduling, etc., that could replace the need to have this as an external third party solution integration.
For how long have I used the solution?
I've been a Datadog user for almost ten years.
What do I think about the stability of the solution?
Datadog is very stable, and we've only come across a few items that needed to be addressed quickly when there were issues.
What do I think about the scalability of the solution?
Scalability is very favorable, aside from cost/budget, which limits the scalability of this platform.
How are customer service and support?
Both customer service and support need a little work, as we have had a number of requests/issues that were not addressed as we needed them to be.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Being an Observability SME, I have used many native and third party solutions, including Dynatrace, New Relic, CloudWatch and Zabbix. As previously mentioned, Datadog provides a superior platform for centralizing and consolidating our Observability layer. Switching to Datadog was a no-brainer when most other solutions either didn't provide the maturity of functions, or have them available, at all.
How was the initial setup?
The initial setup was very straightforward, and the integrations were easily configured.
What about the implementation team?
We implemented Datadog in-house.
What was our ROI?
For the most part, Datadog's ROI is quite impressive when you consider all of the features and functions that are centralized on the platform. It doesn't require us to purchase additional third-party solutions to fill in the gaps.
What's my experience with pricing, setup cost, and licensing?
The setup was dead simple once the cloud integrations and agent components were identified and executed. Licensing falls into our normal third-party processes, so it was a familiar feeling when we started with Datadog. Cost is the only outlier when it comes to a perfect solution. Datadog is expensive, and each add-on drives that cost further into the realm of requiring justifications to finance expanding the core suite of features we would like to enable.
Which other solutions did I evaluate?
What other advice do I have?
They should provide more inclusive pricing, or an "all you can eat" tier that would include all relevant features, as opposed to individual cost increases to let Datadog to become more valuable and replace even more third-party solutions that have a lower cost of entry.