We have a tech stack including all backend services written in TS/Node (mostly) and as a full stack engineer, it is crucial to keep track of new and existing errors. Our logs have been consolidated in Datadog and are accessible for search and review, so the service has become a daily tool for my work.
More recently, session replay has been adopted at my company, but I do not like it so much because the UI elements are not in their place, so it is very hard to see what the users on the web app are actually clicking on.
The best way it has helped us is by consolidating all our logs into a single place and making it easier to find errors. Previously using AWS Cloudwatch was cumbersome and time-consuming. One issue I do have with logs is the length of time they are on the platform. Some issues happen sporadically, so it would be good to have logs for longer than one month by default or make it a configuration.
Another issue that I have is with the search syntax, it could be simpler and it feels like there are too many ways to do the same things.
Logs search is the most valuable feature because it has consolidated all of our backend services logs into one place. Now we can see the relationship between them as requests are going from one service to other dependencies.
One issue I do have with logs is the length of time they are on the platform. Some issues happen sporadically, so it would be good to have logs for longer than one month by default or make it a configuration. I have yet to try rehydrating logs, so this might be an option I need to try. Another issue I have is with the search syntax, it could be simpler. The syntax is a bit cumbersome and there is not an intuitive to save them to look for similar searches in the future.
Finally, while my company replaced a different tool for session replay with DataDog's version, I find it clunky and in need of further improvements. For example, when troubleshooting a web portal issue, it is super important to know what the user clicked, but the elements are not where they should be in the replay.
It is also hard to find details about the sessions, and metadata such as user email, account, etc. that exist on other services with replay features.
I have been using Datadof for approximately five years.
So far we haven't had any issues with uptime and Datadog has been available when needed.
It seems to scale well as we continue to add services that need monitoring.
I haven't had to contact support.
Cloudwatch was not a great tool for what we need to do to troubleshoot issues.
We deployed it in-house with intermediate expertise.
I am not sure how much we are paying, but I use the app often enough to feel like we are getting a good ROI.
I was not involved in the choosing process as a software engineer