New Relic APM evaluation experience

Over the past few days I finally got to evaluating the New Relic APM. It feels like it’s the go-to solution for monitoring application performance, and we’ve been having some issues with it in the past, so it felt natural to give it a try.

Starting point

Our motivating questions are the following: given a web application,

  • Are there some really slow parts (page or Ajax or such) of it? This would point us to a place that needs more tuning.
    • Does the performance vary a lot in some parts?
  • Did we break the performance with some specific release? This would point to a probable bug.
  • Is some part of the system slowing down gradually over time? This could mean that performance depends (too much) on the data size. We want O(1) or O(log n) most of the time.
  • And finally, if we attempt to fix the performance, did the performance actually improve?

Now we have been logging the response times since a very long time ago, so we know stuff like who – did what – how many DB queries it took – how much time it took. So far so good, problem is this is all very offline. Once we have our suspicion about a certain part of the system, I dig into the database and plot charts and compare and draw trendlines and whatnot. There always needs to be a probable cause and some work needs to be devoted each time. What we would hope to gain from a system like New Relic would be:

  • Show me the data with a few clicks.
  • Alert me when performance starts going down.

This brings us to the evaluation part.

Evaluation

Signup is very fast, integration as well. It should work out-of-the-box with most usual languages and, in the case of Java, all the servers I’ve ever heard of. For our Tomcat, this amounted to a simple download, unpack and run one installer (which simply adds a Java agent to the run script).

Coming from Java performance background (SAPE at USI Lugano) I certainly can appreciate the amount of work that had to go into such a simple installation. Even better: it starts working right away! I’d have no second thoughts trying this out on a production system.

In a minute or so the cloud-based monitoring part of New Relic starts receiving the data and the interface starts showing the charts and whatnot.

The good

  • The GUI is outstanding, showing the key information at a glance while being configurable (you can view different time ranges, compare to the past, build a dashboard…).
  • The performance data is correctly split into time spent in Java and in the database.
  • The system correctly guessed that the local deployment contains of one Tomcat and one Postgres and drew a nice map :)
  • Tracing the running system can be started from the web interface and even though it’s only sampling (profiling would be better), it still is impressive.
  • You can drill down to the specific underperforming DB queries.
  • The system identifies APIs used by your application and integrates their performance into the overall view.
  • Alerts!

The bad

  • To be fair, this is our bad: realPad uses a home-grown framework, and all the Ajax requests show up bundled up together. It can be configured, but it does not seem worth the effort.
  • A dedicated profiler like VisualVM mops the floor with the tracing in New Relic.
  • The overall price might be too much: $1800 / year, per host.

Verdict

realPad is going to stay with our own monitoring system. Every other project should definitely use at least the 14-day “free pro” period just before it’s done, to zap all the performance bugs before it’s too late.