HeapAudit – JVM Memory Profiler for the Real World

HeapAudit is not a monitoring tool, but rather an engineering tool that collects actionable data – information sufficient for directly making code change improvements. It is created for the real world, applicable to live running production servers.

HeapAudit is a foursquare open source project designed for understanding JVM heap allocations. It is implemented as a Java agent built on top of ASM.

Understanding JVM Memory Allocations

Performance and scalability issues are generally attributed to bottlenecks in code execution (CPU), memory allocations (RAM) and I/O (disk, network, etc.). Ignoring system level performance problems (i.e. memory fragmentation, NUMA, etc.), memory issues are typically caused by the rate and size of heap allocations. It is a concept easily understood, but hard to track, especially in complex systems. Unlike CPU and I/O, memory allocations often cannot be understood by taking a short-windowed snapshot. Objects currently on the heap or about to get released may be allocated from a period of time ago, requiring more sophisticated analysis to back trace and pinpoint root causes.

Garbage Collection & Memory Profilers

Excessive Garbage Collection (GC) frequently becomes one of the main sources of memory related performance bottlenecks in Java Virtual Machine (JVM). The JVM runtime exposes hooks and counters to examine the heap objects as well as GC rate and sizes. Those pieces of information generally provide an overview of the symptom, but not the root of the performance issue. In other words, it informs of the effect, not the cause. In small code bases, this may be sufficient to identify the problem. In large and complex systems, like those we run here at foursquare, however, this type of information is rather non-actionable.

For instance, if the GC information tells you the JVM is garbage collecting hundreds of thousands of String objects every second, or that the JVM heap summary shows several millions of active String objects, it is not apparent where those String objects were allocated from.

There exist several sophisticated JVM profilers that address this by showing where each of the objects are allocated from, thus pointing out the cause as oppose to only the effect. This is typically achieved by code instrumentation or sampling JVM processes and associating heap allocations to callstacks. This empowers engineers with abundant data to examine JVM-wide allocations and analyze hotspots. Unfortunately, the downside for acquiring such complete information is the overhead. A sub-millisecond operation may take several minutes while also generating gigantic log files. The log becomes even harder to comprehend if concurrent logic where to execute during the collection. In other words, it’s incredibly resource consuming to process and doesn’t always help identify the source of the problem.

As a result, these JVM profilers become restricted for pre-production internal engineering purposes where the environment is highly controlled. Hence, these tools are unsuitable for scenarios like production servers where the issue is triggered by some non-anticipated user driven set of activities.

JVM hooks and counters JVM memory profiling tools
pros natively supported by runtime, no extra overhead provides complete understanding of memory allocations mapped to callstacks
cons relatively insufficient and non-actionable information execution is extremely slow and only suitable for internal engineering purposes

At foursquare, we have tried a handful of memory profilers in conjunction with JVM GC information to analyze our memory usage and have mostly come to the following conclusions:

  • GC information tells us we allocate too much memory too frequently;
  • Memory profiler data is so gigantic we do not have the man power to analyze through all the logs to understand most of the allocation patterns;
  • Our primary programming language, Scala, generates a massive amount of anonymous functions, making it extremely hard to associate allocations to source code;
  • It is nearly impossible to build a system to track memory regressions due to code changes by processing over the large profiler data or coarse GC data; and,
  • Of the small amount of data we manually analyzed, it is hard to distinguish what portion ties back to particular high level logic (i.e. specific endpoint request, etc.)

HeapAudit Java Agent

HeapAudit was created at foursquare precisely to address the fact that we needed a tool which gives us enough information to understand our allocation patterns, but at the same time can be applied to our production machines to understand the heap activities caused by our users.

At foursquare, we wrap the HeapAudit recorder around all user driven requests, attributing all heap allocations during the entire request to this particular recorder.

handleRequest() {

    HeapQuantile recorder = new HeapQuantile();

    // Register to record on local thread only
    HeapRecorder.register(recorder, false);

    try {
        // process request...
    } finally {
        // Make sure to unregister recording from local thread;
        // otherwise reference to the recorder will leak
        HeapRecorder.unregister(recorder, false);
    }

    // Tally heap allocations for the local thread with your
    // customized log output function
    log(recorder.tally(false, false));

}

We enable the HeapAudit java agent on one machine in our server pool at all times and monitor for build to build allocation changes or anomalies. We’ve also enabled this for our internal tests and staging servers to immediately notify engineers if we have an unexpected surge in allocations due to code changes.

The HeapQuantile recorder used in the above example stores allocation statistics broken into quantiles. This is lightweight and performs reasonably well under high volume of heap allocations. HeapAudit does not log the allocations in its own proprietary format. It is up to the consumer of the library to store the information and optionally further dissect the data.

The recorders can be registered globally to capture all allocations across all threads or manually registered to related threads (like when passing execution to a child thread for background processing). This is precisely what we do at foursquare to correlate all heap allocations pertaining to specific endpoints.

Performance Overhead

The performance overhead is highly dependent on the code the recorders audit over. In particular, the overhead is directly tied to the concentration of heap allocation bytecodes within a block of source code. Anecdotally, when HeapAudit is applied to the foursquare production servers, we observe 50~100% latency overhead. This is much lower than the other JVM memory profilers we’ve examined and within reasonable range for servicing actual user requests on production servers.

License and Availability

HeapAudit is open sourced under the Apache License and can be used against any JVM processes. See https://github.com/foursquare/heapaudit for more information.

Have feedback or suggestions? Let us know! And if this is interesting to you, we’re hiring!

- Norbert Hu (@norberthu)