JEP draft: Provide a low-overhead way of sampling Java heap allocations, accessible via JVMTI

AuthorJC Beyler
OwnerChuck Rasbold
Created2016/12/12 21:31
Updated2018/01/11 08:48
Componenthotspot / jvmti
Reviewed byStaffan Larsen


Provide a low-overhead way of sampling Java heap allocations, accessible via JVMTI.


The overall goal of this proposal is to provide a way of getting information about Java object heap allocations from the JVM that:


There is a deep need for users to understand the contents of their heaps. Poor heap management can lead to problems like heap exhaustion and GC thrashing. As a result, a number of tools have been developed to allow users to introspect into their heaps, such as the Java Flight Recorder, jmap, YourKit, or VisualVM tools.

One piece of information that is lacking from most of the existing tooling is the call site for particular allocations. Heap dumps and heap histograms do not contain this information. This information can be critical to debugging memory issues, because it tells developers the exact location in their code particular (and particularly bad) allocations occurred.

There are currently two ways of getting this information out of HotSpot:

First, you can instrument all of the allocations in your application using a bytecode rewriter (like the one at You can then have the instrumentation take a stack trace (when you want one).

Second, you can use Java Flight Recorder, which takes a stack trace on TLAB refills and when allocating directly into the old generation. The downsides of this are that a) it is tied to a particular allocation implementation (TLABs), and misses allocations that don’t meet that pattern; b) it doesn’t allow the user to customize the sampling rate; c) it only logs allocations, so you cannot distinguish between live and dead objects; and c) it is proprietary, so it cannot be user-extended.

This proposal mitigates those problems by providing an extensible JVMTI interface that allows the user to define the sampling rate, and returning a set of live stack traces.


The user facing API for the heap sampling feature proposed by this JEP consists of an extension to JVMTI that allows for heap profiling. The following structure represents a single heap sample:

struct jvmtiStackTrace {
  jvmtiFrameInfo *frames;
  jint frame_count;
  jint size;
  jlong thread_id;

where frames are the stack trace where the allocation event happened; size is the size of the allocation (in bytes); thread_id is the Java thread id.

The new API also includes several new JVMTI methods. The first method added by the API enables tracing:

jvmtiError StartHeapSampling(jvmtiEnv *env, jint monitoring_rate, jint max_gc_storage);

A stop mechanism is possible via:

jvmtiError StopHeapSampling(jvmtiEnv *env);

The JVM handles bookkeeping, with two more functions providing a means to inspect the current allocation behavior:

jvmtiError GetLiveTraces(jvmtiEnv *env, jvmtiStackTraces* stack_traces);

jvmtiError GetGarbageTraces(jvmtiEnv *env, jvmtiStackTraces* stack_traces);

These functions get the information about sampled objects. GetLiveTraces gets sampled information associated with objects that have not been garbage collected yet. GetGarbageTraces returns some of the objects that have recently been garbage collected.

Internally, the system remembers the last X number of garbage collected sampled objects. X being the value passed to the StartHeapSampling method via the max_gc_storage parameter. In our local implementation, we have set x at 200, and have gotten good results. We have used two replacement policies: statistically sampled garbage collected traces, and recently garbage collected, where: Recently garbage collected keeps the traces in a ring buffer; it simply discards the oldest sampled trace when there is a new one. Statistically sampled garbage collected makes a decision to evacuate an old trace when a new one discards a trace with diminishing probability over time. It replaces a random entry with probability 1/samples_seen. This strategy will tend towards preserving the most frequently occurring traces over time.

To get the frequently garbage collected objects, the user can call:

jvmtiError GetFrequentGarbageTraces(jvmtiEnv *env, jvmtiStackTraces* stack_traces);

All stack traces can be released via the following method:

jvmtiError ReleaseTraces(jvmtiEnv* env, jvmtiStackTraces* stack_traces);

Finally, the API provides one last method:

jvmtiError GetHeapSamplingStats) (jvmtiEnv* env, jvmtiHeapSamplingStats* stats);

This provides information about the sampling internals, which is useful for debugging and also provides the user with key information on the sampler.

Internal Object Allocation Tracking

The following section provides insight as to how to track the information in the JVM.

A. Object Allocation

Object allocation is sampled by using the threshold provided by the StartHeapSampling API explained above. The system itself is built on piggy-backing the TLAB buffer overflow mechanism, where the JVM tries to allocate a new TLAB. At that point, the heap sampler gets called and a stacktrace is obtained.

The magic behind this is to modify the end of the TLAB to be the sampling point required to have a given sampling rate. By doing this, the slow path is taken and a sample can be taken. Then, the TLAB end pointer is bumped forward, either to the actual TLAB end or to the next sample request point.

B. Garbage Collection

During reference processing, the system walks the list of internal sampled objects. The list walk checks if the objects are still live. If the object is no longer live, the system removes it from the list of currently live objects and pushes it to a garbage collected list.

In our implementation, there are two different lists maintained for user perusal: Recently garbage collected Frequently garbage collected

C. New Capability

To protect the new feature, a new capability called can_sample_heap is introduced into the jvmtiCapabilities.


There are multiple alternatives to the system presented in this JEP. The introduction presented two already: The JFR system provides an alternative but has a licensing issue/limitation making this not usable by all interested parties The bytecode instrumentation using ASM is an alternative but its overhead makes it prohibitive and not a workable solution

The JFR system also uses the TLAB creation as a means to track memory allocation.

Finally, a second alternative would, instead of implementing the bookkeeping internally, have the JVM simply expose a set of callbacks when allocations/GC happen and let the user handle the whole bookkeeping. The advantages are that the user controls the extent of what is to be maintained or not. The disadvantages are the potential overhead of extra calls to the outside world. Another disadvantage to an exposed callback is the risk of user error. There are many things that are not possible during an allocation, such as the creation of weak references, for example. To enable such a callback system, the documentation will have to be crystal clear and provide sufficient warning to reduce the risk of complex JVM crashes. Due to the error prone nature of providing a callback, the flexibility advantage might be outweighed by the risks of user mistakes. A study will be conducted to assess what real risks exist, how the Java toolchain could mitigate the risks, and what would the extra overhead be.


We have an implementation that we’ve validated on x86 Linux, and are using in production at Google. Further testing needs to be done in four parts: Overhead testing (performance non-regression) needs to be done with benchmarks such as: SPECjvm98, SPECjvm2008, SPECjbb2005.

There are 11 tests in the JTreg framework for this feature that test: turning on/off with multiple threads, multiple threads allocating at the same time, testing if the data is being sampled at the right rate, and if the stacks are coherent to what is expected.

Risks and Assumptions

There is a performance/memory hit with the feature. In the prototype implementation at Google, the overhead is minimal (<2%), but this was using a mechanism that modified JIT’d code. In the version presented here, the system piggy-backs on the TLAB code and should not have that regression.


None known.