JEP 404: Generational Shenandoah

AuthorsBernd Mathiske, Kelvin Nilsen, William Kemper
OwnerRoman Kennke
TypeFeature
ScopeImplementation
StatusCandidate
Componenthotspot / gc
Discussionhotspot dash gc dash dev at openjdk dot java dot net
EffortL
DurationL
Reviewed byRoman Kennke
Created2021/02/01 22:49
Updated2021/08/04 16:44
Issue8260865

Summary

Enhance the Shenandoah garbage collector with generational collection capabilities to improve sustainable throughput, load-spike resilience, and memory utilization.

Goals

Non-Goals

Success Metrics

Motivation

Garbage collectors with concurrent compaction are capable of completely blending GC pause times into the single-digit millisecond range of other common JVM pauses, while also leaving mutator execution speed nearly unfettered. The Shenandoah garbage collector already provides this ideal GC behavior for latency-sensitive Java applications. However, it can only achieve this within limited operational envelopes (i.e., combinations of heap occupancy and allocation rate). In other words, Shenandoah is relatively CPU-hungry and memory-hungry, mostly the latter. Compared to the generational collectors G1, CMS, and Parallel, it tends to require more heap headroom and work harder to recover space occupied by unreachable objects.

A generational collector is not necessarily noticeably disadvantaged if the generational hypothesis, i.e., that most objects die young, does not hold. Auto-tuning in region-based collectors can adjust generation sizes and copying (i.e., promotion) policies dynamically, based on observed object demographics. In the worst case, surviving objects are copied a few times too often and this becomes the norm, not the exception. This cost is, however, dwarfed by the repeated concurrent marking of long-lived objects in the old generation. In that case, there is not much difference left compared to a single-generation collector.

A concurrent collector that is also generational and can dynamically adjust its young generation’s size, and related operational parameters can both achieve low pause times and stay competitive in all other performance aspects.

Description

This enhancement of the Shenandoah garbage collector separates the Java heap into two generations. As in other generational collectors, GC efforts then focus on the young generation, i.e., the one in which allocations by the mutator occur and where ephemeral objects can be reclaimed with reduced effort. We propose the following approaches for an initial implementation.

The collection algorithms operating on each generation are closely based on traditional Shenandoah. Within the young generation, Generational Shenandoah uses the same heuristics as traditional Shenandoah to distinguish areas of memory that hold newly allocated objects from areas of memory holding objects that survived one or more recent young-generation GC passes.

Each generation is formed by a subset of the Shenandoah heap’s regions. At any given time, a region is considered either free or dedicated to either the young or the old generation. The size of each generation is given by its occupied regions plus a quota of free regions. Overreach into the free quota of the respective other generation is tolerated, but it accelerates collection triggering and can lead to degenerated and full collections. We will refine the algorithms to control collection-phase scheduling, young-generation sizing, tenuring age, and other auto-tuning mechanisms based on tests, once a prototype is available.

Shenandoah has a unique Load Reference Barrier (LRB) that supports 32-bit builds and compressed 32-bit object pointers (“compressedOops”) in 64-bit builds. To constrain impact on the mutator we use this same LRB for both generations, without any changes, and use a single evacuator for both old and young collection efforts. Typical evacuation phases collect garbage either exclusively from young regions or from a combination of young and old regions. This behavior mimics G1’s young and mixed collections. The principal improvement over G1 is that generational Shenandoah’s mixed collections are concurrent to the mutator.

The generation-specific marking phases are largely decoupled from each other. Concurrent old-generation marking proceeds in the background while young-generation marking occurs multiple times. It can also be suspended as needed to execute other collection phases. Once old-generation marking has completed, subsequent evacuations and reference updates will include old-generation regions until the entire old-generation collection set has been processed.

For the remembered-set implementation, we use the existing card marking code and supplement code for remembered scanning that is concurrent with mutator execution.

Shenandoah’s existing SATB barriers are generalized to serve the combined needs of young-generation and old-generation concurrent marking. The post-processing of SATB buffers treats references to old-generation memory differently than references to young-generation memory, but the fast path through these barriers remains unchanged.

Building and Invoking

The new generational feature is part of the Shenandoah code base, but it has no runtime effect unless it is activated by the JVM command line option

-XX:ShenandoahGCMode=generational

in which case these existing options for generational garbage collectors (such as Parallel, G1, and CMS) go into effect:

-XX:NewRatio=<n>
-XX:NewSize=<size>
-XX:MaxNewSize=<size>

Please see the project wiki (once available) for more information on how to setup and tune Generational Shenandoah.

Alternatives

Azul Systems’ C4 collector is already generational, but not available in open source. ZGC would be another excellent code base to further develop into a generational concurrent collector. Neither of these options supports compressedOops, however, and the vast majority of Java heaps that we see (e.g., in cloud services) are well below 32 GB in size and thus able to take advantage of this space-saving and performance-improving feature.

Testing

Most existing functional and stress tests are collector-agnostic and can be reused as-is. We will integrate additional test run configurations for the new generational mode along with new mode-specific functional, performance, and stress tests.

Risks and Assumptions