JEP draft: Generational Shenandoah
Authors | Bernd Mathiske, Kelvin Nilsen, William Kemper |
Owner | Bernd Mathiske |
Type | Feature |
Scope | Implementation |
Status | Submitted |
Component | hotspot / gc |
Discussion | hotspot dash gc dash dev at openjdk dot java dot net |
Effort | L |
Duration | L |
Reviewed by | Roman Kennke |
Created | 2021/02/01 22:49 |
Updated | 2021/03/03 18:49 |
Issue | 8260865 |
Summary
Enhance the Shenandoah garbage collector with generational collection capabilities to improve sustainable throughput, load spike resilience, and memory utilization.
Goals
No trade-off between the memory frugality and elasticity of G1 vs.
the short pause times of single-generation Shenandoah.
- Bring the Java heap headroom to maintain Shenandoah pause time targets (typically below 10 ms) in line with G1 at its respective pause targets (typically above 100 ms).
- Sustain an order of magnitude higher ephemeral object allocation rates.
- Decrease the risk of incurring degenerated collections during allocation spikes.
- Less than 5% application throughput reduction (aka additional “mutator overhead”) compared to single-generation Shenandoah.
- Continue to support compressedOops.
- Lay the foundation for a zero-tuning garbage collector.
- Initially supported: x86_64 and aarch64. Other instruction sets will be added later.
Non-Goals
- Making generations mandatory. The option to use single-generation Shenandoah will remain, without performance or functionality degradations.
- Improving performance for every conceivable workload. The generational system will dynamically adapt to approximate a single-generational system as needed, but for some workloads, starting out and remaining with a single generation may still be a superior option. However, we expect the majority of use cases to benefit from generational collection.
- Minimizing CPU/power usage. If longer pauses can be tolerated, other collectors like G1 may still provide more energy-efficient behavior. G1 seems in many ways to be designed to go out of its way not to collect, plus it can perform compactions with maximum parallelism, while the mutator rests. Generational Shenandoah can only approximate but never match these tactics when used as intended, given its mandates to keep pause times much lower and to avoid stop-the-world compactions entirely. However, Generational Shenandoah will fare much closer to G1 in this respect than single-generation Shenandoah.
- Maximizing mutator throughput. If longer pauses can be tolerated, other collectors like Parallel will still provide superior throughput.
Success Metrics
- General performance benchmark scores of Generational Shenandoah will be compared to single-generation Shenandoah. Candidate suites: SPECjbb2015 and SPECjvm2008. (These are registered trademarks of the Standard Performance Evaluation Corporation.)
- Operational envelopes (combinations of allocation rate, heap occupancy, and pause time target) for HyperAlloc or similar workloads will be compared to the other collectors in OpenJDK. Successful runs must not contain any full or degenerated collections.
- Allocation stalls and evacuation stalls will be induced with Extremem or similar workloads and then profiled. Their mutator overhead contributions should not exceed those in single-generation Shenandoah, under respective normal load.
Motivation
Garbage collectors with concurrent compaction are capable of completely blending GC pause times into the single-digit millisecond range of other common JVM pauses, while also leaving mutator execution speed nearly unfettered. The Shenandoah garbage collector in OpenJDK provides this ideal GC behavior for latency-sensitive Java applications. However, it can only achieve this within a limited operational envelope (combinations of heap occupancy and allocation rate). In other words, Shenandoah isrelatively CPU-hungry and memory-hungry, mostly the latter. Compared to the generational collectors G1, CMS, and Parallel, it tends to require more heap headroom andwork harder to recover space occupied by unreachable objects.
A generational collector is not necessarily noticeably disadvantaged if the generational hypothesis, “most objects die young” [Ungar ‘84], does not hold. Auto-tuning in region-based collectors can adjust generation sizes and copying (“promotion”) policies dynamically, based on observed object demographics. Worst case, surviving objects get copied a few times “too often”, and this becomes the norm, not the exception. This cost is, however, dwarfed by repeated concurrent marking of long-lived objects in the old generation. In that case, there is not much difference left compared to a single-generation collector.
A concurrent collector that is also generational and can dynamically adjust its young generation’s size and related operational parameters can both achieve low pause times and stay competitive on all other performance aspects.
Description
This enhancement of the Shenandoah garbage collector separates the Java heap into two generations. Like in other generational collectors, GC efforts then focus on the "young" generation, the one in which allocations by the mutator occur and where ephemeral objects can be reclaimed with reduced effort. We propose the following approaches for an initial implementation.
The collection algorithms operating on each generation are closely based on traditional Shenandoah. Within the young generation, Generational Shenandoah replicates the same heuristics as traditional Shenandoah to distinguish areas of memory that hold newly allocated objects from areas of memory holding objects that survived one or more recent young-generation GC passes.
Each generation is formed by a subset of the Shenandoah heap’s regions. At any given time, a region is considered either free or dedicated to either the young or the old generation. The size of each generation is given by its occupied regions plus a quota of free regions. Overreach into the free quota of the respective other generation is tolerated, but it accelerates collection triggering and can lead to degenerate and full collections. The algorithms to control collection phase scheduling will be refined based on tests once a prototype is available. This also applies to dynamic young generation sizing and other auto-tuning mechanisms such as tenuring age.
Shenandoah has a unique Load Reference Barrier (LRB) that supports 32-bit builds and compressedOops in 64-bit builds. To not increase its mutator impact, this same LRB is used for both generations, without any changes. This leads to a singleton evacuator, which is shared between old and young collection efforts. Typical evacuation phases collect garbage either exclusively from young memory regions or from a combination of young and old memory regions. This behavior mimics G1’s young and mixed collections. The principal improvement over G1 is that generational Shenandoah’s “mixed” collections are concurrent to the mutator.
The generation-specific marking phases, however, are largely decoupled from each other. Concurrent old generation marking proceeds in the background while young generation marking occurs multiple times. It can also be suspended as needed to execute other collection phases. Once old generation marking has completed, subsequent evacuations and reference updates will include old generation regions until the entire old-generation collection set has been processed.
As remembered set implementation, we use existing card marking code and supplement code for remembered scanning that is concurrent with mutator execution.
Shenandoah’s existing SATB barriers are generalized to serve the combined needs of young-generation and old-generation concurrent marking. The post-processing of SATB buffers treats references to old-generation memory differently than references to young-generation memory. But the fast path through these barriers remains unchanged.
Building and Invoking
The new generational feature forms part of the Shenandoah code base in OpenJDK. But it has no runtime effect unless it is activated by this JVM command line option:
-XX:ShenandoahGCMode=generational
These existing options for generational garbage collectors (such as Parallel, G1, CMS) go into effect:
-XX:NewRatio=<n>
-XX:NewSize=<size>
-XX:MaxNewSize=<size>
Please see the project wiki (once available) for more information on how to setup and tune Generational Shenandoah.
Alternatives
Azul Systems’ C4, is already generational, but not available in open source. ZGC would be another excellent code base to further develop into a generational concurrent collector. However, neither of these options supports “compressedOops” (32-bit object references in 64-bit JVMs) and the vast majority of Java heaps we see (e.g., in cloud services) are well below 32 GB in size, which is the upper bound for using this space-saving and performance-improving feature.
Testing
Most of our existing functional and stress tests are collector-agnostic and can be reused as-is. Additional test run configurations for the new generational mode will be integrated. Furthermore, functional, performance, and stress tests specific to Generational Shenandoah will be added.
Risks and Assumptions
- Remembered set operations, in particular scanning, may add to pause times.
- Remembered set-related barriers add to mutator overhead.
- Heuristics to automatically configure generation sizes, the object promotion policy, and the timing as well as the balancing of efforts dedicated to young generation and old generation collections have not yet been fully developed and have not been tested with real world workloads.
Dependencies
- Generational Shenandoah does not require changes to shared code outside the Shenandoah-specific code base of the HotSpot JVM.