JEP draft: Epsilon GC: The Arbitrarily Low Overhead Garbage (Non-)Collector

OwnerAleksey Shipilev
Created2017/02/14 08:23
Updated2017/07/19 10:57
TypeFeature
StatusSubmitted
Componenthotspot / gc
ScopeImplementation
EffortS
DurationS
Priority4
Issue8174901
Relates toJEP 304: Garbage-Collector Interface

Summary

Develop a GC that only handles memory allocation, but does not implement any actual memory reclamation mechanism. Once available Java heap is exhausted, perform the JVM shutdown.

Goals

Provide a completely passive GC implementation with bounded allocation limit, and lowest runtime performance overhead possible. A successful implementation is an isolated code change, not touching other GCs, and having minimal changes in the rest of JVM.

Non-Goals

It is not the goal to introduce manual memory management features in Java language and/or JVM. It is not the goal to introduce new APIs to manage Java heap. It is not the goal to change/cleanup internal JVM interfaces to fit this GC.

Motivation

OpenJDK and other JDK implementations are well known for a broad choice of highly configurable GC implementations. There are four use cases where a trivial no-op GC proves useful.

Performance testing. Having the GC that does almost nothing is the useful tool to do differential performance analysis for other real GCs. Having a no-op GC can help to filter out GC-induced performance artifacts.

Functional testing. For Java code testing, having a way to establish a threshold for allocated memory is useful to assert memory pressure invariants. Today, we have to pick up the allocation data from MXBeans, or even resort to GC logs parsing. Having a GC that accepts only the bounded number of allocations, and fails on heap exhaustion simplifies testing.

VM interface testing. For VM development purposes, having a simplest GC helps to understand the absolute minimum required from VM-GC interface to have a functional allocator. This serves as the proof that VM-GC interface is sane, which is important in lieu of JEP-304 ("GC interface").

Last-drop performance improvements. For ultra-performance-sensitive applications, where developers are conscious about memory allocations, and know the application memory footprint exactly, or even arrive to (almost) completely garbage-free applications. In those applications, GC cycle may be considered an implementation bug that wastes cycles for no good reason. Extremely short lived jobs are one example of this. There are also cases when restarting the JVM -- letting load balancers figure out failover -- is sometimes a better recovery strategy than accepting a GC cycle. Even for non-allocating workloads, the choice of GC means choosing the set of GC barriers that the workload has to use, even if no GC cycle actually happens. Most OpenJDK GCs are generational, and they emit at least one reference write barrier. Avoiding this barrier brings the last bit of performance improvement.

Description

Epsilon GC looks and feels like any other OpenJDK GC, enabled with -XX:+UseEpsilonGC.

Epsilon GC works by implementing linear allocation in a single contiguous chunk of allocated memory. This allows a trivial lock-free TLAB (thread-local allocation buffers) issuance code in the GC, which can then reuse the lock-free within-TLAB allocation handled by existing VM code. Issuing TLABs also helps to keep the resident memory taken by a process bounded by what had been actually allocated. Humongous/out-of-TLAB allocations are handled by the same code, because there is little difference between allocating TLAB and allocating the large objects in this scheme.

The barrier set used by Epsilon is completely empty/no-op, because GC does not do any GC cycles, and therefore does not care about the object graph, object marking, object copying, etc. Introducing a new barrier set implementation is likely to be the most spread out JVM change in this implementation.

Since the only important part of runtime interface where Epsilon is taking part of is issuing TLABs, its performance largely depends on the TLAB sizes issued. With arbitrarily large TLABs and arbitrarily large heap, the performance overhead can be described by an arbitrarily low positive value, hence the name.

Once Java heap is exhausted, no allocation is possible, no memory reclamation is possible, therefore we have to fail. There are several options at that point, most are in line with what existing GCs do:

The prototype runs prove the concept by surviving small workloads, and failing predictably on larger ones. The prototype implementation and some tests can be found at in sandbox repository:

$ hg clone http://hg.openjdk.java.net/jdk10/sandbox sandbox 
$ cd sandbox/ 
$ sh ./get_source.sh
$ sh ./common/bin/hgforest.sh up -r JDK-8174901-epsilon-branch
$ sh ./configure 
$ make images

One can see the difference between the baseline and patched runtime by using:

$ cd hotspot/
$ hg diff -r default:JDK-8174901-epsilon-branch

Automatically generated webrev is at: https://builds.shipilev.net/patch-openjdk-epsilon-jdk10/

Sample binary builds are at: https://builds.shipilev.net/openjdk-epsilon-jdk10-release/ https://builds.shipilev.net/openjdk-epsilon-jdk10-fastdebug/

Alternatives

There are no OpenJDK alternatives that disable all GC barriers.

If barriers are not the issue, then using Serial or Parallel(Old) GC should fit the same performance profile, assuming we can configure their respective heuristics to never do GC cycles before they face the complete heap exhaustion (i.e. by presizing the heap, setting a very large young gen size, disabling adaptive heuristics, etc). This is hard to reliably guarantee with the multitude of GC options they provide, and this is arguably not the target mode for those GCs anyway.

Further improvements in Parallel, G1 and Shenandoah GC projects may render the overheads imposed by GC negligible to warrant a completely no-op GC. If/when that happens, Epsilon would still be useful for internal functional/performance testing.

Testing

Common GC tests would not be suitable for Epsilon GC, because most tests assume they can allocate an arbitrary amount of garbage. New tests would need to be developed to test that GC indeed works well on low-allocating workloads, and that it fails on heap exhaustion in a predictable manner. New jtreg tests under hotspot/gc/epsilon would be enough to assert correctness.

One-off performance testing during the Epsilon development would be enough to assert the performance characteristics when running with interpreter, C1 and C2 compilers. On-going performance testing is not required since implementation is supposed to be unchanged after the initial implementation, and its performance-sensitive paths are implicitly tested by other GCs.

Risks and Assumptions

Usefulness vs. maintenance costs. It can be argued that such an implementation is useless to have in the product, because no one needs it. The experience, however, tells that many players in the Java ecosystem already did this exercise with expunging GC from their custom-built JVMs. That means, having a standard no-op GC option would help that part of the ecosystem. Coupled with low maintenance costs if implementation proves trivial, this risk is minimal. We also think this risk is minimal if feature remains available in non-product builds only, under "develop" flag. Users and downstream distributions may change it to "product" or "experimental" to expose Epsilon to their applications.

Public expectations. Providing the garbage collection implementation that does not in fact do garbage collection may be seen as the dangerous practice. Accidentally enabling Epsilon GC in production may lead to surprise JVM failures when heap is exhausted. We think this risk is minimal if feature remains unavailable by default in product builds, either under "develop" or "experimental" option.

Implementation complexity. It might be the case that implementation would need more changes in the shared code than anticipated, for example in compilers and platform-specific backends. Our prototyping code seems to indicate these changes are isolated enough to be benign. If that proves to be a risk, it should be mitigated by JEP-304 ("GC interface").

Dependencies

This work may depend on JEP-304 ("GC Interface") to minimize shared code changes. It is might not require the GC interface, however, if the shared code changes are minimal.