JEP draft: Lazy Static Final Fields

OwnerJohn Rose
TypeFeature
ScopeJDK
StatusDraft
Componenttools
Created2018/08/25 06:49
Updated2018/08/25 15:17
Issue8209964

Summary

Expand the behavior of final variables to include optional lazy evaluation patterns, in language and JVM. In doing so, extend Java's pre-existing lazy evaluation mechanisms to per-variable granularity, from its current per-class granularity.

Motivation

Java uses lazy evaluation pervasively. Almost every linkage operation potentially triggers a lazy evaluation, such as the execution of a <clinit> method (class initializer bytecode) or invocation of a bootstrap method (for an invokedynamic call site or CONSTANT_Dynamic constant).

Class initializers are coarse-grained compared to mechanisms using bootstrap methods, because their contract is to run all initialization code for a whole class, rather than some initialization that may pertain to a particular field of that class. Such coarse-grained initialization effects make it especially difficult to predict and isolate the side effects of using one static field from the class, since computing the value of one field entails computation of all static fields in the same class.

So touching one field touches them all. In AOT compilers, this makes it difficult to optimize a static field reference, even if the field has a clearly analyzable constant value. It only takes one extra-complicated static field in a class to make all fields non-optimizable. A similar problem appears with proposed mechanisms for constant-folding (at javac time) constant fields with complex initializers.

As an example of an extra-complicated static field initialiazation, which in some codebases appears in almost every file, consider logger initialization:

private final static Logger LOGGER = Logger.getLogger("com.foo.Bar");

This harmless-looking initialization triggers a tremendous amount of behind-the-scenes activity at class initialization time -- though it is unlikely that the logger is needed at class initialization time, or even at all. Deferring the creation to first use would streamline initialization, and might result in optimizing away the initialization entirely.

Final variables are very useful; they are the main mechanism for Java APIs to denote constant values. Lazy variables are also well-proven. Since Java 7 they have been an increasingly important part of JDK internals, expressed via the internal @Stable annotation. The JIT can optimize both final and "stable" variables more fully than other variables. Adding lazy finals will these useful design patterns usable in more places. Finally, their adoption will allow libraries such as the JDK to downsize their reliance on <clinit> code, with likely improvement to startup and AOT optimizations.

Description

A field may be declared with a new modifier lazy, a contextual keyword recognized only as a modifier. Such a field is called a lazy field, and must also be static and final.

A lazy field must be supplied with an initializer. The compiler and runtime arrange to execute the initializer on the first use of the variable, not when the containing scope (the class) is initialized.

Each lazy static final field is associated at compile time with a constant pool entry which supplies its value. Since constant pool entries are themselves lazily computed, this is sufficient to assign a well-defined value to any static lazy final variable associated with the constant pool entry. (More than one lazy variable can be associated with a single entry, although this is not envisioned as a useful feature.) The name of the attribute is LazyValue, and it must refer to a constant pool entry that can be ldc-ed to a value that can be converted to the type of the lazy field. The allowed conversions are the same as those used by MethodHandle.invoke.

Thus, a lazy static field may be viewed as a named alias of a constant pool entry within the class that defined the field. Tools such as compilers may exploit this property.

A lazy field is never a constant variable (in the sense of JLS 4.12.4) and is explicitly excluded from contributing to a constant expression (in the sense of JLS 15.28). Thus, it never possesses a ConstantValue attribute, even if its initializer is a constant expression. Instead, a lazy field possesses a new kind of classfile attribute called LazyValue, which the JVM consults when linking a reference to that particular field. The format of this new attribute is similar to the old one, because it also points to a constant pool entry, in this case the one which resolves the field value.

When linking a lazy static field, the normal process of executing class initializers is not bypassed. Instead, any <clinit> method on the declaring class is initialized according to the rules of JVMS 5.5. In other words, a getstatic bytecode of a lazy static field performs any linkage actions associated with any static field. After initialization (or during an already-started initialization in the current thread), the JVM then resolves the constant pool entry associated with the field, and stores the value of that constant pool entry into that field.

Since lazy static final fields cannot be blank finals, they cannot be assigned to, even in those limited contexts where blank finals may be assigned to.

At compile time, all lazy static fields are taken to be initialized independently of all non-lazy static fields, regardless of their placement in the source file. This means that the ordering constraints among static fields are not observed on lazy static fields. A lazy static field's initializer can refer to any static field of the same class, regardless of location in their common source file. Any non-lazy static field initializer or class initializer block may also refer to a lazy static field value, regardless of relative source order. This is usually not desirable, as it would tend to cancel the benefit of the lazy field, but may be useful in conditional expressions or control flow. Thus, lazy static fields are treated much like fields of another class, insofar as they may be referenced in any order by any part of their declaring class.

Lazy fields may be recognized by the core reflection API by use of two new API points on java.lang.reflect.Field. The new query method isLazy returns true if and only if the field was declared lazy. The new query method isAssigned returns false if and only if the field is lazy and has not been initialized, at the moment the method is called. (It may return true on the very next call in the same thread, depending on race conditions.) Other than isAssigned, there is no way to observe whether a lazy field has been initialized yet.

(The isAssigned reflective call is provided only to assist with occasional problems with circular initialization dependencies. Perhaps we can get away without implementing it, although people who code with lazy variables occasionally want to ask gently whether a lazy variable is set yet, in the same way that users of mutexes occasionally want to ask whether a mutex is locked, but without actually seizing the lock.)

There is one irregular restriction on lazy finals: They must never be initialized to their default value. Thus, a lazy field of reference type must not be assigned a null value by its initializer, and an integral type must not be assigned zero. A lazy boolean can only be assigned a single value, true, since false is its default value. If a lazy static field's initializer returns the default value, the linkage of the field will fail with an appropriate linkage error.

This restriction against default values is made in order to allow JVM implementations to reserve the default value as an internal sentinel to denote the state of not having been initialized. The default value is already specified as the initial value of any field, set at preparation time (as described in JLS 5.4.2). Thus, this value is naturally present already in at the beginning of any field's lifetime, so it is a natural choice if the JVM needs a sentinel to help it track the field's state. Under these rules, the initial default value of a field is never accessible in the case of a lazy static. For this reason, a JVM may alternatively implement a lazy field as an immutable reference to the relevant constant pool entry.

The restriction against default values can be worked around by wrapping possibly-default values in boxes or containers of some desired form. A zero integer could be wrapped in a non-null Integer reference. A non-primitive could be wrapped in an Optional which is empty in the case of null.

To preserve implementation freedom, the contract of isAssigned is minimized. If a JVM can prove that a lazy static variable can be initialized without observable side effects, it may do so at any time; in such a case the isAssigned query will report true even before any getfield is executed. The minimized contract for isAssigned is that if it returns false, none of the side effects from initializing that variable have yet been observed by the current thread, whereas if it returns true, then the current thread can, in the future, observe all side effects of initialization. This contract allows compilers to substitute ldc for getstatic of their own fields, and allows JVMs to avoid tracking detailed initialization states of finals with shared or degenerate constant pool entries.

Multiple threads may race to initialize a lazy final. As is already the case with CONSTANT_Dynamic constant pool entries, the JVM picks an arbitrary winner of such a race and provides the value from that winner to all racing threads, as well as recording it for all future accesses. Thus, JVM implementations may elect to use CAS operations, if the platform supports those, to resolve races; the winner of a race will see a prior default value and the losers will see the non-default winning value.

In this way, the pre-existing rules for single assignment of final variables is preserved and extended across the complexities of lazy evaluation.

The same point applies to safe publication through finals; it is the same for both lazy and non-lazy finals. This may require some JVM implementations to place memory fences around the lazy initialization operation, just as they do for putstatic on non-lazy final fields.

Note that a class can convert a static to a lazy static without breaking binary compatibility. A client's getstatic instruction is identical in both cases. When the variable's declaration changes to lazy, then the getstatic instruction links differently.

Alternatives

Use nested classes as holders for single lazy variables.

Define some sort of library API for managing lazy values or (more generally) monotonic data.

Refactor would-be lazy static variables as nullary static methods and populate their bodies with ldc of CONSTANT_Dynamic constants, by some means.

Use non-final variables for publication of lazily evaluated data, being careful not to modify them, and to fence their initialization for safe publication.

(N.B. The above workarounds do not provide a binary-compatible way to evolve existing static constants away from their current reliance on <clinit>.)

In the direction of adding more functionality, we could allow lazy fields to be non-static and/or non-final, preserving current correspondences and analogies between static and non-static field behaviors. The constant pool cannot be a backing store for non-static fields, but it can still contribute bootstrap methods (that depend on the current instance). Frozen arrays (if implemented) could be given lazy variations, perhaps. Such investigations seem plausible as a follow-on projects for the current proposal. Leaving open such options contributes to our decision to forbid lazy variables to take on default values.

Lazy variables must be initialized by their initializer expressions. This is sometimes an onerous restriction, which led in the past to the invention of blank final variables. Recall that blank finals can be initialized by arbitrary blocks of code, such as try/finally logic, and can be initialized as groups rather than one at a time. Future work may attempt to extend these coding patterns to lazy final variables. Perhaps one or more lazy variables could be associated with a private block of initialization code whose contract is to assign each variable exactly once, as if it were a class initializer or object constructor. The design of such a feature may become clearer after deconstructors are introduced, since the design problems seem to overlap.

An earlier variation of this proposal may be found at http://cr.openjdk.java.net/~jrose/draft/lazy-final.html.