JEP draft: Lazy Static Final Fields
Owner | John Rose |
Type | Feature |
Scope | JDK |
Status | Draft |
Component | tools |
Created | 2018/08/25 06:49 |
Updated | 2021/11/05 21:51 |
Issue | 8209964 |
Summary
Expand the behavior of final variables to include optional lazy evaluation patterns, in language and JVM. In doing so, extend Java's pre-existing lazy evaluation mechanisms to per-variable granularity, from its current per-class granularity.
Motivation
Java uses lazy evaluation pervasively. Almost every linkage operation
potentially triggers a lazy evaluation, such as the execution of a
<clinit>
method (class initializer bytecode) or invocation of a
bootstrap method (for an invokedynamic
call site or
CONSTANT_Dynamic
constant).
Class initializers are coarse-grained compared to mechanisms using bootstrap methods, because their contract is to run all initialization code for a whole class, rather than some initialization that may pertain to a particular field of that class. Such coarse-grained initialization effects make it especially difficult to predict and isolate the side effects of using one static field from the class, since computing the value of one field entails computation of all static fields in the same class.
So touching one field touches them all. In AOT compilers, this makes
it difficult to optimize a static field reference, even if the field
has a clearly analyzable constant value. It only takes one
extra-complicated static field in a class to make all fields
non-optimizable. A similar problem appears with proposed mechanisms
for constant-folding (at javac
time) constant fields with complex
initializers.
As an example of an extra-complicated static field initialization, which in some codebases appears in almost every file, consider logger initialization:
private final static Logger LOGGER = Logger.getLogger("com.foo.Bar");
This harmless-looking initialization triggers a tremendous amount of behind-the-scenes activity at class initialization time – though it is unlikely that the logger is needed at class initialization time, or even at all. Deferring the creation to first use would streamline initialization, and might result in optimizing away the initialization entirely.
Final variables are very useful; they are the main mechanism for Java
APIs to denote constant values. Lazy variables are also well-proven.
Since Java 7 they have been an increasingly important part of JDK
internals, expressed via the internal @Stable
annotation. The JIT
can optimize both final and "stable" variables more fully than other
variables. Adding lazy finals will these useful design patterns
usable in more places. Finally, their adoption will allow libraries
such as the JDK to downsize their reliance on <clinit>
code, with
likely improvement to startup and AOT optimizations.
Description
A field may be declared with a new modifier lazy
, a contextual
keyword recognized only as a modifier. Such a field is called a lazy
field, and must also be static and final.
A lazy field must be supplied with an initializer. The compiler and runtime arrange to execute the initializer on the first use of the variable, not when the containing scope (the class) is initialized.
Each lazy static final field is associated at compile time with a
constant pool entry which supplies its value. Since constant pool
entries are themselves lazily computed, this is sufficient to assign a
well-defined value to any static lazy final variable associated with
the constant pool entry. (More than one lazy variable can be
associated with a single entry, although this is not envisioned as a
useful feature.) The name of the attribute is LazyValue
, and it
must refer to a constant pool entry that can be ldc
-ed to a value
that can be converted to the type of the lazy field. The allowed
conversions are the same as those used by MethodHandle.invoke
.
Thus, a lazy static field may be viewed as a named alias of a constant pool entry within the class that defined the field. Tools such as compilers may exploit this property.
A lazy field is never a constant variable (in the sense of JLS 4.12.4)
and is explicitly excluded from contributing to a constant expression
(in the sense of JLS 15.28). Thus, it never possesses a
ConstantValue
attribute, even if its initializer is a constant
expression. Instead, a lazy field possesses a new kind of classfile attribute
called LazyValue
, which the JVM consults when linking a reference to
that particular field. The format of this new attribute is similar to
the old one, because it also points to a constant pool entry, in this
case the one which resolves the field value.
When linking a lazy static field, the normal process of executing
class initializers is not bypassed. Instead, any <clinit>
method
on the declaring class is initialized according to the rules of JVMS
5.5. In other words, a getstatic
bytecode of a lazy static field
performs any linkage actions associated with any static field.
After initialization (or during an already-started initialization in
the current thread), the JVM then resolves the constant pool entry
associated with the field, and stores the value of that constant
pool entry into that field.
Since lazy static final fields cannot be blank finals, they cannot be assigned to, even in those limited contexts where blank finals may be assigned to.
There is a rule in Java which requires that a static variable may only appear in the initializers of static variables which occur later on in the class body. This rule reduces (but does not eliminate) the possibility that an untimely read of a static variable may obtain the default value of that varaible, rather than its initial value.
class C {
static int x = y; //error: illegal forward reference
static int y = 42;
}
These ordering constraints are observed even for lazy static fields, as if they were not declared lazy. Thus, a lazy static field's initializer can only refer to a static field of the same class that occurs earlier in the same source file.
If in some case two lazy values must depend on each other in a circular relationship, the cycle can be hidden by the use of a private static method. In that case, a true cyclic dependency will cause a stack overflow error. In the case of non-lazy statics, an analogous cycle would cause a default value to become visible.
class C {
//lazy static final Object x = y, y = x; //error
lazy static final Object x = ycycle(), y = x;
private static Object ycycle() { return y; }
}
Any non-lazy static field initializer or class initializer block may also refer to a lazy static field value that precedes in the the source file. This is usually not desirable, as it would tend to cancel the benefit of the lazy field, but may be useful in combination with conditional expressions or control flow.
The purpose of the ordering rule is to require the user to specify a nominal initialization order for lazy statics. The actual dynamic initialization order may differ, but the nominal order serves to demonstrate statically that there are no unintentional cyclic dependencies between the statics, lazy and otherwise.
Lazy fields may be recognized by the core reflection API by use of two
new API points on java.lang.reflect.Field
. The new query method
isLazy
returns true
if and only if the field was declared lazy.
The new query method isAssigned
returns false
if and only if the
field is lazy and has not been initialized, at the moment the method
is called. (It may return true
on the very next call in the same
thread, depending on race conditions.) Other than isAssigned
, there
is no way to observe whether a lazy field has been initialized yet.
(The isAssigned
reflective call is provided only to assist with
occasional problems with circular initialization dependencies.
Perhaps we can get away without implementing it, although people who
code with lazy variables occasionally want to ask gently whether a
lazy variable is set yet, in the same way that users of mutexes
occasionally want to ask whether a mutex is locked, but without
actually seizing the lock.)
To preserve implementation freedom, the contract of isAssigned
is
minimized. If a JVM can prove that a lazy static variable can be
initialized without observable side effects, it may do so at any time;
in such a case the isAssigned
query will report true
even before
any getfield
is executed. The minimized contract for isAssigned
is that if it returns false
, none of the side effects from
initializing that variable have yet been observed by the current
thread, whereas if it returns true
, then the current thread can, in
the future, observe all side effects of initialization. This contract
allows compilers to substitute ldc
for getstatic
of their own
fields, and allows JVMs to avoid tracking detailed initialization
states of finals with shared or degenerate constant pool entries.
Multiple threads may race to initialize a lazy final. As is already
the case with CONSTANT_Dynamic
constant pool entries, the JVM picks
an arbitrary winner of such a race and provides the value from that
winner to all racing threads, as well as recording it for all future
accesses. Thus, JVM implementations may elect to use CAS operations,
if the platform supports those, to resolve races.
When the JVM stores a value into a lazy final field, it performs a
freeze operation. This freeze happens before any getstatic
instruction is allowed to see the field value. This is how
pre-existing rules for safe publication apply to lazy finals.
The effect of a lazy final is closely similar to the effect of a static final defined on its own class, with no other static finals.
class C { lazy static final Object x = xval(), y = yval(); }
f() { ... getstatic C.x ... }
=>
class C_x { static final Object x = xval(); }
class C_y { static final Object y = yval(); }
f() { ... getstatic C_x.x ... }
The difference is that a true cyclic dependency between lazy statics will cause a stack overflow, rather than the observation of a default value.
Note that a class can convert a static to a lazy static without
breaking binary compatibility. A client's getstatic
instruction
is identical in both cases. When the variable's declaration changes
to lazy, then the getstatic
instruction links differently.
Alternatives
Use nested classes as holders for single lazy variables.
Define some sort of library API for managing lazy values or (more generally) monotonic data.
Refactor would-be lazy static variables as nullary static methods and
populate their bodies with ldc
of CONSTANT_Dynamic
constants, by
some means.
Use non-final variables for publication of lazily evaluated data, being careful not to modify them, and to fence their initialization for safe publication.
(N.B. The above workarounds do not provide a binary-compatible way
to evolve existing static constants away from their current reliance
on <clinit>
.)
In the direction of adding more functionality, we could allow lazy fields to be non-static and/or non-final, preserving current correspondences and analogies between static and non-static field behaviors. The constant pool cannot be a backing store for non-static fields, but it can still contribute bootstrap methods (that depend on the current instance). Frozen arrays (if implemented) could be given lazy variations, perhaps. Such investigations seem plausible as a follow-on projects for the current proposal.