JEP draft: Hidden Classes

OwnerMandy Chung
TypeFeature
ScopeSE
StatusDraft
Componentcore-libs / java.lang.invoke
Discussionvalhalla dash dev at openjdk dot java dot net
EffortL
DurationL
Reviewed byDavid Holmes, John Rose, Maurizio Cimadamore, Paul Sandoz
Endorsed byJohn Rose
Created2019/03/13 17:37
Updated2019/12/10 18:54
Issue8220607

Summary

Allow the JVM to define hidden classes that cannot be used directly by the bytecode of other classes. Hidden classes are intended to support frameworks that generate classes at run time and use them indirectly, via reflection. A hidden class may be defined as a member of an access control nest, and may be weakly referenced by its class loader.

Goals

  1. Allow frameworks to define classes as non-discoverable implementation details of the framework, so that they cannot be linked against by other classes nor discovered through reflection.
  2. Support extending an access control nest with non-discoverable classes.
  3. Support aggressive unloading of non-discoverable classes, so that frameworks have the flexibility to define as many as they need.
  4. Deprecate the non-standard API sun.misc.Unsafe::defineAnonymousClass, planning to deprecate-for-removal in a future JDK.

Non-Goals

Motivation

Many language implementations on the JVM rely on dynamic class generation for flexibility and efficiency. For example, javac does not translate a lambda expression into a dedicated class file at compile time, but rather, emits bytecode that will dynamically generate and instantiate a class to yield the object corresponding to the lambda expression only when needed. On a similar note, runtimes for non-Java languages often implement the higher-order features of those languages by using dynamic proxies, which also generate classes dynamically.

Language implementers usually intend for a dynamically generated class to be logically part of the implementation of an existing (statically generated) class. This intention suggests various properties for dynamically generated classes:

Unfortunately, the JVM APIs that define a class -- ClassLoader::defineClass and Lookup::defineClass -- are indifferent to whether the bytes of the class were generated dynamically or statically (i.e. at compile time). The APIs always define a visible class that will be used every time another class in the same loader hierarchy tries to link a class of that name. Consequently, the class may be more discoverable or have a longer lifecycle than desired. In addition, the APIs can only define a class that will act as a member of a nest if the nest's host class knows the name of the class in advance; practically speaking, this prevents dynamically generated classes from being members of a nest.

If the JVM APIs could define "hidden" classes that are not discoverable and have a limited lifecycle, then the myriad of JDK frameworks which generate classes dynamically would shift over to defining classes that way. This would improve the efficiency of language implementations which rely on the frameworks. For example:

Description

The Lookup API introduced in Java SE 7 allows a class to obtain a lookup object that provides reflective access to classes, methods, and fields. Crucially, no matter what code ends up using a lookup object, the reflective access always occurs in the context of the class which originally obtained the lookup object -- the lookup class. In effect, a lookup object transmits the access rights of the lookup class to any code which receives the object.

Java SE 9 enhanced the transmission capabilities of lookup objects by introducing a method Lookup::defineClass(byte[]). From the bytes supplied, this method defines a new class in the same context as the class which originally obtained the lookup object. That is, the newly-defined class has the same defining class loader, run-time package, and protection domain as the lookup class.

This JEP proposes to extend the Lookup API to support defining a hidden class that can only be accessed by reflection. A hidden class is not discoverable by the JVM during bytecode linkage, or by programs making expert use of class loaders (e.g., Class::forName, ClassLoader::loadClass). Optionally, a hidden class can be created as a member of an access control nest, and can be weakly referenced by its defining class loader.

Hidden Classes

Whereas an ordinary class is created by calling ClassLoader::defineClass, a hidden class is created by calling Lookup::defineHiddenClass. This causes the JVM to derive a hidden class from the supplied bytes, link the hidden class, and return a lookup object that provides reflective access to the hidden class. The program should store the lookup object carefully, for it is the only way to obtain the Class object of the hidden class.

The bytes supplied to Lookup::defineHiddenClass must be a ClassFile structure. The JVM's derivation of a hidden class from those bytes is identical to the derivation of an ordinary class by ClassLoader::defineClass.

Name of a hidden class

A hidden class is not anonymous; it has a name that is available by invoking getName on its Class object. The name is shown in diagnostics (such as the output of java -verbose:class), in JVM TI class loading events, in JFR events, and potentially in stack traces (see subsection below).

The name of a hidden class is the concatenation of:

  1. The binary name whose internal form is given by this_class in the ClassFile structure;
  2. The / character;
  3. A non-empty suffix chosen by the JVM implementation. This suffix must be an unqualified name as specified in JVMS 4.2.2 and unique.

For example, suppose this_class is com/example/Foo (the internal form of the binary name com.example.Foo), then the hidden class may be named com.example.Foo/1234.

Although the name of a hidden class is visible in many places, it must never be used by another class. For example, if a class (whether ordinary or hidden) names a hidden class as its declaring class or class member (via InnerClasses attribute), or as its nest host (via NestHost attribute), or as a member of its nest (via NestMembers attribute), the class may successfully be created. At run time, any attempt to resolve a hidden class by name may result in a LinkageError which is no different from resolving a symbolic reference to an ordinary class that cannot be found.

The namespace of hidden class names is disjoint from the namespace of ordinary class names. A hidden class name is not a binary name (JVMS 4.2.1) and thus such a name is forbidden when passed as the first argument to ClassLoader::defineClass. Invoking cl.defineClass("com.example.Foo/1234", <bytes>, ...) will fail because the first argument is not a binary name. On the other hand, if a ClassFile is manufactured with a this_class of com/example/Foo/1234

We acknowledge that not using binary names for the names of hidden classes is potentially a source of problems, but at the same time it is compatible with the longstanding practice of Unsafe::defineAnonymousClass (see https://mail.openjdk.java.net/pipermail/valhalla-dev/2019-August/006273.html). The use of / to indicate a hidden class is also aligned stylistically with the use of / in stack traces to "qualify" a class by its defining module and loader (see StackTraceElement::toString). The error log below reveals two hidden classes, both in code of module m1: one hidden class has a method test, the other has a method apply.

java.lang.Error: thrown from hidden class com.example.Foo/0x0000000800b7a470
    at m1/com.example.Foo/0x0000000800b7a470.toString(Foo.java:16)
    at m1/com.example.Foo_0x0000000800b7a470$$Lambda$29/0x0000000800b7c040.apply(<Unknown>:1000001)
    at m1/com.example.Foo/0x0000000800b7a470.test(Foo.java:11)

Hidden classes and class loaders

Despite the fact that a hidden class has a corresponding Class object, and the fact that a hidden class's supertypes are created by class loaders, no class loader is involved in the creation of the hidden class itself. (Notice that this JEP never says a hidden class is "loaded".) No class loaders are recorded as initiating loaders of a hidden class, and no loading constraints are generated that involve hidden classes. Consequently, hidden classes are not known by any class loader: a symbolic reference in the run-time constant pool of a class D to a class C denoted by N will never resolve to a hidden class for any value of D, C, and N. The reflective methods Class::forName, ClassLoader::findLoadedClass, and Lookup::findClass will not find hidden classes.

Notwithstanding this detachment from class loaders, a hidden class is deemed to have a defining class loader. This is necessary to resolve types used by the hidden class's own fields and methods. In particular, a hidden class has the same defining class loader, runtime package, and protection domain as the lookup class, which is the class that originally obtained the lookup object (the object on which Lookup::defineHiddenClass is invoked).

Linking a hidden class

The JVM links a hidden class immediately after deriving it. The class is verified (5.4.1) and prepared (5.4.2) as for an ordinary class, except that no loading constraints are imposed. After linking, a hidden class is initialized if the appropriate parameter of Lookup::defineHiddenClass was set. If the parameter was not set, then the hidden class will be initialized when reflective methods are used to instantiate it or access its members. (This is the same regime as initialization of ordinary classes, where ClassLoader::defineClass has the same parameter.)

Using a hidden class

Once created and linked, a hidden class can be used via the returned Class object. The hidden class can be instantiated and its members accessed as if it was an ordinary class, except for two restrictions:

  1. The Class object is not modifiable by instrumentation agents, and cannot be redefined or retransformed by JVM TI agents. (However, JVM TI and JDI will be extended to support hidden classes, such as testing whether a class is hidden and including hidden classes in the list of all loaded classes.)
  2. getCanonicalName returns null, indicating the hidden class has no canonical name. (The Class object for an anonymous class in the Java language has the same behavior.)

It is important to realize that the only way for other classes to use a hidden class is indirectly, via the Class object. The hidden class cannot be used directly by bytecode instructions in other classes because it cannot be referenced nominally (that is, by name, "statically"). For example, suppose a framework learns of a hidden class named com.example.Foo/1234, and manufactures a class D that attempts to instantiate it. Code in D would contain a new instruction that points to a constant pool entry in D that denotes the name in internal form, com/example/Foo/1234. The JVM would resolve the constant pool entry by converting the name in internal form to a binary name, com.example.Foo.1234, and trying to load a class of that name. Since the hidden class is not named com.example.Foo.1234, class loading will fail. The hidden class is not truly anonymous, since its name is exposed, but it is effectively invisible.

Without the ability of the constant pool to refer nominally to a hidden class, there is no way to use a hidden class as a superclass, field type, return type, or parameter type. This lack of usability is reminiscent of anonymous classes in the Java language, but hidden classes go further: an anonymous class can enclose other classes in order to let them access its members, but a hidden class cannot enclose other classes (their ClassFile attributes cannot name it). Even a hidden class is unable to use itself as a field type, return type, or parameter type in its own field and method declarations.

Importantly, code in a hidden class can use the hidden class directly, without relying on the Class object. This is because bytecode instructions in a hidden class can refer to the hidden class symbolically (without concern for its name) rather than nominally. For example, a new instruction in a hidden class can instantiate the hidden class via a constant pool entry which refers directly to the this_class item in the current ClassFile. Other instructions, such as getstatic, getfield, putstatic, putfield, invokestatic, and invokevirtual, can access members of the hidden class via the same constant pool entry. Direct use inside the hidden class is important because it simplifies generation of hidden classes by language runtimes and frameworks.

A hidden class generally has the same powers of reflection as an ordinary class: code in a hidden class may define ordinary classes and hidden classes (via ClassLoader::defineClass and Lookup::defineHiddenClass), and may reflectively manipulate ordinary classes and hidden classes (via their Class objects). A hidden class may even act as a lookup class, that is, code in a hidden class may obtain a lookup object on itself, which helps with hidden nestmates (see below).

Hidden classes in stack traces

Methods of hidden classes are not shown in stack traces by default. They represent implementation details of language runtimes, and are never expected to be useful to developers diagnosing application issues. However, they can be included in stack traces with -XX:+UnlockDiagnosticVMOptions -XX:+ShowHiddenFrames.

There are three APIs which reify stack traces: Throwable::getStackTrace, Thread::getStackTrace and the newer StackWalker API introduced in Java SE 9. For the Throwable::getStackTrace and Thread::getStackTrace API, stack frames for hidden classes are omitted by default; they can be included with the same options as for stack traces above. For the StackWalker API, stack frames for hidden classes should be included by a JVM implementation only if SHOW_HIDDEN_FRAMES is set. This allows stack trace filtering to omit unnecessary information when developers are diagnosing application issues, as requested by JDK-8212620.

Hidden nestmate classes

Introduced to the JVM in Java SE 11, a nest is a set of classes that allow access to each other's private members, without any of the backdoor accessibility-broadening methods usually associated with nested classes in the Java language. The set is defined statically: one class serves as the nest host, its class file enumerating the other classes that are nest members; in turn, the nest members indicate in their class files which class hosts the nest. While static membership works well for class files generated from Java source code, it is usually insufficient for class files generated dynamically by language runtimes. To help such runtimes, and to encourage the use of Lookup::defineHiddenClass over Unsafe::defineAnonymousClass, a hidden class can join a nest at run time; an ordinary class cannot.

A hidden class can be created as a member of an existing nest by passing the NESTMATE enum constant to Lookup::defineHiddenClass. The nest which the hidden class joins is not determined by an argument to Lookup::defineHiddenClass. Instead, the nest to be joined is inferred from the lookup class, that is, from the class whose code initially obtained the lookup object: the hidden class is a member of the same nest as the lookup class (see below).

In order for Lookup::defineHiddenClass to add hidden classes to the nest, the lookup object must have the proper permissions, namely PRIVATE and MODULE access. These permissions "prove" that the lookup object was obtained by the lookup class with the intent of allowing other code to expand the nest.

The JVM disallows "nested nests". A member of one nest cannot serve as the host of another nest, regardless of whether nest membership is defined statically or dynamically.

The lookup class's membership of a nest may be indicated statically (via NestHost) if the lookup class is an ordinary class; or may have been set dynamically if the lookup class is a hidden class. Static nest membership is validated lazily. It is important for a language runtime or framework library to be able to add hidden classes to the nest of a lookup class that may have a bad nest membership. As an example, consider the LambdaMetaFactory framework introduced in Java SE 8. When the source code of a class C contains a lambda expression, the corresponding C.class file uses LambdaMetaFactory at run time to define a hidden class that holds the body of the lambda expression and implements the required functional interface. C.class may have a bad NestHost attribute but the execution of C never references the class H named in the NestHost attribute. Since the lambda body may access private members of C, the hidden class needs to be able to access them too; accordingly, LambdaMetaFactory attempts to define the hidden class as a member of the nest hosted by C. If C's static nest membership is valid, then H is C's nest host and the hidden class is added as a member of the nest of H. If C's static nest membership is invalid, C becomes the nest host of its own nest, and the hidden class is added as a member of the nest of C; and the error occurring during static nest membership validation is not propagated.

If a hidden class is created without the NESTMATE enum constant, then the hidden class is the host of its own nest. This aligns with the JVM policy that every class is either a member of a nest with another class as nest host, or else is itself the nest host of a nest. The hidden class can create additional hidden classes as members of its nest: code in the hidden class first obtains a lookup object on itself, then invokes Lookup::defineHiddenClass on the object and passes the NESTMATE enum constant.

Given the Class object for a hidden class created as a member of a nest, Class::getNestHost and Class::isNestmateOf will work as expected. Class::getNestMembers can be called on the Class object of any class in the nest - whether member or host, whether ordinary or hidden - but returns only the members defined statically (that is, the ordinary classes enumerated by NestMembers in the host) along with the nest host.

Class::getNestMembers does not include the hidden classes added to the nest dynamically because hidden classes are non-discoverable and should only be of interest to the code that created them (which knows the nest membership already). This prevents a hidden class from leaking through the nest membership if intended to be private internals.

Weak classes

As specified in JLS 12.7, a class may be unloaded if and only if its defining loader may be reclaimed by the garbage collector. To maximize the chance of unloading a class, it is important to minimize references to its defining loader. This means minimizing reuse of class loaders, so language runtimes will often dedicate an entire class loader to defining one class (or perhaps a small handful of related classes). When all instances of the class are reclaimed, both the class and its defining loader can be reclaimed. However, the resulting large number of "per-class" loaders is demanding on memory, and using ClassLoader::defineClass is considerably slower than Unsafe::defineAnonymousClass according to microbenchmarks.

A better way to loosen the relationship between a class and its defining loader is to introduce the notion of a weak class: a class that is weakly referenced by its defining loader. When all instances of the weak class are reclaimed and the weak class is no longer strongly reachable, it may be unloaded even though its defining loader is still alive. Language runtimes that use weak classes will see an improvement in both footprint and performance.

To capitalize on the loose connection between a hidden class and its defining loader, and to encourage the use of Lookup::defineHiddenClass over Unsafe::defineAnonymousClass, a hidden class can be created as a weak class; an ordinary class cannot.

A hidden class can be created as a weak class via the WEAK class option to Lookup::defineHiddenClass. A hidden nestmate class can also be a weak class.

Alternatives

There is no alternative to injecting a nestmate at run time besides keeping the existing workaround to generate package-private access bridges for the proxy class to access private members of a target class. There is no alternative to hide a class from other classes if it is visible to a class loader.

Testing

LambdaMetaFactory, StringConcatFactory, and Nashorn will be updated to use the new APIs. Performance testing will be run to ensure that no regression on lambda linkage and string concatenation.

Unit tests for the new APIs will be developed.

Risks and Assumptions

We assume that developers who currently use Unsafe::defineAnonymousClass will be able to migrate to Lookup::defineHiddenClass easily. Developers should be aware of four minor constraints on the functionality of hidden classes relative to VM-anonymous classes.

(1) Protected access

Surprisingly, a VM-anonymous class can access protected members of its host class even if the VM-anonymous class exists in a different run-time package and is not a subclass of the host class. In contrast, access control rules are applied properly for hidden classes: a hidden class can only access protected members of another class if the hidden class is in the same run-time package as, or a subclass of, the other class. There is no special access for a hidden class to the protected members of the lookup class.

(2) Constant pool patching

A VM-anonymous class can be defined with its constant pool entries already resolved to concrete values. This allows critical constants to be shared between a VM-anonymous class and the language runtime that defines it, and between multiple VM-anonymous classes. For example, a language runtime will often have MethodHandle objects in its address space that would be useful to newly-defined VM-anonymous classes. Instead of the runtime "serializing" the objects to constant pool entries in VM-anonymous classes, then generating bytecode in those classes to laboriously ldc the entries, the runtime can simply supply Unsafe::defineAnonymousClass with references to its live objects. The relevant constant pool entries in the newly-defined VM-anonymous class are pre-linked to those objects, improving performance and reducing footprint. In addition, this allows VM-anonymous classes to refer to each other: constant pool entries in a class file are based on names, so cannot refer to nameless VM-anonymous classes, but a language runtime can easily track the live Class objects for its VM-anonymous classes and supply them to Unsafe::defineAnonymousClass, thus pre-linking the new class's constant pool entries to other VM-anonymous classes. The Lookup::defineHiddenClass API will not have these capabilities because a future project may offer pre-linking of constant pool entries to all classes uniformly.

(3) Self-control of optimization

VM-anonymous classes were designed on the assumption that only JDK code would define them. Consequently, VM-anonymous classes have an unusual ability that was previously available only to classes in the JDK: control of their own optimization by the HotSpot JVM. Control is exerted through annotation attributes in a VM-anonymous class's defining bytes: @ForceInline or @DontInline causes HotSpot to always-inline or never-inline a method, while @Stable causes HotSpot to treat a non-null field as a foldable constant. However, very few of the VM-anonymous classes dynamically defined by JDK code have needed this ability. It is even possible that future enhancements in the Java SE Platform will make these HotSpot optimizations obsolete. Accordingly, hidden classes will not have the ability to control their optimization, even when defined by JDK code. (This is not thought to present any risk to the migration of JDK code from defining VM-anonymous classes to defining hidden classes.)

As a related matter, VM-anonymous classes can use annotation attributes to prevent their methods appearing in stack traces (@Hidden). Of course, this functionality is automatic for hidden classes, and may be offered to other classes in future.

Migration should take the following into account:

Dependencies

JEP 181 introduces the nest-based access control context where all classes and interfaces in a nest share private access among the nestmates.