JEP 401: Primitive Objects (Preview)

OwnerDan Smith
TypeFeature
ScopeSE
StatusCandidate
Discussionvalhalla dash dev at openjdk dot java dot net
EffortXL
DurationXL
Reviewed byBrian Goetz
Created2020/08/13 19:31
Updated2021/08/10 18:16
Issue8251554

Summary

Enhance the Java object model with user-declared primitive objects, which are class instances that lack object identity and can be stored and passed directly, without object headers or indirections. This is a preview language and VM feature.

Goals

This JEP proposes substantial changes to the Java programming language and the Java virtual machine, including:

Non-Goals

This JEP is concerned with the core treatment of user-declared primitive classes and primitive types. Additional features to improve integration with the Java programming language are not covered here, but are expected to be developed in parallel. Specifically:

An important followup effort, not covered by these JEPs, will enhance the JVM to specialize generic classes and bytecode for different primitive value type layouts.

Other followup efforts may enhance existing APIs to take advantage of primitive objects, or introduce new language features and APIs built on top of primitive objects.

Motivation

Java developers work with two kinds of values: basic primitives (i.e., numeric and boolean values) and references to objects.

Primitives offer better performance, because they are typically stored directly (without headers or pointers) in variables, on the computation stack, and, ultimately, in CPU registers. Hence, memory reads do not have additional indirections, primitive arrays are stored densely and contiguously in memory, primitive values do not require garbage collection, and primitive operations are performed within the CPU.

Object references offer better abstractions, including fields, methods, access control, instance validation, nominal typing, and subtype polymorphism. They also come with identity, enabling features such as field mutation and locking.

In certain domains, developers need the kind of performance offered by primitives, but in exchange have to give up the valuable abstractions of object-oriented programming. This can lead to bugs such as misinterpreting untyped numbers or mishandling arrays of heterogeneous data. (The loss of the Mars Climate Orbiter dramatically illustrates the potential costs of such bugs.)

Ideally, we would like Java virtual machines to run object-oriented code with primitive-like performance. Unfortunately, object identity is a major impediment to such optimizations, even though many objects do not actually need identity. Without identity, JVMs would be free to treat objects in much the same way they treat the basic primitives—stored directly in variables and operated on directly in the CPU.

Concrete examples of objects that do not need identity and would benefit from primitive-like performance include:

We can also expect that new programming patterns and API designs will evolve as it becomes practical for programs to operate on many more objects.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

Primitive objects and classes

A primitive object is a class instance that does not have identity. That is, a primitive object does not have a fixed memory address or any other property to distinguish it from other instances of the same class whose fields store the same values. Primitive objects cannot mutate their fields or be used for synchronization. The == operator on primitive objects compares their fields. Concrete classes whose instances are primitive objects are called primitive classes.

An identity object is a class instance or array that does have identity—the traditional behavior of objects in Java. An identity object can mutate its non-final fields and is associated with a synchronization monitor. The == operator on identity objects compares their identities. Concrete classes whose instances are identity objects are called identity classes.

Primitive class declarations

A class can be declared primitive with the primitive contextual keyword. Such a class is also implicitly final and must not be abstract.

A class is an identity class if it is neither primitive nor abstract.

primitive class Point implements Shape {
    private double x;
    private double y;

    public Point(double x, double y) {
        this.x = x;
        this.y = y;
    }

    public double x() { return x; }
    public double y() { return y; }

    public Point translate(double dx, double dy) {
        return new Point(x+dx, y+dy);
    }

    public boolean contains(Point p) {
        return equals(p);
    }
}

interface Shape {
    boolean contains(Point p);
}

A primitive class declaration is subject to the following restrictions:

In most other ways, a primitive class declaration is just like an identity class declaration. It can have superinterfaces, type parameters, enclosing instances, inner classes, overloaded constructors, static members, and the full range of access restrictions on its members.

Working with primitive objects

Primitive objects are created with normal class instance creation expressions.

Point p1 = new Point(1.0, -0.5);

Instance fields and methods of primitive classes are accessed as usual.

Point p2 = p1.translate(p1.y(), 0.0);

Primitive classes can inherit methods from superclasses and superinterfaces, or they can override them. Instances can be assigned to superclass and superinterface types.

System.out.println(p2.toString());
Shape s = p2;
assert !s.contains(p1);

The == operator compares primitive objects in terms of their field values, not object identity. Fields with basic primitive types are compared by their bit patterns. Other field values—both identity and primitive objects—are recursively compared with ==.

assert new Point(1.0, -0.5) == p1;
assert p1.translate(0.0, 0.0) == p1;

The equals, hashCode, and toString methods, if inherited from Object, along with System.identityHashCode, behave consistently with this definition of equality.

Point p3 = p1.translate(0.0, 0.0);
assert p1.equals(p3);
assert p1.hashCode() == p3.hashCode();
assert System.identityHashCode(p1) == System.identityHashCode(p3);
assert p1.toString().equals(p3.toString());

Attempting to synchronize on a primitive object results in an exception.

Object obj = p1;
try { synchronized (obj) { assert false; } }
catch (RuntimeException e) { /* expected exception */ }

The PrimitiveObject and IdentityObject interfaces

We introduce two new interfaces as essential preview APIs:

All primitive classes implicitly implement PrimitiveObject. All identity classes—including all preexisting concrete classes in the Java ecosystem—implicitly implement IdentityObject. Array types are also subtypes of IdentityObject.

These interfaces help to distinguish between identity objects and primitive objects in three ways:

An interface can explicitly extend either IdentityObject or PrimitiveObject if the author determines that all implementing objects are expected to have or not have identity. It is an error if a class ends up implementing both interfaces implicitly, explicitly, or by inheritance. By default, an interface extends neither of these interfaces and can be implemented by both kinds of concrete classes.

An abstract class can similarly be declared to implement either IdentityObject or PrimitiveObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject (perhaps with a warning). Otherwise, the abstract class extends neither interface and can be extended by both kinds of concrete classes.

The class Object implements neither IdentityObject nor PrimitiveObject, but is effectively, and perhaps explicitly, abstract. (As described above, concrete classes always implement one or the other.) Calls to new Object() are re-interpreted as instance creations of a new, empty identity subclass of Object (name TBD).

Primitive values and references

Primitive objects can be stored in variables and operated on directly, without headers or pointers, as primitive values. The types of these values are called primitive value types.

Primitive objects can also be stored and operated on as references to objects. The types of these references are called primitive reference types.

Thus, there are two distinct types—a value type and a reference type—associated with each primitive class. Instances of the class are handled directly or by reference, depending upon which type is used.

Primitive value types

The name of a primitive class typically denotes that class's primitive value type (but see the discussion about reference-favoring classes below). Unlike a traditional class type, the values of a primitive value type are not references to objects, but the objects themselves. This has two important consequences:

Primitive value types are monomorphic—all of the values of a type are instances of the same class and have the same layout.

Primitive class instance creation expressions have primitive value types. So do this expressions when they appear in the body of a primitive class.

Primitive value types allow field and method accesses, as illustrated above. They also support the == and != operators when comparing two values of the same type.

An expression of a primitive value type cannot be used as the operand of a synchronized statement.

The basic primitive types int, boolean, double, etc., are not affected by this JEP, and are a distinct kind of type.

Default values of primitive value types

Every type has a default value, used to populate newly-allocated fields and array components of that type. The default value of every reference type is null, and the default value of each basic primitive type is 0 or false. The default value of a primitive class's value type is that class's default instance, which is the instance produced by setting each of the class's fields to that field's default value.

The expression Point.default refers to the default instance of primitive class Point.

assert new Point(0.0, 0.0) == Point.default;
Point[] ps = new Point[100];
assert ps[33] == Point.default;

Note that the default instance of a primitive class is created without invoking any constructors or instance initializers, and is available to anyone with access to the class (but see Enforcing instance validation, below). Primitive classes are not able to define a default instance that sets fields to something other than their default values.

Reference types

As usual, a variable of a reference type holds either a reference to an object or null. But, in general, the referenced object may now be either an identity object or a primitive object.

The primitive reference type of a primitive class is typically spelled with the class name followed by .ref. The values of a primitive reference type are references to instances of the named class, or null. A primitive reference type is a subtype of all of the named class's declared supertypes.

Point pi; // stores a Point object
Point.ref pr; // stores a reference to a Point
Shape s; // stores a reference to a Shape, which may be a Point

Many programs that work with primitive objects will not need to explicitly mention primitive reference types like Point.ref, but these types are an important part of the model, so Java programmers should understand them.

A primitive class's reference type has the same members as the class's value type, and supports all the usual operations of reference types. In particular, the runtime behaviors of == and the methods of Object are the same whether operating on the primitive object as a value or a reference.

It is an error if the operand type of a synchronized statement is any subtype of PrimitiveObject, including any primitive reference type.

References to primitive objects are created by primitive reference conversions from primitive values. Like boxing conversions, primitive reference conversions are implicit in the Java language. But primitive reference conversions can be much lighter-weight, because they do not introduce a new identity.

Point p1 = new Point(3.0, -2.1);
Point.ref[] prs = new Point.ref[1];
prs[0] = p1; // convert Point to Point.ref

A primitive value conversion, like an unboxing conversion, converts from a reference to a primitive value, throwing an exception if the reference is null.

Point p2 = prs[0]; // Convert Point.ref to Point
prs[0] = null;
p2 = prs[0]; // NullPointerException

A method invocation may perform an implicit primitive reference or value conversion to ensure the type of the receiver matches the expected type of this in the method declaration.

p1.toString(); // Convert Point to Object
Shape s = p1;
s.contains(p1); // Convert Shape to Point

Often, users of a primitive class can simply operate on its primitive values. But references are useful in the following situations:

Java's generics are designed to work only with reference types, but a future JEP will enhance generics to interoperate with primitive value types.

Overload resolution and type argument inference

Primitive reference conversion and primitive value conversion are allowed in loose, but not strict, invocation contexts. This follows the pattern of boxing and unboxing: a method overload that is applicable without applying the conversions takes priority over one that requires them.

void m(Point p, int i) { ... }
void m(Point.ref pr, Integer i) { ... }

void test(Point.ref pr, Integer i) {
    m(pr, i); // prefers the second declaration
    m(pr, 0); // ambiguous
}

Type argument inference also treats primitive reference and value conversions the same as boxing and unboxing. A primitive value passed where an inferred type is expected will lead to a reference-typed inference constraint.

var list = List.of(new Point(1.0, 5.0));
// infers List<Point.ref>

(This inference behavior will change when, in a future JEP, type arguments are allowed to be inferred as primitive value types.)

Array subtyping

Arrays of primitive class instances are covariant—the type Point[] is a subtype of Point.ref[], which is a subtype of Object[].

When a reference is stored in an array of static type Object[], if the array's runtime component type is Point then the operation will perform both an array store check (checking that the reference to an instance of class Point) and a primitive value conversion (converting reference to a primitive value).

Similarly, reading from an array of static type Object[] will cause a primitive reference conversion if the array stores primitive values.

Object replace(Object[] objs, int i, Object val) {
    Object result = objs[i]; // may perform reference conversion
    objs[i] = val; // may perform value conversion
    return result;
}

Point[] ps = new Point[]{ new Point(3.0, -2.1) };
replace(ps, 0, new Point(-2.1, 3.0));
replace(ps, 0, null); // NullPointerException from value conversion

Reference-favoring primitive classes and migration

Some classes could be declared primitive—they are immutable and do not need identity—but expect many of their clients to want to use a "normal" reference class type, in particular without having to make adjustments for the lack of null. The main use case is a class that was declared as an identity class, but could be compatibly refactored into a primitive class. (Many classes in the standard libraries are designated value-based classes in anticipation of such a migration.)

In these cases, a class can be declared primitive, but with a special name (syntax subject to change):

primitive class Time.val {
    ...
}

Here, the type spelled with the syntax Time.val is a primitive value type, while the corresponding primitive reference type is just spelled Time.

Time[] trefs = new Time[]{ new Time(...) };
Time.val t = trefs[0]; // primitive value conversion

Summarizing the relationship between class names and types:

Primitive class kindClass nameValue typeReference type
StandardFooFooFoo.ref
Reference-favoringBarBar.valBar

(Open question: should it be legal to redundantly apply .val to a standard primitive class name, or .ref to a reference-favoring primitive class name?)

Other than the interpretation of the class's name when used as a type, a reference-favoring primitive class is just like any other primitive class.

Authors who intend to migrate an existing identity class to be primitive should keep in mind that, even when the refactored class is reference-favoring, clients will be able to observe some differences:

Enforcing instance validation

Primitive classes have constructors; as usual, a constructor is responsible for initializing the class's fields, and can ensure the fields' values are valid.

By default, there are a few "back doors" that allow the creation of an instance without invoking a constructor for that instance or validating its field values. These include:

Tentative feature: If it is important for correctness, a primitive class may declare that instances must be validated through a constructor call. In this case, the compiler and JVM will ensure that backdoor instance creation is either prevented or detected before any instance methods of the class are executed.

This is specified with a class modifier:

[KEYWORD TBD] primitive class DatabaseConnection {
    private Database db;
    private String user;

    public DatabaseConnection(Database db, String user) {
        // validation code...
        this.db = db;
        this.user = user;
    }

    ...
}

Compilation and run time

Primitive classes are compiled to class files, with special treatment deeply integrated into the Java virtual machine.

class file representation & interpretation

A primitive class is declared in a class file using the ACC_PRIMITIVE modifier (0x0100). (Encoding of modifiers indicating a reference-favoring or validation-enforcing class is TBD.) At class load time, the class is considered to implement the interface PrimitiveObject; an error occurs if a primitive class is not final, has a non-final instance field, or implements—directly or indirectly—IdentityObject. At preparation time, an error occurs if a primitive class has a circularity in its instance field types.

An abstract class that allows primitive subclasses declares this capability in its class file (details TBD). At class load time, an error occurs if the class is not abstract, declares an instance field, declares a synchronized method, or implements—directly or indirectly—IdentityObject.

At class load time, a class (not an interface) is considered to implement the interface IdentityObject if it is not primitive and does not explicitly allow primitive subclasses. Every array type is also considered to implement IdentityObject. It is a load time error if any class or interface implements or extends—directly or indirectly—both PrimitiveObject and IdentityObject.

A primitive class's reference type is represented using the usual L descriptor (LPoint;). The class's value type is represented with a new Q descriptor prefix (QPoint;).

Verification treats a Q type as a subtype of the corresponding L type—e.g., QPoint; is a subtype of LPoint;.

Classes mentioned in Q descriptors of fields and methods are loaded during preparation or perhaps at a later point, but before the first access of that field or method.

A CONSTANT_Class constant pool entry may also describe a primitive value type by using a Q descriptor as a "class name".

(These details about type encodings may evolve.)

The this parameter of a primitive class's instance method has a primitive value verification type.

Primitive value types are one-slot stack values, even though they may represent aggregates of much more than 32 or 64 bits. No particular encoding of primitive objects is mandated. Implementations are free to use different encodings in different contexts, such as stack vs. heap, as long as the values of the objects' fields are preserved. References to primitive objects, on the other hand, should be encoded in a way that is compatible with traditional object references.

Two new opcodes facilitate instance creation:

It is a linkage error to use the opcode new with a primitive class. Instance initialization methods can be declared in a primitive class, but verification prevents their invocation.

A new kind of special method, a factory method, can be declared to return instances of the class. Factory methods are named <new> (or, alternatively, <init> with a non-void return) and are static. They are invoked with invokestatic.

The anewarray and multianewarray instructions can be used to create arrays of primitive value types.

The defaultvalue instruction, and each of the anewarray and multianewarray instructions, when used with a primitive value type, can trigger initialization of the named primitive class. During initialization, a field of a primitive class with a different primitive value type can recursively trigger initialization of the primitive class of that type.

The checkcast, instanceof, and aastore opcodes support primitive value types, performing primitive value conversions (including null checks) when necessary.

The if_acmpeq and if_acmpne operations implement the == test for primitive objects, as described above. The monitorenter instruction throws an exception if applied to a primitive object.

Java language compilation

Primitive reference conversions in the Java language are implicit in bytecode: QPoint; is a subtype of LPoint;. Primitive value conversions are achieved with checkcast.

Constructors of primitive classes compile to factory methods, not instance initialization methods. In the constructor body, the compiler treats this as a mutable local variable, initialized by defaultvalue, modified by withfield, and ultimately returned as the method result.

Core reflection

Every primitive class has a primary java.lang.Class object representing the primitive class. The primary Class object also represents the primitive reference runtime type.

Every primitive class also has a secondary Class object that represents the primitive value runtime type of the class, and otherwise mimics the behavior of the primary Class object.

A new reflective preview API method, Class.isPrimitiveClass, indicates whether a class object corresponds to a primitive class. (This is distinct from isPrimitive, which indicates whether a Class represents a basic primitive type.)

Methods like Class.isValueType, Class.getValueType, and Class.getReferenceType provide a way to distinguish between and access the primary and secondary Class objects.

The getClass method of a primitive object returns the primary Class object.

Value and reference types can be used in Java class literals to refer to the two Class instances:

Primitive class kindPrimary objectSecondary object
StandardFoo.ref.classFoo.class
Reference-favoringBar.classBar.val.class

The method Class.getDeclaredConstructors, and related methods, search for factory methods rather than instance initialization methods when invoked on a primitive class.

Other APIs

The following APIs also gain new behaviors:

Performance model

Because primitive objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collection performance.

In typical usage, developers can expect the following:

Note, however, that no particular optimizations are guaranteed. For a certain primitive class with a large number of fields, for example, a JVM may encode primitive values as references to heap-allocated objects.

Alternatives

JVMs have long performed escape analysis to identify objects that provably do not rely on identity and can be flattened. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization.

Hand-coded optimizations are possible to improve performance, but as noted in the Motivation section, these techniques require giving up valuable abstractions.

We investigated many different approaches to boxing and polymorphism before settling on a model in which primitive class instances are first-class objects, with a few behavioral changes, and reference and value types provide two different views of the same objects. Approaches that impose identity onto boxed primitive objects have nondeterministic behavior. Approaches that intern boxed primitive objects to a canonical memory location are too heavyweight. Approaches that distinguish between an identity class and interface graph (rooted at Object) and a primitive class and interface graph (rooted at some new class or interface) prevent interoperability with the entire body of existing Java code.

The C language and its relatives support flattened storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike primitive objects, the values of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

Risks and Assumptions

The feature makes significant changes to the Java object model—in particular, what it means to be an Object. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. It will be important to validate that such disruptions are rare and tractable.

Some changes could potentially affect the performance of identity objects. The if_acmpeq and aaload instructions, for example, typically only cost one instruction cycle, but will now need an additional check to detect primitive objects. The identity class case should be optimized as the fast path, and we will need to minimize any performance regressions.

There is a security risk that == can indirectly expose private field values to anyone who can create class instances. There are also security risks involved in allowing instance creation outside of constructors, via default instances and non-atomic reads. Developers will need to understand the tools we offer to mitigate these risks, and when it would be unsafe to declare a class primitive.

This JEP does not address the interaction of primitive classes with the basic primitives or generics; these features will be addressed by other JEPs (see below). But, ultimately, all three JEPs will need to be completed to deliver a cohesive language design.

Dependencies

Prerequisites

In support of this JEP, we are working on separate efforts to improve the JVM Specification (in particular its treatment of class file validation) and the Java Language Specification (in particular its treatment of types). These changes address technical debt and facilitate the specification of these new features.

In anticipation of this feature we already added warnings about potential incompatible changes to primitive class candidates to javac and HotSpot, via JEP 390.

In JEP 402 we propose to update the basic primitive types (int, boolean, etc.) to be represented by primitive classes, allowing basic primitive values to become primitive objects. The existing wrapper classes will be repurposed to represent the corresponding types' primitive classes.

In another JEP we will propose modifying the generics model in Java to make type parameters universal—instantiable by all types, both reference and value.

Future work

JVM class and method specialization (JEP 218) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by primitive value types.

Many existing language features and APIs could be enhanced with primitive classes, and many more new features will be enabled.