JEP draft: Primitive Objects (Preview)

OwnerDan Smith
TypeFeature
ScopeSE
StatusDraft
EffortXL
DurationXL
Reviewed byBrian Goetz
Created2020/08/13 19:31
Updated2021/03/04 02:02
Issue8251554

Summary

Enhance the Java object model with user-declared primitive objects, class instances that lack object identity and can be stored and passed directly, without object headers or indirections. This is a preview language and VM feature.

Goals

This JEP proposes substantial changes to the Java Programming Language and Java Virtual Machine, including:

Non-Goals

This JEP is concerned with the core treatment of user-declared primitive classes and primitive types. Additional features to improve integration with the Java Programming Language are not covered here, but are expected to be developed in parallel. Specifically:

An important followup effort, not covered by these JEPs, will enhance the JVM to specialize generic classes and bytecode for different primitive value type layouts.

Other followup efforts may enhance existing APIs to take advantage of primitive objects, or introduce new language features and APIs built on top of primitive objects.

Motivation

Java programmers work with two kinds of values: basic primitives (numeric and boolean values) and references to objects.

Primitives offer better performance, because they are typically stored directly (without headers or pointers) in variables, on the computation stack, and, ultimately, in CPU registers. Hence, memory reads don't have additional indirections, primitive arrays are stored densely and contiguously in memory, primitive values don't require garbage collection, and primitive operations are contained to the CPU.

Object references offer better abstractions—fields, methods, access control, instance validation, nominal typing, subtype polymorphism, etc. They also come with identity, enabling features like field mutation and locking.

In certain domains, programmers need the kind of performance offered by primitives, but in exchange have to give up the valuable abstractions of object-oriented programming. This can lead to bugs like misinterpreting untyped numbers or mishandling an array of heterogeneous data. (The loss of the Mars Climate Orbiter dramatically illustrates the potential costs of such bugs.)

Ideally, we'd like Java Virtual Machines to run object-oriented code with primitive-like performance. Unfortunately, object identity is a major impediment to such optimizations, even though many objects don't actually need identity. Without identity, JVMs would be free to treat objects in much the same way they treat the basic primitives—stored directly in variables and operated on directly in the CPU.

Concrete examples of objects that don't need identity and would benefit from primitive-like performance include:

We can also expect that new programming patterns and API designs will evolve as it becomes practical for programs to operate on many more objects.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

Primitive objects and classes

A primitive object is a class instance that does not have identity. That is, a primitive object does not have a fixed memory address or any other property to distinguish it from other instances of the same class whose fields store the same values. Primitive objects cannot mutate their fields or be used for synchronization. The == operator on primitive objects is a recursive field value comparison. Classes whose instances are primitive objects are called primitive classes.

An identity object is a class instance or array that does have identity—the traditional behavior of objects in Java. An identity object can mutate its fields and is associated with a synchronization monitor. The == operator on identity objects is an identity comparison. Classes whose instances are identity objects are called identity classes.

Primitive class declarations

A class can be declared primitive with the primitive contextual keyword. Such a class is also implicitly final and must not be abstract.

A class is an identity class if it is neither primitive nor abstract (nor the special class Object).

primitive class Point implements Shape {
    private double x;
    private double y;
    
    public Point(double x, double y) {
        this.x = x;
        this.y = y;
    }
    
    public double x() { return x; }
    public double y() { return y; }
    
    public Point translate(double dx, double dy) {
        return new Point(x+dx, y+dy);
    }
    
    public boolean contains(Point p) {
        return equals(p);
    }
}

interface Shape {
    boolean contains(Point p);
}

A primitive class declaration is subject to the following restrictions:

In most other ways, a primitive class declaration is just like an identity class declaration. It can have superinterfaces, type parameters, enclosing instances, inner classes, overloaded constructors, static members, and the full range of access restrictions on its members.

Working with primitive objects

Primitive objects are created with normal class instance creation expressions.

Point p1 = new Point(1.0, -0.5);

Instance fields and methods of primitive classes are accessed as usual.

Point p2 = p1.translate(p1.y(), 0.0);

Primitive classes can inherit methods from superclasses and superinterfaces, or they can override them. Instances can be assigned to superclass and superinterface types.

System.out.println(p2.toString());
Shape s = p2;
assert !s.contains(p1);

The == operator compares primitive objects in terms of their field values, not object identity. Fields with basic primitive types are compared by their bit patterns. Other fields are recursively compared with ==.

assert new Point(1.0, -0.5) == p1;
assert p1.translate(0.0, 0.0) == p1;

The equals, hashCode, and toString methods, if inherited from Object, along with System.identityHashCode, behave consistently with this definition of equality.

Point p3 = p1.translate(0.0, 0.0);
assert p1.equals(p3);
assert p1.hashCode() == p3.hashCode();
assert System.identityHashCode(p1) == System.identityHashCode(p3);
assert p1.toString().equals(p3.toString());

Attempting to synchronize on a primitive object results in an exception.

Object obj = p1;
try { synchronized (obj) { assert false; } }
catch (RuntimeException e) { /* expected exception */ }

The PrimitiveObject and IdentityObject interfaces

There are two new interfaces introduced as essential preview APIs:

All primitive classes implicitly implement PrimitiveObject. All identity classes (including all preexisting concrete classes in the Java ecosystem) implicitly implement IdentityObject. Array types are also subtypes of IdentityObject.

These interfaces help to distinguish between identity objects and primitive objects in three ways:

An interface can explicitly extend either IdentityObject or PrimitiveObject if the author determines that all implementing objects are expected to have or not have identity. It is an error if a class ends up implementing both interfaces (implicitly, explicitly, or by inheritance). By default, an interface extends neither interface and can be implemented by both kinds of concrete classes.

An abstract class can similarly be declared to implement either IdentityObject or PrimitiveObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject. Otherwise, it extends neither interface and can be extended by both kinds of concrete classes.

The class Object behaves like a simple abstract class: it implements neither IdentityObject nor PrimitiveObject. Calls to new Object() are re-interpreted as instance creation of a new, empty identity subclass of Object (name TBD).

Primitive values and references

Primitive objects can be stored in variables and operated on directly (without headers or pointers) as primitive values. The types of these values are called primitive value types.

Primitive objects can also be stored and operated on as references to objects. The types of these references are called primitive reference types.

Thus, there are two distinct types—a value type and a reference type—associated with each primitive class. Instances of the class are handled directly or by reference, depending on which type is used.

Primitive value types

The name of a primitive class denotes that class's primitive value type. Unlike a traditional class type, the values of a primitive value type are not references to objects, but the objects themselves. This has two important consequences:

Primitive value types are monomorphic—all of the values of a type are instances of the same class and have the same layout.

Primitive class instance creation expressions have primitive value types. So do this expressions when they appear in the body of a primitive class.

Primitive value types allow field and method accesses, as illustrated above. They also support the == and != operators when comparing two values of the same type.

An expression of a primitive value type cannot be used as the operand of a synchronized statement.

The basic primitive types int, boolean, double, etc. are not affected by this JEP, but may be considered another kind of primitive value type.

Reference types

As usual, a variable of a reference type holds either a reference to an object or null. But, in general, the referenced object may now be either an identity object or a primitive object.

The primitive reference type of a primitive class is spelled with the class name followed by .ref. The values of a primitive reference type are references to instances of the named class, or null. A primitive reference type is a subtype of all of the named class's declared supertypes.

Point pi; // stores a Point object
Point.ref pr; // stores a reference to a Point
Shape s; // stores a reference to a Shape, which may be a Point

A primitive class's reference type has the same members as the class's value type, and supports all the usual operations of reference types, except that it is (probably) an error for an expression of the reference type to be used as the operand of a synchronized statement.

References to primitive objects are created by primitive reference conversions from primitive values. Like boxing conversions, primitive reference conversions are implicit in the Java language. But primitive reference conversions can be much lighter-weight, because they don't introduce a new identity.

Point p1 = new Point(3.0, -2.1);
Point.ref[] prs = new Point.ref[1];
prs[0] = p1; // convert Point to Point.ref

A primitive value conversion, like an unboxing conversion, converts from a reference to a primitive value, throwing an exception if the reference is null.

Point p2 = prs[0]; // Convert Point.ref to Point
prs[0] = null;
p2 = prs[0]; // NullPointerException

A method invocation may perform an implicit primitive reference or value conversion to ensure the type of the receiver matches the expected type of this in the method declaration.

p1.toString(); // Convert Point to Object
Shape s = p1;
s.contains(p1); // Convert Shape to Point

In many programs, it's unnecessary to operate on primitive references—primitive value types provide all needed functionality. Primitive references are useful in the following situations:

Java's generics are designed to only work with reference types, but a separate JEP will enhance generics to interoperate with primitive value types.

Overload resolution and type argument inference

Primitive reference conversion and primitive value conversion are allowed in loose, but not strict, invocation contexts. This follows the pattern of boxing and unboxing: a method overload that is applicable without applying the conversions takes priority over one that requires them.

void m(Point p, int i) { ... }
void m(Point.ref pr, Integer i) { ... }

void test(Point.ref pr, Integer i) {
    m(pr, i); // prefers the second declaration
    m(pr, 0); // ambiguous
}

Type argument inference also treats primitive reference and value conversions the same as boxing and unboxing. A primitive value passed where an inferred type is expected will lead to a reference-typed inference constraint.

var list = List.of(new Point(1.0, 5.0));
// infers List<Point.ref>

(This inference behavior will change when, in a separate JEP, type arguments are allowed to be inferred as primitive value types.)

Array subtyping

Arrays of primitive class instances are covariant—the type Point[ ] is a subtype of both Point.ref[ ] and Object[ ].

When a reference is stored in an array of static type Object[ ], if the array's runtime component type is Point, the operation will perform both an array store check (checking that the reference to an instance of class Point) and a primitive value conversion (converting reference to a primitive value).

Similarly, reading from an array of static type Object[ ] will cause a primitive reference conversion if the array stores primitive values.

Object replace(Object[] objs, int i, Object val) {
    Object result = objs[i]; // may perform reference conversion
    objs[i] = val; // may perform value conversion
    return result;
}

Point[] ps = new Point[]{ new Point(3.0, -2.1) };
replace(ps, 0, new Point(-2.1, 3.0));
replace(ps, 0, null); // NullPointerException from value conversion

Reference-favoring primitive classes and migration

Some classes could be declared primitive—they're immutable and do not need identity—but expect many of their clients to want to use a "normal" reference class type, in particular without having to make adjustments for the lack of null. The main use case is a class that was declared as an identity class, but would like to be compatibly refactored as a primitive class. (Many classes in the standard libraries are designated value-based classes in anticipation of such a migration.)

In these cases, a class can be declared primitive, but with a special name:

primitive class Time.val {
    ...
}

Here, the type spelled with the syntax Time.val is a primitive value type, while the corresponding primitive reference type is just spelled Time.

Time[] trefs = new Time[]{ new Time(...) };
Time.val t = trefs[0]; // primitive value conversion

Other than the interpretation of the class's name when used as a type, a reference-favoring primitive class is just like any other primitive class.

Authors who intend to migrate an existing identity class to be primitive should keep in mind that, even when the refactored class is reference-favoring, clients will be able to observe some differences:

Default values of primitive value types

Every type has a default value, used to populate newly-allocated fields and array components of that type. The default value of every reference type is null, and the default value of each basic primitive type is 0 or false. The default value of a primitive class's value type is that class's default instance, which is the instance produced by setting each of the class's fields to its default value.

The expression Point.default refers to the default instance of primitive class Point.

assert new Point(0.0, 0.0) == Point.default;
Point[] ps = new Point[100];
assert ps[33] == Point.default;

Note that the default instance of a primitive class is created without invoking any constructors or instance initializers, and is available to anyone with access to the class. Primitive classes are not able to define a default instance that sets fields to something other than their default values.

Enforcing instance validation

Primitive classes have constructors; as usual, a constructor is responsible for initializing the class's fields, and can ensure the fields' values are valid.

By default, there are a few "back doors" that allow the creation of an instance without invoking a constructor for that instance or validating its field values. These include:

(Tentative feature): if it is important for correctness, a primitive class may declare that instances must be validated through a constructor call. In this case, the compiler and JVM will ensure that backdoor instance creation is either prevented or detected before any instance methods of the class are executed.

This is specified with a class modifier:

[KEYWORD TBD] primitive class DatabaseConnection {
    private Database db;
    private String user;
    
    public DatabaseConnection(Database db, String user) {
        // validation code...
        this.db = db;
        this.user = user;
    }
    
    ...
}

Compilation and run time

Primitive classes are compiled to class files, with special treatment deeply integrated into the Java Virtual Machine.

class file representation & interpretation

A primitive class is declared in a class file using the ACC_PRIMITIVE modifier (0x0100). (Encoding of modifiers indicating a reference-favoring or validation-enforcing class is TBD.) At class load time, the class is considered to implement the interface PrimitiveObject; an error occurs if a primitive class is not final, has a non-final instance field, or implements—directly or indirectly—IdentityObject. At preparation time, an error occurs if a primitive class has a circularity in its instance field types.

An abstract class that allows primitive subclasses declares this capability in its class file (details TBD). At class load time, an error occurs if the class is not abstract, declares an instance field, declares a synchronized method, or implements—directly or indirectly—IdentityObject.

At class load time, a class (not an interface) is considered to implement the interface IdentityObject if it is not primitive and does not explicitly allow primitive subclasses. Every array type is also considered to implement IdentityObject. It is a load time error if any class or interface implements or extends—directly or indirectly—both PrimitiveObject and IdentityObject.

Primitive value types are represented in descriptors using a Q prefix rather than the usual L prefix (QPoint;). The corresponding L type (LPoint;) is not intended to be used (details on restrictions TBD).

Verification treats a Q type as a subtype of the named class's superclass type—e.g., QPoint; is a subtype of Ljava/lang/Object;.

Classes mentioned in Q descriptors of fields and methods are loaded during preparation (or perhaps at a later point, but before the first access of that field or method).

A CONSTANT_Class constant pool entry may also describe a primitive value type by using a Q descriptor as a "class name".

(These details about type encodings may evolve.)

The this parameter of a primitive class's instance method has a primitive value verification type.

No particular encoding of primitive objects is mandated. Implementations are free to use different encodings in different contexts, as long as the values of the objects' fields are preserved. References to primitive objects, on the other hand, should be encoded in a way that is compatible with traditional object references.

Two new opcodes facilitate instance creation:

It is a linkage error to use the opcode new with a primitive class. Instance initialization methods can be declared in a primitive class, but verification prevents their invocation.

A new kind of special method, a factory method, can be declared to return instances of the class. Factory methods are named <new> (or, alternatively, <init> with a non-void return) and are static. They are invoked with invokestatic.

The anewarray and multianewarray instructions can be used to create arrays of primitive value types.

Each of the defaultvalue, anewarray, and multianewarray instructions, when used with a primitive value type, can trigger initialization of the named primitive class. During initialization, a field of a primitive class with a (different) primitive value type can recursively trigger initialization of that named primitive class.

The checkcast, instanceof, and aastore opcodes support primitive value types, performing primitive value conversions (including null checks) when necessary.

The if_acmpeq and if_acmpne operations implement the == test for primitive objects, as described above. The monitorenter instruction throws an exception if applied to a primitive object.

Java language compilation

javac encodes primitive reference types like Point.ref as synthetic abstract superclasses of their primitive class, with names like Point$ref. Casts are inserted as needed to access the members of the concrete subclass.

In the case of a reference-favoring primitive class, the simple class name (Time) is given to the abstract superclass, and the concrete subclass has a name like Time$val.

Primitive reference conversions are implicit: QPoint; is a subtype of LPoint$ref;. Primitive value conversions are achieved with checkcast.

Constructors of primitive classes compile to factory methods, not instance initialization methods. In the constructor body, the compiler treats this as a mutable local variable, initialized by defaultvalue, modified by withfield, and ultimately returned as the method result.

Core reflection

The getClass method of a primitive object returns a java.lang.Class object representing the object's primitive class. This Class also represents the object's a primitive value class type.

(Tentatively:) In the case of a reference-favoring primitive class, the class object's name is given as ClassName.val. In the Java language, ClassName.val.class is a supported class literal.

A new reflective preview API method, Class.isPrimitiveClass, indicates whether a class is a primitive class. (This is distinct from isPrimitive, which indicates whether a Class represents a basic primitive type.)

The method Class.getDeclaredConstructors, and related methods, searches for factory methods rather than instance initialization methods when invoked on a primitive class.

(Tentatively:) The result of getSuperclass on a primitive class is a synthetic abstract class used to model the primitive class's reference type. The class object's name is given as ClassName.ref. In the Java language, ClassName.ref.class is a supported class literal.

(Tentatively:) Methods Class.valueType and Class.referenceType provide a convenient way to map from a reference type class object to a value type class object, and vice versa.

Performance model

Because primitive objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collection performance.

In typical usage, programmers can expect the following:

Note, however, that no particular optimizations are guaranteed. For a certain primitive class with a large number of fields, for example, a JVM may encode primitive values as references to heap-allocated objects.

Alternatives

JVMs have long performed escape analysis to identify objects that provably do not rely on identity and can be "flattened". These optimizations are somewhat unpredictable, and do not help with objects that "escape" the scope of the optimization.

Hand-coded optimizations are possible to improve performance, but as noted in the "Motivation" section, these techniques give up valuable abstractions.

We investigated many different approaches to "boxing" and polymorphism before settling on a model in which primitive class instances are first-class objects (with a few behavioral changes), and reference and value types provide two different views of the same objects. Approaches that impose identity onto boxed primitive objects have nondeterministic behavior. Approaches that "intern" boxed primitive objects to a canonical memory location are too heavyweight. Approaches that distinguish between an identity class and interface graph (rooted at Object) and a primitive class and interface graph (rooted at some new class or interface) prevent interoperability with the entire body of existing Java code.

The C language and its relatives support flattened storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike primitive objects, the values of these abstractions have identity, meaning they support operations like field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

Risks and Assumptions

The feature makes significant changes to the Java object model—in particular, what it means to be an Object. Programmers may be surprised by, or encounter bugs due to, changes in the behavior of operations like == and synchronized. It will be important to validate that such disruptions are rare and tractable.

Some changes could potentially affect the performance of identity objects. The if_acmpeq and aaload instructions, for example, typically only cost one instruction cycle, but will now need an additional check to detect primitive objects. The identity class case should be optimized as the "fast path", and we'll need to minimize any performance regressions.

There is a security risk that == can indirectly expose private field values to anyone who can create class instances. There are also security risks involved in allowing instance creation outside of constructors, via default instances and non-atomic reads. Programmers will need to understand the tools we offer to mitigate these risks, and when it would be unsafe to declare a class primitive.

This JEP does not address the interaction of primitive classes with the basic primitives or generics; these features will be addressed by other JEPs (see "related features", below). But, ultimately, all three JEPs will need to be completed to deliver a cohesive language design.

Dependencies

Prerequisites

Separate efforts to improve the JVM Specification (in particular its treatment of class file validation) and the Java Language Specification (in particular its treatment of types) are expected to occur before this JEP, addressing technical debt and facilitating specification of the new features.

Warnings about potential incompatible changes to primitive class candidates have been added to javac and HotSpot by JEP 390, in anticipation of this feature.

In a separate JEP, we anticipate updating the basic primitive types (int, boolean, etc.) to be represented by primitive classes, allowing basic primitive values to become primitive objects. The existing wrapper classes will be repurposed to represent the corresponding types' primitive classes.

In another JEP, we anticipate modifying the generics model in Java to make type parameters universal—instantiable by all types, both reference and value.

Future work

JVM class and method specialization (JEP 218) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by primitive value types.

Many existing language features and APIs could be enhanced with primitive classes, and many more new features will be enabled.