JEP draft: Optimized invocation of String::format and Objects::hash

AuthorBrian Goetz
OwnerJim Laskey
TypeFeature
ScopeSE
StatusSubmitted
Componenttools
Discussionamber dash dev at openjdk dot java dot net
Reviewed byAlex Buckley, Brian Goetz, Vicente Arturo Romero Zaldivar
Endorsed byAlex Buckley
Created2018/06/25 21:23
Updated2018/11/12 13:36
Issue8205637

Summary

Enable Java compilers to use alternate translation strategies, such as invokedynamic, for the invocation of certain JDK methods designated as compiler intrinsic candidates, with the goal of providing substantially improved runtime performance for these methods. Specifically, intrinsify the invocation of String::format and Objects::hash.

Motivation

In most cases, the JVM does an excellent job of optimizing bytecode at run time. However, for certain kinds of methods, the Java compiler's standard translation strategy results in bytecode which is hard to optimize. A prime example is String::format, whose signature is:

public static String format(String formatString, Object... args) { ... }

The bytecode that javac generates for an invocation of String::format is hard to optimize, despite the best efforts of the JVM's JIT compiler. It is common to have primitive arguments; they must be boxed. A varargs array must be created and initialized with all the arguments. The format string will almost always be a constant string, but it is parsed every time by the implementation of String::format. That implementation is, unsurprisingly, too large to inline. As a result, the bytecode is much slower than we'd like.

Methods like String::format and Objects::hash (which has a similar signature) are critically important, as they are concise and reliable ways to implement toString and hashCode. Some developers shy away from using these methods, and instead use more verbose and error-prone mechanisms purely out of performance considerations. By optimizing the invocation of String::format and Objects::hash, the most readable and maintainable way to implement toString and hashCode also becomes the most performant way.

JEP 280 replaced the translation of string concatenation with invokedynamic, resulting in faster bytecode, less allocation churn, and more uniform optimizability. We can apply the same technique to methods like String::format, by compiling invocations of String::format using an alternate translation strategy that is customized for this specific invocation based information available at compile time, such as the static types and values of the arguments present in the invocation.

Goals

Enable JDK developers to (i) tag methods as candidates for compile-time intrinsification, and (ii) describe appropriate alternate translations of intrinsification candidates that conform to the specification of the candidate method.

Non-Goals

It is a non-goal to expose the intrinsification mechanism for use outside the JDK libraries.

Description

There are two separate aspects to enabling compile-time intrinsification:

The first can be accomplished by creating a Java SE annotation @IntrinsicCandidate which JDK library authors use to tag suitable methods as candidates for intrinsification. A compiler is thereby authorized to select an alternate, but behavior preserving, translation for invocations of those methods. This specifies only that a compiler may do so, not how a compiler does it. JLS 13.1 would be updated to be aware of this opt-in.

We propose to accomplish the second in the OpenJDK javac implementation by creating a mechanism for declaring and registering intrinsic processors. They will be invoked when the compiler encounters an invocation of an intrinsic candidate, and will instruct the compiler as to whether and how to replace the standard translation with an optimized translation. Such intrinsification is entirely optional; a compiler may choose to not intrinsify at all, or may choose to provide command-line options for enabling or disabling intrinsification.

We intend to intrinsify String::format (and related methods, such as PrintStream::printf) to avoid the boxing overhead, varargs overhead, and repeated analysis of constant format strings. Consider the following invocation of String::format:

String name = ...
int age = ...
String s = String.format("%s: %d", name, age);

This results in boxing age to an Integer, allocating a varargs array, storing name and the boxed age into the varargs array, and then parsing and interpreting the format string -- on every invocation. When the format string is constant, which it almost always is, the compile-time analysis can select an alternate translation, as follows:

String s = name + ": " + Integer.toString(age);

which can be further optimized to an invokedynamic using the mechanics from JEP 280. Note that neither name nor age need to be constant variables in order to select the alternate translation.

Similarly, invoking Objects::hash has two of the three problems that String::format has: boxing and varargs. The invocation:

int hashCode() { return Objects.hash(name, age); }

will similarly box age and then box name and age into a varargs array (which many varargs methods will defensively copy). However, we could instead translate it as follows:

int hashCode() { return name.hashCode() + 31 * Integer.hashCode(age); }

which avoids these unnecessary costs.

Risks and Assumptions

If not properly implemented, the alternate translation may not be perfectly behaviorally compatible with the specification or original implementation.

Even if properly implemented, an alternate implementation may not properly track changes made to the original implementation in the future.

Even if properly implemented and tracked, maintenance of intrinsic candidate methods and their alternate translations is made more difficult, as changes may need to be made in two places and must be behaviorally identical.