JEP 286: Local-Variable Type Inference

AuthorBrian Goetz
OwnerDan Smith
Created2016/03/08 15:37
Updated2018/03/23 01:48
TypeFeature
StatusClosed / Delivered
Componentspecification / language
ScopeSE
Discussionamber dash dev at openjdk dot java dot net
EffortM
DurationS
Priority3
Reviewed byAlex Buckley, Mark Reinhold
Endorsed byMark Reinhold
Release10
Issue8151454
Relates toJEP 301: Enhanced Enums
JEP 323: Local-Variable Syntax for Lambda Parameters

Summary

Enhance the Java Language to extend type inference to declarations of local variables with initializers.

Goals

We seek to improve the developer experience by reducing the ceremony associated with writing Java code, while maintaining Java's commitment to static type safety, by allowing developers to elide the often-unnecessary manifest declaration of local variable types. This feature would allow, for example, declarations such as:

var list = new ArrayList<String>();  // infers ArrayList<String>
var stream = list.stream();          // infers Stream<String>

This treatment would be restricted to local variables with initializers, indexes in the enhanced for-loop, and locals declared in a traditional for-loop; it would not be available for method formals, constructor formals, method return types, fields, catch formals, or any other kind of variable declaration.

Success Criteria

Quantitatively, we want that a substantial percentage of local variable declarations in real codebases can be converted using this feature, inferring an appropriate type.

Qualitatively, we want that the limitations of local variable type inference, and the motivations for these limitations, be accessible to a typical user. (This is, of course, impossible to achieve in general; not only will we not be able to infer reasonable types for all local variables, but some users imagine type inference to be a form of mind reading, rather than an algorithm for constraint solving, in which case no explanation will seem sensible.) But we seek to draw the lines in such a way that it can be made clear why a particular construct is over the line -- and in such a way that compiler diagnostics can effectively connect it to complexity in the user's code, rather than an arbitrary restriction in the language.

Motivation

Developers frequently complain about the degree of boilerplate coding required in Java. Manifest type declarations for locals are often perceived to be unnecessary or even in the way; given good variable naming, it is often perfectly clear what is going on.

The need to provide a manifest type for every variable also accidentally encourages developers to use overly complex expressions; with a lower-ceremony declaration syntax, there is less disincentive to break complex chained or nested expressions into simpler ones.

Nearly all other popular statically typed "curly-brace" languages, both on the JVM and off, already support some form of local-variable type inference: C++ (auto), C# (var), Scala (var/val), Go (declaration with :=). Java is nearly the only popular statically typed language that has not embraced local-variable type inference; at this point, this should no longer be a controversial feature.

The scope of type inference was significantly broadened in Java SE 8, including expanded inference for nested and chained generic method calls, and inference for lambda formals. This made it far easier to build APIs designed for call chaining, and such APIs (such as Streams) have been quite popular, showing that developers are already comfortable having intermediate types inferred. In a call chain like:

int maxWeight = blocks.stream()
                      .filter(b -> b.getColor() == BLUE)
                      .mapToInt(Block::getWeight)
                      .max();

no one is bothered (or even notices) that the intermediate types Stream<Block> and IntStream, as well as the type of the lambda formal b, do not appear explicitly in the source code.

Local variable type inference allows a similar effect in less tightly structured APIs; many uses of local variables are essentially chains, and benefit equally from inference, such as:

var path = Paths.get(fileName);
var bytes = Files.readAllBytes(path);

Description

For local variable declarations with initializers, enhanced for-loop indexes, and index variables declared in traditional for loops, allow the reserved type name var to be accepted in place of manifest types:

var list = new ArrayList<String>(); // infers ArrayList<String>
var stream = list.stream();         // infers Stream<String>

The identifier var is not a keyword; instead it is a reserved type name. This means that code that uses var as a variable, method, or package name will not be affected; code that uses var as a class or interface name will be affected (but these names are rare in practice, since they violate usual naming conventions).

Forms of local variable declarations that lack initializers, declare multiple variables, have extra array dimension brackets, or reference the variable being initialized are not allowed. Rejecting locals without initializers narrows the scope of the feature, avoiding "action at a distance" inference errors, and only excludes a small portion of locals in typical programs.

The inference process, substantially, just gives the variable the type of its initializer expression. Some subtleties:

Applicability and impact

Scanning the OpenJDK code base for local variable declarations, we found that 13% cannot be written using var, since there is no initializer, the initializer has the null type, or (rarely) the initializer requires a target type. Among the remaining local variable declarations:

Alternatives

We could continue to require manifest declaration of local variable types.

Rather than supporting var, we could limit support to uses of diamond in variable declarations; this would address a subset of the cases addressed by var.

The design described above incorporates several decisions about scope, syntax, and non-denoteable types; alternatives for those choices which were also considered are documented here.

Scope Choices

There are several other ways we could have scoped this feature. We considered restricting the feature to effectively final locals (val). However, we backed off from this position because:

On the other hand, we could have expanded this feature to include the local equivalent of "blank" finals (i.e., not requiring an initializer, instead relying on definite assignment analysis.) We chose the restriction to "variables with initializers only" because it covers a significant fraction of the candidates while maintaining the simplicity of the feature and reducing "action at a distance" errors.

Similarly, we also could have taken all assignments into account when inferring the type, rather than just the initializer; while this would have further increased the percentage of locals that could exploit this feature, it would also increase the risk of "action at a distance" errors.

Syntax Choices

There was a diversity of opinions on syntax. The two main degrees of freedom here are what keywords to use (var, auto, etc), and whether to have a separate new form for immutable locals (val, let). We considered the following syntactic options:

After gathering substantial input, var was clearly preferred over the Groovy, C++, or Go approaches. There was a substantial diversity of opinion over a second syntactic form for immutable locals (val, let); this would be a tradeoff of additional ceremony for additional capture of design intent. In the end, we chose to support only var. Some details on the rationale can be found here.

Non-denotable types

Sometimes the type of the initializer is a non-denotable type, such as a capture variable type, intersection type, or anonymous class type. In such cases, we have a choice of whether to i) infer the type, ii) reject the expression, or iii) infer a denotable supertype.

Compilers (and attentive programmers!) must already be comfortable reasoning about non-denoteable types. However, their use as the types of local variables will significantly increase their exposure, revealing compiler/specification bugs and forcing programmers to confront them more frequently. Pedagogically, it's nice to have a simple syntactic transformation between explicitly-typed and implicitly-typed declarations.

That said, it's unhelpfully pedantic to simply reject initializers with non-denotable types (often to the surprise of the programmer, such as in the declaration var c = getClass()). And a mapping to a supertype can be unexpected and lossy.

These considerations led us to different answers:

Risks and Assumptions

Risk: Because Java already does significant type inference on the RHS (lambda formals, generic method type arguments, diamond), there is a risk that attempting to use var on the LHS of such an expression will fail, and possibly with difficult-to-read error messages.

We've mitigated this by using simplified error messages when the LHS is inferred.

Examples:

Main.java:81: error: cannot infer type for local
variable x
        var x;
            ^
  (cannot use 'val' on variable without initializer)

Main.java:82: error: cannot infer type for local
variable f
        var f = () -> { };
            ^
  (lambda expression needs an explicit target-type) 

Main.java:83: error: cannot infer type for local
variable g
        var g = null;
            ^
  (variable initializer is 'null')

Main.java:84: error: cannot infer type for local
variable c
        var c = l();
            ^
  (inferred type is non denotable)

Main.java:195: error: cannot infer type for local variable m
        var m = this::l;
            ^
  (method reference needs an explicit target-type)

Main.java:199: error: cannot infer type for local variable k
        var k = { 1 , 2 };
            ^
  (array initializer needs an explicit target-type)

Risk: source incompatibilities (someone may have used var as a type name.)

Mitigated with reserved type names; names like var do not conform to the naming conventions for types, and therefore are unlikely to be used as types. The name var is commonly used as an identifier; we continue to allow this.

Risk: reduced readability, surprises when refactoring.

Like any other language feature, local variable type inference can be used to write both clear and unclear code; ultimately the responsibility for writing clear code lies with the user. See the style guidelines for using var.