JEP 286: Local-Variable Type Inference

AuthorBrian Goetz
OwnerDan Smith
Created2016/03/08 15:37
Updated2016/12/06 18:08
TypeFeature
StatusCandidate
Componentspecification / language
ScopeSE
Discussionplatform dash jep dash discuss at openjdk dot java dot net
EffortM
DurationS
Priority3
Reviewed byAlex Buckley, Mark Reinhold
Endorsed byMark Reinhold
Issue8151454
Relates toJEP 301: Enhanced Enums

Summary

Enhance the Java Language to extend type inference to declarations of local variables with initializers.

Goals

We seek to improve the developer experience by reducing the ceremony associated with writing Java code, while maintaining Java's commitment to static type safety, by allowing developers to elide the often-unnecessary manifest declaration of local variable types. This feature would allow, for example, declarations such as:

var list = new ArrayList<String>();  // infers ArrayList<String>
var stream = list.stream();          // infers Stream<String>

This treatment would be restricted to local variables with initializers, indexes in the enhanced for-loop, and locals declared in a traditional for-loop; it would not be available for method formals, constructor formals, method return types, fields, catch formals, or any other kind of variable declaration.

Success Criteria

Quantitatively, we want that a substantial percentage of local variable declarations in real codebases can be converted using this feature, inferring an appropriate type.

Qualitatively, we want that the limitations of local variable type inference, and the motivations for these limitations, be accessible to a typical user. (This is, of course, impossible to achieve in general; not only will we not be able to infer reasonable types for all local variables, but some users imagine type inference to be a form of mind reading, rather than an algorithm for constraint solving, in which case no explanation will seem sensible.) But we seek to draw the lines in such a way that it can be made clear why a particular construct is over the line -- and in such a way that compiler diagnostics can effectively connect it to complexity in the user's code, rather than an arbitrary restriction in the language.

Motivation

Developers frequently complain about the degree of boilerplate coding required in Java. Manifest type declarations for locals are often perceived to be unnecessary or even in the way; given good variable naming, it is often perfectly clear what is going on.

The need to provide a manifest type for every variable also accidentally encourages developers to use overly complex expressions; with a lower-ceremony declaration syntax, there is less disincentive to break complex chained or nested expressions into simpler ones.

Nearly all other popular statically typed "curly-brace" languages, both on the JVM and off, already support some form of local-variable type inference: C++ (auto), C# (var), Scala (var/val), Go (declaration with :=). Java is nearly the only popular statically typed language that has not embraced local-variable type inference; at this point, this should no longer be a controversial feature.

The scope of type inference was significantly broadened in Java SE 8, including expanded inference for nested and chained generic method calls, and inference for lambda formals. This made it far easier to build APIs designed for call chaining, and such APIs (such as Streams) have been quite popular, showing that developers are already comfortable having intermediate types inferred. In a call chain like:

int maxWeight = blocks.stream()
                      .filter(b -> b.getColor() == BLUE)
                      .mapToInt(Block::getWeight)
                      .max();

no one is bothered (or even notices) that the intermediate types Stream<Block> and IntStream, as well as the type of the lambda formal b, do not appear explicitly in the source code.

Local variable type inference allows a similar effect in less tightly structured APIs; many uses of local variables are essentially chains, and benefit equally from inference, such as:

var path = Path.of(fileName);
var fileStream = new FileInputStream(path);
var bytes = Files.readAllBytes(fileStream);

Description

For local variable declarations with initializers, enhanced for-loop indexes, and index variables declared in traditional for loops, allow the reserved type name var to be accepted in place of manifest types:

var list = new ArrayList<String>(); // infers ArrayList<String>
var stream = list.stream();         // infers Stream<String>

The type is inferred based on the type of the initializer. If there is no initializer, the initializer is the null literal, or the initializer is a poly expression that requires a target type (lambda, method reference, implicit array initializer), then the declaration is rejected.

The identifier var is not a keyword; instead it is a reserved type name or a context-sensitive keyword. This means that code that uses var as a variable, method, or package name will not be affected; code that uses var as a class or interface name will be affected (but these names are rare in practice, since they violate usual naming conventions).

Excluding locals with no initializers eliminates "action at a distance" inference errors, and only excludes a small portion of locals in typical programs.

Elided type arguments

In addition to var, we may consider supporting partially-typed local variable declarations that elide type arguments with a "diamond":

Collection<> coll = Arrays.asList(1, 2, 3);

In this case, after determining the type of the initializer, the type is mapped to whatever supertype it has that is a parameterization of Collection.

Non-denotable types

Sometimes the type of the initializer is a non-denotable type, such as a capture variable. In this case, the algorithm may be designed to either i) infer the type, ii) reject the expression, or iii) infer a denotable supertype.

Inferring the type adds new expressive power to the language, but the results may sometimes be surprising.

For example:

void test(List<?> l1, List<?> l2) {
    var l3 = l1; // List<CAP> or List<?>?
    l3 = l2; // error?
    l3.add(l3.get(0)); // error?
}

There will probably not be a uniform strategy applied to all non-denotable types. Instead, considering them case-by-case:

Applicability and impact

Scanning the OpenJDK code base for local variable declarations, we found that 13% cannot be written using var, since there is no initializer, the initializer has the null type, or (rarely) the initializer requires a target type. Among the remaining local variable declarations:

Alternatives

We could continue to require manifest declaration of local variable types.

Rather than supporting var, we could limit support to uses of diamond in variable declarations; this would address a subset of the cases addressed by var.

The design described above incorporates several decisions about scope, syntax, and non-denoteable types; alternatives for those choices which were also considered are documented here.

Scope Choices

There are several other ways we could have scoped this feature. One, which we considered, was restricting the feature to effectively final locals (val). However, we backed off from this position because:

On the other hand, we could have expanded this feature to include the local equivalent of "blank" finals (i.e., not requiring an initializer, instead relying on definite assignment analysis.) We chose the restriction to "variables with initializers only" because it covers a significant fraction of the candidates while maintaining the simplicity of the feature and reducing "action at a distance" errors.

Similarly, we also could have taken all assignments into account when inferring the type, rather than just the initializer; while this would have further increased the percentage of locals that could exploit this feature, it would also increase the risk of "action at a distance" errors.

Syntax Choices

There will inevitably be a diversity of opinions on syntax. The two main degrees of freedom here are what keywords to use (var, auto, etc), and whether to have a separate new form for immutable locals (val, let). We considered the following syntactic options:

Whether or not to have a second form for immutable locals (val, let) is a tradeoff of additional ceremony for additional capture of design intent. We already have effectively-immutable analysis for lambda and inner class capture, and the majority of local variables are already effectively immutable. Some people like that var and val are so similar, so that the difference recedes into the background when reading code, while others find them distractingly similar. Similarly, some like that var and let are clearly different, while others find the difference distracting. (If we are to support new forms, they should ideally have equal syntactic weight (both val and let qualify), so that laziness is less likely to entice users to omit the additional declaration of immutability.)

Auto is a viable choice, but Java developers are more likely to have experience with Javascript, C#, or Scala than they are with C++, so we do not gain much by emulating C++ here.

Using const or final seems initially attractive because it doesn't involve new keywords. However, going in this direction effectively closes the door on ever doing inference for mutable locals. Using def has the same defect.

The Go syntax (a different kind of assignment operator) seems pretty un-Javaish.

Non-Denotable Types

We considered simply reporting an error whenever the type of the initializer is non-denotable.

Arguments for rejecting them include:

Arguments for accepting them include:

While we were initially drawn to the "reject them" approach, we found that there were a significant class of cases involving capture variables that users would ultimately find to be mystifying restrictions. For example, when inferring

var c = Class.forName("com.foo.Bar")

inference produces a capture type Class<CAP>, even though the type of this expression is "obviously" Class<?>.

Risks and Assumptions

Risk: Because Java already does significant type inference on the RHS (lambda formals, generic method type arguments, diamond), there is a risk that attempting to use var on the LHS of such an expression will fail, and possibly with difficult-to-read error messages.

We've mitigated this by using simplified error messages when the LHS is inferred.

Examples:

Main.java:81: error: cannot infer type for local
variable x
        var x;
            ^
  (cannot use 'val' on variable without initializer)

Main.java:82: error: cannot infer type for local
variable f
        var f = () -> { };
            ^
  (lambda expression needs an explicit target-type) 

Main.java:83: error: cannot infer type for local
variable g
        var g = null;
            ^
  (variable initializer is 'null')

Main.java:84: error: cannot infer type for local
variable c
        var c = l();
            ^
  (inferred type is non denotable)

Main.java:195: error: cannot infer type for local variable m
        var m = this::l;
            ^
  (method reference needs an explicit target-type)

Main.java:199: error: cannot infer type for local variable k
        var k = { 1 , 2 };
            ^
  (array initializer needs an explicit target-type)

Risk: source incompatibilities (someone may have used var as a type name.)

Mitigated with reserved type names; names like var do not conform to the naming conventions for types, and therefore are unlikely to be used as types. The name var is commonly used as an identifier; we continue to allow this.

Risk: reduced readability, surprises when refactoring.

Like any other language feature, local variable type inference can be used to write both clear and unclear code; ultimately the responsibility for writing clear code lies with the user.

Dependencies

The proper mapping of capture variables to their bounds, termed upward projection, is being designed as part of an effort to address bugs in the type system, JDK-8154901.