JEP 305: Pattern Matching

AuthorBrian Goetz
OwnerGavin Bierman
Created2017/05/30 19:48
Updated2017/06/16 16:25
TypeFeature
StatusCandidate
Componentspecification / language
ScopeSE
Discussionamber dash dev at openjdk dot java dot net
Priority3
Reviewed byMark Reinhold
Issue8181287

Summary

Enhance the Java programming language with pattern matching. Initial support will include type-test and constant patterns, supported by the switch statement and a matches expression. Follow-on efforts will broaden the range of patterns supported and the language constructs that support them.

Motivation

Nearly every program includes some sort of logic that combines testing if an expression has a certain type or structure, and then conditionally extracting components of its state for further processing. For example, all Java programmers are familiar with the instanceof-and-cast idiom:

if (obj instanceof Integer) {
    int intValue = ((Integer) obj).intValue();
    // use intValue
}

There are three things going on here: a test (is x an Integer), a conversion (casting obj to Integer), and a destructuring (extracting the intValue component from the Integer). This pattern is straightforward and understood by all Java programmers, but is suboptimal for several reasons. It is tedious; doing both the type test and cast should be unnecessary (what else would you do after an instanceof test?). The accidental boilerplate of casting and destructuring obfuscates the more significant logic that follows. But most importantly, the repetition provides opportunities for errors to creep unnoticed into programs.

This problem gets worse when we want to test against multiple possible target types. We can extend the above example using a chain of if...else tests:

String formatted = "unknown";
if (obj instanceof Integer) {
    int i = (Integer) obj;
    formatted = String.format("int %d", i);
}
else if (obj instanceof Byte) {
    byte b = (Byte) obj;
    formatted = String.format("byte %d", b);
}
else if (obj instanceof Long) {
    long l = (Long) obj;
    formatted = String.format("long %d", l);
}
else if (obj instanceof Double) {
    double d = (Double) obj;
    formatted = String.format(“double %f", d);
}
else if (obj instanceof String) {
    String s = (String) obj;
    formatted = String.format("String %s", s);
}

The above code is familiar, but has many undesirable properties. As already mentioned, repeating the cast in each arm is an irritation. The business logic can too easily get lost in the boilerplate. But most importantly, the approach allows coding errors to remain hidden -- because we've used an overly-general control construct. The intent of the above code is to assign something to formatted in each arm of the if...else chain. But, there is nothing here that enables the compiler to verify this actually happens. If some block -- perhaps one that is executed rarely in practice -- forgets to assign to formatted, we have a bug. (Leaving formatted as a blank local or blank final would at least enlist the "definite assignment" analysis in this effort, but this is not always done.) Finally, the above code is less optimizable; absent compiler heroics, it will have O(n) time complexity, even though the underlying problem is often O(1).

Description

Rather than reach for ad-hoc solutions, we believe it is time for Java to embrace pattern matching. Pattern matching is a technique that has been adapted to many different styles of programming languages going back to the 1960s, including text-oriented languages like SNOBOL4 and AWK, functional languages like Haskell and ML, and more recently extended to object-oriented languages like Scala and C#.

A pattern is a combination of a predicate that can be applied to a target, along with a set of binding variables that are extracted from the target if the predicate applies to it. One form of binding pattern is a type test pattern, illustrated below (in conjunction with a theoretical matches operator):

if (x matches Integer i) {
    // can use i here
}

The phrase Integer i is a type test pattern; the i is a declaration of a new variable, not a use of an existing one. The target is tested to see if it is an instance of Integer, and if so, it is cast to Integer and its int component bound to the binding variable i.

As mentioned earlier, the chain of if...else is undesirable because it is using an overly general control construct. We have an existing mechanism for a multi-armed equality test in the language -- switch. But switch is (at present) very limited. You can only switch on a small set of types -- numbers, strings, and enums -- and you can only test for exact equality against constants. But these limitations are mostly accidents of history; the switch statement is a perfect "match" for pattern matching. If we allow a case label to specify a pattern, we can express the above with switch:

String formatted;
switch (obj) {
    case Integer i: formatted = String.format("int %d", i); break;
    case Byte b:    formatted = String.format("byte %d", b); break;
    case Long l:    formatted = String.format("long %d", l); break;
    case Double d:  formatted = String.format(“double %f", d); break;
    case String s:  formatted = String.format("String %s", s); break
    default:        formatted = obj.toString();
}

Now, the intent of the code is far clearer, because we're using the right control construct -- we're saying "the expression obj matches at most one of the following conditions, figure it out and execute the corresponding arm". As a bonus, it is more optimizable too; in this case we are more likely to be able to do the dispatch in O(1) time.

Traditional case labels (comparing against a compile-time constant, such as a number, String, or enum) can be thought of as constant patterns, where a target matches a constant pattern if they are equal according to Object.equals(), and matching a constant pattern produces no bindings.

As an initial target, we aim to support constant patterns and type-test patterms with bindings in the switch statement and in a new matches expression. We may support guards (ancillary boolean expressions that must be true in order to match, such as case String s && !s.isEmpty(), and/or a continue statement in switches.

Future Work

Type-test patterns and patterns in switch are the obvious first steps, but these are only a start. Possible areas for future work (to be the subject of future JEPs) include:

Deconstruction Patterns. Many classes are simply carriers for their data. We construct them with constructors, which take a vector of N arguments and produce an aggregate, but we generally fetch the components one at a time, with accessors. Just as we can combine the type-test/cast/bind operations into a single type-test pattern, we can combine the type-test/cast/extract-multiple into a single deconstruction pattern. If we have a hierarchy of Node with subtypes for IntNode (containing a single int), AddNode and MulNode (containing two nodes), and NegNode (containing a single node), we can match against a Node and act on the specific subtypes all in one step:

int eval(Node n) {
    switch(n) {
        case IntNode(int i): return i;
        case NegNode(Node n): return -eval(n);
        case AddNode(Node left, Node right): return eval(left) + eval(right);
        case MulNode(Node left, Node right): return eval(left) * eval(right);
        default: throw new IllegalStateException(n);
    };
}

Today, to express ad-hoc polymorphic calculations like this, we would use the "Visitor" pattern. Using pattern matching is generally more transparent and straightforward.

Nested Patterns. Patterns compose nicely. In fact, in the above example, we are already using nested patterns; the "arguments" to the deconstruction patterns, such as Node n, are already patterns (in this case, type-test patterns.) If we want to match on an AddNode whose left argument is an IntNode containing a zero, we can just add another level of nesting here:

case AddNode(IntNode(0), Node right)

In this example, we have a deconstruction pattern (AddNode(...)) whose left component must match another deconstruction pattern (IntNode(...)) whose sole component must match the constant pattern 0.

Expression switch. The switch statement is currently a statement, but very often when we are choosing between several alternatives, we want to populate a result and keep going. Having an expression form of switch would eliminate the contortions needed to simulate an expression with a switch statement.

Sealed types. Knowing that the cases of a switch are exhaustive is useful; it means that we need not manually code default arms that should never be executed under normal situations. Being able to seal a hierarchy helps communicate a valuable constraint to clients, and assists the compiler in exhaustiveness analysis.

Alternatives

The benefits of type-test patterns (but not deconstruction patterns) could be obtained by flow typing in if and switch statements, or by a type switch construct. Patterns generalize both of these constructs.

Dependencies

The implementation will likely make use of Dynamic Constants in the JVM (https://bugs.openjdk.java.net/browse/JDK-8177279).