JEP 406: Pattern Matching for switch (Preview)

OwnerGavin Bierman
TypeFeature
ScopeSE
StatusCandidate
Componentspecification / language
Discussionamber dash dev at openjdk dot java dot net
Relates toJEP 405: Record Patterns & Array Patterns (Preview)
Reviewed byAlex Buckley, Brian Goetz
Endorsed byBrian Goetz
Created2018/10/29 08:07
Updated2021/04/27 16:06
Issue8213076

Summary

Enhance the Java programming language with pattern matching for switch expressions and statements, along with extensions to the language of patterns. Extending pattern matching to switch allows an expression to be tested against a number of patterns, each with a specific action, so that complex data-oriented queries can be expressed concisely and safely.

Goals

Motivation

In Java 16, JEP 394 extended the instanceof operator to take a type pattern and perform pattern matching. This modest extension allows the familiar instanceof-and-cast idiom to be simplified:

// Old code
if (o instanceof String) {
    String s = (String)o;
    ... use s ...
}

// New code
if (o instanceof String s) {
    ... use s ...
}

We often want to compare a variable such as o against multiple alternatives. Java supports multi-way comparisons with switch statements and, since Java 14, switch expressions (JEP 361), but unfortunately switch is very limited. You can only switch on values of a few types — numeric types, enum types, and String — and you can only test for exact equality against constants. We might like to use patterns to test the same variable against a number of possibilities, taking a specific action on each, but since the existing switch does not support that, we end up with a chain of if...else tests such as:

static String formatter(Object o) {
    String formatted = "unknown";
    if (o instanceof Integer i) {
        formatted = String.format("int %d", i);
    } else if (o instanceof Long l) {
        formatted = String.format("long %d", l);
    } else if (o instanceof Double d) {
        formatted = String.format("double %f", d);
    } else if (o instanceof String s) {
        formatted = String.format("String %s", s);
    }
    return formatted;
}

This code benefits from using pattern instanceof expressions, but it is far from perfect. First and foremost, this approach allows coding errors to remain hidden because we have used an overly general control construct. The intent is to assign something to formatted in each arm of the if...else chain, but there is nothing that enables the compiler to identify and verify this invariant. If some block — perhaps one that is executed rarely — does not assign to formatted, we have a bug. (Declaring formatted as a blank local would at least enlist the compiler’s definite-assignment analysis in this effort, but such declarations are not always written.) In addition, the above code is not optimizable; absent compiler heroics it will have O(n) time complexity, even though the underlying problem is often O(1).

But switch is a perfect match for pattern matching! If we extend switch statements and expressions to work on any type, and allow case labels with patterns rather than just constants, then we could rewrite the above code more clearly and reliably:

static String formatterPatternSwitch(Object o) {
    return switch (o) {
        case Integer i -> String.format("int %d", i);
        case Long l    -> String.format("long %d", l);
        case Double d  -> String.format("double %f", d);
        case String s  -> String.format("String %s", s);
        default        -> o.toString();
    };
}

The semantics of this switch are clear: A case label with a pattern matches the value of the selector expression o if the value matches the pattern. (We have shown a switch expression for brevity but could instead have shown a switch statement; the switch block, including the case labels, would be unchanged.)

The intent of this code is clearer because we are using the right control construct: We are saying, "the parameter o matches at most one of the following conditions, figure it out and evaluate the corresponding arm." As a bonus, it is optimizable; in this case we are more likely to be able to perform the dispatch in O(1) time.

Pattern matching and null

Traditionally, switch statements and expressions throw NullPointerException if the selector expression evaluates to null, so testing for null must be done outside of the switch:

static void testFooBar(String s) {
    if (s == null) {
        System.out.println("oops!");
        return;
    }
    switch (s) {
        case "Foo", "Bar" -> System.out.println("Great");
        default           -> System.out.println("Ok");
    }
}

This was reasonable when switch supported only a few reference types. However, if switch allows a selector expression of any type, and case labels can have type patterns, then the standalone null test feels like boilerplate. It would be better to integrate the null test into the switch:

static void testFooBar(String s) {
    switch (s) {
        case null         -> System.out.println("Oops");
        case "Foo", "Bar" -> System.out.println("Great");
        default           -> System.out.println("Ok");
    }
}

The behavior of the switch when the value of the selector expression is null is always determined by its case labels. With a case null, the switch executes the code associated with that label; without a case null, the switch throws NullPointerException, just as before. (To maintain backward compatibility with the current semantics of switch, the default label does not match a null selector.)

We may wish to handle null in the same way as another case label. For example, in the following code, case null, String s would match both the null value and all String values:

static void testStringOrNull(Object o) {
    switch (o) {
        case null, String s -> System.out.println("String: " + s);
    }
}

Refining patterns in switch

Experimentation with patterns in switch suggests it is common to want to refine patterns. Consider the following code that switches over a Shape value:

class Shape {}
class Rectangle extends Shape {}
class Triangle  extends Shape { int calculateArea() { ... } }

static void testTriangle(Shape s) {
    switch (s) {
        case null:
            break;
        case Triangle t:
            if (t.calculateArea() > 100) {
                System.out.println("Large triangle");
                break;
            }
        default:
            System.out.println("A shape, possibly a small triangle");
    }
}

The intent of this code is to have a special case for large triangles (with area over 100), and a default case for everything else (including small triangles). However, we cannot express this directly with a single pattern. We first have to write a case label that matches all triangles, and then place the test of the area of the triangle rather uncomfortably within the corresponding statement group. Then we have to use fall-through to get the correct behavior when the triangle has an area less than 100. (Note the careful placement of break; inside the if block.)

The problem here is that using a single pattern to discriminate among cases does not scale beyond a single condition. We need some way to express a refinement to a pattern. One approach might be to allow case labels to be refined; such a refinement is called a guard in other programming languages. For example, we could introduce a new keyword where to appear at the end of a case label and be followed by a boolean expression, e.g., case Triangle t where t.calculateArea() > 100.

However, there is a more expressive approach. Rather than extend the functionality of case labels, we can extend the language of patterns themselves. We can add a new kind of pattern called a guarded pattern, written p && b, that allows a pattern p to be refined by an arbitrary boolean expression b.

With this approach, we can revisit the testTriangle code to express the special case for large triangles directly. This eliminates the use of fall-through in the switch statement, which in turn means we can enjoy concise arrow-style (->) rules:

static void testTriangle(Shape s) {
    switch (s) {
        case Triangle t && (t.calculateArea() > 100) ->
            System.out.println("Large triangle");
        default ->
            System.out.println("A shape, possibly a small triangle");
    }
}

The value of s matches the pattern Triangle t && (t.calculateArea() > 100) if, first, it matches the type pattern Triangle t and, if so, the expression t.calculateArea() > 100 evaluates to true.

Using switch makes it easy to understand and change case labels when application requirements change. For example, we might want to split triangles out of the default path; we can do that by using both a refined pattern and a non-refined pattern:

static void testTriangle(Shape s) {
    switch (s) {
        case Triangle t && (t.calculateArea() > 100) ->
            System.out.println("Large triangle");
        case Triangle t ->
            System.out.println("Small triangle");
        default ->
            System.out.println("Non-triangle");
    }
}

Description

We enhance switch statements and expressions in two ways:

Patterns in case labels

We revise the grammar for switch labels in a switch block to read (compare JLS §14.11.1):

SwitchLabel:
  case CaseConstant { , CaseConstant }
  case Pattern
  case null
  case null, TypePattern
  case null, default
  default

A normal switch block is a switch block where one or more switch labels are case labels with constants, and all the other switch labels are case null, case null, default, or default. (That is, no case labels contain patterns.)

A normal switch is a switch statement or expression with a normal switch block.

A pattern switch block is a switch block where every switch label is either a case label with a pattern, or a case label with null and a type pattern, or case null, or case null, default. (That is, no case labels contain constants.)

A pattern switch is a switch statement or expression with a pattern switch block.

Less formally, a switch can have either patterns as labels (pattern switch) or constants as labels (normal switch), but not a mix. This partitioning is analogous to how a switch may contain arrow-style rules (->) or colon-style statement groups (:), but not a mix.

The grammar makes it clear that a case label may have multiple constants but only one pattern. This means that a case label in a normal switch is somewhat more expressive than a case label in a pattern switch, since only the former can share the same action across different values of the selector expression. This is an inevitable consequence of how patterns introduce names, which we consider further below.

Both a normal switch and a pattern switch allow either arrow--style rules or colond-style statement groups in the switch block.

The behavior of a pattern switch is, broadly, the same as the behavior of a normal switch: The value of the selector expression is compared to the switch labels, one of the labels is selected, and the code associated with that label is executed. The difference in a pattern switch is that selection is determined by pattern matching rather than by checking equality. For example, in the following code, the value of o will match the pattern Long l, and the code associated with case Long l will be executed:

Object o = 123L;
String formatted = switch (o) {
    case Integer i -> String.format("int %d", i);
    case Long l    -> String.format("long %d", l);
    case Double d  -> String.format("double %f", d);
    case String s  -> String.format("String %s", s);
    default        -> o.toString();
};

There are three major design issues when case labels can have patterns:

  1. Enhanced type checking
  2. Scope of pattern variable declarations
  3. Dealing with null

1. Enhanced type checking

1a. Selector expression typing

The type of the selector expression is broader in a pattern switch than in a normal switch. Namely:

For example, in the following pattern switch the selector expression o is matched with type patterns involving a class type, an enum type, a record type, and an array type:

record Point(int i, int j) {}
enum Color { RED, GREEN, BLUE; }

static void typeTester(Object o) {
    switch (o) {
        case null     -> System.out.println("null");
        case String s -> System.out.println("String");
        case Color c  -> System.out.println("Color with " + c.values().length + " values");
        case Point p  -> System.out.println("Record class: " + p.toString());
        case int[] ia -> System.out.println("Array of ints of length" + ia.length);
        default       -> System.out.println("Something else");
    }
}

Every case label in the switch block must be compatible with the selector expression. For a case label with a pattern, known as a pattern label, we use the existing notion of compatibility of an expression with a pattern (JLS §14.30.1).

1b. Dominance of pattern labels

It is possible for the selector expression to match multiple patterns in a pattern switch block. Consider this problematic example:

static void error(Object o) {
    switch(o) {
        case CharSequence cs ->
            System.out.println("A sequence of length " + cs.length());
        case String s ->    // Error - pattern is dominated by previous pattern
            System.out.println("A string: " + s);

    }
}

The first pattern label case CharSequence cs dominates the second pattern label case String s because every value that matches the pattern String s also matches the pattern CharSequence cs, but not vice versa. This is because the type of the second pattern, String, is a subtype of the type of the first pattern, CharSequence.

A pattern label of the form case p dominates a pattern label of the form case p && e, i.e., where the pattern is a guarded version of the original pattern. For example, the pattern label case String s dominates the pattern label case String s && s.length() > 0, since every value that matches the guarded pattern String s && s.length() > 0 also matches the pattern String s.

The compiler checks all pattern labels. It is a compile-time error if a pattern label in a switch block is dominated by an earlier pattern label in that switch block. (For this purpose, case null, T t is treated as if it were case T t.)

The notion of dominance is analogous to conditions on the catch clauses of a try statement, where it is an error if a catch clause that catches an exception class E is preceded by a catch clause that can catch E or a superclass of E (JLS §11.2.3). Logically, the preceding catch clause dominates the subsequent catch clause.

It is also a compile-time error if a switch block has more than one match-all switch label. The two match-all switch labels are default and case null, default.

1c. Completeness of pattern labels in switch expressions

A switch expression requires that all possible values of the selector expression are handled in the switch block. This maintains the property that successful evaluation of a switch expression will always yield a value. For normal switch expressions, this is enforced by a fairly straightforward set of extra conditions on the switch block. For pattern switch expressions, we define a notion of type coverage of a switch block.

Consider this (erroneous) pattern switch expression:

static int coverage(Object o) {
    return switch (o) {         // Error - incomplete
        case String s -> s.length();
    };
}

The switch block has only one case label, case String s. This matches any value of the selector expression whose type is a subtype of String. We therefore say that the type coverage of this arrow rule is every subtype of String. This pattern switch expression is incomplete because the type coverage of its switch block does not include the type of the selector expression.

Consider this (still erroneous) example:

static int coverage(Object o) {
    return switch (o) {         // Error - incomplete
        case String s  -> s.length();
        case Integer i -> i;
    };
}

The type coverage of this switch block is the union of the coverage of its two arrow rules. In other words, the type coverage is the set of all subtypes of String and the set of all subtypes of Integer. But, again, the type coverage still does not include the type of the selector expression, so this pattern switch expression is also incomplete and causes a compile-time error.

The type coverage of default is all types, so this example is (at last!) legal:

static int coverage(Object o) {
    return switch (o) {
        case String s  -> s.length();
        case Integer i -> i;
        default -> 0;
    };
}

If the type of the selector expression is a sealed class (JEP 397), then the type coverage check can take into account the permits clause of the sealed class to determine whether a switch block is complete. Consider the following example of a sealed interface S with three permitted subclasses A, B, and C:

sealed interface S permits A, B, C {}
final class A implements S {}
final class B implements S {}
record C(int i) implements S {}  // Implicitly final

static int testSealedCoverage(S s) {
    return switch (s) {
        case A a -> 1;
        case B b -> 2;
        case C c -> 3;
    };
}

The compiler can determine that the type coverage of the switch block is the types A, B, and C. Since the type of the selector expression, S, is a sealed interface whose permitted subclasses are exactly A, B, and C, this switch block is complete. As a result, no default label is needed.

To defend against incompatible separate compilation, the compiler automatically adds a default label whose code throws an IncompatibleClassChangeError. This label will only be reached if the sealed interface is changed and the switch code is not recompiled. In effect, the compiler hardens your code for you.

The requirement for a pattern switch expression to be complete is analogous to the treatment of a switch expression whose selector expression is an enum class, where a default label is not required if there is a clause for every constant of the enum class.

2. Scope of pattern variable declarations

Pattern variables (JEP 394) are local variables that are declared by patterns. Pattern variable declarations are unusual in that their scope is flow-sensitive. As a recap consider the following example, where the type pattern String s declares the pattern variable s:

static void test(Object o) {
    if ((o instanceof String s) && s.length() > 3) {
        System.out.println(s);
    } else {
        System.out.println("Not a string");
    }
}

The declaration of s is in scope in the right-hand operand of the && expression, as well as in the "then" block. However, it is not in scope in the "else" block; in order for control to transfer to the "else" block the pattern match must fail, in which case the pattern variable will not have been initialized.

We extend this flow-sensitive notion of scope for pattern variable declarations to encompass pattern declarations occurring in case labels with two new rules:

  1. The scope of a pattern variable declaration which occurs in a case label of a switch rule includes the expression, block, or throw statement that appears to the right of the arrow.

  2. The scope of a pattern variable declaration which occurs in a case label of a switch labeled statement group, where there are no further switch labels that follow, includes the block statements of the statement group.

This example shows the first rule in action:

static void test(Object o) {
    switch (o) {
        case Character c -> {
            if (c.charValue() == 7) {
                System.out.println("Ding!");
            }
            System.out.println("Character");
        }
        case Integer i ->
            throw new IllegalStateException("Invalid argument");
    }
}

The scope of the declaration of the pattern variable c is the block to the right of the first arrow.

The scope of the declaration of the pattern variable i is the throw statement to the right of the second arrow.

The second rule is more complicated. Let us first consider an example where there is only one case label for a switch labeled statement group:

static void test(Object o) {
    switch (o) {
        case Character c:
            if (c.charValue() == 7) {
                System.out.print("Ding ");
            }
            if (c.charValue() == 9) {
                System.out.print("Tab ");
            }
            System.out.println("character");
        default:
            System.out.println();
    }
}

The scope of the declaration of the pattern variable c includes all the statements of the statement group, namely the two if statements and the println statement. The scope does not include the statements of the default statement group, even though the execution of the first statement group can fall through the default switch label and execute these statements.

The possibility of falling through a case label that declares a pattern variable must be excluded as a compile-time error. Consider this erroneous example:

static void test(Object o) {
    switch (o) {
        case Character c:
            if (c.charValue() == 7) {
                System.out.print("Ding ");
            }
            if (c.charValue() == 9) {
                System.out.print("Tab ");
            }
            System.out.println("character");
        case Integer i:                 // Compile-time error
            System.out.println("An integer " + i);
    }
}

If this were allowed and the value of the selector expression o was a Character, then execution of the switch block could fall through the second statement group (after case Integer i:) where the pattern variable i would not have been initialized. Allowing execution to fall through a case label that declares a pattern variable is therefore a compile-time error.

This is why case Character c: case Integer i: ... is not permitted. Similar reasoning applies to the prohibition of multiple patterns in a case label: Neither case Character c, Integer i: ... nor case Character c, Integer i -> ... is allowed. If such case labels were allowed then both c and i would be in scope after the colon or arrow, yet only one of c and i would have been initialized depending on whether the value of o was a Character or an Integer.

On the other hand, falling through a label that does not declare a pattern variable is safe, as this example shows:

void test(Object o) {
    switch (o) {
        case String s:
            System.out.println("A string");
        default:
            System.out.println("Done");
    }
}

3. Dealing with null

Traditionally, a switch throws NullPointerException if the selector expression evaluates to null. This is well-understood behavior and we do not propose to change it for any existing switch code.

However, given that there is a reasonable and non-exception-bearing semantics for pattern matching and null values, there is an opportunity to make pattern switch more null-friendly while remaining compatible with existing switch semantics.

We introduce three new null-matching case labels, only one of which may occur in any given switch block:

  1. case null — matches when the value of the selector expression is null.

  2. case null, T t — matches when the value of the selector is null, or it matches the type pattern T t.

  3. case null, default — matches when the value of the selector is null, or if no other case labels match.

We lift the blanket rule that a switch immediately throws NullPointerException if the value of the selector expression is null. Instead, we inspect the case labels to determine the behavior of a switch:

For example, given the declaration below, evaluating test(null) will print null! rather than throw NullPointerException:

static void test(Object o) {
    switch (o) {
        case null     -> System.out.println("null!");
        case String s -> System.out.println("String");
        default       -> System.out.println("Something else");
    }
}

This new behavior around null is as if the compiler automatically enriches the switch block with a case null whose body throws NullPointerException. In other words, this code:

static void test(Object o) {
    switch (o) {
        case String s  -> System.out.println("String: " + s);
        case Integer i -> System.out.println("Integer");
    }
}

is equivalent to:

static void test(Object o) {
    switch (o) {
        case null      -> throw new NullPointerException();
        case String s  -> System.out.println("String: "+s);
        case Integer i -> System.out.println("Integer");
    }
}

In both examples, evaluating test(null) will cause NullPointerException to be thrown.

We preserve the intuition from the existing switch construct that performing a switch over null is an exceptional thing to do. The difference in a pattern switch is that you have a mechanism to directly handle this case inside the switch rather than outside. If you choose not to have a null-matching case label in a switch block then switching over null value will throw NullPointerException, as before.

Guarded and parenthesized patterns

After a successful pattern match we often further test the result of the match. This can lead to cumbersome code, such as:

static void test(Object o) {
    switch (o) {
        case String s:
            if (s.length() == 1) { ... }
            else { ... }
            break;
        ...
    }
}

The desired test — that o is a String of length 1 — is unfortunately split between the case label and the ensuing if statement. We could improve readability if a pattern switch supported the combination of a pattern and a boolean expression in a case label.

Rather than add another special case label, we enhance the pattern language by adding guarded patterns, written p && e. This allows the above code to be rewritten so that all the conditional logic is lifted into the case label:

static void test(Object o) {
    switch (o) {
        case String s && (s.length() == 1) -> ...
        case String s                      -> ...
    }
}

The first case matches if o is both a String and of length 1. The second case matches if o is a String of some other length.

Sometimes we need to parenthesize patterns to avoid parsing ambiguities. We therefore extend the language of patterns to support parenthesized patterns written (p), where p is a pattern.

More precisely, we change the grammar of patterns. Assuming that the record patterns and array patterns of JEP 405 are added, the grammar for patterns will become:

Pattern:
  PrimaryPattern
  GuardedPattern

GuardedPattern:
  PrimaryPattern && ConditionalAndExpression

PrimaryPattern:
  TypePattern
  RecordPattern
  ArrayPattern
  ( Pattern )

A guarded pattern is of the form p && e, where p is a pattern and e is a boolean expression. In a guarded pattern any local variable, formal parameter, or exceptional parameter that is used but not declared in the subexpression must either be final or effectively final.

A guarded pattern p && e introduces the union of the pattern variables introduced by pattern p and expression e. The scope of any pattern variable declaration in p includes the expression e. This allows for patterns such as String s && (s.length() > 1), which matches a value that can be cast to a String such that the string has a length greater than one.

A value matches a guarded pattern p && e if, first, it matches the pattern p and, second, the expression e evaluates to true. If the value does not match p then no attempt is made to evaluate the expression e.

A parenthesized pattern is of the form (p), where p is a pattern. A parenthesized pattern (p) introduces the pattern variables that are introduced by the subpattern p. A value matches a parenthesized pattern (p) if it matches the pattern p.

We also change the grammar for instanceof expressions to:

InstanceofExpression:
  RelationalExpression instanceof ReferenceType
  RelationalExpression instanceof PrimaryPattern

This change, and the non-terminal ConditionalAndExpression in the grammar rule for a guarded pattern, ensure that, for example, the expression e instanceof String s && s.length() > 1 continues to unambiguously parse as the expression (e instanceof String s) && (s.length() > 1). If the trailing && is intended to be part of a guarded pattern then the entire pattern should be parenthesized, e.g., e instanceof (String s && s.length() > 1).

The use of the non-terminal ConditionalAndExpression in the grammar rule for a guarded pattern also removes another potential ambiguity concerning a case label with a guarded pattern. For example:

boolean b = true;
switch (o) {
    case String s && b -> s -> s;
}

If the guard expression of a guarded pattern were allowed to be an arbitrary expression then there would be an ambiguity as to whether the first occurrence of -> is part of a lambda expression or part of the switch rule, whose body is a lambda expression. Since a lambda expression can never be a valid boolean expression, it is safe to restrict the grammar of the guard expression.

Future work

Alternatives

Dependencies

This JEP builds on pattern matching for instanceof (JEP 394) and also the enhancements offered by switch expressions (JEP 361). We intend this JEP to coincide with JEP 405, which defines two new kinds of patterns that support nesting. The implementation will likely make use of dynamic constants (JEP 309).