JEP draft: Templated Strings and Template Policies (Preview)

OwnerJim Laskey
TypeFeature
ScopeSE
StatusDraft
Componentspecification / language
Discussionjdk dash dev at openjdk dot java dot net
EffortM
DurationM
Reviewed byMaurizio Cimadamore
Created2021/09/17 13:41
Updated2021/11/26 19:32
Issue8273943

Summary

Add elements to the Java necessary to support template processing, including an interface for capturing expressions embedded in templated strings and an interface for invoking template processing.

This is a preview language and runtime feature.

Goals

Non-Goals

Motivation

One of the most commonly requested Java features is to support some sort of string templating, which is useful for formatting log messages and snippets of HTML, JSON, XML, or SQL. While Java already has many ways to combine constant strings with non-constant values (concatenation, String::format, MessageFormat), developers would prefer something more direct, for several reasons:

However, there are reasons to hesitate adding such a feature, including:

Template processing implemented by many popular languages offer the desired convenience in the simple cases, but fall afoul of many of these downsides. We may want the convenience of templates, but we also want safety and flexibility across a range of domains.

Language        Example

C#             $"{x} plus {y} equals {x + y}"      
Groovy         "$x plus $y equals ${x + y}"        
JavaScript     |${x} plus ${y} equals ${x + y}| 
Kotlin         "$x plus $y equals ${x + y}"        
Scala          f"$x%d plus $y%d equals ${x + y}%d" 
Python         f"{x} plus {y} equals {x + y}"      
Ruby           "#{x} plus #{y} equals #{x + y}"    
Swift          "\(x) plus \(y) equals \(x + y)"    
VisualBasic    $"{x} plus {y} equals {x + y}"

We’re not interested in merely doing “string concatenation” as templates have been interpreted by other languages. We would like to do better.

What's wrong with simple string concatenation?

The only case handled by most other languages that support templates is the simplest one -- uninterpreted concatenation.

var greeting = `Hello, $name, I am $age years old`

The feature illustrated here is constrained in many ways: the format string is not validated, the parameters are not validated or transformed in any way, the parts are combined by a very constrained mechanism (the result must be exactly the segments of the format string concatenated with the string value of the parameters), and finally, the result must be a String. While these might be convenient defaults, not being able to customize any of these behaviors is a severe limitation.

In addition, the surfacing of the feature in the language is confusingly ad-hoc; it requires a different delimiter from "regular" strings, as well as a different set of rules for separating verbatim content from embedded expressions. An important goal with the delivery of text blocks was that string literals and text blocks be different stackings of the same basic feature, rather than wholly separate features (this is one reason "raw string literals" was withdrawn). We would like to follow the same discipline here; parameters should be part of the overall string expression feature, not a separate thing.

Another level of indirection

We can meet our diverse goals by separating mechanism from policy. How we introduce parameters into string expressions is mechanism; how we combine the parameters and the string into a final result is policy. The language may need to have an opinion about how a templatized expression is expressed, but the semantics of how parameters are validated, transformed, and combined should remain in the hands of ordinary library code. Users should be able to select the templating policy they want, and be able to capture templating policies in libraries for reuse.

A templating policy might be described by an interface like:

interface TemplatePolicy<T> {
    T apply(String template, List<Object> parameters);
}

An implementation of a template policy is an ordinary object that implements some instantiation of TemplatePolicy. The simplest template policy is what every other language does -- concatenation -- and can be exposed by the standard libraries.

We can express template processing as instance behavior on a policy object:

String s = CONCAT."Hello \{name}, I am \{age} years old.";

where CONCAT is a static instance of TemplatePolicy which captures the obvious policy.

The escape sequence \{ is currently unused (and therefore currently illegal in string literals and text blocks), so this choice of parameter carrier is compatible with the existing string literal and text block features. (Swift uses \(, which would also be a valid choice.) This means we do not need to invent a new form (or two) of "string template expression" with a different delimiter or prefix character.

The policy object has the flexibility to validate the format string and parameters, interpret the format string and parameters as it sees fit, combine them as it sees fit (not just sequential concatenation), and produce a result that is not even a String. The compiler shreds a parameterized string expression into the constant and non-constant parts, and arranges for the combination method on the policy object to be invoked.

Examples

Delegating control to a policy object dramatically expands the expressiveness and safety of the feature.

String formatting. Formatting libraries like String::format offer more than just concatenation; they offer rich formatting options such as field-width management, leading-zero fills, hex conversion, locale-specific presentation and more. Making straight concatenation easier but no improvement for formatting libraries leaves users with an unpleasant choice of either convenience or rich formatting. If we wanted to format the number age using the various modifiers supported by the %d format specifier, we wouldn't want to abandon the convenience of the straightforward expression.

On the other hand, it would be folly to bake the String::format descriptor language into the Java language; representation and interpretation of the format specifiers should be under the control of the template policy. The answer is to encapsulate this in a library that implements this set of format specifiers, and exposes a constant policy object. Here, FORMAT is a policy object that interprets a set of format specifiers that are similar to printf / String::format, using the convention that the format specifier appears immediately before the "hole":

String s = FORMAT."Hello %s\{name}, I am %10d\{age} years old.";

When the format string is shredded into constant and variable parts, the end of each constant part should contain a format descriptor which is used to condition the formatting of the following parameter (and the policy object can validate this). The Java language knows nothing of the format descriptor language; this is interpreted solely by the formatter library.

Even ignoring the choice of format descriptor language, library methods like String::format often embody difficult choices, such as whether or not to use the currently selected Locale to format numeric quantities. Some users like the flexibility they get from such automatic localization; others resent the performance overhead of Locale processing. By exposing a mechanism by which users and libraries can implement their own formatters, users are not constrained by these choices made by libraries on their behalf -- there could be both locale-sensitive and locale-insensitive formatters for the same domain, and the user can choose the one they want.

Validation and normalization. SQL statements are often parameterized by some dynamic data value. Unfortunately, the data being injected is often tainted by user input. The JDBC framework includes builders for prepared statements, which sanitize inputs and compose the query in a SQL-aware manner:

PreparedStatement ps
    = connection.prepareStatement("SELECT * FROM Person p where p.last_name = ?");
ps.setString(1, name);

This will escape any ' characters in name and surround it with ' characters before performing the injection. If name is "Bobby", the resulting query will be SELECT * FROM Person p where p.last_name = 'Bobby'.

With a convenient string concatenation feature, it is sorely tempting to construct SQL queries with:

String query = "SELECT * FROM Person p where p.last_name = '$name'";
ResultSet rs = connection.createStatement().executeQuery(query);

Unfortunately, this now exposes the application to potentially disastrous SQL injection attacks unless name has been previously sanitized. Trading security for convenience is not a good trade.

We can get the best of both worlds with a SQL-specific policy object that performs the sanitization that PreparedStatement does, and more:

SQL databases generally follow a common set of rules around single-quotes, but some databases also have other supported forms of quotes. To the extent that a given database has its own nonstandard quoting rules, we would like to defend against attacks that exploit those as well. This means that we don't just need a SQL-specific policy object; we need a Connection-specific policy object, because the Connection comes from the JDBC driver for the specific database we're talking to.

While there are many API choices that JDBC might select, one might be to make Connection also be a policy object; then we could ask the connection to format the query directly:

var query = connection."SELECT * FROM \{table}";

Non-string results. One could easily imagine a JSON or XML library providing a similar level of quote discipline and injection protection in those domains (they are vulnerable to injection attacks too):

String s = JSON."""
                {
                   "a": \{a},
                   "b": \{b}
                }
                """;

The policy referred to by JSON would perform the proper validation of the format string, and quoting and escaping of the parameters a and b before composing the final string.

But, do we even want to produce a string at all? Many JSON libraries allow us to represent JSON documents through a Json type; it might be more efficient for the JSON policy object to go directly to that representation rather than first constructing a (potentially large) string and then parsing the resulting string. While some policy objects will surely want to produce strings, there's no reason all of them do. Our policy interface can be parameterized by the type it returns, as TemplatePolicy<T> illustrates. So this JSON example could be:

Json j = JSON."""
              {
                 "a": \{a},
                 "b": \{b}
              }
              """;

which is more direct and potentially more efficient.

Another use for non-string results is when formatting messages for logging. Many logging calls are for debug information, and often debug logging is turned off. Many frameworks allow you to provide a Supplier<String> for log messages that is only invoked if the message is actually going to be logged, to avoid the overhead of formatting a string that is going to be thrown away. A lazy policy object could produce Supplier<String> rather than String itself.

Localization. The examples so far have been about template processing enhanced with validation and transformation, but this can be taken further. The JDK contains APIs such as ResourceBundle to support localization of messages. A resource bundle is a mapping from key names to localizable templated strings. (These templated strings use a different format than String::format, in part because they must support changing the order of parameters as part of the localization process; the placeholder in the localized template contains the index of the corresponding parameter.)

If resource bundles had a TemplatePolicy, then they could use the format string as a key to look up the localized string, and then perform the template processing, all in one go:

String message = resourceBundle."error: file \{filename} not found";

which would have the effect of using the string "error: file \{} not found" as the key, mapping it to an appropriate localized error message for the current locale, reordering the parameters according to the {nn} holes in the localized messages, and formatting the result using the MessageFormat rules.

Description

The uncoupling of template description from template handling is correspondingly implemented as the new interfaces java.lang.TemplatedString and java.lang.TemplatePolicy. A TemplatedString instance captures the constituent parts of template and parameters garnered from a string literal or text block. A TemplatePolicy instance can be applied to a TemplatedString instance for validation and composition.

Templated strings

Instantiation of a templated string instance captures a template from the original string literal or text block by replacing expressions with placeholder characters. The templated string also captures parameters, values subsequent to evaluating embedded expressions.

Language changes for templated strings

A new escape sequence, \{ is being introduced to indicate that a Java expression will follow. The expression continues until scanning encounters a corresponding } character. A string literal or text block will automatically be reframed as a templated string if the literal contains a \{ escape sequence. A templated string literal goes through a secondary scanning to extract expressions.

Examples:

// String literal
String s = "x + y = z";

// Templated string
int x = 10;
int y = 20;
TemplatedString ts = "\{x} + \{y} = \{x + y}";

Expressions must be valid. As with parameters to an invocation, expressions are evaluated left to right. The only limitation is that escape sequences can not be used in embedded expressions. Example:

TemplatedString ts = "\{\tx}";

will produce an illegal character: '\' error.

String.translateEscapes is enhanced to translate all "\{...}" sequences to the Unicode OBJECT_REPLACEMENT_CHARACTER (\uFFFC). Thus, the template derived from a string literal or text block will have OBJECT_REPLACEMENT_CHARACTER placeholders where corresponding expressions existed. The compiler reports an error if a developer independently uses \uFFFC in a templated string.

A templated string can occur anywhere a string literal can occur. However, the type of a templated string is always an implementation of TemplatedString. A TemplatedString instance supplies the template and a values list derived from the embedded expressions. A developer may apply the predefined java.lang.TemplatePolicy.CONCAT template policy if a String built from concatenation is required.

TemplatedString

The primary API for TemplatedString is

public interface TemplatedString {
    public static final char OBJECT_REPLACEMENT_CHARACTER = '\uFFFC';

    String template();

    List<Object> values();

    List<String> segments();

    public static List<String> split(String string) {...}

    String concat();

...more
}

The OBJECT_REPLACEMENT_CHARACTER constant represents the character used as a placeholder.

The template() method returns the template containing placeholders.

The values() method returns the list of values, results from evaluating embedded expressions.

The List<String> segments() method returns the template split at placeholders. Because this method can be called frequently in some policies (using StringBuilder), the list construction is guaranteed only to occur once.

segments() uses the specialized TemplatedString.split(String string) method instead of String.split to guarantee that edge cases (empty string and strings ending with OBJECT_REPLACEMENT_CHARACTER) still produce a list containing one more element than the number of expressions. That is, every expression has a corresponding segment before and after. This arrangement is significant to formatting policies that either prefix or suffix expressions with specifiers.

concat() returns the simple but optimal concatenation of the segments with interleaved values. For TemplatedString instances constructed by the compiler, concat() is guaranteed to have better performance than building in Java (ex. StringBuider) and can be used by policies that merely want to post-transform the result of concatenation.

There are additional convenience methods that are part of the TemplatedString API that are contextually described in later sections.

Templated string code generation

Code generation for templated strings utilizes the equivalent of anonymous classes to capture the context of the embedded expressions.

Example:

int x = 10;
int y = 20;

TemplatedString ts = "Adding \{x} plus \{y} equals \{x + y}."

lowers to the equivalent of:

int x = 10;
int y = 20;

TemplatedString ts = new TemplatedString() {
    private static final String TEMPLATE = "Adding \uFFFC plus \uFFFC equals \uFFFC.";

    private static final List<String> SEGMENTS = TemplatedString.split(TEMPLATE);

    @Override public String template() { return TEMPLATE; }

    @Override public List<Object> values() { return List.of(x, y, x + y); }

    @Override public List<String> segments() { return SEGMENTS; }

    @Override public String concat() {
        return "Adding " + x + " plus " + y + " equals " + (x + y) + ".";
    }

    ...
});

Template policy

A template policy embodies the rules for the validation and composition of a result based on the inputs provided by a templated string.

Template policy language changes

Extension of method invocation (JLS 5.12) to include a templated string, string literal or text block on the RHS of a dot separator and an instance of TemplatePolicy on the LHS. This variation of method invocation is termed as apply template policy. If the RHS is a string literal or a text block, the literal is reframed as a templated string with no placeholders and no values.

Example:

String ts = CONCAT."Adding \{x} plus \{y} equals \{x + y}."

is equivalent to

String ts = CONCAT.apply("Adding \{x} plus \{y} equals \{x + y}.");

TemplatePolicy

The primary API for TemplatePolicy is:

public interface TemplatePolicy<R, E extends Throwable> {

    R apply(TemplatedString templatedString) throws E;

... more
}

A TemplatePolicy's apply method is responsible for validating inputs from the templated string and then composing, possibly transforming, a result. The R parameter allows the policy developer to specify the result type. The E parameter indicates the thrown exception type when the policy validation fails. RuntimeException can be used to suggest that the apply method throws no exceptions or only throws unchecked exceptions.

Example:

class SimplePolicy implements TemplatePolicy<String, IllegalArgumentException> {
    public String compose(TemplatedString templatedString) {
        StringBuilder sb = new StringBuilder();
        Iterator<String> segmentsIter = templatedString.segments().iterator();
        List<Object> values = templatedString.values();

        for (Object value : values) {
            sb.append(segmentsIter.next());
            Object value = values.next();

            if (value instanceof Boolean) {
                throw new IllegalArgumentException("I don't like Booleans");
            }

            sb.append(value);
        }
        
        sb.append(segmentsIter.next());

        return sb.toString();
    }
}

Use of invokeDynamic

A drawback of using a TemplatedString, as described above, is that values must be boxed and added to a list on each call to values(). To avoid this performance penalty, the code generation for apply template policy uses invokedynamic instead of invokeInterface. This gives the policy an opportunity to construct a MethodHandle providing an optimal implementation.

Simple template policies

Two factory methods are provided simplify the creation of composition or transforming policies.

public interface TemplatePolicy<R, E extends Throwable> {
...
    public static <R> TemplatePolicy<R, RuntimeException>
            ofComposed(BiFunction<List<String>, List<Object>, R> policy) { ... }

    public static <R> TemplatePolicy<R, RuntimeException>
            ofTransformed(Function<String, R> policy) { ... }
...
}

TemplatePolicy.ofComposed is invoked providing a lambda expecting a segments list and a values list.

var composer = TemplatePolicy.ofComposed((segments, values) -> { ... });

TemplatePolicy.ofTransformed is invoked providing a lambda expecting the String result of basic concatenation.

var json = TemplatePolicy.ofTransformed(JSONObject::new);

Policies generated using these factories will have an apply return type matching the return type of the provided lambda.

Predefined template policies

Two predefined template policies are supplied to simplify common scenarios.

import static java.lang.TemplatePolicy.CONCAT;

...

int x = 10;
int y = 20;

String s = CONCAT."Adding \{x} plus \{y} equals \{x + y}.";

TemplatePolicy.CONCAT is a policy that returns the optimal concatenation of the segments with interleaved values. CONCAT example above is equivalent to:

String ts = "Adding " + x + " plus " + y + " equals " + (x + y) + ".";

Developers should use CONCAT over the previous mentioned concat method since the CONCAT policy makes the intended use clear up front. The necessity for this will become more evident as more template policies become available.

java.util.FormatterPolicy is a template policy that interprets format specifiers that precede values. The specifier format used is the same as defined in the class java.util.Formatter.

Example:

FormatterPolicy fmt = new FormatterPolicy();

for (int i = 1; i <= 10000; i *= 10) {
    String s = fmt."This answer is %5d\{i}";
    System.out.println(s);
}

Output:

This answer is     1
This answer is    10
This answer is   100
This answer is  1000
This answer is 10000

It is also possible to specify a locale:

FormatterPolicy fmt = new FormatterPolicy(Locale.CANADA);

There is also a predefined FormatterPolicy that uses the current locale.

import static java.util.FormatterPolicy.FORMAT;

...

String s = FORMAT."This answer is %5d\{i}";

It is also possible to use a templated string as the only argument of an existing format method.

System.out.format("This answer is %5d\{i}");

Alternatives

Not having a prefix apply policy invocation syntax was considered. This decision would mean that the policy would have to be applied using a suffix method call.

String s = "This answer is %5d\{i}".apply(FORMAT);

Prefixing the templated string is preferential for the simple reason that the policy is clear upfront. Template policy should not be an afterthought.

It is reasonable to consider a templated string expression without a policy meaning to evaluate as basic concatenation. However, this choice would contradict the goal of being vulnerability safe. Defaulting to string policy would be too easy for developers to circumvent protective policies. Always requiring a policy ensures that the developer, at least, thought about the templated string's circumstances.

Use of ${...} expression delimiters was considered, but would require a special string tagging (prefix or new quotes) to avoid conflicts with legacy code. This as mentioned in the motivation, we want to avoid creating addition string types.

Use of \[...] and \(...) was considered but thought to confuse reading the content of the expression. Braces, however, are not allowed in template expressions.

Testing

Full coverage testing of new APIs. Combination testing of expressions similar to expression tests used elsewhere.

Risks and Assumptions

There is heavy dependence on java.util.Formatter for the implementation of java.util.FormatterPolicy that requires a significant rewriting of established code. An independent effort might be needed.