Design Goals

The primary design goals for the project were to provide a single easy to read and testable grammar for the Java programming language, and when integrating that into javac, to have a minimal impact on the rest of javac.

Minimal impact

To see how parsing fits into the general javac architecture, see this Compilation Overview for javac. In that overview, the term "parse" is used for the general process of converting source text into syntax trees.

javac flow

It was a requirement that integrating Java.g into javac should have minimal impact on the rest of javac, for obvious reasons. This meant that the generated parser had to create standard javac syntax trees and interact with the standard javac error reporting mechanisms. In the context of the Compilation Overview it is just the "parse" part of the pipeline that is updated to use Java.g. The rest of the pipeline, to enter classes into the symbol tables, perform annotation processing, analyse the program and generate class files, is unchanged.

If nothing else, this requirement facilitated testing the parser by means of comparing the trees generated by the ANTLR parser and the standard parser. With a few limited exceptions, the trees generated by the two parsers are identical. The differences are because of limitations in the standard parser (for example, it does not maintain accurate source positions for every node in an expression) or because of premature optimizations, such as string folding (for example, combining "a" + "b" into "ab" directly in the parser.) Both of these may be seen as issues arising from javac's original use as (just) a batch compiler, in which higher fidelity between the source text and the syntax tree was not a primary requirement. The more we have tools to process syntax trees, the higher is the demand for greater fidelity.

There was one minor way in which the design was allowed to impact the high level compiler architecture. Originally, the scanner and parser were separate steps in the actual compiler pipeline implemented in An early milestone with Java.g was to integrate it into that pipeline using the standard javac lexer. However, to accomodate the possibility of using an ANTLR lexer as well, the scanner and parser steps were combined in the pipeline to be a single step that reads files and returns syntax trees. For the standard javac, the responsibility to use the standard scanner has been delegated to the standard parser, while the ANTLR-based javac uses both the lexer and parser generated by ANTLR from Java.g.

As an implementation detail, javac was changed so that the Parser class was replaced by a Parser interface with different implementations for the original standard parser and for Java.g. At least in principle, this should also make it easier to plug in additional implementations in future, should that be desirable. The change to use a Parser interface is available as part of the standard OpenJDK version of javac.

One clear and testable Java grammar

javac embodies a grammar which is tested, but which only exists in javac. JLS has grammars which are publicly accessible but not testable. As a result, it was a goal to try an unify these in a grammar which was both testable and readable.

The requirement to integrate with javac's standard syntax trees and error reporting mechanisms meant that the original Java.g had to be augmented with actions to create javac tree nodes, or report javac errors. For those wishing to (just) read the grammar, these actions can be at best a distraction and at worst, an issue when using ANTLRWorks to browse through the grammar or to create a printable PDF or just the grammar rules.

It would be possible to maintain a separate grammar for use with ANTLRWorks and similar tools, but this could cause maintenance issues later on. Instead, the grammer is decorated with comments that identify the code required for javac integration. This makes it possible to easily remove such code to derive a stripped down version of the file suitable for use with ANTLRWorks, while being sure that the derived file is still fundamentally the same as the grammar used by and tested within javac itself.