The parser that is currently in the javac compiler is a hand-written LALR parser. It is somewhat fragile, and is not always easy to extend when working on potential new language features. In addition, it is not well-suited for analysis, such as comparison against the grammar rules in the Java Language Specification (JLS).
There is a separate but related problem with JLS. The JLS actually contains two slightly different grammars. One is the "exposition grammar" used through the body of the book in chapters 1-17; the other is the "reference grammar" in chapter 18, supposedly suitable as the basis for an implementation. They are different in a number of places, and neither match exactly what is done in the javac compiler itself. Thus, a subsidiary goal is to understand the differences between these three formulations, with the possibility of aligning them, and being able to formally test the result.
The project is available in the Mercurial repositories at:
Within that forest, all the changes to the javac compiler are in
The original Java.g ANTLR grammar has been integrated into the javac compiler. The resulting compiler passes all the compiler regression tests and all the relevant JCK tests for a Java compiler. The grammar has also been marked up with comments so that it is possible to strip out those parts that are specific to javac (such as building javac AST nodes) leaving a grammar which can be used within tools like ANTLRWorks.
For More Information ...
This Project is sponsored by the Compiler Group.