[all pages:] introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]
All pages: introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]
dtd | bandm meta_tools | tdom |
Xantlr, Representing the Result of LA-LL(k) Parsing as a Sequence of SAX Events
(related API documentation: package xantlr, package xantlrtdom )
1
Xantlr's Basic Functionality
2
Specifying the Generated SAX Events
2.1
Normalization of Content Models
3
Running Xantlr
3.1
Messages and Unreachable Nonterminals
3.2
antlrC
Grammar Inheritance and Xantlr
4
Running the Resulting Compiler
5
Notes on the Implementation
Xantlr can be thought of as a preprocessor to the famous antlrC tool: It automatically inserts certain "semantic actions" into the grammar definition. When the corresponding parser generated by antlrC will be running, these actions will cause the emissions of XML SAX events ([Sax04]) representing the abstract syntax tree, which is the result of the parsing process.
Furthermore, an XML DTD is generated by Xantlr, which exactly defines the structure of the generated sequence of SAX events.
The advantages of this technique are ...
In nearly all cases we pipe the output of an Xantlr parser into a Tdom generated model for further visitor-based processing.
More details on all these issues can be found in [tlw01a]
Please also refer to the antlr documentation of the version employed by Xantlr. ([antlr2doc].)
Basically, Xantlr emits a sequence of SAX Events which represent the "abstract syntax tree" (or "AST") of the recognized parsing result. An AST is a thinned-out version of the parse tree, in which redundant front-end non-terminals do not appear anymore.
Every inner node in the AST is represented as an XML element. Its subnodes are represented as this element's content, respecting sequential order.
Every leaf node which corresponds to a recognized terminal, is by default not at all represented in the output data, but con be configured to appear in the SAX outout stream with its character data.
For controlling the kind of SAX events generated, new optionshave been added to the existing antlr options .
The definition of rule options from antlr is enhanced by two new options, each of which may be given zero or one time for each non-terminal in the parser definition:
rule_options ::= ... sax_event_type xml_tag |
sax_event_type ::= xmlNodeType
=
pcdata content entity abstract |
xml_tag ::= xmlNodeName = stringValue |
The effect of these options is ...
The regular expressions as they appear in the non-terminal rules in the original Xantlr-grammar, and as they appear in the resulting DTD may have subtle differences. (These are esp. important in case that the DTD is fed into a Tdom model, because name mangling will be affected.)
You should better always read the resulting DTD carefully.
The first issue is, that alternatives with empty contents ("epsilon") cannot be expressed in DTD, but have to be modelled by modifying neighbours or parents. So a standard transformation is of type ...
A | #eps | B --> (A | B)? |
Secondly, there is an important and wide-spread simplifying transformation which is really very helpful and corresponds to the transition between front-end representation ("parse" tree) and semantics ("AST"):
Consider e.g. the front-end syntax definition ...
parameterlist ::= ( parameter ( "," parameter ) * )? |
This is a typical case in the parsing of programming languages: either you can enter no "parameter" at all, or just one, or more than one, seperated by a front-end token which does notappear in the semi-AST.
The DTD content model corresponding verbatim to the parser grammar is ...
parameterlist ::= ( parameter ( parameter ) * )? |
In Xantlr, each regular (sub-)expression of form "X X*" is rewritten to " X+". So we get ...
parameterlist ::= ( parameter + )? |
Each regular (sub-)expression of form "(X+)?" is rewritten to "X* ". So we get at last ...
parameterlist ::= parameter * |
So we get the most convenient form for further processing.
In principle, you always should read the DTD generated by Xantlr carefully and, whenever a program evolves, please consider the influences of changes in the grammar definition file to the DTD.
Xantlr is called applied to a grammar source file exactly like the underlying antlr-tool, but with the meta_tools classes preceding the antlrC classes in the classpath:
/usr/bin/java -classpath metatools.jar:antlr.jar antlr.Tool mygrammar.g |
This will generated the sources for the parser, lexer, vocabulary etc. as ususal
with antlrC
.
("MyParser.java", "MyLexer.java", "MyTokenTypes.java",
"MyTokenTypes.txt", etc.)
The generated source text for the parser is enhanced with the above-mentioned
SAX event generating code.
Additionally a DTD file is generated for each parser in the grammar source file,
named "MyParser.dtd".
Additionally to the known progress messages and error messages generated by antlrC , Xantlr additionally outputs the following:
In principle the Xantlr mechanisms are compatible with the inheritance mechanism defined by antlrC .
Please note that if you have a grammar "X extends Y", the parsers and lexers belonging to "X" normally must be produced, even if they are not used themselves. Mostly the generated files "X_parser_TokenTypes.java" and/or "X_parser_TokenTypes.txt" are required.
Each compiler generated by Xantlr is derived from a base class called <METATOOLS>/xantlr/runtime/X_LLkParser.
This class is an extension of the original antlr-class "LLkParser", and has two additional fields, which hold the receivers (1) of the generated events, and (2) of all error messages.
The first field must be set to an object of type <METATOOLS>/xantlr/runtime/EventGenerator. This interfaces offers the methods which are called by the automatically inserted semantic actions, namely EventGenerator.startElement(tag), EventGenerator.endElement(tag), etc.
An implementation of this interface has to be set by calling X_LLkParser.setEventGenerator() . Currently this is always an instance of SAXEventGenerator, which maps the start/end-calls mentioned above to the corresponding SAX events.
A SAXEventGenerator must in turn be linked to the target of these SAX events, by calling SAXEventGenerator.setContentHandler(org.xml.sax.ContentHandler)
For the error messages one has to call X_LLkParser.setMessageReceiver() .
antlr itself has an option "defaultErrorHandler", which can be set "for an entire (.g) file, ...for a grammar, ...for a rule", as explained in http://www.antlr2.org/doc/options.html#File, Grammar, and Rule This option is described in http://www.antlr2.org/doc/options.html#defaultErrorHandler
When set to false, antlrC throws some exceptions, when set to true, these are caught and fed into the functions defined in the base class antlr/Parser : (reportError(String), reportError(RecognitionException), reportWarning(String) ) which simply print to System.err. These methods are redefined in <METATOOLS>/xantlr/runtime/X_LLkParserto generated correct message objectsand to send these to the above-mentioned message receiver.
Note 1: reportWarning() can be called explicitly from any of the user defined rules by antlr semantic actions !)
Note 2:You can substitute an own class at the base class of the generated parser as defined in the last paragraph of http://www.antlr2.org/doc/metalang.html#Parser Class Definitionsby starting the grammar file with the declaration
class MyParser extends Parser("path.of.my.own.ParserBaseClass"); |
But in case of Xantlr, your "ParserBaseClass" must also derive from xantlr.X_LLkParser.
When an Xantlr generated parser shall be run, it must not be plugged to an antlrC generated lexer directly, but by an intermediate HistoryToken.
Assuming that myInStream is a java InputStream object which delivers the text to parse, and MyParser and MyLexer are the names of the generated classes, then following code will bring the abstract syntax tree to be printed on the terminal:
MyLexer lexer = new MyLexer(myInStream); MyParser parser = new MyParser(HistoryToken.chain(lexer)); SAXEventGenerator gen = new SAXEventGenerator(parser); parser.setMessageReceiver(new MessagePrinter(new PrintWriter_flushing(System.err))); gen.setContentHandler(new ContentPrinter (new PrintWriter_flushing(System.err), false, false)); parser.topNonTerminal(); |
(Please note: Whenever the result of the parcsing process shall be fed into Tdom , then there is a glueing class <METATOOLS>/xantlrtdom/XantlrTdom, which does all this plugging automatically. See on the co-operation of xantlr and tdom.)
Currently Xantlr is working as a modification of antlrC only in the version 2.7.4 .
Since this version is no longer found in the net, we keep a copy on http://bandm.eu/software/mirror/index.html
In the original antlrC
implementation, the semantic actions
directly called modification methods of a (non-specified) automaton, which
plugs together the generated parser code.
From the sources in antrl.g we created xantlr.g, splitting up
this process into a two phases:
Because the latter is named identical to the parser of the original antlrC implmentation, this splitting operation is transparent to the rest of the antlrC code, which can therefore be executed as ususal.
On this intermediate model the newly defined Xantlr modifications can be implemented in a clean and maintainable way. They include (1) the interpreting of the newly defined options, (2) translating them into semantic actions (i.e. java method calls) and (3) deleting the options for not confusing the genuine antlr process.
These are the sources are processed by antlrC :
xantlr.g ---> ANTLRSyntaxChecker ---> ANTLRParser.jaca visitor.g ---> XANTLRVisitor.java dtd.g ---> DTDGenerator.java extends XANTLRVisitor expander.g ---> XmlRepresentationExpander extends XANTLRVisitor filter.g ---> XmlRepresentationFilter extends XANTLRVisitor |
All these phases are plugged together in AntlrParser.grammar(), created from by antlrC from xantlr.g. Since the name of the generated class is antlr.ANTLRParser, the unmodified antlr.Tool will run as usual and perform all the additional Xantlr tasks "without even noticing it", if only the meta_tools jar-file (containing the new ANTLRParser ) precedes the antlrC jar-file in the classpath.
dtd | bandm meta_tools | tdom |
made
2025-01-09_11h54 by
lepper on
happy-ubuntu
produced with
eu.bandm.metatools.d2d
and
XSLT
FYI view
page d2d source text