[all pages:] introduction message / location / muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal cygwin tips SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf SOURCE:basic.dd2 SOURCE:xslt.dd2
bibliography | bandm meta_tools | white papers 2 |
Collected White Papers on Technical Details
1
Intentions
2
Identifying, Searching and Finding of Resources
2.1
XML Document Identifier
2.2
XML-DIs in Documents
2.3
Locating Documents by URL/URI/URN
3
Locations and Locators, i.e. position IN a particular document
3.1
Locators in SAX
3.2
Locations in the d2d/xml/xslt pipelines
4
XML Namespaces
4.1
Namespace Definition, Encoding in Documents
4.2
Namespaces in SAX
4.3
Reserved Namespaces
5
XML Namespaces in D2d processing
6
Error Signalling and Processing
6.1
General Strategies
6.2
The ANTRL / xantlr / tdom / xantlrtdom Error Signalling Pipeline
6.3
Errors in SAX and JAXP/TRAX
In meta_tools , most areas of processing are neatly separated and cleanly implemented. But there are some processes and data-flows the details of which are not only rather complicated, but also distributed over different modules and layers of the architecture.
(In most cases, this is due to external standards and tools we have to interface !-)
For these topics, a central "White Paper" is desirable which describes at one central place the coding principles (the basic design decisions as well as the details) and sketches the co-operation of the modules involved.
Some of these "little white papers" will be collected here. The collection itself is rather ad hoc, ie. not carefully structured or aiming at completeness. It will grow as soon as new topics are identified as useful and desired.
The XML Document Identifier (=XML-DI in this text) is implicitly defined in
http://www.w3.org/TR/xml11/#NT-ExternalID[=XML]
It is implemented in
meta_tools
in
message.XMLDocumentIdentifier
The "SYSTEM" part must be a URL.
http://webdesign.about.com/od/dtds/qt/tipdoctype.htm
says that in case of a PUBLIC id present
"The rest of the DOCTYPE identifier is optional..",
but the syntax graph in [XML] says different!
The "PUBLIC" part is not defined in the XML standard. Only traditionally it is a "Formal Public Identifier" = "FPI" as defined in SGML (= ISO 8879:1986) / (ISO 9070:1991)
See http://en.wikipedia.org/wiki/Formal_Public_Identifier[=WIKIFPI]
http://www.ietf.org/rfc/rfc3151.txt contains a proposal for mapping FPIs into an URN namespace
Following [WIKIFPI], WE as a domain owner have a registered (!!)
FPI, namely
+//IDN eu.bandm//...
This is supported in some methods in <METATOOLS>/dtd/Utilities.html
A doctype declaration of an XML encoded text consists of a
TAG, namely the tag of the root element, and a DTD reference.
This is in form of an XML-DI.
Our implementation of DTD includes a field for this XML-DI.
This is conceptually NOT CLEAN, because the "system" part of "the same"
DTD may differ between computer systems or even applications or even runs,
while the "public" part must be identical.
Nevertheless, XML requries the system part to be always present, and to
treat both parts as a unit.
The XML-DI of a DTD can be encoded into a PI for tdom, so that TDOM gets to know it. This is done in the WRAPPER DTD, in case a third-party DTD shall be processed by tdom. The TypedDOMGenerator compiles this id into a static final field in the generated sub-class of TypedDTD. It can be retrieved by calling <METATOOLS>/tdom/runtime/TypedDTD.html#getDocumentId()
The method dtd.Utilities.addTdomPI_documentId(Dtd) generates such a PI for a dtd as an internal model.
Whenever a
ContentPrinter
gets the first "startElement" SAX call,
it prints XML version and encoding, followed by a synthesized "DOCTYPE"
declaration. For this, the "document locator" must be have been
set in advance by a call to the method
<METATOOLS>/util2/ContentPrinter.html#setDocumentLocator(org.xml.sax.Locator)
This is a standard function from
org.xml.sax.ContentHandler [=SAX-ContenHandler]
In our context, this method is called e.g. by DUMPER CODE GENERATED by tdom. See eg. <METATOOLS>/src/eu/bandm/tools/doctypes/xhtml/Dumper.java, method "match (Document_html)"
BUT THIS SEEMS AN ABUSE, because [SAX-ContenHandler] says this location is the ORIGIN of the sax events, i.e. the document itself, not the origin of its type definition !?!?!?
The class <METATOOLS>/doctypes/DocTypes.html implements a "URI Resolver" interface. The procedure defintions in "calltools.mk" named "xml2text", "xml2html", "xml2xml" (more may come!) all call "xalan". They attribute the call to the Xalan command line by "-URIRESOLVER eu.bandm.tools.doctypes.DocTypes".
The code of
resolve(uri,base)delivers for requests for URIs like
bandm.eu/doctypes/<XXX>/<YYY> and
http://bandm.eu/doctypes/<XXX>/<YYY>
a "javax.xml.transform.Source" instance object which points to
a "getResouceAsStream()"-resource named <XXX>/<YYY>, -- relative to the
class file of DocTypes.
This is employed e.g. to find some files with xslt-procedures which are included in the main transformation file.
remark 1)
The "PUBLIC ID" (which does belong to us !-) is currently NOT DECODED.
remark 2)
The currently used "CMDLINE-WRAPPER" for xantlr does NOT use the uri decoder
for the top-level (=commandline argument) files! But it should be eliminated anyhow by a wrapper around the JAXP/TRAX interface.
There is an interface definition org.xml.sax.Locator. which offers different methods for inquiring the different coordinates of the current input position (e.g. getColumnNumber(), getSystemId()).
The function ContentHandler.setDocumentLocator(org.xml.sax.Locator locator) should be called "under the hood" by an XML "stream" parser to set the implementing object. It is NOT guaranteed that every XML parser does so.
(This is a more internal topic, but we need a kind of memo, so we put it here!-)
XML "Namespaces" and "Namespace Names" are complicated because they reflect a complex history of parallel/sequential, competing/co-operative, clean/hacking developments. Different interfaces and third-party-modules (eg SAX, Xalan) impose different constraints on their input wrt. name spaces, and these are not always fully and exactly specified!
Basically, some tools and sub-specifications of the XML family do not consider namespaces at all (e.g. the XML base specification, the DTD mechanism, [xml]). Others can work in two different modes, either namespace aware or not. A third group does always consider name spaces (XPath, XSL-T).
Whenever name spaces are considered, this applies to all kinds of identifiers, namely element tags, attribute names, and (eg. in XPath and XSLT) names of variables, functions, templates, modes, etc.
Our class NamespaceName in a first step seems to make things even more complicated, to make them more easy, namely interoperable, in a second step! The basic philosophy of this class it trying to combine requirements from different sources:
In XML in general [xml] , in non-namespace mode the colon ":" is a normal character which may occur arbitrarily often in any identifier (tag or attribute name, etc.)
But in namespace mode there may be maximally ONE(1) colon ":", separating a non-empty prefix and a non-empty local name, see [xml-ns] .
Identifiers without colon have an "empty prefix", but the canonical notation ":localname" is nevertheless not allowed.
In every textual representation of a document, every prefix, including the empty prefix, must be mapped to a certain namespace URI. This namespace URI defines the "identity" of a certain namespace, i.e. is used for "equal()" tests. The assignments from prefices to URIs are valid in nested scopes. In an textual representation of a document, such a scope is defined by the contents of an element (including its own name and the names of its attributes, see below).
A special case is the empty URI "" which can only be represented by the
empty prefix.
Per default, the empty prefix is mapped to the empty URI.
Further mappings are established (in the textual representation)
using the following syntax, which looks like an attribute:
<pref:ELEMENT xmlns="http://uri1" xmlns:pref="http://uri2"> ... </pref:ELEMENT> |
In this example, the empty prefix is mapped tu "uri1", the prefix pref to "uri2". This mapping is valid "backwards", already for the tag of the containing element!
These assignments only look like attributes, but are no attributes in the sense of
SAX and DOM and tdom.
(FIXME says they ARE attributes, but FIXME says they are not!?)
<ELEMENT xmlns="" xmlns:pref="" > |
Here the first assignmentis valid, the second NOT.
The mapping of the empty prefix is ONLY VALID for element tags, not for attribute names. The latter stay with the empty URI iff they have an empty prefix. This table shows the allowed combinations:
pref="" | pref non-empty | |
uri="" | ATT | -- |
EL | -- | |
uri non-empty | -- | ATT |
EL | EL |
An instance of our implementation,
NamespaceName, represents
all possible identifiers in an XML context, either in namespace mode or
in non-namespace mode.
In namespace mode MUST have a local name and a (possibly empty) URI.
It MAY additionally have a prefix value.
All field values are always != null. All arguments to a constructor call must be !=null ! "Not having" means being the empty string "". This corresponds to the requirements of SAX.
Following combinations are possible, selected by different constructor signatures.
URI localName prefix enableNameSpace new NamespaceName(uri,localname) "NONEMPTY" "NONEMPTY" "" true "" "NONEMPTY" "" true new NamespaceName(uri,prefix,localname) "NONEMPTY" "NONEMPTY" "NONEMPTY" true "" "NONEMPTY" "NONEMPTY" --> ERROR new NamespaceName(qname/localname) "" "::NON::EMPTY::" "" false ("NONEMPTY" does not contain colons!) |
Only URI and localname are considered for the definition of "equals()"! As a consequence, non-namespace-names and namespace names in the no-uri-space are considered EQUAL iff they have the same local name!
(
The prefix is only used to memorize some previous external representation, for
writing out the name with the same prefix again, iff possible. It is only
used for "ergonomic" reasons, but semantically not significant!
When later serializing namespaces, it can serve as a MERE HINT for construting
an external representation, resembling the originally read in.
It is not really
clean to keep it here, but the basic philosophy of namespace names is not ours.
Anyhow, this partitioning is much less redundant and intermangled than
the SAX "qnames", which contain the prefix and double the local name.
)
The two kinds of SAX events which represent tags (startElement() and
endElement()) have "uri", "localname" and "qName" as three independent arguments,
e.g. to the "startElement()" method in
ContentHandler.
Its doc text says (not very clearly):
"If the http://xml.org/sax/features/namespaces-property is true (default!),
then uri and localname must be provided.
If false, both may together be there or may be not there.
If the http://xml.org/sax/features/namespace-prefixes-property is true
then the qName is required.
If false (default!) then it is optional."
To repeat: "not being there" always means being the empty string value, but not null!
It is not clear in which mode which data takes which precedence! So, to be on the safe side, the all arguments should be supplied redundantly when using badly documented third-party software. Esp. "qname" should comply (redundantly) to the current mapping of prefices to uris.
SAX includes the events
startPrefixMapping(String,String)
and
endPrefixMapping(String)
to grow and shrink this map.
It is NOT SPECIFIED what an "start" does when a prefix is already in use,
and whether an "end" will let old, shadowed mappings pop-up again.
OUR code which consumes SAX events treats these questions systematically and in a compositional way.
The api doc of org.xml.sax.ContentHandler.startPrefixMapping(String,String)
says :
"There should never be start/endPrefixMapping events for the
"xml" prefix, since it is predeclared and immutable."
But all library functions which make up the SAX processing pipeline do not treat the "xml:" ns-prefix specially. It is NOT YET CLEAR at which points, and how often, an explicit
startPrefixMapping("xml", "http://www.w3.org/XML/1998/namespace") |
is permitted, or even required!
FIXME !!
It is also not yet clear how to avoid the following situation
when translating from a less idiosyncratic model, e.g. writing out
d2d "Udom" structures:
If a (non-empty) namespace-uri is assigned to the empty prefix, this is fine for all
elements. But when later an ATTRIBUTE has to be written with this namespace-uri,
then that must be matched to some non-empty prefix
additionally to the empty one.
So in this case, "synonyms" seem really necessary.
((
Perhaps it would be the best strategy to leave the empty prefix immutably
assigned to the empty URI, because the latter cannot be expressed in another way!?!?
And, even better, do never use the empty URI, if not required by some
legacy format!?
))
Generally, our SAX processing pipelines operate in namespace mode. The value of "prefix" is normally ignored. But there are some applications in the outer world which do require a certain prefix! In this context, the original prefix (eg. recognized by a third-party parser and communicated to a metatools internal structure involving NamespaceName instances, via a SAX event) can be stored in the "prefix" field of the NamespaceName instance. It then can be passed on transparently, and at last will be used on final re-serialization. BUT this only as a hint, not carrying real semantics!
The serialization of NamespaceName instances includes the emitting of
the corrsponding "prefix mapping" events.
In
meta_tools
, the serialization is performed by
ContentPrinter.
But this device is NOT namespace-aware, it prints only q-names!
So, for to use it in namespace mode, a
NamespaceEmbedder
is pre-poned! This consumes the start/end mapping events, keeps track of all
open scopes, and maps them to "PSEUDO-ATTRIBUTE" definitions. It
fills in the resulting qName for all attributes and elements. This is
of course an ABUSE of the SAX interface and does only work because
the behaviour of our ContentPrinter is known and specified BEYOND the SAX convention!
((
Consequently, in a first version, this code
did remove the Uri and the local name from the sax events. But this DOES NOT
WORK for Xalan Template Construction, see below, so for this case
we have to produce redundant output!
))
Its code requires that there are NO HOMONYMS, ie. nested assignments of the same prefix to different (or the same?) uris. To ensure this, NamespaceHomonymResolver can be included in the pipeline.
The Udom2Sax-serializer from d2d, eg, looks only whether the Uri of the namespaceName to write out next is already mapped to any prefix. If so, it does nothing! Only if not, a currently unused prefix is selected and assigned to this Uri. For this, the prefix stored in the namespace name is the first candidate.
In d2d namespaces can be declared in genuine "ddf" definitions
by prefix and ns-uri. Both must be non-empty.
In the "with xmlrep element = <XMLTAG>" part, the xml tag can have
a prefix, which then will be mapped back to the declared namespace uri
to construct a namespace name.
If there is no prefix in the xml tag, and no namespace has been
declared to be "default" then, then a namespace name is constructed
with empty prefix, mapped to the empty uri.
Otherwise, the declared default is taken.
If a dtd is imported to d2d as a definition module, then the "<?tdom xmlns:... ?>" namespace declarations are respected and required.
The following graphic tries to summarize the situation between d2d output and ContentPrinter:
STARTPREFIXMAPPING uuuu pppp STARTELEMENT uuuu llll ENDELEMENT uuuu llll ENDPREFIXMAPPING pppp ==================[namespaceEmbedder] ======== > STARTPREFIXMAPPING uuuu pppp STARTELEMENT uuuu llll pppp:llll ATTRIBUTE ---- pppp xmlns:pppp = uuuu ENDELEMENT uuuu llll pppp:llll ENDPREFIXMAPPING pppp //original intention of namespaceEmbedder, to be used with content printer, // was different, namely: ==================[namespaceEmbedder] ======== > STARTELEMENT ---- ---- pppp:llll ATTRIBUTE ---- ---- xmlns:pppp = uuuu ENDELEMENT ---- ---- pppp:llll |
((
The original design did NOT WORK when the sax events were piped into
a JAXP-TemplateHandler receiver for constructing an XSLT Transformer.
This SEEMS TO require the redundant version, as we found out
after two days' reverse engineering !-(
))
The most frequent reserved namespace is
xmnls:xml="http://www.w3.org/XML/1998/namespace" |
It is special in very different concerns, which make it hard to treat it consistently:
See [NS in XML:sect 3, last paras].
Esp. the last points are crucial, because our normal pipeline cannot treat this prefix and uri in the canonical way. Current firefox (36.0.1) indeed rejects documents which attempt this. VERY UGLY!
In the context of d2d , there are some very different settings involving name spaces:
1--
The fundamental taks of d2d is to produce an XML encoded output from a "readable" d2d source. This is described in d2d.html in detail. A dedicated section describes how name spaces are defined in the text type definition, and how element and attribute definitions can refer to name spaces (via prefices) when defining an explicit xml tag.
2--
When a DTD is used to define the text format and the parsing rules, then, for namespaces to be active, the tags(=element names/attribute names) in the DTD must have to the form with one colon ("ab:cde"), and there must be TDOM-PIs like "<?tdom xmlns:ab="nameSpaceUri" ?>" in the DTD. The d2d parser will do as it alway does: accept (and need) only the pure "local" tags, like "cde", and generate the XML output in the intended name space.
3--
When a d2d
source text is used to describe an XSLT transformation, then
the target format is given by the header:
"#d2d 2.0 xslt text producing <module> : <toplevelelement>"
By this the (zero to more) namespace declarations occuring in the target text type
are known, i.e. the corresponding mappings of prefices to uris.
The namespace declaration for xslt itself is known anyhow and uses some fixed prefix.
But additional name spaces can be required, namely those occuring the input document, to which the defined xslt code later shall be applied, or for calling some "tpath runtime extension functions", which can live in any name space. For this the xslt mode defines an additional syntax, see the dedicated section in the d2d xslt mode documentation.
4--
When the xslt rules are defined in the ddf text format definition (with "docu to_<X>" constructs, as explained in the d2d documentation), then the namespaces of the input document are known, because this is the d2d model just created, and the target type is indicated by the "<X>" in the transformation definition.
All additional name spaces needed for tpath extension functions etc. can currently only be added by extending the namespace declaration mechanism: All namespace declarations of all modules involved will be combined into the namespace declarations of the synthesized xslt source. Of course this is not an optimal solution, because conflicts of prefices must be "manually" avoided in advance.
(An automated solution is of course possible as soon as the xslt sources are parsed by ourself and not let to an external tool. But this is not the general case.)
5--
Sometimes a d2d
text type definition (ddf) is translated into a DTD.
This is executed as a "Task" from the d2d
man tool, see
the tool doc.
(This is frequently done to compile the DTD via
TDOM into a strongly typed Java
model.)
In this case, the name spaces declarations from the ddf text type definition will re-appear as "<?tdom xmlns:ab=".." ?>" process instructions, as described above.
Errors and Exceptions can be treated by different mechanisms, depending on the usage context of the code, whether it is sensible to continue the operation of a certain module after the error, and depending on the size of the software module.
Small, low-level and general purpose modules may signal an error by throwing an exception without any declaration, ie. a subclass of <JAVA-API>/java.lang.RuntimeException. With throwing an exception, the execution of the current method call is finished. So this is really a very limited protocol.
Then some higher layer in the software architecture is responsible for translating this error into some diagnosis sensible for a user. An exception should NEVER reach the user's level.
Larger, specialized software modules get a MessageReceiver as a constructor parameter. They send warnings, errors and hints explicitly to this channel.
This can happen arbitrarily often, without terminating execution of the module's functionality.
Additionally, a result may be set ==null or to some other dedicated value, to indicate that the execution has failed and any pipeline processing cannot be continued without special reactions.
The message receiver can "tee" the messages to a MessageCounter, for easily detecting if severe errors have occured in a transformation step.
MORE TO COME
org.xml.sax.XMLReader defines a method "setErrorHandler(org.xml.sax.ErrorHander)".
Such an error handler must implement three methods, (warning(), error() and fatalError()), all parametrized with an SAXParseException object.
[all pages:] introduction message / location / muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal cygwin tips SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf SOURCE:basic.dd2 SOURCE:xslt.dd2
bibliography | bandm meta_tools | white papers 2 |
made
2018-12-30_11h02 by
lepper on
linux-q699.site
produced with
eu.bandm.metatools.d2d
and
XSLT
FYI view
page d2d source text