[all pages:] introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]
All pages: introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]
metajava | bandm meta_tools | option |
umod --- an Automated Model Generator
(related API documentation: package umod.runtime )
1
Purpose and Way of Operation
2
Input Notation Syntax and Generated Java
Classes
2.1
Model Declaration
2.2
Import Declaration
2.3
Enumerations
2.4
Class Hierarchy
2.4.1
Extending and implementing external classes.
2.4.2
Declaring Classes as ABSTRACT and ALGEBRAIC
2.4.3
Verbatim Java Source Text in Class Definitions
2.4.4
Per-Class Generated Methods: doClone() and initFrom().
2.5
Field Definitions
2.5.1
Field Names
2.5.2
Initial Field Values
2.5.3
Per-Field Generated Methods
2.5.4
Abstract Fields, Generalized Setter and Getter Methods
2.5.5
Pragmas for Field Definitions
2.6
Types
2.6.1
Primitive Types
2.6.2
Reference Types
2.6.3
Constructed Type OPT
2.6.4
Constructed Types in General
2.6.5
Special "Un-Curry-ed" Treatment of Cascaded MAPs
2.6.6
Overloading of a null function result
in standard java runtime libraries
2.7
Documentation
2.8
Constructors and Default Values for Fields
2.8.1
Ambiguity in Overloading Resolution for Constructors
2.8.2
Defining Constructors with Verbatim Inserted Java Source
2.9
Pattern Handling Methods
2.10
Visitors
2.10.1
Common Base Class and Calling of Visitors
2.10.2
"Simple Kind" Visitor
2.10.3
Multiphase Visitor
2.10.4
Rewriters
2.10.5
Rewriting of Aggregates
2.10.6
Visiting and Rewriting "null"
2.10.7
Diagnosis
2.10.8
Optimization
2.11
Visualization
2.11.1
User-Defined Visualization by the Modifier TOSTRING
2.11.2
User-Defined Visualization by the Modifier FORMAT
2.11.3
Automated Swing Tree Generation
2.11.4
User-Directed Visitor-Based Dump Routines
2.12
XML Encoded Serialization/Deserialization
3
Using the umod
Tool
3.1
Command Line Options
3.2
Splitting the input text into input files
3.3
Error Messages
umod compiles a data model from a high-level definition language into java code. The input language is designed for utmost compactness. Although propietary, it closely follows mathematical notation.
Java is a rather verbose language. So the definition of large data models is a rather tedious, monotone and error-prone task, requiring lots of redundant typing. umod automates this typing process, by generating from an compact, non-redundant notation the vast amount of code which is needed to realize ...
Some of these features can be enabled separately, according to the user's need.
In contrast to other similar projects, e.g. pizza, umod is not realized as a pre-processor to arbitrarily arranged java source files. Instead, it is a compiler which generates one(1) model from one(1) single, central source file. (This is true conceptually; technically see Section 3.2.)
This approach has both advantages and disadvantages. E.g. you cannot use the convenient umod notation for complicated nested typed collection classes ad hoc in any source file, but only in such a central model definition file.
As an advantage you have one central compact definition of the main model of a project, --- mostly not much more than one single screen page! This turned out to be really helpful when developing the further code or writing documentation.
Please note:
1) Some consistency conditions are easily checked by the java compiler, but
only at much higher costs by the umod
tool. Therefore in most such
cases the corresponding checks are
left to the former, and error messages will come from the attempt to
compile the generated source text, not from umod
generation process.
This requires some experience and "type checking" skills from the user
(but still much less than the C++/STL error messages !-)
1
2) The generated code can always be by-passed by inserting verbatim java code. So the data models generated by umod are widely, but not totally fool-proved. The usage of umod does not replace responsibility and survey.
The input syntax for umod
definition files is somehow graphic-oriented, for the
sake of maximal compactness and clear arrangement.
The umod
definition file is intended to be also used as a documentation, esp.
for the programmer during their coding work.
A typical example, which also demonstrates most of the syntactic means described in the following, is the umod model for DTDs, --- see its pretty-printed umod source.
For the basic data types, i.e. the elements which make up the data model, umod supports two flavours of definitions:
Additionally, there can be
After their declaration, most of these entities are referred to by an identifier.
identifier ::= identifier_lower identifier_upper |
identifier_lower ::= lowerCaseAlpha lowerCaseAlpha upperCaseAlpha digit _ |
identifier_upper ::= upperCaseAlpha lowerCaseAlpha upperCaseAlpha digit _ |
There is only one single name space for the identifiers of all these different categories, and duplicate usage of an identifier will yield an error.
All umod
identifiers must start with a Latin character, and may be continued with
characters, Arabic digits and the underscore "_". Consequently, all generated
Java objects with an identifier starting with an underscore do not correspond
to a umod
definition, but are additional and ancillary.
In umod
there is a difference between lower and upper case identifiers:
Class, enumeration and visitor
names must start with an upper case, field names must start with lower case.
External declarations, enumeration items and types can be lower or upper case.
The top level syntax of each umod source file contains the declaration of the model name, like ...
MODEL myModel = .. .. END MODEL |
The underlying grammar can be described as ...
umodFile ::= MODEL identifier_upper = docEntry modelLines END MODEL |
modelLines ::=
visitorDeclaration
importDeclaration
typeDef enumDef toplevelClassDef extendClassDef interfaceDef |
The umod compiler then generates source files of a certain package (in the java sense.) The name of the package must be given as a command line parameter when calling the tool, cf. Section 3.1 below.
For each class definition and type definition the umod
compiler generates
one single java class.
Additionally, it generates sources of further java classes, realizing
the different purposes mentioned above and explained in detail
further below, e.g. visitors, serializers, visualizations, etc.
The umod compiler can work either in package mode or in monolithic mode. This mode also is selected by a command line switch.
In package mode all generated classes are contained in the named package, and each class is contained in its own source file as a top-level public class. Additionally, a model class with the name of the model is created, which only contains some central entry points and attributes valid for the model as a whole.
In monolithic mode, this model class is created in the same way. But all other generated classes are realized as static inner classes (in the java sense) of that one model class.
Attention: Type definitions "should work", but have not yet been tested thoroughly and are not covered in the following documentation. Up to now, our extensive applications of umod did not make them appear really necessary.
An umod model can refer to any existing java class for defining field types and data types, and for declaring them as superclasses to and interfaces implemented by a model class.
This is prepared by an import declaration :
importDeclaration ::= EXT SOURCE identifier = pathToClass |
pathToClass ::= identifier . identifier |
The identifier must begin with an ascii character, either upper or lower case,
and can contain ascii charactes,
digits and the underscore "_".
The pathToClass is the fully qualified class name (in the jave sense). In the rest of the data model declaration this identifier will be used to refer to the specified external class, e.g. when declaring superclasses, interfaces or field types.
If the modifier SOURCE is not given, then the class must be reachable and is loaded by the umod compiler.
umod uses a second, dedicated class loader for this purpose. In case that the class loading context in which umod itself is running, and the future context of the generated classes differs, than this class loader can be parametrized with a command line switch, cf. Section 3.1.
The modifier SOURCE must always be added whenever the external class cannot be loaded at all during the run of umod . In this case the umod compiler will insert references to the class without testing whether it is reachable. The user has to care that the java compiler will later reach either the source or the class file.
As auxialliary class, as field values, umod supports the easy definiton of simple enumeration types.
enumDef ::= ENUM identUpper docEntry = enumitem , enumitem |
enumItem ::= ident docEntry |
The identifiers can be used in the type language for fields, as described in Section 2.6.
The syntax of class definitions is somehow "graphic oriented", for sake of compactness. The fragment of input text ...
TOPLEVEL CLASS A | B1 | | C1 | | C2 | B2 |
...defines "graphically" a generated hierarchy of classes,
namely A extending java.lang.Object, B1 and B2 extending
A, and C1 and C2 extending B1.
In contrast to Java
, Class names must begin with an upper-case letter.
Character sequences
which are used by the umod
front-end syntax as keywords, as described in
this document (e.g. "TOPLEVEL", "CLASS", "JAVA", ...)
are rejected as identifiers. This rejection is (currently) implicit by
the parsing process, and reported as a syntax error.
Please note that, inspite of the graphic appearance, the syntax definition
and the implemented parser of umod
do not treat line-breaks as
significant.
The same example could have been written as
A | B1 | | C1 | | C2 | B2 |
For sake of clarity the definitions can be broken into sub-graphs, which are automatically combined, like:
TOPLEVEL CLASS A | B1 | B2 EXTEND CLASS B1 | C1 | C2 |
The appearance of each class definition in the first block should leave out most details, so that this top-level class tree can serve as a dcoumentation for the basic skelton of a model. Then more details may be added later, in one or more "EXTEND CLASS" blocks.
Every field definition (see Section 2.5) can be placed individually at either appearance of "B1". The same holds for every modifying attribute (see Section 2.4.1). Please note that an extensive usage of this feature will result in less instead of more clarity.
The syntax of class definitions can be described more formally as ...
toplevelClassDef ::= TOPLEVEL CLASS classdef |
extendClassDef ::= EXTEND CLASS classdef |
classDef ::= identifier_upper
classModifieres
docEntry
fieldDef fieldDoc superField subClassesDef |
subClassesDef ::= | classDef |
Following the class name there can appear different modifying attributes, defined by ...
classModifiers ::= ABSTRACT
ALGEBRAIC
EXTENDS idenifier IMPLEMENTS idenifier |
Their usage is explaind by the following examples:
TOPLEVEL CLASS A EXTENDS SomeExternalClass | B1 IMPLEMENTS Interface0, Interface1 |
Only toplevel class definitions can be given an explicit superclass
by the EXTENDS keyword. It no such is given, then a toplevel class
extends java.lang.object.
Only external classes can be used for such a superclass declaration.
Every class can be given a list of interfaces it implements
by the IMPLEMENTS keyword.
Only external classes can be used for interface declaration.
TOPLEVEL CLASS A ALGEBRAIC | B1 |
Every toplevel class can be declared ALGEBRAIC. This enforces "algebraic semantics" on the equality relation. Consequently, in the generated code an equals() method is included which defines equality by comparing all field contents and ignoring missing pointer identity. A hashcode() method is constructed accordingly.
The algebraic property is automatically distributed to all sub-classes of the class it appears with.
Currently it is not defined what an ALGEBRAIC modifier does when applied to a non-toplevel class, and the authors can not imagine what this should mean !-)
Currently it is not defined what happens when a float field appears in an algebraic data type, because floats do not have a precise and canonical notion of "identity". There must be some "epsilon" value to implement algebraic identity, and we do not yet know where to get this from.
Please note that the algebraic semantics of class definitions which have fields of compound types (sequences, sets, maps, etc.) rely on the correct implementation of the "equals()" method in the corresponding runtime libraries. For those employed as default by the automatically generated code, this is guaranteed.
Even more important: the objects which realize the values of these fields, i.e. the employed collection objects are still modifiable! This is of course not optimal, since in-place update are forbidden and would better be prevented by the generated code. But the alternative would have been to choose as default the copy of these container objects into non-modifiable variants. But since this must be done fully recursively, it can come out to be very expensive, so we decided that the programmer stays responsible that all collection objects which are referred to from any algebraic umod value indeed stay un-altered!
Every class can be declared ABSTRACT. This (a) is translated into
an "abstract" declaration for the java compiler, and (b) some
parts of the code will not be generated for this class, e.g. constructors.
The ABSTRACT attribute does only apply to the class level it appears
on explicitly.
You can insert free java source text into each
class definition. This text will be inserted "verbatim" into the generated
java class. It is subject to syntax check and re-formatting.
This is implemented by calling the
GeneratedClass.addDeclarations()
method
from the metajava model.)
It is not subject to context check or type check. Therefore some kinds of
error will be reported by the subsequenct attempt to run the java compiler.
TOPLEVEL CLASS A | B1 JAVA public String myfunction(int i){ return ""+this+i;} $$ | B2 |
Alternatively, you can can insert free java source text into the java source generated for the top-level, model representation class:
TOPLEVEL CLASS A | B1 <<JAVA public static String myfunction(int i){ return ""+i;} $$ | B2 |
Please note that also this construct has to appear inside the class hierarchy, in spite of resulting in top-level code. It can not be placed on the syntactic top-level of the umod input file.
Both kinds are esp. useful for declaring instances of model classes, because umod itself has no language constructs on the instance level:
TOPLEVEL CLASS A | B1 <<JAVA public static final B1 CONST_B1 = new B1(); | B2 |
A special case is the toString() method (see also Section 2.11.1), which can be defined by simply giving the method body:
TOPLEVEL CLASS A | B1 TOSTRING JAVA return "[B1:"+this+"]"; $$ | B2 |
((Remark:
The syntax JAVA...$$ is certainly not very pretty.
A markup of this kind is required,
because these java fragments are by-passed
already on the lexer level of the employed antlrC
lexer/parser architecture.
))
For each generated class "C" umod provides ...
class C { ... public C doclone(){ } public C initFrom(Object o){ } |
"doclone()" returns a shallow copy of the object it is called upon.
"initFrom()" copies the values of all those umod defined fields from the argument object o to the object this , which are defined on the level of the "most special common superclass" of both objects, and on all levels above.
Field definitions are interspered into the class definitions. They follow the syntax
fieldDef ::= identifier_lower
abstrfield
type
defaultValue
fieldPragmas docEntry |
fieldPragmas ::= ! traversalPragma constructorPragma ; |
Field name and type must be given.
For every field declaration (in the umod source) the java class generated for the umod class definition will be given a corresponding field (in the java sense), together with a zoo of getter, setter and auxiliary functions, depending on the type of the field.
E.g.:
A | B1 b11 int = "12" b12 int = "my.package.Global.function(\"string\")" | | C1 c1 MAP string TO C2 | | C2 | B2 |
...defines an attribute for the class B1 which is named b1 and has a simple Java type of int, and an attribute of class C1 which is named c1 and is a aggregate type, namely a map from string values to references to objects of class C2.
In contrast to Java ...
The generated java classes will contain a field with the given name.
There is no mangling of field names.
Therefore all lower-case identifiers which are reserved words in java,
are not allowed as field identifiers and will be rejected (in contrast
to forbidden class names, see above, which do not pass through the
umod
parser) explicitly,
during the context analysis phase.
The fields themselves will nevertheless not be accessible. In package mode they will be declared "protected", and in monolithic mode they will be declared "private".
Instead, code for getter and setter methods will be produced. This guarantees certain integrity conditions, esp. strictness of non-opt values, see Section 2.6.3 below.
Initial values can be given to each field.
They have to be denotated as string constants in double quotes which
contain directly inserted java source text (see example above).
Double quotes and backslashes can be used in the contained text by
escaping them with backslashes.
The java source text is undertaken a syntax check, but no type check.
The syntax check is done by calling
<METATOOLS>FormatClosure.expression()
from the metajava model.
Most errors in these initialization text will be reported by the
subsequent run of the java compiler.
For each field declaration
EXTEND CLASS C f T |
...there will be ...
class C { ... public T get_f() {...} // returns the current value public boolean set_f(T arg) // raises umod.runtime.StrictnessException iff arg==null // and T is not "OPT xxx" // returns true IFF a change is caused by the assignment, // ie. oldvalue!=newvalue {...} ... } |
If the command line switch (see Section 3.1) "--getterfunctions" is set to true, there will be additionally ...
class C { ... public static final ops.Function<C,T> get_f = new ops.Function<C,T>{ public T apply(C c){return c.f;}} ; ... } |
If the command line switch (see Section 3.1) "--setterfunctions" is set to true, there will be additionally ...
class C { ... public static final C opx.Consumer<C,T> set_f = new ops.Consumer<C,T>{ public C consume (T arg, C state){ state.set_f(arg); return state; }} ; ... } |
Both these objects are very convenient for using the elegant way of programming offered by metatools' "ops" package.
By inserting an abstrfield construct into an fieldDef, a "virtual" field can be declared:
abstrfield ::= ABSTRACT GETTER SETTER |
The ABSTRACT keyword alone
causes the generation of both a getter and a setter method.
No field is generated, but these functions rely on the content
of a field defined on a subclass level.
With the keywords GETTER and SETTER you can restrict
the generation to that method.
The definitions of the concrete fields may employ
a more specific type, iff the type is a model element class and
the more specific type a sub-class thereof.
The concrete fields do not need to be present in every branch of the
sub-class tree.
The different cases are as follows:
B | B1 | B2 A f ABSTRACT B g ABSTRACT OPT B ¦ A1 f B1 g B1 ¦ A2 f B2 g OPT B2 | A3 |
This code will lead to getter and setter functions in A, A1 and A2, with different Java signatures:
class A { ... B get_f(); boolean set_f(B); B get_g(); boolean set_g(B); ...} class A1 { ... B1 get_f(); boolean set_f(B1); B1 get_g(); boolean set_g(B1); ...} class A2 { ... B2 get_f(); boolean set_f(B2); B2 get_g(); boolean set_g(B2); ...} |
For A1 and A1, the getter functions will simply
return the current value of the corresponding field.
"A3.get_g()" will return null as the default value for every
OPT type.
"A3.get_f()" will throw an UnsupportedOperationException, since
the value can not be delivered, and there is no global default.
With the setter functions its a little bit more complicated, but also quite canonical:
x.set_f(a) [/x.set_g(a)] a.class== null B1 B2 x.class== A1 XPstrict OK XPtype A2 XPstrict [/OK] XPtype OK A3 XPunsp XPunsp XPunsp OK = store value and return change flag, as usual XPunsp = a special Unsupported Operation Exception XPstrict = the special umod Strictness Exception XPtype = "normal" java runtime typing error, "class cast exception" |
This shows that this feature does loose some static type safety. But in practice this turned out to be not really significant, and the benefits of more specialization in the subclasses and elegant abstraction pay out.
A common practice is to define an abstract field on the level of some superclass and to realize its getters and setters by a umod field definition in some of its sub-branches, but by verbatim given Java methods in some others, see Section 2.4.3.
It is not necessary for non-algebraic types
that a declared abstract getter function is also implemented
in every subclass, as long as it is not used during program execution.
(But this seems bad programming style ?-)
But it is necessary for algebraic types, because their "hashcode() method
does call the getter function for each abstract field and not for the concrete implementations.
The pragmas are used to control the generation of constructors and visitors, and are explained in detail together with these, cf. Section 2.8 and Section 2.10.
The same kinds of type expressions can be used to declare the type of a field of some class, or for creating Java classes on their own with a type definition statement. In both cases the syntax is ...
type ::= reference primitiveType constructedType |
reference ::= classReference enumReference typeReference externalClassReference |
classReference ::= // a reference to a class (by its identifier) defined locally in this model |
enumReference ::= // a reference to an enumeration type (by its identifier) defined locally in this model |
externalClassReference ::= // a reference to a class declared as "EXT" in an importDeclaration |
primitiveType ::= int float char string bool |
constructedType ::= OPT type type -> type MAP type TO type type <-> type REL type TO type SET type SEQ type type * type type + type type / type |
The primitive types are mapped to Java types currently as follows
umod : | int | float | char | string | bool |
Java : | int | double[float] | char | String | boolean |
boxed type | Integer | Double[Float] | Character | --- | Boolean |
missing/not yet supported: | long |
Please not that "string" in the umod sense is a scalar type, and therefor written with lower-case intial character.
The command line switch --floatNotDouble t makes that the umod
"float" type is realized by a Java "float".
Cf. Section 3.1. Without this, the Java type
"doublle" is employed.
(The Java types "long", "Biginteger" and "BigDecimal" are currently
not yet supported.)
Reference types are denotated by identifiers.
They have to correspond to either an external declaration, a class definition or
a type definition.
They are translated into a reference to a java object of the corresponding
java class, but excluding any reference to "null", cf. Section 2.6.3.
In java reference types implicitly always contain the additional object called "null", but all primitive types never do. With umod this is treated in a more orthogonal way: Types of both kinds do not include the null value. But by applying the OPT constructor you get a type which is "optional", i.e. which includes the value "null" as an additional value in its "carrier set".
The OPT type construcutor does not create new java class definitions on its own, but it (a) modifies the code realizing the "set_<>()" functions and the constructors, and (b) selects which proxy class will be selected for aggregate types (lists, sets, maps, etc.), when applied to their argument(s).
By these means a umod model always guarantees strictness, i.e. that a value the type of which is not OPT will never take the value null.
All type constructors are fully compositional, i.e. can be nested arbitrarily !-)
(Of course, some combinations do not make any sense, e.g. a multiple application
of OPT, which is idempotent.)
Most constructed types are translated into parameterized instances of generic classes, either directly from the "java.util" zoo, or from our own proxy classes in umod/runtime.(The latter are needed to guarantee the strictness condition ("!=null") for all fields and values which are not of type "OPT".)
In both cases the interfaces for constructing, changing and inquiring follow the interface definitions of the corresponding collection types from "java.util".
The different type constructors and their notation are ...
SEQ t | Sequence (= list). |
SET t | (Final) power set. |
MAP t1 TO t2 t1 -> t2 |
final (possibly partial) map |
REL t1 TO t2 t1 <-> t2 |
multimap, as defined in ops/Multimap. |
t1 * t2 | pair, i.e. simultanuous combination of two instances of the two types. |
t1 + t2 | co-pair, i.e. alternative selection of left or right side. |
The default value for every field of a (non-optional!) aggretate type is an empty instance of this aggregate, cf. Section 2.8 below.
Whenever a new instance for such a field needs to be created explicitly, the constructor call of the correctly instatiated run-time class must be hand-coded explicitly. This can be very tedious, cf. CheckedMap_LR<String, CheckedMap_L<Integer,CheckedSet<Integer>>>. It may be easier to create a dummy instance of some class definition and make a get_<field>()" for retrieving a correctly typed empty instance.
In case of sparse data and for sake of efficiency, on the conceptual level oftenly the following type transformation, called "Currying", is applied:
(A * B * C) -> D ==> A -> B -> C -> D |
So the data is realized as a map of maps of maps, but the way we want to operate is "on one single three-dimensional" map. This interpretation requires the following operations:
m.containsKey(a,b,c) = m.containsKey(a) ? m.get(a).containsKey(b) ? m.get(a).get(b).containsKey(c) : false : false m.get(a,b,c) = m.get(a).get(b).get(c) m.put(a,b,c,d) = m.get(a).get(b).put(c,d) |
Let "F" be the same of a field definition and "o" an object reference. Each field can be defined as "strict", non-null simply by not prefixing its type with "OPT". So the default for any top-level field of type "map" is an empty map. But this is not the case on the further levels of nesting: Initialy, the map "o.get_F().get(a)" does not exist, i.e. "a" is not contained in the domaim ("as a key") in the map returned by o.get_F(), and o.get_F().get(a).get(b) consequently throws a null pointer exception.
To support this "un-curried" view to the map, umod generates code for these two methods which are safe: You always can call "o.put_F(a,b,c,d) ", and the necessary intermediate maps will be constructed automatically. You always can call "o.containsKey_F(a,b,c)". If this returns true, then you can safely call "F_get().get(a).get(b).get(c)".
The analog mechanism exists for sets, lists and multimaps:
A | f A -> B -> SET C | g A -> B -> LIST C | h A -> B <-> C |
...generates code for ...
class A { .... public void add_f (a, b, c) {// add c to the set selected by a and b, and create this and all intermediate maps iff necessary public void add_g (a, b, c) {// append c to the end of the sequence selected by a and b, and create this and all intermediate maps iff necessary public void add_h (a, b, c) {// add c as a value for the key b to the multi-map set selected by a, and create this iff necessary |
Please note that for the standard java implementations it holds that
m.containsKey(a) == false ==> m.get(a) == null |
We do not follow this rather confusing overloading of "null". Indeed, it does not make real sense when thinking in a "strongly typed way": In case of A->B (with B not-optional) you want to be guaranteed never to get a null. In case of a->OPT B, a value of null contained in the map, and the key not being in the map at all, are two very different things.
Therefore with umod the attempt of a "get" without "containsKey==true" is undefined, and may result in an exception.
Documentation can be entered in the umod source. It will be attached to the generated code as "Java doc comment", and thus re-appear when generating API documentation of the generated sources by applying "javadoc" etc. Therefore the usual stylistic rules for the writing of doc comment should be considered, esp. that the first sentence up to the character sequence ". " is quoted in survey tables.
The docEntry defines the format for any doc text entry.
docEntry ::= DOC characters $$ |
As shown in the rule umodFile, documentation can be attached to the module as a whole. Similar this is allowed by rule classDef for every class definition, by fieldDef for every field definition, by enumDef for an enumeration type as a whole, and by enumItem for every single enumeration value. Addtionally, the construction fieldDoc has been introduced especially into the definition classDef for the separation of field declarations and their documentation.
fieldDoc ::= identifier_lower DOC characters $$ |
For all these places, more than one such construct may appear: They will be concatenated in text order.
Whenever at least one such doc entry is contained, then umod runs in documented mode, otherwise undocumented.
In documented mode, a stream-lined pretty print of the original source (as navigateable HTML) will be inclduded in the "doc-files" directory. The API doc generated from the "Java doc comments" will frequently refer to this (reconstructed) source, and vice versa. See the pretty-printed umod source for DTDs as a instructive example.
The toplevel docu will be attached (as doc comment) to the model class;
additionally it will be written
into the file "package-info.java", iff umod
runs in non-monolithic mode (=package mode).
The toplevel docu will be followed by a second, synthesized comment which reflects
date and time of creation of the java sources, and the command line
parameters. This synthesized comment will always be attached, also in non-documented mode.
All docu attached to fields and classes will be followed by a second, synthesized comment which gives the link into the pretty-printed source.
If running in documented mode, all classes not having documentation text will insert a warning text into the generated Java API doc.
Whenever a new object instance is constructed, the value for every single field must be defined. This can be done by
For implicit defaults is holds that ...
An explicit default can be given to any field by notation mentioned above in Section 2.5.2. A field with neither implicit nor explicit default value must appear as a constructor argument and is called obligate field for the rest of this section.
A minimal constructor is a constructor the parameters of which are exactly the values for the obligate fields. The user can specify one minimal constructor explicitly, or more than one, if the sequential order of the field values yields different type signatures. If there is no single explicit minimal constructor, one minimal constructor is supplied by the umod compiler implicitly; its parameters are the values for the obligate fields in the source text order. 2
A special subcase of a minimal constructor is the 0ary constructor, iff there are no obligate fields. The automated generation is suppressed iff the user defines a 0ary constructor by explicit Java source, see Section 2.4.3. 3
All non-minimal constructors
must be declared explicitly by the user.
It the class defines no obligate fields, all constructors of the superclass are
inherited (which is different to Java).
Otherwise, the signatures of these constructors can be expanded eplicitly to
make them applicable. At least all obligate fields must be added.
(Such inheritance is only supported from a superclass which is a umod
class definition, not from
an external, imported class.)
The declaration of a constructor is done by pragmas following the field definition, as mentioned above in Section 2.5, by appending fieldPragmas . The syntax for constructor declaration is defined as ...
constructorPragma ::= C constructorNumber / sequentialOrder |
E.g. ...
TOPLEVEL CLASS A f int ! C 0/0 C 1/0 ; g OPT int ! C 0/1 ; |
Declarations of constructors use a pragma starting with the keyword "C".
The first number following the keyword is a number identifying the constructor.
The second number, after the slash, indicates the position of the argument
which will be used to initialize the field to which the pragma belongs.
These position indications only stand for their sequential order.
The numbers can increase with arbitrary step width.
Every combination of constructor and argument number may only appear once
with all field definitions of the same class definition level.
Every constructor must initialize all obligate fields.
NB: Since the character "C" in these pragmas is parsed as an "identifier", there must be whitespace between it and the first digit.
So the example above creates two constructors:
public A (int arg0, int arg1){ f = arg0 ; g = arg1 ; } public A (int arg0) { f = arg0 ; } |
Please note that constructor "1" can only be defined because field "g" has a default value (namely null).
Please note further that constructor "1" is the only possible minimal constructor and would be synthesized implicitly if not specified explicitly.
In contrast to java, constructors are inherited from (umod -defined) superclasses. This happens is different ways:
The special notation
superField ::= ^ identifier_lower fieldPragmas |
(as contained in classDef) allows to refer to a field of some higher level class definition for including its initialization into a new constructor:
A f OPT int g OPT int | B | | C name string ! C 2/0 ; ^f ! C 2/1 ; |
Also with this device, one particular field name may appear at most once in a constructor signature.
In java constructors are identified by their class signature. This can lead to overloading situations which cannot be resolved by a java compiler. E.g.
A a1 OPT int ! C 0/0 ; a2 OPT int ! C 1/0 ; |
umod does not warn you in these cases, but lets the java compiler discover the problem.
If the verbatim inserted java source (cf. Section 2.4.3) defines a 0ary constructor explicitly, then the implicit creation of such is suppressed.
All other conflicts between explicitly specified constructors and verbatim inserted java source are discovered not before running the java compiler.
For to use with the Paisley pattern matching library, special pattern handling methods can be generated, two for every class and one for every field definition. This in controlled by the command line switch --patterns, see Section 3.1.
The names and meanings of the generated methods are as follows:
class A extends B { protected F1 f1 ; protected F2 f2 ; public static Pattern<A> get_f1 (Pattern<F1> p){..} public static Pattern<A> get_f2 (Pattern<F2> p){..} // These pattern match the object iff the pattern argument "p" // matches the resp. field value. } class __Patterns { // ... public static Pattern<Object> cast_A (Pattern<? super A> p){..} // matches iff p matches and argument is instance of A public static Pattern<? super A> term_A ( Pattern<? super B> superpattern, Pattern<? super F1> f1_pattern, Pattern<? super F2> f2_pattern){..} // matches iff object matches superpattern (= a pattern defined for the // superclass) and all field values match the resp. patterns. // NOTE: one can treat any of these as "don't care" by setting them // to "Pattern.any" // ... } |
umod supports the code generation for different types of visitors. They are used in the traditional way, i.e. used as a superclass for a user-defined class, which overrides only those methods which deal with those parts of the model the user is interested in.
In a umod source, the declaration of a visitor involves two steps:
The different traversal orders are identified by numbers. They are declared within the class definitions, by appending pragmas to the field definitions. This is similar to constructor declarations (Section 2.8).
The syntax is
traversalPragma ::= V traversalNumber / sequentialOrder lrCode WS |
lrCode ::= L R |
So the first number after the leading "V" identifies the traversal order. The second number, after the slash, indicates the sequential order in which the corresponding field's contents will be visited, relative to the contents of the other fields of this level of class definition.
(As with constructor declarations, there must be white-space after the "V".)
As with constructor declarations, the position numbers are only relevant w.r.t. their sequential order; they can increase with arbitrary step width.
In contrast to constructor declarations, they are not related to position numbers used in the definition of the superclass. They only define the traversal order among the fields of this this level of class definition. The traversal order w.r.t. the superclass cannot be influenced by their selection, but is defined by the "kind" of generated visitor, declared as described below. So their sequence can start with an arbitrary numeric value.
Visitor and constructor defining pragmas can arbitrarily be mixed in the pragma section "!...;" at the end of a field definition.
Example:
A | B1 b1 B1 ! C 0/0 V 0/1 1/0 ; b2 SEQ B1 ! V 0/0 ; | | C1 c1 string -> C2 ! V 0/2 V 1/20 C 0/20 ; c2 OPT int |
Whenever the type of a field a visitor shall follow contains MAP or REL constructs, an additional lrCode can be inserted after the numeric code. It indicates whether to visit the left or the right side of each level of these binaray type constructor applications. The codes can enable leaves or whole sub-brances, as in
A | a1 (A -> int) -> SEQ (A <-> A) ! V 0 LL V 1 LL R ; |
where traversal code 0 will only select the references to A in the
domain of the domain, while 1 will visit additionally both sides of all
contained multi-maps.
Please note that the lrCode does not alter the sequential
order of visiting, but constitutes only an enabling condition.
(In general: Whenever you want to program an algorithm which depends on
a "local" consequence of a "global" property, as it is the case with
the sequential order of visiting, then the code is better maintainable
when realizing this explicitly order-respecting behaviour
locally. The required extra code is in most cases only a three-liner !-)
At the beginning of each umod definition file the generated visitors are declared, as already mentioned in the grammar rule modelLines above. The syntax for these declarations is ...
visitorDeclaration ::= VISITOR
int
identifier_upper
MULTIPHASE IS PRINTER IS REWRITER IS COREWRITER docEntry ; |
identifier_upper directly gives the name of the generated java class which
realizes the visitor.
The int indicates which traversal order is used by the
generated visitor. Of course, one and the same number can be
used for more than one visitors.
Then follows the optional indication of the visitor kind. If this is omitted,
a "simple kind" visitor is generated.
The common base class generated by umod
for all visitors is
one and the same abstract class. Let this be called
"BaseVisitor" in this and subsequent paragraphs. (Indeed, in the generated code its
currently named "MATCH_ONLY_00",
but this name is normally not visible to the user and may change without notice.)
For each class C of the model, BaseVisitor provides a method
public void match (C x){}.
Additionally, it provides a method
public void match (java.lang.Object x){}, which allows to
dynamically type an arbitrary object. Neither the class of this object
needs to be known statically, nor even whether it is an instance of any
model class at all.
Calling match(X) on an arbitrary object from the model is the most common way of activating a visitor. Internally, the specialization on the argument is performed explicitly by a chain of if(x instanceof C'){...}else-statements.
Whenever the most special model class
C' of the visited object is identified, this information
is carried over into the static type information of the visitors source code,
and the corresponding "action(C')" method is called by explicit casting.
There are default "action(C')" methods in BaseVisitor, which indeed doe
call the action(C'') method with the argument casted statically to its
superclass, or calling a special nomatch() method, if the class is a
top-level class of the model.
This raises a RuntimeException in case that the partial flag is not
set to true. This feature can be used to discover forgotten cases, while assuming
all cases covered.
The different kinds of derived visitors, automatically generated or user defined, differ in the contents of this action() method, as described in the following sections.
In case that the visited object is neither an instance of any model class, nor of an imported external class, then the method BaseVisitor.foreignObject(Object o) is called. This method must be overridden whenever a match() shall be applied also to objects of un-known classes. As a default, this method throws a Runtime Exception with the message that "o" ist not an instance of a model class.
Furthermore, for each field definition "f" which refers to an
aggregate (list, set, map) of instances of model classes,
the generated code (for the class which contains this field)
provides the method "public void descend_f(final BaseVisitor visitor){..}".
This code loops over the contents of the aggregate automatically and can be
used from anywhere for explicitly applying a visitor to all
elements contained in a given aggregate field.
If a field f carries a visitor code with number n and an
lrCode which selects only a subset of all branches, then
a descend_n_f(BaseVisitor) method is generated which respects this
selection.
Instances of "BaseVisitor" itself can be used for just classifying model objects by overriding some "action()" methods, while not providing any default descending behaviour.
The action() method of a generated visitor of the "simple kind" first calls the match() method on the sequence of fields, as determined by the selected traversal order, and then calls action() with the parameter casted statically to its superclass.
For example, assume a model definition like ...
MODEL example = VISITOR 0 V0 // simple kind VISITOR 0 V1 MULTIPHASE TOPLEVEL CLASS X A a1 X ! V 0/0 ; a2 SEQ (X->X) ! V 0/1 ; | B b1 X ! V 0/0 ; b2 SEQ (X->X) ! V 0/1 ; | | C c1 X ! V 0/0 ; c2 SEQ (X->X ) ! V 0/1 ; <<< JAVA static class Derived extends V0 { public void action (B x){ //do something super.action(x); } } $$ |
Then a typical control flow when calling "Derived.match(o)" with an argument which happens to be of class "C" can be depicted as follows:
Derived.match(o) / __________/ / V BaseVisitor.match(Object o) | V BaseVisitor.match(A x) ........................> V0.action(A x) | ^\ | | \ | | +-> match(x.a1); | | x.descend_a2(this) V | BaseVisitor.match(B x) ....> Derived.action(B x){ | | //do something | | super.action(x); | | } | | | +-----------------> V0.action(B x) | ^ \ | | . \ | | . +-> match(x.b1); | | . x.descend_b2(this); | | . this.action((A)x); V | BaseVisitor.match(C x) -- | -------------------> V0.action(C x) | \ \ \ \ +-> match(x.c1); \ x.descend_c2(this); -----------------------------< this.action((B)x); |
With these simple visitors the code re-usage (induced by a common superclass and thus a common "action()" method) does only take place after the specific descends have been carried out (by the more specific "action()" method).
Visitors of the MULTIPHASE kind do separate the code abstraction and the descending into different phases. The generated code basically looks like this:
public static class V1 extends BaseVisitor { protected boolean haspre=true; protected boolean hasdescend=true; protected boolean haspost=true; public void action (C x){ if (haspre) pre(x); if (hasdescend) descend(x); if (haspost) post(x); } public void pre (C x) {pre((B)x); } public void pre (B x) {pre((A)x); } public void pre (A x) {} public void descend (C x) { match(x.c1); x.descend_c2(this); descend((B) x); } public void descend (B x) { match(x.b1); x.descend_b2(this); descend((A) x); } public void descend (A x) { ... } public void post (C x) {post((B)x); } public void post (B x) {post((A)x); } public void post (A x) {} |
The variables haspre, hasdescend and haspost are "global switches" to enable these three phases independently. They can be overwritten by the derived visitor's code. E.g. they can be set to "false" once, at intialization time, or switched on and off dynamically during execution,
This more complex schedule allows specialization/inheritance of activities, without disturbing the inheritance w.r.t. descending. Again, this may become clear when looking at a graphical representation of the resulting control flow:
match(Object o) : | : .............. V : user-defined ... : : match(A x) --------------------+ : | ^ ^ : | | | : ...specialized V | | : processing . match(B x) pre(B x) descend(B x); +------------------ | ^ ^\ ^ | | | \=> call match() | | | | for fields on | | | | "B"-level | V | | | match(C x)--->action(C x)--> pre(C x); descend(C x); post(C x) \ \=> call match() for fields on "C"-level |
Please note that with this variant there is no inheritance between the action() methods. For code re-usage you always have to to program the pre() methods, possibly disabling both descend() and post().
Esp. when adding the MULTIPHASE behaviour to an existing "simple" visitor, then the exisiting inheritance between action() will be lost and replaced by the described three(3) separate inheritances !
There are two kinds of rewriters: A visitor declared as COREWRITER can
deal with cycles, but always creates copies, even if nothing changes.
A visitor declared as REWRITER cannot deal with cycles, but does
cloning only if necessary. It is most convenient for transforming
"term-like" data, and preserves sharing as far as possible.
Both kinds of rewriters are non-destructive: Whenever only a single value must be changed due to rewriting, a new copy of the containing object is created, altered and used for the further rewriting process.
The usage of both kinds of rewriters follows the pattern ...
MyRW rw = new MyRW(); rw.match(o); Object rewritten_object = rw.get_result(); |
For convenience this is the same as ...
Object rewritten_object = (new MyRW()).rewrite(o); |
...and there is also a typed variant ...
A original ; A rewritten_a = (new MyRW()).rewrite_typed(a); |
In case of a (non-co-)rewriter there is a second constructor
public RM (RW parent){..} |
which takes an existing rewriter as its argument. This is made the "parent" rewriter, and all cache look-ups will be passed to this parent, iff they are unsuccessful in the local cache. So things like "nesting and inheritance of scopes" can easily be modelled.
When the generated (non-co-)rewriter is called directly,
without overriding any method by the user,
it does an identity transformation, it does nothing.
But when the generated co-rewriter is called directly, without any
overriding by the user, it creates a deep copy of its argument.
All generated rewriters contain two methods for every class definition
C, namely action(C) and rewriteFields(C).
action() is called by the match() cascade, as defined for the
general case and described above in
Section 2.10.1.
It (1) performs the preparatory steps of rewriting, and (2) is not
specially concerned with the fields selected by the traversal order.
Both kinds of methods may be overwritten by the user. A certain contract must be kept, which is best explained by looking at the behaviour of the generated code.
For the (non-co-)rewriter, the interface to use consists of ...
boolean lookUp(Object) // sets result/ismulti flag and returns true, // iff an entry exists in the rewriter's cache, // or in that of its parent. void putToCache() // memorize the currently set result (single or multiple) void useCache(boolean) // sets whether the generated "action()" may read the cache Object original ; // must be readable and writeable by "action()" Object getResult(); // returns the most recently set result boolean isMulti(); // returns whether this is multiple (= a list) void revert(); // reset result to original void substitute(Object newresult); // set newresult as result void substitute_multiple(List<Object> newresults); // set newresults as multiple result void substitute_empty(); // set empty list as multiple result |
In case of the (non-co-)rewriter, the generated method RW.action(C c) does the following:
The method rewriteFields(C c) generated for every class C performs the non-generic, field structure specific rewriting. Its argument is the clone of the object. It first calls rewriteFields((D)c) for the superclass "D". Then it saves the current value of "result" into a local variable. It assumes that this points either to the original or to the clone, depending on whether changes to any field have happened in the super-class(es).
Then for all those fields which are selected by the chosen traversal order, match() is called on their contents.
Whatever this method returns in the variable result is compared with the original value contained in the field. Iff a change has happend, the result is stored into the field of the clone, and the overall local result of the method is overwritten to point to the clone.
Finally, after all fields from the corresponding traversal selection have been rewritten, the local result is copied to result, for communicating it to the caller (which may be a rewriteFields() of a sub-class or the action()method of the same class).
If a field value is an aggregate (i.e., is of a "container type"), (1) a temporary new aggregate object ist constructed. Then (2) match() is called sequentially on the contained objects, and (3) result is step by step treated accordingly, i.e. stored into the temporary aggregate. Here also the occurance of changes is monitored in a similar was as described for simple values. Whenever a change happens in an arbitrarily deep nesting level, then the local result is re-adjusted to point to the clone.
Any user-defined, overwritten method may behave similar, e.g. define the result of the rewriting by calling the methods from the interface above.
A user-defined, overwritten method may return more or less than one object by calling substitute_multiple(List<Object>) or substitute_empty(). This list of objects will be inserted in the nearest enclosing list or map structure. Up to this, multiplicity distributes!
E.g. having a structure and code like (in a symbolic notation!)
A a SEQ (B * C) action (B b){ original = b ; substitute_multiple(new List(b1, b2)); action (C c){ original = c ; substitute_multiple(new List(c1, c2, c3)); a1 = { (B1,C2) } |
...this will yield ...
rewrite(a1) = { (b1, c1)(b1, c2)(b1, c3), (b2, c1)(b2, c2)(b2, c3) } |
Please note that the level which calls "match" always needs the pointer value comparison original!=getResult() for determining whether a relevant change has happened. So it is part of the contract of action() to set both values before returning!
The operations of the co-rewriter are much simpler. The interface is
boolean lookUp(Object) // sets the variable "result" and returns true, // iff an entry exists in the co-rewriter's cache, void putToCache(origObj, newObj) // memorize newObj as the rewriting result of origObj // set the value of result=newObj // This can ONLY BE CALLED ONCE for each key Object getResult() // returns the most recently set result rewriteDone(Object key)// restores the "result" value to the clone of "key" // (this is called before returning from the // rewriting method, mostly "action(Object)") |
The generated action(o) method ...
The method rewriteFields(clone) calls match(clone.get_f()) on all selected fields, as in the non-co-rewriter case. But it needs not monitor whether changes occur, since all objects are copied anyhow.
When the user overrides the generated action(Object o) method, the user's code should ...
When rewriting the field contents of SET and SEQ type, the rewriting is done element-wise, from "left to right", in the case of SEQ. But of course a declarative, sequence independent style of coding is always more robust and better readable.
This is esp. true when rewriting MAP and REL type values. The Java libraries which realize the corresponding data structures have imperative behaviour: E.g. the map class has overwriting semantics: Whatever is put LAST determines the current value. Of course this can make programs very hard to understand.
Here our approach is more declarative, and independent of this sequential order. Consider the following diagram:
M ---------------------> | | | L | R V V =====================> M' |
Let "M" be the mapping which shall be rewritten. It can be a map or a
multi-map.
Then rewriting is applied to the domain of M, yielding a new, auxiliary
mapping "L", and then to the range of M, yielding "R".
When we alllow "substitute_multiple()", then one single (1) element can be
re-written to more than one (>1) elements, and we get multimaps for L and/or R.
Otherwise we get maps.
As can easily be seen in the diagram, the result M' of rewriting M is defined to
L-inv o M o R
If the declared type of M (and consequently M') is a multimap, this works in any case.
If the declared type of M (and M') is a map, then L-inv and R must be maps, i.e. L can be a multi-map, but must be injective, while R must be a map. If these conditions are violated by the user-defined rewriting rules (which define L and R) when applied to the current data M, then an exception is thrown. The result is independent of any sequential order of API calls.
"The invention of null was a billion dollar mistake".
Of course you should avoid to use it, whenever possible. One major achievement of umod is eliminate illegal null references, and legal references must be declared explicitly by the type constructor "OPT", se Section 2.6.3.
null behaves in a very irregular way.
On the static, text level, there are always different "types" of null,
so we could have supported
match((A)null) and action((A)null).
The latter could even make a static cast to the superclass, say "B",
symbolically written as
action((A)null){
match((B)null);
}
But of course you cannot descend to any field in the null case, so at least when action() starts doing so, the value null must be treated specially.
But, even worse, you cannot store all these different null values to a cache! They all are the same, when seen as a runtime value! (Of course you COULD introduce auxiliary wrappers which tag all these different nulls. But this would be a lot of work for a construct which, as demonstrated above, is better avoided anyhow!)
So we decided not to visit nor to rewrite any null value! As soon as any visitor or rewriter finds a value (contained in a field or an aggregate) to be ==null, it does not do anything but leaves it unchanged.
This is not really a problem, because you can treat the (rare!) cases when field values may be ==null explicitly, one step earlier, when visiting the containing object itself! This is much more sensible also because at this point the context of the null value is still known. A (theoretically possible, but not implemented) visitation of a null value would require explicit passing of additional information, anyhow, to be of any worth.
Each visitor-based processing code is somewhere in the middle between "declarative" and "imperative" style of programming. Being determined by the selection of the methods overriden as well as by the processed data, the outcoming flow of control can be quite surprising. Therefore an interactive debugging is supported by the generated code.
Every visitor/rewriter has a field
protected java.io.PrintStream _visitor_debug_stream = null; |
Whenver this value is !=null, some intermediate steps
((currently: only the replacing of the result by the clone AND
sub-change in MAPs in the (non-co-)rewriting process))
are dumped to this PrintStream.
When the command line switch "--visitordebug " is set to true for code generation, more debugging code will be included. see Section 3.1.
By setting the command line switch "--visitoroptimize" to true (see Section 3.1), all generated visitor and rewriter code will incorporate the following optimization tactics:
This optimization should make sense with data models which decompose into disjoint spheres with only few connections, for all those visitors/rewriters which process not all of these spheres. These results were presented in [lt11a] on the "ICMT 2011"; for further info please refer to the paper or to the slides of the talk.
Visualization of a data model (or parts thereof) is supported by different means.
A | B1 b1 B1 b2 char | | C1 c1 MAP string TO C2 c2 int TOSTRING JAVA return b1.toString().substring(2)+">>"+c1 ; $$ FORMAT "c2 '==>' b1 ; c1" |
The TOSTRING directive can appear in a class definition at any
position like a field definition.
It is followed by a fragment of java source text enclosed in "JAVA...$$", like
all other verbatim java.
This source text fragment must consist of a statement or a
sequence of statements (in the java sense).
It will be verbatim encapsulated in the methode declaration
public String toString(){ ... }, so it has to end with the
type-correct return statement.
W.r.t error reporting, the same rules apply as with other verbatim java source text, cf. Section 2.4.3 above.
The FORMAT directive can appear in a class definition at any position like a field definition. The FORMAT keyword is followed by string constant which contains a format description. This is compiled into a method of a dedicated visitor, which constructs a Format object for visualizing an instance of this class.
The syntax of the format directives is an instance of the generic syntax for format front-ends.
The Format object which represents an object a of class A of a model class M can be generated by calling ...
class A { public Format format() {...} } |
...which is a wrapper for ...
class M { public static Format toFormat (Object o) {...} } --or-- class M { class Formatter { public static Format process (Object o) {...} } } |
...which is a wrapper for (the protected method!) ...
new M.__Formatter().toFormat(Object o) |
Further there is a static function with a mode parameter. It first set a global variable in the Formatter object which normally defaults to 0(zero), and which is used in the $switch $mode{..} expressions as described in the format front-end documentation.
class M { public static Format toFormat (Object o, int mode) {...} } |
Whenever a format directive needs to embed a format for an object of a model class for which no FORMAT directive is given, a call is compiled to ...
protected Format M.__Formatter.defaultformat(Object o){ result = Format.literal(String.valueOf(o)) ; } |
Whenever a format directive needs to embed a format for an object which is not part of the umod model (i.e. whenever an field with a type defined by an EXT import appears in the format directive), a call is compiled to ...
protected Format M.__Formatter.foreignObject(Object o) |
This method tests whether o implements <METATOOLS>/format/Formattable , and in this case format() is called, otherwise Format.literal(String.valueOf(o));
These both methods can be overridden by deriving a new formatting visitor from M.__Formatter.
Whenever the user wants to change the values of nulltext, mode, format_empty and/or default_indent, he/she cannot use the above-mentioned wrappers, but (1) has to create an explicit instance anyhow (e.g. by formatter = new M.__Formatter()), (2) assign to these public fields, and (3) create the format by calling format = formatter.process(o) explicitly.
If activated by a command line switch (see Section 3.1), code is generated for a java swing tree representation.
For a certain model definiton Mymod and a reference to an object myobj, a swing tree is generated by calling
JTree tree = new JTree(new Mymod.__TREEGEN__().growRoot(myobj)) ; |
For convenience, there is a runtime class offering the static method umod.runtime.SwingBrowser model2swingpanel(...) , which opens a top-level window with scrolls bars etc. and all you need for browsing a umod model via a GUI.
Esp. for debugging purpose, a dedicated visitor can be generated which ia declared "IS PRINTER", cf. the syntax of visitor declarations in Section 2.10.
The constructor of each such visitor takes a java.io.PrintStream as its only argument. Whenever the match() method of this visitor is called for a certain object, this object is printed to this stream as follows:
When a field has to be printed which is of aggregate type, then (1) a new line is opened for every item in the aggregate, and (2) the name of this field and the current index position is printed in "[....]" before the call to match() of the current value.
Plese note that this is currently still a primitive implementation, and cannot deal with cylic data !
The basic umod xml-serialization is based on rules which try to combine simplicity, readability and non-redundancy.
Additionally, there is a special empty reference element which realizes (by an "idref" attribute) a pointer to some object defined at some other place (i.e. earlier when writing, or earlier or later when reading). The code generated by umod starts every serialization of objects with one(1) single certain root object, descending in a depth-first discipline. Therefore back-patching is never required in this case, but the de-serialization code does support it. The first reference to an object always leads to an "in-place" dump of the complete object structure, as defined above. This is fine in case of objects which are only referred to once. It is esp. fine for human readers in case of "algebraic" objects, which do not have an "identity" beyond there structure. In this case, the usage of "id", "idref" and "reference" would only serve as a kind of shorthand notation. This is different with non-algebraic objects, where identity (and "self-identity" and "non-identity" !-) does carry more semantics than the collection of field values!
The writing out of an instance of model "M" is started by some code like
final java.io.PrintWriter p0 = new java.io.PrintWriter(outstream); final eu.bandm.tools.util.ContentPrinter cp = new ContentPrinter(p0); final eu.bandm.umod.runtime.XMLconfiguration conf = new XMLconfiguration(); final M.SAX_Writer dumper = new M.SAX_Writer(cp, conf); dumper.match(myTopLevelOjectForWriting); |
The concrete tag strings and attribute names to use are configurable and are initialized via the XMLconfiguration object.
When writing, no errors should occur, but failures. These are reported via thrown exceptions.
The reading works as follows:
FIXME MISSING |
On reading of course errors can occur, esp. when the external representation is not "valid" w.r.t. the implicit syntax rules of the DTD, which reflect the object structure of the umod model.
The options for the current implementation of the umod tool are as follows:
( definitions from file ../../src/eu/bandm/tools/umod/umodOptions.xml )
Model substantial information | ||
-0 | --sourceroot | uri |
file system directory which is the root of the source code hierarchie | ||
-1 | --packagename | string |
name of the package of the generated model class | ||
-2 | --sourcefiles | (uri)+ |
source files containing the model declaration | ||
-m | --monolithic | bool(=true) |
generate a single class instead of a package | ||
-p | --targetclasspath | (uri)* |
classpath to search for existing class files, if different from current classpath. NOT YET ACTIVE!! | ||
Code generation modification | ||
--floatNotDouble | bool(=false) | |
implement the umod type "float" not by doubles but by floats. | ||
-G | --getterfunctions | bool(=false) |
whether to generate getter function objects DEPRECATED | ||
-S | --setterfunctions | bool(=false) |
whether to generate setter function objects DEPRECATED | ||
-x | --xmlgen | bool(=true) |
whether to generate SAX de-/serialization | ||
-s | --swingtree | bool(=false) |
whether to generate a swing tree visualization | ||
--patterns | ||
Generate methods for lifting and applying "Paisley" patterns to model elements and their fields | ||
--constructorsPre20181214 | ||
Use old constructor code generation scheme, prior to 20181214 redefinition. | ||
-V | --visitordebug | bool(=false) |
whether to add logging code to the generated visitors | ||
-v | --visitoroptimize | bool(=false) |
whether to cut superfluous descends | ||
--visitorcompile | bool(=false) | |
whether to cut superfluous descends by code generation | ||
--rewriterwarnings | ||
whether all warnings from rewriter code shall be printed, even when the code generation algorithm takes the responsibility. | ||
-w | --linewidth | int(=80) |
line width for the formatting of the generated source code | ||
Attention --setterfunctions and --getterfunctions are deprecated.
Currently more than one input files can be supplied. All these files must follow the same syntax, as described above, and use the same module name. All contained declarations will be processes as if they were contained in one single source file. So this allows the separation e.g. of documentation and declaration, or of different trees of the forest. But is only a provisionary means, until real modularization and parametrization will be introduced.
The basic philosophy is to delegate most error messages to the subsequent
step, i.e. the execution of a Java compiler. Some problems cannot be
detected without detailed analysis, and we do not want to re-implement things
done by the Java compiler anyhow.
This implies that the error messages generated there must be "calculated backwards"
to find their cause in the umod
source.
Nevertheless basic errors and warnings will be generated by the umod
tool
on its own.
(More to come)
1 For example: the "ORDERED" keywords initiates the generation of a "compareTo(..)" Java method. The generated code first compares the constructors, then steps through the field values. In case of object references, simply recursive calls to "compareTo(..)" will be generated. umod does not guarantee that such a method does exist. This is because this test is quite different when the type of the field is a reference to (a) a class of the same model, (b) to an external class given in binary form, or (c) given in source text form. So we deliberately leave the test to the further processing.
2
Please note that this implicit constructor may lead to overloading conflicts in
between constructor type signatures, eg. in the case
TOPLEVEL CLASS
A
f1 int ! C 0/0 ;
f2 int
f3 int = 3 ! C 0/1 ;
See Section 2.8.1.
3 In an older implementation implicit supply of minimal constructors had been restricted to 0ary constructors. A source text which failed to define at least one constructor covering all obligate fields (here and of all superclasses) had been rejected. This behaviour is still available by the command line switch --constructorsPre20181214 t
metajava | bandm meta_tools | option |
made
2025-01-09_11h55 by
lepper on
happy-ubuntu
produced with
eu.bandm.metatools.d2d
and
XSLT
FYI view
page d2d source text