[all pages:] introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]



All pages: introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]



go one page back go to start go to start go one page ahead
metajava bandm meta_tools option

umod --- an Automated Model Generator



(related API documentation: package umod.runtime   )


1          Purpose and Way of Operation
2          Input Notation Syntax and Generated Java Classes
2.1          Model Declaration
2.2          Import Declaration
2.3          Enumerations
2.4          Class Hierarchy
2.4.1          Extending and implementing external classes.
2.4.2          Declaring Classes as ABSTRACT and ALGEBRAIC
2.4.3          Verbatim Java Source Text in Class Definitions
2.4.4          Per-Class Generated Methods: doClone() and initFrom().
2.5          Field Definitions
2.5.1          Field Names
2.5.2          Initial Field Values
2.5.3          Per-Field Generated Methods
2.5.4          Abstract Fields, Generalized Setter and Getter Methods
2.5.5          Pragmas for Field Definitions
2.6          Types
2.6.1          Primitive Types
2.6.2          Reference Types
2.6.3          Constructed Type OPT
2.6.4          Constructed Types in General
2.6.5          Special "Un-Curry-ed" Treatment of Cascaded MAPs
2.6.6          Overloading of a null function result in standard java runtime libraries
2.7          Documentation
2.8          Constructors and Default Values for Fields
2.8.1          Ambiguity in Overloading Resolution for Constructors
2.8.2          Defining Constructors with Verbatim Inserted Java Source
2.9          Pattern Handling Methods
2.10          Visitors
2.10.1          Common Base Class and Calling of Visitors
2.10.2          "Simple Kind" Visitor
2.10.3          Multiphase Visitor
2.10.4          Rewriters
2.10.5          Rewriting of Aggregates
2.10.6          Visiting and Rewriting "null"
2.10.7          Diagnosis
2.10.8          Optimization
2.11          Visualization
2.11.1          User-Defined Visualization by the Modifier TOSTRING
2.11.2          User-Defined Visualization by the Modifier FORMAT
2.11.3          Automated Swing Tree Generation
2.11.4          User-Directed Visitor-Based Dump Routines
2.12          XML Encoded Serialization/Deserialization
3          Using the umod Tool
3.1          Command Line Options
3.2          Splitting the input text into input files
3.3          Error Messages

^ToC 1 Purpose and Way of Operation

umod compiles a data model from a high-level definition language into java code. The input language is designed for utmost compactness. Although propietary, it closely follows mathematical notation.

Java is a rather verbose language. So the definition of large data models is a rather tedious, monotone and error-prone task, requiring lots of redundant typing. umod automates this typing process, by generating from an compact, non-redundant notation the vast amount of code which is needed to realize ...

  1. the data type definitions,
  2. methods for creating, modifying and inquiring the model,
  3. different kinds of visitors,
  4. methods for visualization,
  5. methods for a "soap"-like (de-)serialization [soap].

Some of these features can be enabled separately, according to the user's need.

In contrast to other similar projects, e.g. pizza, umod is not realized as a pre-processor to arbitrarily arranged java source files. Instead, it is a compiler which generates one(1) model from one(1) single, central source file. (This is true conceptually; technically see Section 3.2.)

This approach has both advantages and disadvantages. E.g. you cannot use the convenient umod notation for complicated nested typed collection classes ad hoc in any source file, but only in such a central model definition file.

As an advantage you have one central compact definition of the main model of a project, --- mostly not much more than one single screen page! This turned out to be really helpful when developing the further code or writing documentation.

Please note:
1) Some consistency conditions are easily checked by the java compiler, but only at much higher costs by the umod tool. Therefore in most such cases the corresponding checks are left to the former, and error messages will come from the attempt to compile the generated source text, not from umod generation process. This requires some experience and "type checking" skills from the user (but still much less than the C++/STL error messages !-) 1

2) The generated code can always be by-passed by inserting verbatim java code. So the data models generated by umod are widely, but not totally fool-proved. The usage of umod does not replace responsibility and survey.

^ToC 2 Input Notation Syntax and Generated Java Classes

The input syntax for umod definition files is somehow graphic-oriented, for the sake of maximal compactness and clear arrangement.
The umod definition file is intended to be also used as a documentation, esp. for the programmer during their coding work.

A typical example, which also demonstrates most of the syntactic means described in the following, is the umod model for DTDs, --- see its pretty-printed umod source.

^ToC 2.1 Model Declaration

For the basic data types, i.e. the elements which make up the data model, umod supports two flavours of definitions:

  1. either "class" definitions, using a graphic oriented input format, and supporting inheritance/specialization,
  2. or "type" definitions, denotatend by a pure term notation and not supporting specialization. ((THE "TYPE" CONSTRUCT IS CURRENTLY NOT YET WELL SUPPORTED!))

Additionally, there can be

  1. definitions of visitors,
  2. definitions of simple interfaces,
  3. definitions of simple enumeration types,
  4. import declarations of external, predefined classes,
  5. and documentation text for most of these constructs.

After their declaration, most of these entities are referred to by an identifier.

identifier ::= identifier_lower | identifier_upper
identifier_lower ::= lowerCaseAlpha ( lowerCaseAlpha| upperCaseAlpha| digit| _ ) *
identifier_upper ::= upperCaseAlpha ( lowerCaseAlpha| upperCaseAlpha| digit| _ ) *

There is only one single name space for the identifiers of all these different categories, and duplicate usage of an identifier will yield an error.

All umod identifiers must start with a Latin character, and may be continued with characters, Arabic digits and the underscore "_". Consequently, all generated Java objects with an identifier starting with an underscore do not correspond to a umod definition, but are additional and ancillary.
In umod there is a difference between lower and upper case identifiers: Class, enumeration and visitor names must start with an upper case, field names must start with lower case. External declarations, enumeration items and types can be lower or upper case.

The top level syntax of each umod source file contains the declaration of the model name, like ...

MODEL myModel =
  .. 
  ..
END MODEL

The underlying grammar can be described as ...

umodFile ::= MODEL identifier_upper = ( docEntry ) ? modelLines END MODEL
modelLines ::= ( visitorDeclaration ) * ( importDeclaration ) *
        ( typeDef| enumDef | toplevelClassDef | extendClassDef | interfaceDef ) +

The umod compiler then generates source files of a certain package (in the java sense.) The name of the package must be given as a command line parameter when calling the tool, cf. Section 3.1 below.

For each class definition and type definition the umod compiler generates one single java class.
Additionally, it generates sources of further java classes, realizing the different purposes mentioned above and explained in detail further below, e.g. visitors, serializers, visualizations, etc.

The umod compiler can work either in package mode or in monolithic mode. This mode also is selected by a command line switch.

In package mode all generated classes are contained in the named package, and each class is contained in its own source file as a top-level public class. Additionally, a model class with the name of the model is created, which only contains some central entry points and attributes valid for the model as a whole.

In monolithic mode, this model class is created in the same way. But all other generated classes are realized as static inner classes (in the java sense) of that one model class.

Attention: Type definitions "should work", but have not yet been tested thoroughly and are not covered in the following documentation. Up to now, our extensive applications of umod did not make them appear really necessary.

^ToC 2.2 Import Declaration

An umod model can refer to any existing java class for defining field types and data types, and for declaring them as superclasses to and interfaces implemented by a model class.

This is prepared by an import declaration :

importDeclaration ::= EXT ( SOURCE ) ? identifier = pathToClass
pathToClass ::= identifier ( . identifier ) *

The identifier must begin with an ascii character, either upper or lower case, and can contain ascii charactes, digits and the underscore "_".

The pathToClass is the fully qualified class name (in the jave sense). In the rest of the data model declaration this identifier will be used to refer to the specified external class, e.g. when declaring superclasses, interfaces or field types.

If the modifier SOURCE is not given, then the class must be reachable and is loaded by the umod compiler.

umod uses a second, dedicated class loader for this purpose. In case that the class loading context in which umod itself is running, and the future context of the generated classes differs, than this class loader can be parametrized with a command line switch, cf. Section 3.1.

The modifier SOURCE must always be added whenever the external class cannot be loaded at all during the run of umod . In this case the umod compiler will insert references to the class without testing whether it is reachable. The user has to care that the java compiler will later reach either the source or the class file.

^ToC 2.3 Enumerations

As auxialliary class, as field values, umod supports the easy definiton of simple enumeration types.

enumDef ::= ENUM identUpper ( docEntry ) ? = enumitem ( , enumitem ) *
enumItem ::= ident ( docEntry ) ?

The identifiers can be used in the type language for fields, as described in Section 2.6.

^ToC 2.4 Class Hierarchy

The syntax of class definitions is somehow "graphic oriented", for sake of compactness. The fragment of input text ...

TOPLEVEL CLASS
   A
   | B1
   | | C1
   | | C2
   | B2

...defines "graphically" a generated hierarchy of classes, namely A extending java.lang.Object, B1 and B2 extending A, and C1 and C2 extending B1.
In contrast to Java , Class names must begin with an upper-case letter. Character sequences which are used by the umod front-end syntax as keywords, as described in this document (e.g. "TOPLEVEL", "CLASS", "JAVA", ...) are rejected as identifiers. This rejection is (currently) implicit by the parsing process, and reported as a syntax error.

Please note that, inspite of the graphic appearance, the syntax definition and the implemented parser of umod do not treat line-breaks as significant.
The same example could have been written as

 A | B1 | | C1 | | C2 | B2 

For sake of clarity the definitions can be broken into sub-graphs, which are automatically combined, like:

TOPLEVEL CLASS
   A
   | B1
   | B2

EXTEND CLASS
   B1
   | C1
   | C2

The appearance of each class definition in the first block should leave out most details, so that this top-level class tree can serve as a dcoumentation for the basic skelton of a model. Then more details may be added later, in one or more "EXTEND CLASS" blocks.

Every field definition (see Section 2.5) can be placed individually at either appearance of "B1". The same holds for every modifying attribute (see Section 2.4.1). Please note that an extensive usage of this feature will result in less instead of more clarity.

The syntax of class definitions can be described more formally as ...

toplevelClassDef ::= TOPLEVEL CLASS classdef
extendClassDef ::= EXTEND CLASS classdef
classDef ::= identifier_upper classModifieres ( docEntry ) ?
        ( fieldDef | fieldDoc | superField ) * subClassesDef
subClassesDef ::= ( | ) + classDef

^ToC 2.4.1 Extending and implementing external classes.

Following the class name there can appear different modifying attributes, defined by ...

classModifiers ::= ABSTRACT ? ALGEBRAIC ?
        ( EXTENDS idenifier ) ? ( IMPLEMENTS idenifier + ) ?

Their usage is explaind by the following examples:

TOPLEVEL CLASS
   A EXTENDS SomeExternalClass 
   | B1 IMPLEMENTS Interface0, Interface1

Only toplevel class definitions can be given an explicit superclass by the EXTENDS keyword. It no such is given, then a toplevel class extends java.lang.object.
Only external classes can be used for such a superclass declaration.

Every class can be given a list of interfaces it implements by the IMPLEMENTS keyword.
Only external classes can be used for interface declaration.

^ToC 2.4.2 Declaring Classes as ABSTRACT and ALGEBRAIC

TOPLEVEL CLASS
   A ALGEBRAIC
   | B1

Every toplevel class can be declared ALGEBRAIC. This enforces "algebraic semantics" on the equality relation. Consequently, in the generated code an equals() method is included which defines equality by comparing all field contents and ignoring missing pointer identity. A hashcode() method is constructed accordingly.

The algebraic property is automatically distributed to all sub-classes of the class it appears with.

Currently it is not defined what an ALGEBRAIC modifier does when applied to a non-toplevel class, and the authors can not imagine what this should mean !-)

Currently it is not defined what happens when a float field appears in an algebraic data type, because floats do not have a precise and canonical notion of "identity". There must be some "epsilon" value to implement algebraic identity, and we do not yet know where to get this from.

Please note that the algebraic semantics of class definitions which have fields of compound types (sequences, sets, maps, etc.) rely on the correct implementation of the "equals()" method in the corresponding runtime libraries. For those employed as default by the automatically generated code, this is guaranteed.

Even more important: the objects which realize the values of these fields, i.e. the employed collection objects are still modifiable! This is of course not optimal, since in-place update are forbidden and would better be prevented by the generated code. But the alternative would have been to choose as default the copy of these container objects into non-modifiable variants. But since this must be done fully recursively, it can come out to be very expensive, so we decided that the programmer stays responsible that all collection objects which are referred to from any algebraic umod value indeed stay un-altered!

Every class can be declared ABSTRACT. This (a) is translated into an "abstract" declaration for the java compiler, and (b) some parts of the code will not be generated for this class, e.g. constructors.
The ABSTRACT attribute does only apply to the class level it appears on explicitly.

^ToC 2.4.3 Verbatim Java Source Text in Class Definitions

You can insert free java source text into each class definition. This text will be inserted "verbatim" into the generated java class. It is subject to syntax check and re-formatting. This is implemented by calling the GeneratedClass.addDeclarations() method from the metajava model.)
It is not subject to context check or type check. Therefore some kinds of error will be reported by the subsequenct attempt to run the java compiler.

TOPLEVEL CLASS
   A
   | B1 
     JAVA public String myfunction(int i){ return ""+this+i;} $$
   | B2

Alternatively, you can can insert free java source text into the java source generated for the top-level, model representation class:

TOPLEVEL CLASS
   A
   | B1 
<<JAVA public static String myfunction(int i){ return ""+i;} $$
   | B2

Please note that also this construct has to appear inside the class hierarchy, in spite of resulting in top-level code. It can not be placed on the syntactic top-level of the umod input file.

Both kinds are esp. useful for declaring instances of model classes, because umod itself has no language constructs on the instance level:

TOPLEVEL CLASS
   A
   | B1 
<<JAVA public static final B1 CONST_B1 = new B1();
   | B2

A special case is the toString() method (see also Section 2.11.1), which can be defined by simply giving the method body:

TOPLEVEL CLASS
   A
   | B1 
        TOSTRING JAVA return "[B1:"+this+"]"; $$
   | B2

((Remark:
The syntax JAVA...$$ is certainly not very pretty. A markup of this kind is required, because these java fragments are by-passed already on the lexer level of the employed antlrC lexer/parser architecture. ))

^ToC 2.4.4 Per-Class Generated Methods: doClone() and initFrom().

For each generated class "C" umod provides ...

  class C {
    ... 
    public C doclone(){
    }
    public C initFrom(Object o){
    }

"doclone()" returns a shallow copy of the object it is called upon.

"initFrom()" copies the values of all those umod defined fields from the argument object o to the object this , which are defined on the level of the "most special common superclass" of both objects, and on all levels above.

^ToC 2.5 Field Definitions

Field definitions are interspered into the class definitions. They follow the syntax

fieldDef ::= identifier_lower ( abstrfield ) ? type ( defaultValue ) ?
        ( fieldPragmas ) ? ( docEntry ) ?
fieldPragmas ::= ! ( traversalPragma | constructorPragma ) + ;

Field name and type must be given.

For every field declaration (in the umod source) the java class generated for the umod class definition will be given a corresponding field (in the java sense), together with a zoo of getter, setter and auxiliary functions, depending on the type of the field.

E.g.:

   A
   | B1
     b11 int = "12" 
     b12 int = "my.package.Global.function(\"string\")"
   | | C1
       c1 MAP string TO C2 
   | | C2
   | B2

...defines an attribute for the class B1 which is named b1 and has a simple Java type of int, and an attribute of class C1 which is named c1 and is a aggregate type, namely a map from string values to references to objects of class C2.

^ToC 2.5.1 Field Names

In contrast to Java ...

  1. ... a field name must begin with a lower case letter,
  2. ... a certain field name may not appear more than once in a single upward path of class definitions. I.e., the attempt of "shadowing" is treated as an error.

The generated java classes will contain a field with the given name. There is no mangling of field names.
Therefore all lower-case identifiers which are reserved words in java, are not allowed as field identifiers and will be rejected (in contrast to forbidden class names, see above, which do not pass through the umod parser) explicitly, during the context analysis phase.

The fields themselves will nevertheless not be accessible. In package mode they will be declared "protected", and in monolithic mode they will be declared "private".

Instead, code for getter and setter methods will be produced. This guarantees certain integrity conditions, esp. strictness of non-opt values, see Section 2.6.3 below.

^ToC 2.5.2 Initial Field Values

Initial values can be given to each field. They have to be denotated as string constants in double quotes which contain directly inserted java source text (see example above). Double quotes and backslashes can be used in the contained text by escaping them with backslashes.
The java source text is undertaken a syntax check, but no type check. The syntax check is done by calling
<METATOOLS>FormatClosure.expression() from the metajava model.
Most errors in these initialization text will be reported by the subsequent run of the java compiler.

^ToC 2.5.3 Per-Field Generated Methods

For each field declaration

EXTEND CLASS C
               f T 

...there will be ...

  class C {
    ...
    public T get_f() {...}
      // returns the current value
    public boolean set_f(T arg) 
      // raises umod.runtime.StrictnessException iff arg==null
      //   and T is not "OPT xxx"
      // returns true IFF a change is caused by the assignment,
      //   ie. oldvalue!=newvalue
      {...}
    ...
  }

If the command line switch (see Section 3.1) "--getterfunctions" is set to true, there will be additionally ...

  class C {
    ...
    public static final ops.Function<C,T> get_f 
      = new ops.Function<C,T>{ public T apply(C c){return c.f;}} ;
    ...
  }

If the command line switch (see Section 3.1) "--setterfunctions" is set to true, there will be additionally ...

  class C {
    ...
    public static final C opx.Consumer<C,T> set_f
      = new ops.Consumer<C,T>{ public C consume (T arg, C state){
          state.set_f(arg); return state; }} ;
    ...
  }

Both these objects are very convenient for using the elegant way of programming offered by metatools' "ops" package.

^ToC 2.5.4 Abstract Fields, Generalized Setter and Getter Methods

By inserting an abstrfield construct into an fieldDef, a "virtual" field can be declared:

abstrfield ::= ABSTRACT ( GETTER | SETTER ) ?

The ABSTRACT keyword alone causes the generation of both a getter and a setter method. No field is generated, but these functions rely on the content of a field defined on a subclass level. With the keywords GETTER and SETTER you can restrict the generation to that method.
The definitions of the concrete fields may employ a more specific type, iff the type is a model element class and the more specific type a sub-class thereof.
The concrete fields do not need to be present in every branch of the sub-class tree.

The different cases are as follows:

       B
       | B1
       | B2

       A
          f     ABSTRACT B 
          g     ABSTRACT OPT B 
       ¦ A1
          f     B1
          g     B1
       ¦ A2
          f     B2
          g     OPT B2
       | A3

This code will lead to getter and setter functions in A, A1 and A2, with different Java signatures:

     class A  { ... B  get_f(); boolean set_f(B);  B  get_g(); boolean set_g(B); ...}
     class A1 { ... B1 get_f(); boolean set_f(B1); B1 get_g(); boolean set_g(B1); ...}
     class A2 { ... B2 get_f(); boolean set_f(B2); B2 get_g(); boolean set_g(B2); ...}

For A1 and A1, the getter functions will simply return the current value of the corresponding field.
"A3.get_g()" will return null as the default value for every OPT type.
"A3.get_f()" will throw an UnsupportedOperationException, since the value can not be delivered, and there is no global default.

With the setter functions its a little bit more complicated, but also quite canonical:

   x.set_f(a) [/x.set_g(a)]

           a.class==     null            B1       B2 
   x.class==
        A1               XPstrict        OK       XPtype
        A2               XPstrict [/OK]  XPtype   OK
        A3               XPunsp          XPunsp   XPunsp

 OK = store value and return change flag, as usual
 XPunsp = a special Unsupported Operation Exception 
 XPstrict = the special umod Strictness Exception
 XPtype = "normal" java runtime typing error, "class cast exception"

This shows that this feature does loose some static type safety. But in practice this turned out to be not really significant, and the benefits of more specialization in the subclasses and elegant abstraction pay out.

A common practice is to define an abstract field on the level of some superclass and to realize its getters and setters by a umod field definition in some of its sub-branches, but by verbatim given Java methods in some others, see Section 2.4.3.

It is not necessary for non-algebraic types that a declared abstract getter function is also implemented in every subclass, as long as it is not used during program execution. (But this seems bad programming style ?-)
But it is necessary for algebraic types, because their "hashcode() method does call the getter function for each abstract field and not for the concrete implementations.

^ToC 2.5.5 Pragmas for Field Definitions

The pragmas are used to control the generation of constructors and visitors, and are explained in detail together with these, cf. Section 2.8 and Section 2.10.

^ToC 2.6 Types

The same kinds of type expressions can be used to declare the type of a field of some class, or for creating Java classes on their own with a type definition statement. In both cases the syntax is ...

type ::= reference | primitiveType | constructedType
reference ::= classReference | enumReference | typeReference| externalClassReference
classReference ::= // a reference to a class (by its identifier) defined locally in this model
enumReference ::= // a reference to an enumeration type (by its identifier) defined locally in this model
externalClassReference ::= // a reference to a class declared as "EXT" in an importDeclaration
primitiveType ::= int| float| char| string| bool
constructedType ::= OPT type | type -> type | MAP type TO type | type <-> type | REL type TO type | SET type | SEQ type | type * type | type + type | type ( / type ) +

^ToC 2.6.1 Primitive Types

The primitive types are mapped to Java types currently as follows

umod : int float char string bool
Java : int double[float] char String boolean
boxed type Integer Double[Float] Character --- Boolean
missing/not yet supported: long

Please not that "string" in the umod sense is a scalar type, and therefor written with lower-case intial character.

The command line switch --floatNotDouble t makes that the umod "float" type is realized by a Java "float". Cf. Section 3.1. Without this, the Java type "doublle" is employed.
(The Java types "long", "Biginteger" and "BigDecimal" are currently not yet supported.)

^ToC 2.6.2 Reference Types

Reference types are denotated by identifiers.
They have to correspond to either an external declaration, a class definition or a type definition.
They are translated into a reference to a java object of the corresponding java class, but excluding any reference to "null", cf. Section 2.6.3.

^ToC 2.6.3 Constructed Type OPT

In java reference types implicitly always contain the additional object called "null", but all primitive types never do. With umod this is treated in a more orthogonal way: Types of both kinds do not include the null value. But by applying the OPT constructor you get a type which is "optional", i.e. which includes the value "null" as an additional value in its "carrier set".

The OPT type construcutor does not create new java class definitions on its own, but it (a) modifies the code realizing the "set_<>()" functions and the constructors, and (b) selects which proxy class will be selected for aggregate types (lists, sets, maps, etc.), when applied to their argument(s).

By these means a umod model always guarantees strictness, i.e. that a value the type of which is not OPT will never take the value null.

^ToC 2.6.4 Constructed Types in General

All type constructors are fully compositional, i.e. can be nested arbitrarily !-)
(Of course, some combinations do not make any sense, e.g. a multiple application of OPT, which is idempotent.)

Most constructed types are translated into parameterized instances of generic classes, either directly from the "java.util" zoo, or from our own proxy classes in umod/runtime.(The latter are needed to guarantee the strictness condition ("!=null") for all fields and values which are not of type "OPT".)

In both cases the interfaces for constructing, changing and inquiring follow the interface definitions of the corresponding collection types from "java.util".

The different type constructors and their notation are ...

SEQ t Sequence (= list).
SET t (Final) power set.
MAP t1 TO t2
t1 -> t2
final (possibly partial) map
REL t1 TO t2
t1 <-> t2
multimap, as defined in ops/Multimap.
t1 * t2 pair, i.e. simultanuous combination of two instances of the two types.
t1 + t2 co-pair, i.e. alternative selection of left or right side.

The default value for every field of a (non-optional!) aggretate type is an empty instance of this aggregate, cf. Section 2.8 below.

Whenever a new instance for such a field needs to be created explicitly, the constructor call of the correctly instatiated run-time class must be hand-coded explicitly. This can be very tedious, cf. CheckedMap_LR<String, CheckedMap_L<Integer,CheckedSet<Integer>>>. It may be easier to create a dummy instance of some class definition and make a get_<field>()" for retrieving a correctly typed empty instance.

^ToC 2.6.5 Special "Un-Curry-ed" Treatment of Cascaded MAPs

In case of sparse data and for sake of efficiency, on the conceptual level oftenly the following type transformation, called "Currying", is applied:

   (A * B * C) -> D  ==>  A -> B -> C -> D

So the data is realized as a map of maps of maps, but the way we want to operate is "on one single three-dimensional" map. This interpretation requires the following operations:

   m.containsKey(a,b,c) 
      = m.containsKey(a) ? m.get(a).containsKey(b) ? m.get(a).get(b).containsKey(c)
                                                   : false 
                         : false 

   m.get(a,b,c)   = m.get(a).get(b).get(c)
   m.put(a,b,c,d) = m.get(a).get(b).put(c,d) 
                                             

Let "F" be the same of a field definition and "o" an object reference. Each field can be defined as "strict", non-null simply by not prefixing its type with "OPT". So the default for any top-level field of type "map" is an empty map. But this is not the case on the further levels of nesting: Initialy, the map "o.get_F().get(a)" does not exist, i.e. "a" is not contained in the domaim ("as a key") in the map returned by o.get_F(), and o.get_F().get(a).get(b) consequently throws a null pointer exception.

To support this "un-curried" view to the map, umod generates code for these two methods which are safe: You always can call "o.put_F(a,b,c,d) ", and the necessary intermediate maps will be constructed automatically. You always can call "o.containsKey_F(a,b,c)". If this returns true, then you can safely call "F_get().get(a).get(b).get(c)".

The analog mechanism exists for sets, lists and multimaps:

  A
  | f  A -> B -> SET C 
  | g  A -> B -> LIST  C 
  | h  A -> B <->  C 

...generates code for ...

  class A { ....
     public void add_f (a, b, c) {// add c to the set selected by a and b,
                                     and create this and all intermediate maps
                                     iff necessary 

     public void add_g (a, b, c) {// append c to the end of the sequence
				     selected by a and b,
                                     and create this and all intermediate maps
                                     iff necessary 

     public void add_h (a, b, c) {// add c as a value for the key b to the
                                     multi-map set selected by a, 
                                     and create this iff necessary 

^ToC 2.6.6 Overloading of a null function result in standard java runtime libraries

Please note that for the standard java implementations it holds that

   m.containsKey(a) == false  ==> m.get(a) == null

We do not follow this rather confusing overloading of "null". Indeed, it does not make real sense when thinking in a "strongly typed way": In case of A->B (with B not-optional) you want to be guaranteed never to get a null. In case of a->OPT B, a value of null contained in the map, and the key not being in the map at all, are two very different things.

Therefore with umod the attempt of a "get" without "containsKey==true" is undefined, and may result in an exception.

^ToC 2.7 Documentation

Documentation can be entered in the umod source. It will be attached to the generated code as "Java doc comment", and thus re-appear when generating API documentation of the generated sources by applying "javadoc" etc. Therefore the usual stylistic rules for the writing of doc comment should be considered, esp. that the first sentence up to the character sequence ". " is quoted in survey tables.

The docEntry defines the format for any doc text entry.

docEntry ::= DOC ( characters ) * $$

As shown in the rule umodFile, documentation can be attached to the module as a whole. Similar this is allowed by rule classDef for every class definition, by fieldDef for every field definition, by enumDef for an enumeration type as a whole, and by enumItem for every single enumeration value. Addtionally, the construction fieldDoc has been introduced especially into the definition classDef for the separation of field declarations and their documentation.

fieldDoc ::= identifier_lower DOC ( characters ) * $$

For all these places, more than one such construct may appear: They will be concatenated in text order.

Whenever at least one such doc entry is contained, then umod runs in documented mode, otherwise undocumented.

In documented mode, a stream-lined pretty print of the original source (as navigateable HTML) will be inclduded in the "doc-files" directory. The API doc generated from the "Java doc comments" will frequently refer to this (reconstructed) source, and vice versa. See the pretty-printed umod source for DTDs as a instructive example.

The toplevel docu will be attached (as doc comment) to the model class; additionally it will be written into the file "package-info.java", iff umod runs in non-monolithic mode (=package mode).
The toplevel docu will be followed by a second, synthesized comment which reflects date and time of creation of the java sources, and the command line parameters. This synthesized comment will always be attached, also in non-documented mode.

All docu attached to fields and classes will be followed by a second, synthesized comment which gives the link into the pretty-printed source.

If running in documented mode, all classes not having documentation text will insert a warning text into the generated Java API doc.

^ToC 2.8 Constructors and Default Values for Fields

Whenever a new object instance is constructed, the value for every single field must be defined. This can be done by

  1. implicit default
  2. explicit default
  3. constructor argument.

For implicit defaults is holds that ...

  1. a field of type OPT(x) has the value null as its default.
  2. a field of a non-optional aggregate type (set, list, map, multimap, etc.) has the empty aggregate as its default.
  3. all other fields (primitive types and references, which are not optional) do not have an implicit default.

An explicit default can be given to any field by notation mentioned above in Section 2.5.2. A field with neither implicit nor explicit default value must appear as a constructor argument and is called obligate field for the rest of this section.

A minimal constructor is a constructor the parameters of which are exactly the values for the obligate fields. The user can specify one minimal constructor explicitly, or more than one, if the sequential order of the field values yields different type signatures. If there is no single explicit minimal constructor, one minimal constructor is supplied by the umod compiler implicitly; its parameters are the values for the obligate fields in the source text order. 2

A special subcase of a minimal constructor is the 0ary constructor, iff there are no obligate fields. The automated generation is suppressed iff the user defines a 0ary constructor by explicit Java source, see Section 2.4.3. 3

All non-minimal constructors must be declared explicitly by the user. It the class defines no obligate fields, all constructors of the superclass are inherited (which is different to Java). Otherwise, the signatures of these constructors can be expanded eplicitly to make them applicable. At least all obligate fields must be added.
(Such inheritance is only supported from a superclass which is a umod class definition, not from an external, imported class.)

The declaration of a constructor is done by pragmas following the field definition, as mentioned above in Section 2.5, by appending fieldPragmas . The syntax for constructor declaration is defined as ...

constructorPragma ::= C ( constructorNumber / sequentialOrder ) +

E.g. ...

TOPLEVEL CLASS
A
   f   int      ! C 0/0  C 1/0 ;
   g   OPT int  ! C 0/1        ;

Declarations of constructors use a pragma starting with the keyword "C".
The first number following the keyword is a number identifying the constructor.
The second number, after the slash, indicates the position of the argument which will be used to initialize the field to which the pragma belongs.
These position indications only stand for their sequential order. The numbers can increase with arbitrary step width.
Every combination of constructor and argument number may only appear once with all field definitions of the same class definition level.
Every constructor must initialize all obligate fields.

NB: Since the character "C" in these pragmas is parsed as an "identifier", there must be whitespace between it and the first digit.

So the example above creates two constructors:

  public A (int arg0, int arg1){ f = arg0 ; g = arg1 ; }
  public A (int arg0)          { f = arg0 ; }

Please note that constructor "1" can only be defined because field "g" has a default value (namely null).

Please note further that constructor "1" is the only possible minimal constructor and would be synthesized implicitly if not specified explicitly.

In contrast to java, constructors are inherited from (umod -defined) superclasses. This happens is different ways:

  1. if no pragma with the same visitor number appears in the subclass:
    1. if the sub-class introduces no obligate fields, then the constructor is inherited "as is" for this subclass.
    2. if there are new new obligate fields on this class level, than the constructor is not inherited to this subclass and any further subclass, and a corresponding warning is emitted.
  2. if one or more pragmas with the same visitor number do appear in the subclass:
    1. if the lowest argument number is larger than the largest argument number used in the superclass, then the construtor is "extended": The new arguments are appended to the list of the arguments of the superclass, the generated code assigns the values of the "new" arguments to the corrsponding fields, after calling the constructor of the superclass with the sequence of inherited arguments.
      Again, all obligate fields must be included.
    2. if the lowest argument number is equal to 0 , then the constructor number is "recycled" and a totally new constructor chain is started here. Please note that all obligate fields must be included in such a constructor explicitly, including those of all superclasses, using the "^" Notation, see below.
    3. if the lowest argument number is not equal to 0 , but lower than the highest argument number used on the superclass level for this constructor, than it is an error.

The special notation

superField ::= ^ identifier_lower fieldPragmas

(as contained in classDef) allows to refer to a field of some higher level class definition for including its initialization into a new constructor:

A
   f   OPT int
   g   OPT int
| B
| | C
   name string  ! C 2/0 ;
   ^f           ! C 2/1 ;

Also with this device, one particular field name may appear at most once in a constructor signature.

^ToC 2.8.1 Ambiguity in Overloading Resolution for Constructors

In java constructors are identified by their class signature. This can lead to overloading situations which cannot be resolved by a java compiler. E.g.

A 
  a1 OPT int     ! C 0/0      ;
  a2 OPT int     ! C     1/0  ;

umod does not warn you in these cases, but lets the java compiler discover the problem.

^ToC 2.8.2 Defining Constructors with Verbatim Inserted Java Source

If the verbatim inserted java source (cf. Section 2.4.3) defines a 0ary constructor explicitly, then the implicit creation of such is suppressed.

All other conflicts between explicitly specified constructors and verbatim inserted java source are discovered not before running the java compiler.

^ToC 2.9 Pattern Handling Methods

For to use with the Paisley pattern matching library, special pattern handling methods can be generated, two for every class and one for every field definition. This in controlled by the command line switch --patterns, see Section 3.1.

The names and meanings of the generated methods are as follows:

   class A extends B {
     protected F1 f1 ; 
     protected F2 f2 ; 

     public static Pattern<A> get_f1 (Pattern<F1> p){..}
     public static Pattern<A> get_f2 (Pattern<F2> p){..}
     // These pattern match the object iff the pattern argument "p"
     //   matches the resp. field value.
   }

   class __Patterns {
     // ...

     public static Pattern<Object> cast_A (Pattern<? super A> p){..}
     // matches iff p matches and argument is instance of A 

     public static Pattern<? super A> term_A
        ( Pattern<? super B> superpattern,
          Pattern<? super F1> f1_pattern,
          Pattern<? super F2> f2_pattern){..}
     // matches iff object matches superpattern (= a pattern defined for the
     //   superclass) and all field values match the resp. patterns.
     // NOTE: one can treat any of these as "don't care" by setting them
     //         to "Pattern.any"

     // ...
   }


^ToC 2.10 Visitors

umod supports the code generation for different types of visitors. They are used in the traditional way, i.e. used as a superclass for a user-defined class, which overrides only those methods which deal with those parts of the model the user is interested in.

In a umod source, the declaration of a visitor involves two steps:

  1. definition of one or more different traversal orders.
  2. declaration of the visitor classes.

The different traversal orders are identified by numbers. They are declared within the class definitions, by appending pragmas to the field definitions. This is similar to constructor declarations (Section 2.8).

The syntax is

traversalPragma ::= V ( traversalNumber / sequentialOrder ( lrCode WS ) * ) +
lrCode ::= ( L | R ) +

So the first number after the leading "V" identifies the traversal order. The second number, after the slash, indicates the sequential order in which the corresponding field's contents will be visited, relative to the contents of the other fields of this level of class definition.

(As with constructor declarations, there must be white-space after the "V".)

As with constructor declarations, the position numbers are only relevant w.r.t. their sequential order; they can increase with arbitrary step width.

In contrast to constructor declarations, they are not related to position numbers used in the definition of the superclass. They only define the traversal order among the fields of this this level of class definition. The traversal order w.r.t. the superclass cannot be influenced by their selection, but is defined by the "kind" of generated visitor, declared as described below. So their sequence can start with an arbitrary numeric value.

Visitor and constructor defining pragmas can arbitrarily be mixed in the pragma section "!...;" at the end of a field definition.

Example:

   A
   | B1
     b1  B1               ! C 0/0  V 0/1 1/0 ;
     b2 SEQ B1            !        V 0/0     ;
   | | C1
       c1 string -> C2    !        V 0/2 V 1/20 C 0/20 ;
       c2 OPT int 

Whenever the type of a field a visitor shall follow contains MAP or REL constructs, an additional lrCode can be inserted after the numeric code. It indicates whether to visit the left or the right side of each level of these binaray type constructor applications. The codes can enable leaves or whole sub-brances, as in

   A
   | a1  (A -> int) -> SEQ (A <-> A)  ! V 0 LL   V 1 LL R  ; 

where traversal code 0 will only select the references to A in the domain of the domain, while 1 will visit additionally both sides of all contained multi-maps. Please note that the lrCode does not alter the sequential order of visiting, but constitutes only an enabling condition.
(In general: Whenever you want to program an algorithm which depends on a "local" consequence of a "global" property, as it is the case with the sequential order of visiting, then the code is better maintainable when realizing this explicitly order-respecting behaviour locally. The required extra code is in most cases only a three-liner !-)

At the beginning of each umod definition file the generated visitors are declared, as already mentioned in the grammar rule modelLines above. The syntax for these declarations is ...

visitorDeclaration ::= VISITOR int identifier_upper
        ( MULTIPHASE | IS PRINTER | IS REWRITER | IS COREWRITER ) ? ( docEntry ) ? ;

identifier_upper directly gives the name of the generated java class which realizes the visitor.
The int indicates which traversal order is used by the generated visitor. Of course, one and the same number can be used for more than one visitors.
Then follows the optional indication of the visitor kind. If this is omitted, a "simple kind" visitor is generated.

^ToC 2.10.1 Common Base Class and Calling of Visitors

The common base class generated by umod for all visitors is one and the same abstract class. Let this be called "BaseVisitor" in this and subsequent paragraphs. (Indeed, in the generated code its currently named "MATCH_ONLY_00", but this name is normally not visible to the user and may change without notice.)
For each class C of the model, BaseVisitor provides a method public void match (C x){}.
Additionally, it provides a method public void match (java.lang.Object x){}, which allows to dynamically type an arbitrary object. Neither the class of this object needs to be known statically, nor even whether it is an instance of any model class at all.

Calling match(X) on an arbitrary object from the model is the most common way of activating a visitor. Internally, the specialization on the argument is performed explicitly by a chain of if(x instanceof C'){...}else-statements.

Whenever the most special model class C' of the visited object is identified, this information is carried over into the static type information of the visitors source code, and the corresponding "action(C')" method is called by explicit casting.
There are default "action(C')" methods in BaseVisitor, which indeed doe call the action(C'') method with the argument casted statically to its superclass, or calling a special nomatch() method, if the class is a top-level class of the model. This raises a RuntimeException in case that the partial flag is not set to true. This feature can be used to discover forgotten cases, while assuming all cases covered.

The different kinds of derived visitors, automatically generated or user defined, differ in the contents of this action() method, as described in the following sections.

In case that the visited object is neither an instance of any model class, nor of an imported external class, then the method BaseVisitor.foreignObject(Object o) is called. This method must be overridden whenever a match() shall be applied also to objects of un-known classes. As a default, this method throws a Runtime Exception with the message that "o" ist not an instance of a model class.

Furthermore, for each field definition "f" which refers to an aggregate (list, set, map) of instances of model classes, the generated code (for the class which contains this field) provides the method "public void descend_f(final BaseVisitor visitor){..}".
This code loops over the contents of the aggregate automatically and can be used from anywhere for explicitly applying a visitor to all elements contained in a given aggregate field.
If a field f carries a visitor code with number n and an
lrCode which selects only a subset of all branches, then a descend_n_f(BaseVisitor) method is generated which respects this selection.

Instances of "BaseVisitor" itself can be used for just classifying model objects by overriding some "action()" methods, while not providing any default descending behaviour.

^ToC 2.10.2 "Simple Kind" Visitor

The action() method of a generated visitor of the "simple kind" first calls the match() method on the sequence of fields, as determined by the selected traversal order, and then calls action() with the parameter casted statically to its superclass.

For example, assume a model definition like ...

MODEL example = 
VISITOR 0  V0  // simple kind
VISITOR 0  V1 MULTIPHASE

TOPLEVEL CLASS 
X
A
      a1 X             ! V 0/0 ; 
      a2 SEQ (X->X)    ! V 0/1 ; 
| B
      b1 X             ! V 0/0 ; 
      b2 SEQ (X->X)    ! V 0/1 ; 
| | C
      c1 X             ! V 0/0 ; 
      c2 SEQ (X->X )   ! V 0/1 ; 
<<< JAVA static class Derived extends V0 {
           public void action (B x){
             //do something
             super.action(x);
           }
         }   $$

Then a typical control flow when calling "Derived.match(o)" with an argument which happens to be of class "C" can be depicted as follows:

                    Derived.match(o)
                    /
         __________/
        /          
       V  
BaseVisitor.match(Object o) 
       |        
       V        
BaseVisitor.match(A x)  ........................> V0.action(A x)
       |           		             ^\                            
       |           		             | \                           
       |           		             |  +-> match(x.a1);
       |           		             |      x.descend_a2(this)
       V                                     |  
BaseVisitor.match(B x) ....> Derived.action(B x){     |
       |           //do something            |                             
       |           super.action(x);          |                      
       |         }   |                       |                          
       |             +-----------------> V0.action(B x)                       
       |         ^                            \                            
       |         | 		             . \                           
       |         | 		             .  +-> match(x.b1);      
       |         | 		             .      x.descend_b2(this);
       |         |		             .      this.action((A)x); 
       V         |		                  
BaseVisitor.match(C x) -- |  -------------------> V0.action(C x)                          
                 |		              \                            
                  \      	               \                           
                   \	   	                +-> match(x.c1);
                    \	   	                    x.descend_c2(this);
                     -----------------------------< this.action((B)x);                  		                  

^ToC 2.10.3 Multiphase Visitor

With these simple visitors the code re-usage (induced by a common superclass and thus a common "action()" method) does only take place after the specific descends have been carried out (by the more specific "action()" method).

Visitors of the MULTIPHASE kind do separate the code abstraction and the descending into different phases. The generated code basically looks like this:

  public static class V1 extends BaseVisitor {
    protected boolean haspre=true;
    protected boolean hasdescend=true;
    protected boolean haspost=true;

    public void action (C x){
      if (haspre) pre(x);
      if (hasdescend) descend(x);
      if (haspost) post(x);
    }

    public void pre (C x) {pre((B)x); }
    public void pre (B x) {pre((A)x); }
    public void pre (A x) {}

    public void descend (C x) { match(x.c1);
                                x.descend_c2(this);
                                descend((B) x);
                              }
    public void descend (B x) { match(x.b1);
                                x.descend_b2(this);
                                descend((A) x);
                              } 
    public void descend (A x) {  ...
                              }

    public void post (C x) {post((B)x); }
    public void post (B x) {post((A)x); }
    public void post (A x) {}

The variables haspre, hasdescend and haspost are "global switches" to enable these three phases independently. They can be overwritten by the derived visitor's code. E.g. they can be set to "false" once, at intialization time, or switched on and off dynamically during execution,

This more complex schedule allows specialization/inheritance of activities, without disturbing the inheritance w.r.t. descending. Again, this may become clear when looking at a graphical representation of the resulting control flow:

match(Object o)       :
       |              :                   ..............
       V              :  user-defined ... :            :
match(A x)            --------------------+            :
       |                         ^             ^       :              
       |                         |             |       : ...specialized
       V                         |             |       :      processing           .
match(B x)                   pre(B x)  descend(B x);   +------------------
       |                         ^             ^\                  ^
       |                         |             | \=> call match()  |
       |                         |             |     for fields on |
       |                         |             |      "B"-level    |
       V                         |             |                   |
match(C x)--->action(C x)--> pre(C x); descend(C x);         post(C x)
                                             \
                                              \=> call match() for 
                                                   fields on "C"-level

Please note that with this variant there is no inheritance between the action() methods. For code re-usage you always have to to program the pre() methods, possibly disabling both descend() and post().

Esp. when adding the MULTIPHASE behaviour to an existing "simple" visitor, then the exisiting inheritance between action() will be lost and replaced by the described three(3) separate inheritances !

^ToC 2.10.4 Rewriters

There are two kinds of rewriters: A visitor declared as COREWRITER can deal with cycles, but always creates copies, even if nothing changes.
A visitor declared as REWRITER cannot deal with cycles, but does cloning only if necessary. It is most convenient for transforming "term-like" data, and preserves sharing as far as possible.

Both kinds of rewriters are non-destructive: Whenever only a single value must be changed due to rewriting, a new copy of the containing object is created, altered and used for the further rewriting process.

The usage of both kinds of rewriters follows the pattern ...

  MyRW rw = new MyRW();
  rw.match(o);
  Object rewritten_object = rw.get_result();

For convenience this is the same as ...

  Object rewritten_object = (new MyRW()).rewrite(o);

...and there is also a typed variant ...

  A original ; 
  A rewritten_a = (new MyRW()).rewrite_typed(a);

In case of a (non-co-)rewriter there is a second constructor

  public RM (RW parent){..}

which takes an existing rewriter as its argument. This is made the "parent" rewriter, and all cache look-ups will be passed to this parent, iff they are unsuccessful in the local cache. So things like "nesting and inheritance of scopes" can easily be modelled.

When the generated (non-co-)rewriter is called directly, without overriding any method by the user, it does an identity transformation, it does nothing.
But when the generated co-rewriter is called directly, without any overriding by the user, it creates a deep copy of its argument.

All generated rewriters contain two methods for every class definition C, namely action(C) and rewriteFields(C).
action() is called by the match() cascade, as defined for the general case and described above in
Section 2.10.1. It (1) performs the preparatory steps of rewriting, and (2) is not specially concerned with the fields selected by the traversal order.

Both kinds of methods may be overwritten by the user. A certain contract must be kept, which is best explained by looking at the behaviour of the generated code.

For the (non-co-)rewriter, the interface to use consists of ...


   boolean lookUp(Object) // sets result/ismulti flag and returns true,
                          //   iff an entry exists in the rewriter's cache, 
                          //   or in that of its parent.
   void putToCache()      // memorize the currently set result (single or multiple)
   void useCache(boolean) // sets whether the generated "action()" may read the cache

   Object original ;      // must be readable and writeable by "action()"
   Object getResult();    // returns the most recently set result
   boolean isMulti();     // returns whether this is multiple (= a list)

   void revert();         // reset result to original
   void substitute(Object newresult);
                          // set newresult as result
   void substitute_multiple(List<Object> newresults);
                          // set newresults as multiple result
   void substitute_empty(); // set empty list as multiple result

In case of the (non-co-)rewriter, the generated method RW.action(C c) does the following:

  1. It looks up in the cache RW.cache whether the object o has already been processed by this rewriter instance (or by some rewriter in the parent chain). In this case the result of the earlier visiting process is drawn from the cache and stored as result (single or multiple), and the method returns immediately.
    This cache look-up can explicitly be disabled by calling useCache(false).
  2. If no result is retrieved from the cache, then the object o itself is stored into original and memorized as the (likely only intermediate!) value of result.
  3. A copy (i.e. a shallow clone) of o is created,
  4. and the method rewriteFields(Object) is called with that clone as an argument.
  5. When this method returns, whatever is currently the value of result is left there (for the caller of action(Object)) and saved to the cache as the rewriting result of the visited object o, by calling putToCache()

The method rewriteFields(C c) generated for every class C performs the non-generic, field structure specific rewriting. Its argument is the clone of the object. It first calls rewriteFields((D)c) for the superclass "D". Then it saves the current value of "result" into a local variable. It assumes that this points either to the original or to the clone, depending on whether changes to any field have happened in the super-class(es).

Then for all those fields which are selected by the chosen traversal order, match() is called on their contents.

Whatever this method returns in the variable result is compared with the original value contained in the field. Iff a change has happend, the result is stored into the field of the clone, and the overall local result of the method is overwritten to point to the clone.

Finally, after all fields from the corresponding traversal selection have been rewritten, the local result is copied to result, for communicating it to the caller (which may be a rewriteFields() of a sub-class or the action()method of the same class).

If a field value is an aggregate (i.e., is of a "container type"), (1) a temporary new aggregate object ist constructed. Then (2) match() is called sequentially on the contained objects, and (3) result is step by step treated accordingly, i.e. stored into the temporary aggregate. Here also the occurance of changes is monitored in a similar was as described for simple values. Whenever a change happens in an arbitrarily deep nesting level, then the local result is re-adjusted to point to the clone.

Any user-defined, overwritten method may behave similar, e.g. define the result of the rewriting by calling the methods from the interface above.

A user-defined, overwritten method may return more or less than one object by calling substitute_multiple(List<Object>) or substitute_empty(). This list of objects will be inserted in the nearest enclosing list or map structure. Up to this, multiplicity distributes!

E.g. having a structure and code like (in a symbolic notation!)

     A     a  SEQ (B * C)

     action (B b){ original = b ; substitute_multiple(new List(b1, b2));
     action (C c){ original = c ; substitute_multiple(new List(c1, c2, c3));

     a1 = { (B1,C2) }

...this will yield ...

     rewrite(a1) = { (b1, c1)(b1, c2)(b1, c3), (b2, c1)(b2, c2)(b2, c3) }

Please note that the level which calls "match" always needs the pointer value comparison original!=getResult() for determining whether a relevant change has happened. So it is part of the contract of action() to set both values before returning!

The operations of the co-rewriter are much simpler. The interface is

   boolean lookUp(Object) // sets the variable "result" and returns true,
                          //   iff an entry exists in the co-rewriter's cache, 

   void putToCache(origObj, newObj)
           // memorize newObj as the rewriting result of origObj
           // set the value of result=newObj
	   // This can ONLY BE CALLED ONCE for each key

   Object getResult()     // returns the most recently set result
   rewriteDone(Object key)// restores the "result" value to the clone of "key"
                          //   (this is called before returning from the 
                          //    rewriting method, mostly "action(Object)")

The generated action(o) method ...

  1. ...first creates clone=o.doclone().
  2. Then it enters the clone into the cache in advance. Only this enables the generated co-rewriter to deal with cyclic data.
  3. Finally it calls rewriteFields(clone).

The method rewriteFields(clone) calls match(clone.get_f()) on all selected fields, as in the non-co-rewriter case. But it needs not monitor whether changes occur, since all objects are copied anyhow.

When the user overrides the generated action(Object o) method, the user's code should ...

  1. look-up in the cache whether o has already been visited.
  2. enter a new rewriting result into the cache by calling putToCache(object, object) before descending into sub-fields, whenever there could be a ("cyclic") path in the sub-structure which leads back to the currently rewritten object! This method can only be called once and will throw an InvalidStateException iff the key is already contained in the map
  3. Call match() on field contents and update the fields of the clone by setting them to the value of getResult()
  4. Immediately before returning, if any recursive descend has happened, finally rewriteDone(orig) must be called to restore the result variable to the clone which has been cached for orig.

^ToC 2.10.5 Rewriting of Aggregates

When rewriting the field contents of SET and SEQ type, the rewriting is done element-wise, from "left to right", in the case of SEQ. But of course a declarative, sequence independent style of coding is always more robust and better readable.

This is esp. true when rewriting MAP and REL type values. The Java libraries which realize the corresponding data structures have imperative behaviour: E.g. the map class has overwriting semantics: Whatever is put LAST determines the current value. Of course this can make programs very hard to understand.

Here our approach is more declarative, and independent of this sequential order. Consider the following diagram:

                    M
           --------------------->
         |                        |
         |  L                     | R
         V                        V
           =====================>
                    M'

Let "M" be the mapping which shall be rewritten. It can be a map or a multi-map.
Then rewriting is applied to the domain of M, yielding a new, auxiliary mapping "L", and then to the range of M, yielding "R". When we alllow "substitute_multiple()", then one single (1) element can be re-written to more than one (>1) elements, and we get multimaps for L and/or R. Otherwise we get maps.

As can easily be seen in the diagram, the result M' of rewriting M is defined to
L-inv o M o R

If the declared type of M (and consequently M') is a multimap, this works in any case.

If the declared type of M (and M') is a map, then L-inv and R must be maps, i.e. L can be a multi-map, but must be injective, while R must be a map. If these conditions are violated by the user-defined rewriting rules (which define L and R) when applied to the current data M, then an exception is thrown. The result is independent of any sequential order of API calls.

^ToC 2.10.6 Visiting and Rewriting "null"

"The invention of null was a billion dollar mistake".

Of course you should avoid to use it, whenever possible. One major achievement of umod is eliminate illegal null references, and legal references must be declared explicitly by the type constructor "OPT", se Section 2.6.3.

null behaves in a very irregular way.

  1. It is a value which has a special type, which is sub-type of any other type.
  2. It can be cast into a certain, special type and thus used for controling overloading resolution.
  3. But the "instanceof" test again shows a different behaviour.

On the static, text level, there are always different "types" of null, so we could have supported match((A)null) and action((A)null). The latter could even make a static cast to the superclass, say "B", symbolically written as
action((A)null){
match((B)null);
}

But of course you cannot descend to any field in the null case, so at least when action() starts doing so, the value null must be treated specially.

But, even worse, you cannot store all these different null values to a cache! They all are the same, when seen as a runtime value! (Of course you COULD introduce auxiliary wrappers which tag all these different nulls. But this would be a lot of work for a construct which, as demonstrated above, is better avoided anyhow!)

So we decided not to visit nor to rewrite any null value! As soon as any visitor or rewriter finds a value (contained in a field or an aggregate) to be ==null, it does not do anything but leaves it unchanged.

This is not really a problem, because you can treat the (rare!) cases when field values may be ==null explicitly, one step earlier, when visiting the containing object itself! This is much more sensible also because at this point the context of the null value is still known. A (theoretically possible, but not implemented) visitation of a null value would require explicit passing of additional information, anyhow, to be of any worth.

^ToC 2.10.7 Diagnosis

Each visitor-based processing code is somewhere in the middle between "declarative" and "imperative" style of programming. Being determined by the selection of the methods overriden as well as by the processed data, the outcoming flow of control can be quite surprising. Therefore an interactive debugging is supported by the generated code.

Every visitor/rewriter has a field

    protected java.io.PrintStream _visitor_debug_stream = null;

Whenver this value is !=null, some intermediate steps
((currently: only the replacing of the result by the clone AND sub-change in MAPs in the (non-co-)rewriting process))
are dumped to this PrintStream.

When the command line switch "--visitordebug " is set to true for code generation, more debugging code will be included. see Section 3.1.

^ToC 2.10.8 Optimization

By setting the command line switch "--visitoroptimize" to true (see Section 3.1), all generated visitor and rewriter code will incorporate the following optimization tactics:

  1. When compiling the model, umod makes an "SCC" analysis of all model classes w.r.t. the "associations" (in UML speak) defined by the types of the field definitions,
  2. and then analyses which SCCs are reachable by every distinct field.
  3. The results of this analysis are encoded into some static final data, and thus available at runtime.
  4. Whenever (at runtime) the code of a user defined class, derived from a certain generated visitor/rewriter, is loaded, this code is questioned for the set of classes for which an override of any method does exist.
    (This analysis is performed on the binary code, at class loading time, but this is only for technical reasons. The semantically identical results could be drawn out of the source text.)
  5. From the "overriden classes" we can abstract to "overridden SCCs",
  6. and finally can conclude which fields ("assiciations") never need to be followed, because they only lead to one or more SCCs for which no user-defined visitor/rewriter method exists.

This optimization should make sense with data models which decompose into disjoint spheres with only few connections, for all those visitors/rewriters which process not all of these spheres. These results were presented in [lt11a] on the "ICMT 2011"; for further info please refer to the paper or to the slides of the talk.

^ToC 2.11 Visualization

Visualization of a data model (or parts thereof) is supported by different means.

^ToC 2.11.1 User-Defined Visualization by the Modifier TOSTRING

   A 
   | B1
     b1 B1 
     b2 char  
   | | C1
       c1 MAP string TO C2
       c2 int 
       TOSTRING JAVA return b1.toString().substring(2)+">>"+c1 ; $$
       FORMAT "c2 '==>' b1 ; c1"

The TOSTRING directive can appear in a class definition at any position like a field definition. It is followed by a fragment of java source text enclosed in "JAVA...$$", like all other verbatim java.
This source text fragment must consist of a statement or a sequence of statements (in the java sense).
It will be verbatim encapsulated in the methode declaration public String toString(){ ... }, so it has to end with the type-correct return statement.

W.r.t error reporting, the same rules apply as with other verbatim java source text, cf. Section 2.4.3 above.

^ToC 2.11.2 User-Defined Visualization by the Modifier FORMAT

The FORMAT directive can appear in a class definition at any position like a field definition. The FORMAT keyword is followed by string constant which contains a format description. This is compiled into a method of a dedicated visitor, which constructs a Format object for visualizing an instance of this class.

The syntax of the format directives is an instance of the generic syntax for format front-ends.

The Format object which represents an object a of class A of a model class M can be generated by calling ...

 
  class A {
    public Format format() {...}
  }

...which is a wrapper for ...

  class M {
    public static Format toFormat (Object o) {...}
  }
--or--
  class M {
    class Formatter {
      public static Format process (Object o) {...}
    }
  }

...which is a wrapper for (the protected method!) ...

  new M.__Formatter().toFormat(Object o)

Further there is a static function with a mode parameter. It first set a global variable in the Formatter object which normally defaults to 0(zero), and which is used in the $switch $mode{..} expressions as described in the format front-end documentation.

  class M {
    public static Format toFormat (Object o, int mode) {...}
  }

Whenever a format directive needs to embed a format for an object of a model class for which no FORMAT directive is given, a call is compiled to ...

  protected Format M.__Formatter.defaultformat(Object o){
    result = Format.literal(String.valueOf(o)) ;
  }

Whenever a format directive needs to embed a format for an object which is not part of the umod model (i.e. whenever an field with a type defined by an EXT import appears in the format directive), a call is compiled to ...

  protected  Format M.__Formatter.foreignObject(Object o)

This method tests whether o implements <METATOOLS>/format/Formattable , and in this case format() is called, otherwise Format.literal(String.valueOf(o));

These both methods can be overridden by deriving a new formatting visitor from M.__Formatter.

Whenever the user wants to change the values of nulltext, mode, format_empty and/or default_indent, he/she cannot use the above-mentioned wrappers, but (1) has to create an explicit instance anyhow (e.g. by formatter = new M.__Formatter()), (2) assign to these public fields, and (3) create the format by calling format = formatter.process(o) explicitly.

^ToC 2.11.3 Automated Swing Tree Generation

If activated by a command line switch (see Section 3.1), code is generated for a java swing tree representation.

For a certain model definiton Mymod and a reference to an object myobj, a swing tree is generated by calling

 JTree tree = new JTree(new Mymod.__TREEGEN__().growRoot(myobj)) ;

For convenience, there is a runtime class offering the static method umod.runtime.SwingBrowser model2swingpanel(...) , which opens a top-level window with scrolls bars etc. and all you need for browsing a umod model via a GUI.

^ToC 2.11.4 User-Directed Visitor-Based Dump Routines

Esp. for debugging purpose, a dedicated visitor can be generated which ia declared "IS PRINTER", cf. the syntax of visitor declarations in Section 2.10.

The constructor of each such visitor takes a java.io.PrintStream as its only argument. Whenever the match() method of this visitor is called for a certain object, this object is printed to this stream as follows:

  1. first a sequence of characters like "| | |...", indicating the match call's nesting level,
  2. then a simpel toString() represetnation of the object,
  3. then the name/value pairs of all fields which are not marked for descending (using toString()),
  4. followed by the output caused by (1) recursively matching all fields marked for descending (by the traversal order selected for this visitor), after (2) the above-mentioned nesting level has been inctreased-

When a field has to be printed which is of aggregate type, then (1) a new line is opened for every item in the aggregate, and (2) the name of this field and the current index position is printed in "[....]" before the call to match() of the current value.

Plese note that this is currently still a primitive implementation, and cannot deal with cylic data !

^ToC 2.12 XML Encoded Serialization/Deserialization

The basic umod xml-serialization is based on rules which try to combine simplicity, readability and non-redundancy.

  1. all primitive types are encapsulated into their type name used as a tag.
  2. all sequences, sets, maps, multimaps simply serialize their content (in canonical order), and encapsulate this into one and the same generic "aggregate" tag
  3. all fields are tagged with the field name
  4. the type-driven tags as listed above (=primitives and generic aggregates), are omitted when directly under a field tag.
  5. all objects are tagged with the class name
  6. on top-level of field contents, null values MAY be read, but are not written. Instead, fields with a null value are simply omitted.
  7. empty structures on top-level, which are not optional, are also omitted. (this corresponds to the default rules when constructing an object: only in case of OPT types there is a difference between null and an empty aggregate!)
  8. the "right-not-left" case of a co-pair is wrapped into a dedicated element.

Additionally, there is a special empty reference element which realizes (by an "idref" attribute) a pointer to some object defined at some other place (i.e. earlier when writing, or earlier or later when reading). The code generated by umod starts every serialization of objects with one(1) single certain root object, descending in a depth-first discipline. Therefore back-patching is never required in this case, but the de-serialization code does support it. The first reference to an object always leads to an "in-place" dump of the complete object structure, as defined above. This is fine in case of objects which are only referred to once. It is esp. fine for human readers in case of "algebraic" objects, which do not have an "identity" beyond there structure. In this case, the usage of "id", "idref" and "reference" would only serve as a kind of shorthand notation. This is different with non-algebraic objects, where identity (and "self-identity" and "non-identity" !-) does carry more semantics than the collection of field values!

The writing out of an instance of model "M" is started by some code like

   final java.io.PrintWriter p0 = new java.io.PrintWriter(outstream);
   final eu.bandm.tools.util.ContentPrinter cp = new ContentPrinter(p0); 
   final eu.bandm.umod.runtime.XMLconfiguration conf = new XMLconfiguration();
   final M.SAX_Writer dumper = new M.SAX_Writer(cp, conf);
   dumper.match(myTopLevelOjectForWriting);

The concrete tag strings and attribute names to use are configurable and are initialized via the XMLconfiguration object.

When writing, no errors should occur, but failures. These are reported via thrown exceptions.

The reading works as follows:

  FIXME MISSING 

On reading of course errors can occur, esp. when the external representation is not "valid" w.r.t. the implicit syntax rules of the DTD, which reflect the object structure of the umod model.

^ToC 3 Using the umod Tool

^ToC 3.1 Command Line Options

The options for the current implementation of the umod tool are as follows:

( definitions from file ../../src/eu/bandm/tools/umod/umodOptions.xml )

Model substantial information
-0 --sourceroot uri
  file system directory which is the root of the source code hierarchie
-1 --packagename string
  name of the package of the generated model class
-2 --sourcefiles ( uri) +
  source files containing the model declaration
-m --monolithic bool(=true)
  generate a single class instead of a package
-p --targetclasspath ( uri) *
  classpath to search for existing class files, if different from current classpath. NOT YET ACTIVE!!
Code generation modification
--floatNotDouble bool(=false)
  implement the umod type "float" not by doubles but by floats.
-G --getterfunctions bool(=false)
  whether to generate getter function objects DEPRECATED
-S --setterfunctions bool(=false)
  whether to generate setter function objects DEPRECATED
-x --xmlgen bool(=true)
  whether to generate SAX de-/serialization
-s --swingtree bool(=false)
  whether to generate a swing tree visualization
--patterns
  Generate methods for lifting and applying "Paisley" patterns to model elements and their fields
--constructorsPre20181214
  Use old constructor code generation scheme, prior to 20181214 redefinition.
-V --visitordebug bool(=false)
  whether to add logging code to the generated visitors
-v --visitoroptimize bool(=false)
  whether to cut superfluous descends
--visitorcompile bool(=false)
  whether to cut superfluous descends by code generation
--rewriterwarnings
  whether all warnings from rewriter code shall be printed, even when the code generation algorithm takes the responsibility.
-w --linewidth int(=80)
  line width for the formatting of the generated source code

Attention --setterfunctions and --getterfunctions are deprecated.

^ToC 3.2 Splitting the input text into input files

Currently more than one input files can be supplied. All these files must follow the same syntax, as described above, and use the same module name. All contained declarations will be processes as if they were contained in one single source file. So this allows the separation e.g. of documentation and declaration, or of different trees of the forest. But is only a provisionary means, until real modularization and parametrization will be introduced.

^ToC 3.3 Error Messages

The basic philosophy is to delegate most error messages to the subsequent step, i.e. the execution of a Java compiler. Some problems cannot be detected without detailed analysis, and we do not want to re-implement things done by the Java compiler anyhow.
This implies that the error messages generated there must be "calculated backwards" to find their cause in the umod source.
Nevertheless basic errors and warnings will be generated by the umod tool on its own.

(More to come)



1 For example: the "ORDERED" keywords initiates the generation of a "compareTo(..)" Java method. The generated code first compares the constructors, then steps through the field values. In case of object references, simply recursive calls to "compareTo(..)" will be generated. umod does not guarantee that such a method does exist. This is because this test is quite different when the type of the field is a reference to (a) a class of the same model, (b) to an external class given in binary form, or (c) given in source text form. So we deliberately leave the test to the further processing.

2 Please note that this implicit constructor may lead to overloading conflicts in between constructor type signatures, eg. in the case
TOPLEVEL CLASS
A
f1 int ! C 0/0 ;
f2 int
f3 int = 3 ! C 0/1 ;
See Section 2.8.1.

3 In an older implementation implicit supply of minimal constructors had been restricted to 0ary constructors. A source text which failed to define at least one constructor covering all obligate fields (here and of all superclasses) had been rejected. This behaviour is still available by the command line switch --constructorsPre20181214 t






go one page back go to start go to start go one page ahead
metajava bandm meta_tools option

made    2024-08-30_17h50   by    lepper   on    happy-ubuntu        Valid XHTML 1.0 Transitional Valid CSS 2.1

produced with eu.bandm.metatools.d2d    and    XSLT    FYI view page d2d source text