#d2d 2.0 text using mtdocpage : webpage #title #umod --- an Automated Model Generator // #htmlTitle bandm metatools umod, the Micro Model Generator #lang en #tableOfContents #h1 #title Purpose and Way of Operation #p #umod compiles a data model from a high-level definition language into java code. The input language is designed for utmost compactness. Although propietary, it closely follows mathematical notation. #p Java is a rather verbose language. So the definition of large data models is a rather tedious, monotone and error-prone task, requiring lots of redundant typing. #umod automates this typing process, by generating from an compact, non-redundant notation the vast amount of code which is needed to realize #ldots #list #i the data type definitions, #i methods for creating, modifying and inquiring the model, #i different kinds of visitors, #i methods for visualization, #i methods for a "soap"-like (de-)serialization #cite soap. #/list Some of these features can be enabled separately, according to the user's need. #p#kind missing statt "tsoap" refer to mathml etc. ?? #p In contrast to other similar projects, e.g. pizza, #umod is not realized as a pre-processor to arbitrarily arranged java source files. Instead, it is a compiler which generates one(1) model from one(1) single, central source file. (This is true conceptually; technically see #ref txt_splitInput.) #p // THIS IS A LABEL TEST #link#loc txtstrictness #/link ENDE LINK #p This approach has both advantages and disadvantages. E.g. you cannot use the convenient #umod notation for complicated nested typed collection classes #ital ad hoc#/ in any source file, but only in such a central model definition file. #p As an advantage you have one central compact definition of the main model of a project, --- mostly not much more than one single screen page! This turned out to be really helpful when developing the further code or writing documentation. #p Please note:#nl 1) Some consistency conditions are easily checked by the java compiler, but only at much higher costs by the #umod tool. Therefore in most such cases the corresponding checks are left to the former, and error messages will come from the attempt to compile the generated source text, not from #umod generation process. This requires some experience and "type checking" skills from the user (but still much less than the C++/STL error messages !-) #footnote For example: the "#src!ORDERED!" keywords initiates the generation of a "#src!compareTo(..)!" #java method. // FIXME DOC FEHLT The generated code first compares the constructors, then steps through the field values. In case of object references, simply recursive calls to "#src!compareTo(..)!" will be generated. #umod does #emph!not guarantee! that such a method does exist. This is because this test is quite different when the type of the field is a reference to (a) a class of the same model, (b) to an external class given in binary form, or (c) given in source text form. So we deliberately leave the test to the further processing. #/footnote #p 2) The generated code can always be by-passed by inserting #ital!verbatim! java code. So the data models generated by #umod are widely, but not totally fool-proved. The usage of #umod does not replace responsibility and survey. // ------------------------------------------------------------------ #h1 #title Input Notation Syntax and Generated #java Classes #p The input syntax for #umod definition files is somehow graphic-oriented, for the sake of maximal compactness and clear arrangement. #nl The #umod definition file is intended to be also used as a #emph!documentation!, esp. for the programmer during her/his coding work. #p#kind src The syntax of #umod input is defined in #link 3/eu/bandm/tools/umod/parser/umod.g #text this grammar file#/. // ------------------------------------------------------------------ #h2 #title Model Declaration #p For the basic data types, i.e. the elements which make up the data model, #umod supports two flavours of definitions: #list #i either "class" definitions, using a graphic oriented input format, and supporting inheritance/specialization, #i or "type" definitions, denotatend by a pure term notation and not supporting specialization. ((THE "TYPE" CONSTRUCT IS CURRENTLY NOT YET WELL SUPPORTED!)) #/list #p Additionally, there can be #list #i definitions of visitors, #i definitions of simple interfaces, #i definitions of simple enumeration types, #i import declarations of external, predefined classes, #i and documentation text for most of these constructs. #/list After their declaration, most of these entities are referred to by an #nonterminal identifier. #cfRule identifier ::= identifier_lower | identifier_upper ; #cfRule identifier_lower ::= lowerCaseAlpha (lowerCaseAlpha | upperCaseAlpha | digit | "_" )* ; #cfRule identifier_upper ::= upperCaseAlpha (lowerCaseAlpha | upperCaseAlpha | digit | "_" )* ; #p There is #emph!only one single name space! for the identifiers of all these different categories, and duplicate usage of an identifier will yield an error. #p All #umod identifiers must start with a Latin character, and may be continued with characters, Arabic digits and the underscore "#src!_!". Consequently, all generated Java objects with an identifier #emph!starting! with an underscore do not correspond to a #umod definition, but are additional and ancillary. #nl In #umod there is a difference between lower and upper case identifiers: Class, enumeration and visitor names must start with an upper case, field names must start with lower case. External declarations, enumeration items and types can be lower or upper case. #p The top level syntax of each #umod source file contains the declaration of the model name, like #ldots #source MODEL myModel = .. .. END MODEL #/source #p The underlying grammar can be described as #ldots #cfRule umodFile ::= "MODEL" identifier_upper "=" (docEntry)? modelLines "END" "MODEL" ; #cfRule modelLines ::= (visitorDeclaration)* (importDeclaration)* \nl (typeDef | enumDef | toplevelClassDef | extendClassDef | interfaceDef )+ ; #p The #umod compiler then generates source files of a certain #emph!package! (in the java sense.) The name of the package must be given as a command line parameter when calling the tool, cf. #ref txtcommandlineoptions below. #p For each class definition and type definition the #umod compiler generates one single java class. #nl Additionally, it generates sources of further java classes, realizing the different purposes mentioned above and explained in detail further below, e.g. visitors, serializers, visualizations, etc. #p The #umod compiler can work either in #emph!package mode! or in #emph!monolithic mode!. This mode also is selected by a command line switch. #p In package mode all generated classes are contained in the named package, and each class is contained in its own source file as a top-level #src!public! class. Additionally, a #emph!model class! with the name of the model is created, which only contains some central entry points and attributes valid for the model as a whole. #p In monolithic mode, this model class is created in the same way. But all other generated classes are realized as #emph!static inner classes! (in the java sense) of that one model class. #p #xemph!Attention:! Type definitions "should work", but have not yet been tested thoroughly and are not covered in the following documentation. Up to now, our extensive applications of #umod did not make them appear really necessary. // ------------------------------------------------------------------ #h2 #title Import Declaration #p An #umod model can refer to any existing java class for defining field types and data types, and for declaring them as superclasses to and interfaces implemented by a model class. #p This is prepared by an import declaration : #cfRule importDeclaration ::= "EXT" ("SOURCE")? identifier "=" pathToClass ; #cfRule pathToClass ::= identifier ("." identifier)* ; #p //The identifier must begin with an upper case ascii character.#nl The identifier must begin with an ascii character, either upper or lower case, and can contain ascii charactes, digits and the underscore "#src!_!".#nl #p The pathToClass is the fully qualified class name (in the jave sense). In the rest of the data model declaration this identifier will be used to refer to the specified external class, e.g. when declaring superclasses, interfaces or field types. #p If the modifier #src!SOURCE! is #emph!not! given, then the class must be reachable and is loaded by the #umod compiler. #p #umod uses a second, #emph dedicated class loader#/ for this purpose. In case that the class loading context in which #umod itself is running, and the future context of the generated classes differs, than this class loader can be parametrized with a command line switch, cf. #ref txtcommandlineoptions. #p The modifier #src!SOURCE! must always be added whenever the external class cannot be loaded at all during the run of #umod . In this case the #umod compiler will insert references to the class without testing whether it is reachable. The user has to care that the java compiler will later reach either the source or the class file. // ------------------------------------------------------------------ #h2 #title Enumerations #p As auxialliary class, as field values, #umod supports the easy definiton of simple enumeration types. #cfRule enumDef ::= "ENUM" identUpper (docEntry)? "=" enumitem ("," enumitem)* ; #cfRule enumItem ::= ident (docEntry)? ; #p The identifiers can be used in the type language for fields, as described in #ref txt_typeExprs. // ------------------------------------------------------------------ #h2 #title Class Hierarchy #p The syntax of class definitions is somehow "graphic oriented", for sake of compactness. The fragment of input text #ldots #source TOPLEVEL CLASS A | B1 | | C1 | | C2 | B2 #/source #p #ldots defines "graphically" a generated hierarchy of classes, namely #src!A! extending #src!java.lang.Object!, #src!B1! and #src!B2! extending #src!A!, and #src!C1! and #src!C2! extending #src!B1!. #nl In contrast to #java, Class names #emph!must! begin with an upper-case letter. Character sequences which are used by the #umod front-end syntax as keywords, as described in this document (e.g. "#src!TOPLEVEL!", "#src!CLASS!", "#src!JAVA!", #ldots) are rejected as identifiers. This rejection is (currently) #emph!implicit! by the parsing process, and reported as a syntax error. #p Please note that, inspite of the graphic appearance, the syntax definition and the implemented parser of #umod do #emph!not! treat line-breaks as significant. #nl The same example could have been written as #source A | B1 | | C1 | | C2 | B2 #/source #p For sake of clarity the definitions can be broken into sub-graphs, which are automatically combined, like: #source TOPLEVEL CLASS A | B1 | B2 EXTEND CLASS B1 | C1 | C2 #/source #p The appearance of each class definition in the #emph!first! block should leave out most details, so that this top-level class tree can serve as a dcoumentation for the basic skelton of a model. Then more details may be added later, in one or more "#src!EXTEND CLASS!" blocks. #p Every field definition (see #ref txtfielddefs) can be placed individually at either appearance of "B1". The same holds for every modifying attribute (see #ref abstractalgebraic). Please note that an extensive usage of this feature will result in #emph!less! instead of more clarity. #p The syntax of class definitions can be described more formally as #ldots #cfRule toplevelClassDef ::= "TOPLEVEL" "CLASS" classdef ; #cfRule extendClassDef ::= "EXTEND" "CLASS" classdef ; #cfRule classDef ::= identifier_upper classModifieres (docEntry)? \nl (fieldDef | fieldDoc | superField)* subClassesDef ; #cfRule subClassesDef ::= ("|")+ classDef ; // ------------------------------------------------------------------ #h3 #title Extending and implementing external classes. // #src!EXTENDS! and #sr.IMPLEMENTS . #label abstractalgebraic #p Following the class name there can appear different modifying attributes, defined by #ldots #cfRule classModifiers ::= "ABSTRACT"? "ALGEBRAIC"? \nl ("EXTENDS" idenifier)? ("IMPLEMENTS" idenifier+)? ; #p Their usage is explaind by the following examples: #source TOPLEVEL CLASS A EXTENDS SomeExternalClass | B1 IMPLEMENTS Interface0, Interface1 #/source #p Only #emph!toplevel! class definitions can be given an explicit superclass by the #src!EXTENDS! keyword. It no such is given, then a toplevel class extends #src!java.lang.object!.#nl Only #emph!external! classes can be used for such a superclass declaration. #p Every class can be given a list of interfaces it implements by the #src!IMPLEMENTS! keyword.#nl Only #emph!external! classes can be used for interface declaration. // ------------------------------------------------------------------ #h3 #title Declaring Classes as #src!ABSTRACT! and #src!ALGEBRAIC! #source TOPLEVEL CLASS A ALGEBRAIC | B1 #/source #p Every #emph!toplevel! class can be declared #src!ALGEBRAIC!. This enforces "algebraic semantics" on the equality relation. Consequently, in the generated code an #src!equals()! method is included which defines equality by comparing all field contents and ignoring missing pointer identity. A #src!hashcode()! method is constructed accordingly. #p The algebraic property is automatically distributed to all sub-classes of the class it appears with. #p Currently it is #emph not defined#/ what an #src!ALGEBRAIC! modifier does when applied to a non-toplevel class, and the authors can #emph!not! imagine what this #emph!should! mean !-) #p Currently it is #emph not defined#/ what happens when a #src!float! field appears in an algebraic data type, because floats do #emph!not! have a precise and canonical notion of "identity". There must be some "epsilon" value to implement algebraic identity, and we do not yet know where to get this from. #p#kind missing FIXME algebraic identity for floats missing #p Please note that the algebraic semantics of class definitions which have fields of compound types (sequences, sets, maps, etc.) rely on the correct implementation of the "#src!equals()!" method in the corresponding runtime libraries. For those employed as default by the automatically generated code, this is guaranteed. #p Even more important: the objects which realize the values of these fields, i.e. the employed #bold!collection objects are still modifiable!! This is of course not optimal, since in-place update are forbidden and would better be prevented by the generated code. But the alternative would have been to choose as default the copy of these container objects into non-modifiable variants. But since this must be done fully recursively, it can come out to be very expensive, so we decided that the programmer stays responsible that all collection objects which are referred to from any algebraic umod value indeed stay un-altered! #p Every class can be declared #src!ABSTRACT!. This (a) is translated into an "#src!abstract!" declaration for the java compiler, and (b) some parts of the code will not be generated for this class, e.g. constructors.#nl The #src!ABSTRACT! attribute does only apply to the class level it appears on explicitly. // ------------------------------------------------------------------ #h3 #title #ital!Verbatim! Java Source Text in Class Definitions #label txtverbatimjava #p You can insert free java source text into each class definition. This text will be inserted "#ital!verbatim!" into the generated java class. It is subject to syntax check and re-formatting. This is implemented by calling the #link 2/eu/bandm/tools/metajava/GeneratedClass.html #loc addDeclarations(java.lang.String) #text #src!GeneratedClass.addDeclarations()!#/link method from the #link metajava.html #text metajava model#/.)#nl It is not subject to context check or type check. Therefore some kinds of error will be reported by the subsequenct attempt to run the java compiler. #source TOPLEVEL CLASS A | B1 JAVA public String myfunction(int i){ return ""+this+i;} $$ | B2 #/source #p Alternatively, you can can insert free java source text into the java source generated for the top-level, model representation class: #source TOPLEVEL CLASS A | B1 < get_f = new ops.Function{ public T apply(C c){return c.f;}} ; ... } #/source #p If the command line switch (see #ref txtcommandlineoptions) "#src!--setterfunctions!" is set to true, there will be additionally #ldots #source class C { ... public static final C opx.Consumer set_f = new ops.Consumer{ public C consume (T arg, C state){ state.set_f(arg); return state; }} ; ... } #/source #p Both these objects are very convenient for using the elegant way of programming offered by #link ops.html #text metatools' "#src!ops!" package#/. //#link 2/eu/bandm/tools/ops/package-summary.html //#text metatools' "#src!ops!" package#/. // ------------------------------------------------------------------ #h3 #title Abstract Fields, Generalized Setter and Getter Methods #p By inserting an #nonterminal abstrfield construct into an #nonterminal fieldDef, a "virtual" field can be declared: #cfRule abstrfield ::= "ABSTRACT" ("GETTER" | "SETTER")? ; #p The #src!ABSTRACT! keyword alone causes the generation of both a getter and a setter method. No field is generated, but these functions rely on the content of a field defined on a subclass level. With the keywords #src!GETTER! and #src!SETTER! you can restrict the generation to that method. #nl The definitions of the concrete fields may employ a more specific type, iff the type is a model element class and the more specific type a sub-class thereof. #nl The concrete fields do not need to be present in every branch of the sub-class tree. #p The different cases are as follows: #source B | B1 | B2 A f ABSTRACT B g ABSTRACT OPT B ¦ A1 f B1 g B1 ¦ A2 f B2 g OPT B2 | A3 #/source #p This code will lead to getter and setter functions in A, A1 and A2, with different #java signatures: #source class A { ... B get_f(); boolean set_f(B); B get_g(); boolean set_g(B); ...} class A1 { ... B1 get_f(); boolean set_f(B1); B1 get_g(); boolean set_g(B1); ...} class A2 { ... B2 get_f(); boolean set_f(B2); B2 get_g(); boolean set_g(B2); ...} #/source #p For A1 and A1, the getter functions will simply return the current value of the corresponding field. #nl "#src!A3.get_g()!" will return #src!null! as the default value for every #src!OPT! type. #nl "#src!A3.get_f()!" will throw an #src!UnsupportedOperationException!, since the value can not be delivered, and there is no global default. #p With the setter functions its a little bit more complicated, but also quite canonical: #source x.set_f(a) [/x.set_g(a)] a.class== null B1 B2 x.class== A1 XPstrict OK XPtype A2 XPstrict [/OK] XPtype OK A3 XPunsp XPunsp XPunsp OK = store value and return change flag, as usual XPunsp = a special Unsupported Operation Exception XPstrict = the special umod Strictness Exception XPtype = "normal" java runtime typing error, "class cast exception" #/source #p This shows that this feature does loose some static type safety. But in practice this turned out to be not really significant, and the benefits of more specialization in the subclasses and elegant abstraction pay out. #p A common practice is to define an abstract field on the level of some superclass and to realize its getters and setters by a #umod field definition in some of its sub-branches, but by verbatim given Java methods in some others, see #ref txtverbatimjava. #p It is not necessary for non-algebraic types that a declared abstract getter function is also implemented in every subclass, as long as it is not used during program execution. (But this seems bad programming style ?-) #nl But it is necessary for algebraic types, because their "#src!hashcode()! method does call the getter function for each abstract field and not for the concrete implementations. // ------------------------------------------------------------------ #h3 #title Pragmas for Field Definitions #p The pragmas are used to control the generation of constructors and visitors, and are explained in detail together with these, cf. #ref txtconstructors and #ref txtvisitors. // ------------------------------------------------------------------ #h2 #title Types #label txt_typeExprs #p The same kinds of type expressions can be used to declare the type of a field of some class, or for creating #java classes on their own with a type definition statement. In both cases the syntax is #ldots #cfRule type ::= reference | primitiveType | constructedType ; #cfRule reference ::= classReference | enumReference | typeReference | externalClassReference ; #cfInf classReference #def a reference to a class (by its identifier) defined locally in this model #/cfInf #cfInf enumReference #def a reference to an enumeration type (by its identifier) defined locally in this model #/cfInf #cfInf externalClassReference #def a reference to a class declared as "EXT" in an #nonterminal importDeclaration #/cfInf #cfRule primitiveType ::= int | float | char | string | bool ; #cfRule constructedType ::= "OPT" type | type "->" type | "MAP" type "TO" type | type "<->" type | "REL" type "TO" type | "SET" type | "SEQ" type | type "*" type | type "+" type | type ("/" type)+ ; // ------------------------------------------------------------------ #h3 #title Primitive Types #p The primitive types are mapped to #java types currently as follows #table#border 1 #tr#td #umod : #td#src!int! #td#src!float! #td#src!char!#td#src!string!#td#src!bool! #tr#td #java : #td#src!int! #td#src!double[float]!#td#src!char!#td#src!String!#td#src!boolean! #tr#td boxed type #td#src!Integer!#td#src!Double[Float]!#td#src!Character!#td --- #td#src!Boolean! #tr#td missing/not yet supported: #td#src!long! #td #td #td #td #/table #p Please not that "#src!string!" in the #umod sense is a scalar type, and therefor written with #emph!lower-case! intial character. #p The command line switch #src!--floatNotDouble t! makes that the umod "#src!float!" type is realized by a Java "#src!float!". Cf. #ref txtcommandlineoptions. Without this, the Java type "#src!doublle!" is employed. #nl (The Java types "#src!long!", "#src!Biginteger!" and "#src!BigDecimal!" are currently not yet supported.) // ------------------------------------------------------------------ #h3 #title Reference Types #p Reference types are denotated by identifiers.#nl They have to correspond to either an external declaration, a class definition or a type definition.#nl They are translated into a reference to a java object of the corresponding java class, #emph!but excluding any reference to "null"!, cf. #ref txtstrictness. // ------------------------------------------------------------------ #h3 #title Constructed Type #src!OPT! #label txtstrictness #p In java reference types implicitly #emph!always! contain the additional object called "#src!null!", but all primitive types #emph!never! do. With #umod this is treated in a more orthogonal way: Types of both kinds do #emph!not! include the #src!null! value. But by applying the #src!OPT! constructor you get a type which is "optional", i.e. which includes the value "#src!null!" as an additional value in its "carrier set". #p The #src!OPT! type construcutor does not create new java class definitions on its own, but it (a) modifies the code realizing the "#src!set_<>()!" functions and the constructors, and (b) selects which proxy class will be selected for aggregate types (lists, sets, maps, etc.), when applied to their argument(s). #p By these means a #umod model always guarantees #emph!strictness!, i.e. that a value the type of which is not #src!OPT! will never take the value #src!null!. // ------------------------------------------------------------------ #h3 #title Constructed Types in General #p All type constructors are fully compositional, i.e. can be nested arbitrarily !-) #nl (Of course, some combinations do not make any sense, e.g. a multiple application of #src!OPT!, which is idempotent.) #p Most constructed types are translated into parameterized instances of generic classes, either directly from the "#src!java.util!" zoo, or from our own proxy classes in #link 2/eu/bandm/tools/umod/runtime/package-summary.html #text #src!umod/runtime!.#/link (The latter are needed to guarantee the strictness condition ("#src !=null#/src") for all fields and values which are not of type "#src!OPT!".) #p In both cases the interfaces for constructing, changing and inquiring follow the interface definitions of the corresponding collection types from "#src!java.util!". #p The different type constructors and their notation are #ldots #table #tr#td #src!SEQ! #ital!t! #td Sequence (= list). #tr#td #src!SET! #ital!t! #td (Final) power set. #tr#td #src!MAP! #ital!t1! #src!TO! #ital!t2! #nl #ital!t1! #src!->! #ital!t2 ! #td final (possibly partial) map #tr#td #src!REL! #ital!t1! #src!TO! #ital!t2! #nl #ital!t1! #src!<->! #ital!t2 ! #td multimap, as defined in #link 2/eu/bandm/tools/ops/Multimap.html #text #src!ops/Multimap!.#/link #tr#td #ital!t1! #src!*! #ital!t2 ! #td pair, i.e. simultanuous combination of two instances of the two types. #tr#td #ital!t1! #src!+! #ital!t2 ! #td co-pair, i.e. alternative selection of left or right side. #/table #p The default value for every field of a (non-optional!) aggretate type is an #emph.empty instance. of this aggregate, cf. #ref txtconstructors below. #p Whenever a new instance for such a field needs to be created explicitly, the constructor call of the correctly instatiated run-time class must be hand-coded explicitly. This can be very tedious, cf. #src!CheckedMap_LR>>!. It may be easier to create a dummy instance of some class definition and make a #src!get_()!" for retrieving a correctly typed empty instance. // ------------------------------------------------------------------ #h3 #title Special "Un-Curry-ed" Treatment of Cascaded #src!MAP!s #p In case of sparse data and for sake of efficiency, on the conceptual level oftenly the following type transformation, called "Currying", is applied: #source (A * B * C) -> D ==> A -> B -> C -> D #/source #p So the data is #emph!realized! as a map of maps of maps, but the way we want to operate is "on one single three-dimensional" map. This interpretation requires the following operations: #source m.containsKey(a,b,c) = m.containsKey(a) ? m.get(a).containsKey(b) ? m.get(a).get(b).containsKey(c) : false : false m.get(a,b,c) = m.get(a).get(b).get(c) m.put(a,b,c,d) = m.get(a).get(b).put(c,d) // <- AND CREATE all intermediate maps // as necessary #/source #p Let "#src!F!" be the same of a field definition and "#src!o!" an object reference. Each field can be defined as "strict", non-null simply by not prefixing its type with "#src!OPT!". So the default for any top-level field of type "map" is an #emph!empty! map. But this is not the case on the further levels of nesting: Initialy, the map "#src!o.get_F().get(a)!" does not exist, i.e. "#src!a!" is not contained in the domaim ("as a key") in the map returned by #src!o.get_F()!, and #src!o.get_F().get(a).get(b)! consequently throws a null pointer exception. #p To support this "un-curried" view to the map, #umod generates code for these two methods which are safe: You always can call "#src!o.put_F!(a,b,c,d) ", and the necessary intermediate maps will be constructed automatically. You always can call "#src!o.containsKey_F(a,b,c)!". If this returns #src!true!, then you can safely call "#src!F_get().get(a).get(b).get(c)!". #p The analog mechanism exists for sets, lists and multimaps: #source A | f A -> B -> SET C | g A -> B -> LIST C | h A -> B <-> C #/source #p #ldots generates code for #ldots #commentchar\ #source class A { .... public void add_f (a, b, c) {// add c to the set selected by a and b, and create this and all intermediate maps iff necessary public void add_g (a, b, c) {// append c to the end of the sequence selected by a and b, and create this and all intermediate maps iff necessary public void add_h (a, b, c) {// add c as a value for the key b to the multi-map set selected by a, and create this iff necessary #/source #commentchar/ // ------------------------------------------------------------------ #h3 #title Overloading of a #src!null! function result in standard java runtime libraries #p Please note that for the standard java implementations it holds that #source m.containsKey(a) == false ==> m.get(a) == null #/source #p We do #emph!not! follow this rather confusing overloading of "null". Indeed, it does not make real sense when thinking in a "strongly typed way": In case of #src!A->B! (with #src!B! not-optional) you want to be guaranteed #emph!never! to get a #src!null!. In case of #src!a->OPT B!, a value of #src!null! contained in the map, and the key not being in the map at all, are two very different things. #p Therefore with #umod the attempt of a "get" without "containsKey==true" is undefined, and may result in an exception. // ------------------------------------------------------------------ #h2 #title Documentation #p Documentation can be entered in the #umod source. It will be attached to the generated code as "Java doc comment", and thus re-appear when generating API documentation of the generated sources by applying "#src!javadoc!" etc. Therefore the usual stylistic rules for the writing of doc comment should be considered, esp. that the first sentence up to the character sequence "#src!. !" is quoted in survey tables. #p The #nonterminal docEntry defines the format for any doc text entry. #cfRule docEntry ::= "DOC" (characters)* "$$" ; #p As shown in the rule #nonterminal umodFile, documentation can be attached to the module as a whole. Similar this is allowed by rule #nonterminal classDef for every class definition, by #nonterminal fieldDef for every field definition, by #nonterminal enumDef for an enumeration type as a whole, and by #nonterminal enumItem for every single enumeration value. // visitor docu ?? // type definition, ?? Addtionally, the construction #nonterminal fieldDoc has been introduced especially into the definition #nonterminal classDef for the separation of field declarations and their documentation. #cfRule fieldDoc ::= identifier_lower "DOC" (characters)* "$$" ; #p For all these places, more than one such construct may appear: They will be concatenated in text order. #p Whenever at least one such doc entry is contained, then #umod runs in #emph!documented mode!, otherwise #emph!undocumented!. #p In documented mode, a stream-lined pretty print of the original source (as navigateable HTML) will be inclduded in the "#src!doc-files!" directory, and the generated doc comment will frequently refer to this (reconstructed) source. #p The toplevel docu will be attached (as doc comment) to the model class; additionally it will be written into the file "#src!package-info.java!", iff #umod runs in non-monolithic mode (=package mode). #nl The toplevel docu will be followed by a second, synthesized comment which reflects date and time of creation of the java sources, and the command line parameters. This synthesized comment will always be attached, also in non-documented mode. #p All docu attached to fields and classes will be followed by a second, synthesized comment which gives the link into the pretty-printed source. #p If running in documented mode, all classes #emph!not! having documentation text will insert a warning text into the generated Java API doc. #p#kind missing DOC of interface def #p#kind missing DOC of type declarations // ------------------------------------------------------------------ #h2 #title Constructors and Default Values for Fields #label txtconstructors #p Whenever a new object instance is constructed, the value for every single field must be defined. This can be done by #list #i implicit default #i explicit default #i constructor argument. #/list #p For #emph!implicit! defaults is holds that #ldots #list #i a field of type #src!OPT(x)! has the value #src!null! as its default. #i a field of a non-optional aggregate type (set, list, map, multimap, etc.) has the #emph!empty aggregate! as its default. #i all other fields (primitive types and references, which are not optional) do #emph!not! have an implicit default. #/list #p An #emph!explicit! default can be given to any field by notation mentioned above in #ref txtfieldinitvalues. A field with neither implicit nor explicit default value must appear as a constructor argument and is called #emph!obligate field! for the rest of this section. // NEU seit 20181218 #p A #emph!minimal constructor! is a constructor the parameters of which are exactly the values for the obligate fields. The user can specify one minimal constructor explicitly, or more than one, if the sequential order of the field values yields different type signatures. If there is no single explicit minimal constructor, one minimal constructor is supplied by the #umod compiler implicitly; its parameters are the values for the obligate fields in the source text order. #footnote Please note that this implicit constructor may lead to overloading conflicts in between constructor type signatures, eg. in the case #nl#src(TOPLEVEL CLASS) #nl#src(A) #nl#src( f1 int ! C 0/0 ; ) #nl#src( f2 int ) #nl#src( f3 int = 3 ! C 0/1 ; ) #nl See #ref txt_consabmig. #/footnote #p A special subcase of a minimal constructor is the 0ary constructor, iff there are no obligate fields. The automated generation is suppressed iff the user defines a 0ary constructor by explicit Java source, see #ref txtverbatimjava. #footnote In an older implementation implicit supply of minimal constructors had been restricted to 0ary constructors. A source text which failed to define at least one constructor covering all obligate fields (here and of all superclasses) had been rejected. This behaviour is still available by the command line switch #src!--constructorsPre20181214 t! #/footnote /* OLD VERSION ================================================================= #p In case that #emph!all! fields of a class definition (including all those inherited from a superclass) do have a default value, then a #emph!0ary! constructor (a constructor with zero arguments) is created for this class automatically. === */ #p // pre 20181214 All other constructors, i.e. those with parameters, All non-minimal constructors must be declared #emph!explicitly! by the user. It the class defines no obligate fields, all constructors of the superclass are inherited (which is different to Java). Otherwise, the signatures of these constructors can be expanded eplicitly to make them applicable. At least all obligate fields must be added. #nl (Such inheritance is only supported from a superclass which is a #umod class definition, not from an external, imported class.) #p The declaration of a constructor is done by pragmas following the field definition, as mentioned above in #ref txtfielddefs, by appending #nonterminal fieldPragmas . The syntax for constructor declaration is defined as #ldots #cfRule constructorPragma ::= "C" (constructorNumber "/" sequentialOrder)+ ; #p E.g. #ldots #source TOPLEVEL CLASS A f int ! C 0/0 C 1/0 ; g OPT int ! C 0/1 ; #/source #p Declarations of constructors use a pragma starting with the keyword "#src!C!". #nl The first number following the keyword is a number identifying the constructor. #nl The second number, after the slash, indicates the position of the argument which will be used to initialize the field to which the pragma belongs. #nl These position indications only stand for their #emph!sequential order!. The numbers can increase with arbitrary step width. #nl Every combination of constructor and argument number may only appear once with all field definitions of the same class definition level. #nl Every constructor must initialize all obligate fields. #p NB: Since the character "#src!C!" in these pragmas is parsed as an "identifier", there must be whitespace between it and the first digit. #p#kind src cf. #link 3/eu/bandm/tools/umod/parser/umod.g #text umod.g #p So the example above creates two constructors: #source public A (int arg0, int arg1){ f = arg0 ; g = arg1 ; } public A (int arg0) { f = arg0 ; } #/source #p Please note that constructor "1" can only be defined because field "#src!g!" has a default value (namely #src!null!). #p Please note further that constructor "1" is the only possible minimal constructor and would be synthesized implicitly if not specified explicitly. /* ==== The #emph!definedness requirement! says that an attempt to define a constructor which does #emph!not supply an argument for all obligate fields // PRE 20181214 those fields which do not have a default value! is considered an error and signalled as such. ==*/ #p In contrast to java, constructors are #emph!inherited! from (#umod -defined) superclasses. This happens is different ways: #list #i if no pragma with the same visitor number appears in the subclass: #list #i if the sub-class introduces no obligate fields, // only fields with default values (implicit or explicit), then the constructor is inherited "as is" for this subclass. #i if there are new new obligate fields on this class level, // fields which would require constructor arguments, than the constructor is #emph!not! inherited to this subclass and any further subclass, and a corresponding #emph!warning! is emitted. #/list #i if one or more pragmas with the same visitor number do appear in the subclass: #list #i if the lowest argument number is larger than the largest argument number used in the superclass, then the construtor is "extended": The new arguments are appended to the list of the arguments of the superclass, the generated code assigns the values of the "new" arguments to the corrsponding fields, after calling the constructor of the superclass with the sequence of inherited arguments.#nl Again, all obligate fields must be included. // Please note that the "definedness" requirement mentioned above applies // here accordingly. #i if the lowest argument number is equal to #src!0! , then the constructor number is "recycled" and a totally new constructor chain is started here. //NEW 20181214 Please note that all obligate fields must be included in such a constructor explicitly, including those of all superclasses, using the "#src!^!" Notation, see below. /* === pre 20181213 Please note that, according to the definedness requirement, #emph!every start of a constructor chain is only possible if the superclass has a 0ary constructor. ==*/ #i if the lowest argument number is not equal to #src!0! , but lower than the highest argument number used on the superclass level for this constructor, than it is an error. #/list #/list The special notation // "#src!^!" #cfRule superField ::= "^" identifier_lower fieldPragmas ; #p (as contained in #nonterminal classDef) allows to refer to a field of some higher level class definition for including its initialization into a new constructor: #source A f OPT int g OPT int | B | | C name string ! C 2/0 ; ^f ! C 2/1 ; #/source #p // NEU 20181214: Also with this device, one particular field name may appear at most once in a constructor signature. #p#kind missing link to non-umod superclass (e.g using 0ary constructor, which is called by all others !!) // ------------------------------------------------------------------ #h3 #title Ambiguity in Overloading Resolution for Constructors #label txt_consabmig #p In java constructors are identified by their class signature. This can lead to overloading situations which cannot be resolved by a java compiler. E.g. #source A a1 OPT int ! C 0/0 ; a2 OPT int ! C 1/0 ; #/source #p #umod does #emph!not! warn you in these cases, but lets the java compiler discover the problem. // ------------------------------------------------------------------ #h3 #title Defining Constructors with #ital!Verbatim! Inserted Java Source #p If the #ital!verbatim! inserted java source (cf. #ref txtverbatimjava) defines a #emph!0ary! constructor explicitly, then the implicit creation of such is suppressed. #p All other conflicts between explicitly specified constructors and #ital!verbatim! inserted java source are discovered #emph!not before running the java compiler!. // ------------------------------------------------------------------ #h2 #title Pattern Handling Methods #p For to use with the #link 2/eu/bandm/tools/paisley/package-summary.html #text Paisley#/link pattern matching library, special pattern handling methods can be generated, two for every class and one for every field definition. This in controlled by the command line switch #src!--patterns!, see #ref txtcommandlineoptions. #p The names and meanings of the generated methods are as follows: #commentchar \ #source class A extends B { protected F1 f1 ; protected F2 f2 ; public static Pattern get_f1 (Pattern p){..} public static Pattern get_f2 (Pattern p){..} // These pattern match the object iff the pattern argument "p" // matches the resp. field value. } class __Patterns { // ... public static Pattern cast_A (Pattern p){..} // matches iff p matches and argument is instance of A public static Pattern term_A ( Pattern superpattern, Pattern f1_pattern, Pattern f2_pattern){..} // matches iff object matches superpattern (= a pattern defined for the // superclass) and all field values match the resp. patterns. // NOTE: one can treat any of these as "don't care" by setting them // to "Pattern.any" // ... } #/source #commentchar / // ------------------------------------------------------------------ #h2 #title Visitors #label txtvisitors #p #umod supports the code generation for different types of visitors. They are used in the traditional way, i.e. used as a superclass for a user-defined class, which overrides only those methods which deal with those parts of the model the user is interested in. #p In a #umod source, the declaration of a visitor involves two steps: #list #i definition of one or more different #emph.traversal orders.. #i declaration of the visitor classes. #/list #p The different traversal orders are identified by numbers. They are declared within the class definitions, by appending pragmas to the field definitions. This is similar to constructor declarations (#ref txtconstructors). #p The syntax is #cfRule traversalPragma ::= "V" (traversalNumber "/" sequentialOrder (lrCode WS)* )+ ; #cfRule lrCode ::= ( "L" | "R" )+ ; #p So the first number after the leading "#src!V!" identifies the traversal order. The second number, after the slash, indicates the sequential order in which the corresponding field's contents will be visited, relative to the contents of the other fields of this level of class definition. #p (As with constructor declarations, there must be white-space after the "#src!V!".) #p As with constructor declarations, the position numbers are only relevant w.r.t. their sequential order; they can increase with arbitrary step width. #p In contrast to constructor declarations, they are not related to position numbers used in the definition of the superclass. They only define the traversal order among the fields of this this level of class definition. The traversal order w.r.t. the superclass cannot be influenced by their selection, but is defined by the "kind" of generated visitor, declared as described below. So their sequence can start with an arbitrary numeric value. #p Visitor and constructor defining pragmas can arbitrarily be mixed in the pragma section "#src!!!...#src!;!" at the end of a field definition. #p Example: #source A | B1 b1 B1 ! C 0/0 V 0/1 1/0 ; b2 SEQ B1 ! V 0/0 ; | | C1 c1 string -> C2 ! V 0/2 V 1/20 C 0/20 ; c2 OPT int #/source #p Whenever the type of a field a visitor shall follow contains #src!MAP! or #src!REL! constructs, an additional #nonterminal lrCode can be inserted after the numeric code. It indicates whether to visit the left or the right side of each level of these binaray type constructor applications. The codes can enable leaves or whole sub-brances, as in #source A | a1 (A -> int) -> SEQ (A <-> A) ! V 0 LL V 1 LL R ; #/source #p where traversal code #src!0! will only select the references to #src!A! in the domain of the domain, while #src!1! will visit additionally both sides of all contained multi-maps. #emph!Please note! that the #nonterminal lrCode does #emph!not! alter the sequential order of visiting, but constitutes only an enabling condition. #nl (In general: Whenever you want to program an algorithm which depends on a "local" consequence of a "global" property, as it is the case with the sequential order of visiting, then the code is better maintainable when realizing this explicitly order-respecting behaviour #emph!locally!. The required extra code is in most cases only a three-liner !-) #p At the beginning of each #umod definition file the generated visitors are declared, as already mentioned in the grammar rule #nonterminal modelLines above. #label txt_syntax_visitor_declaration The syntax for these declarations is #ldots #cfRule visitorDeclaration ::= "VISITOR" int identifier_upper \nl ( "MULTIPHASE" | "IS" "PRINTER" | "IS" "REWRITER" | "IS" "COREWRITER" )? (docEntry)? ";" ; #p #nonterminal identifier_upper directly gives the name of the generated java class which realizes the visitor. #nl The #nonterminal int indicates which traversal order is used by the generated visitor. Of course, one and the same number can be used for more than one visitors. #nl Then follows the optional indication of the visitor kind. If this is omitted, a "simple kind" visitor is generated. // ------------------------------------------------------------------ #h3 #title Common Base Class and Calling of Visitors #label txtvisitorbasics #p The common base class generated by #umod for all visitors is one and the same abstract class. Let this be called "#src!BaseVisitor!" in this and subsequent paragraphs. (Indeed, in the generated code its currently named "#src!MATCH_ONLY_00!", but this name is normally not visible to the user and may change without notice.) #nl For each class #src!C! of the model, #src!BaseVisitor! provides a method #src.public void match (C x){}.. #nl Additionally, it provides a method #src!public void match (java.lang.Object x){}!, which allows to dynamically type an arbitrary object. Neither the class of this object needs to be known statically, nor even whether it is an instance of any model class at all. #p Calling #src!match()! on an arbitrary object from the model is the most common way of activating a visitor. Internally, the specialization on the argument is performed explicitly by a chain of #src!if(x instanceof C'){...}else!-statements. #p Whenever the #emph!most special! model class #src!C'! of the visited object is identified, this information is carried over into the static type information of the visitors source code, and the corresponding "#src!action()!" method is called by explicit casting. #nl There is a default "#src!action()!" method in #src!BaseVisitor!, which indeed does call the #src!action()! method with the argument casted statically to its superclass, or calling a special #src!nomatch()! method, if the class is a top-level class of the model. This raises a #src!RuntimeException! in case that the #src!partial! flag is not set to true. This feature can be used to discover forgotten cases, while assuming all cases covered. #p The different kinds of derived visitors, automatically generated or user defined, differ in the contents of this #src!action()! method, as described in the following sections. #p In case that the visited object is neither an instance of any model class, nor of an imported external class, then the method #src!BaseVisitor.foreignObject(Object o)! is called. This method #emph!must! be overridden whenever a #src!match()! shall be applied also to objects of un-known classes. As a default, this method #emph!throws a Runtime Exception! with the message that "o" ist not an instance of a model class. #p Furthermore, for each field definition "#src!f!" which refers to an aggregate (list, set, map) of instances of model classes, the generated code (for the class which contains this field) provides the method "#src!public void descend_f(final BaseVisitor visitor){..}!". #nl This code loops over the contents of the aggregate automatically and can be used #emph!from anywhere! for explicitly applying a visitor to all elements contained in a given aggregate field. #nl If a field f carries a visitor code with number n and an #nonterminal lrCode which selects only a subset of all branches, then a #src!descend_n_f(BaseVisitor)! method is generated which respects this selection. #p Instances of "#src!BaseVisitor!" itself can be used for just classifying model objects by overriding some "#src!action()!" methods, while not providing any default descending behaviour. #p#kind missing "partial" flag behaviour not supported on the next class levels ???? // ------------------------------------------------------------------ #h3 #title "Simple Kind" Visitor #p The #src!action()! method of a generated visitor of the "simple kind" first calls the #src!match()! method on the sequence of fields, as determined by the selected traversal order, and then calls #src!action()! with the parameter casted statically to its superclass. #p #commentchar \ For example, assume a model definition like #ldots #source MODEL example = VISITOR 0 V0 // simple kind VISITOR 0 V1 MULTIPHASE TOPLEVEL CLASS X A a1 X ! V 0/0 ; a2 SEQ (X->X) ! V 0/1 ; | B b1 X ! V 0/0 ; b2 SEQ (X->X) ! V 0/1 ; | | C c1 X ! V 0/0 ; c2 SEQ (X->X ) ! V 0/1 ; <<< JAVA static class Derived extends V0 { public void action (B x){ //do something super.action(x); } } #/source #commentchar / #p Then a typical control flow when calling "#src!Derived.match(o)!" with an argument which happens to be of class "#src!C!" can be depicted as follows: #commentchar \ #source Derived.match(o) / __________/ / V BaseVisitor.match(Object o) | V BaseVisitor.match(A x) ........................> V0.action(A x) | ^\ | | \ | | +-> match(x.a1); | | x.descend_a2(this) V | BaseVisitor.match(B x) ....> Derived.action(B x){ | | //do something | | super.action(x); | | } | | | +-----------------> V0.action(B x) | ^ \ | | . \ | | . +-> match(x.b1); | | . x.descend_b2(this); | | . this.action((A)x); V | BaseVisitor.match(C x) -- | -------------------> V0.action(C x) | \ \ \ \ +-> match(x.c1); \ x.descend_c2(this); -----------------------------< this.action((B)x); #/source #commentchar / /* ????????????????????????????????????????????? #p The picture shows clearly #bold that there is no inheritance/code reusage#/ between the #src!action()! methods: In spite of being a sublass of #src!B!, the processing of an object of class #src!C! never reaches the method #src!action(B x)!.#nl With this kind of visitors, all abstraction/code re-usage has to be coded explicitly, by local subroutines called from multiple #src!Derived.action()!-methods. ??????????????????????????????????????????????? */ // ------------------------------------------------------------------ #h3 #title Multiphase Visitor #p With these simple visitors the code re-usage (induced by a common superclass and thus a common "#src!action()!" method) does only take place #emph!after! the specific descends have been carried out (by the more specific "#src!action()!" method). #p Visitors of the #src!MULTIPHASE! kind do separate the code abstraction and the descending into different phases. //support a finer granularity of //control by separating the visting process into three phases. The generated code basically looks like this: /* ========================== ??????????????????????? The #java "inheritance" axis is employed for deriving one visitor from another. Therefore explicit coding is needed to the support the #emph!re-usage! of code w.r.t. the #em.visited objects.. This code is generated in the #src!MULTIPHASE! kind of visitors. #nl Their #src!action! method is defined by #ldots ======================================= */ #source public static class V1 extends BaseVisitor { protected boolean haspre=true; protected boolean hasdescend=true; protected boolean haspost=true; public void action (C x){ if (haspre) pre(x); if (hasdescend) descend(x); if (haspost) post(x); } public void pre (C x) {pre((B)x); } public void pre (B x) {pre((A)x); } public void pre (A x) {} public void descend (C x) { match(x.c1); x.descend_c2(this); descend((B) x); } public void descend (B x) { match(x.b1); x.descend_b2(this); descend((A) x); } public void descend (A x) { ... } public void post (C x) {post((B)x); } public void post (B x) {post((A)x); } public void post (A x) {} #/source #p The variables #src!haspre!, #src!hasdescend! and #src!haspost! are "global switches" to enable these three phases independently. They can be overwritten by the derived visitor's code. E.g. they can be set to "false" once, at intialization time, or switched on and off dynamically during execution, #p This more complex schedule allows specialization/inheritance of activities, without disturbing the inheritance w.r.t. descending. Again, this may become clear when looking at a graphical representation of the resulting control flow: #source match(Object o) : | : .............. V : user-defined ... : : match(A x) --------------------+ : | ^ ^ : | | | : ...specialized V | | : processing . match(B x) pre(B x) descend(B x); +------------------ | ^ ^\ ^ | | | \=> call match() | | | | for fields on | | | | "B"-level | V | | | match(C x)--->action(C x)--> pre(C x); descend(C x); post(C x) \ \=> call match() for fields on "C"-level #/source #p Please note that with this variant there is #bold no inheritance#/ between the #src!action()! methods. For code re-usage you always have to to program the #src!pre()! methods, possibly disabling both #src!descend()! and #src!post()!. #p Esp. when adding the #src!MULTIPHASE! behaviour to an existing "simple" visitor, then the exisiting #bold inheritance between#/bold #src!action()! #bold will be lost#/bold and replaced by the described three(3) separate inheritances ! /* ========================== ????????????????????????????? #p #bold Please note#/ that there is (as with simple kind visitors) still #bold no inheritance#/ between the #src!action()! methods, --- a fact the author of #umod himself quite frequently forgot when using it #src!!-)! // For code re-usage you always have to to program the #src!pre()! methods, possibly disabling both #src!descend()! and #src!post()!. ==================================================== */ // ------------------------------------------------------------------ #h3 #title Rewriters #label txt_rewriters #p There are two kinds of rewriters: A visitor declared as #src!COREWRITER! can deal with cycles, but #emph!always! creates copies, even if nothing changes. #nl A visitor declared as #src!REWRITER! cannot deal with cycles, but does cloning only if necessary. It is most convenient for transforming "term-like" data, and preserves sharing as far as possible. #p Both kinds of rewriters are non-destructive: Whenever only a single value must be changed due to rewriting, a new copy of the containing object is created, altered and used for the further rewriting process. #p The usage of both kinds of rewriters follows the pattern #ldots #source MyRW rw = new MyRW(); rw.match(o); Object rewritten_object = rw.get_result(); #/source #p For convenience this is the same as #ldots #source Object rewritten_object = (new MyRW()).rewrite(o); #/source #p #ldots and there is also a typed variant #ldots #source A original ; A rewritten_a = (new MyRW()).rewrite_typed(a); #/source #p In case of a (non-co-)rewriter there is a second constructor #source public RM (RW parent){..} #/source #p which takes an existing rewriter as its argument. This is made the "parent" rewriter, and all cache look-ups will be passed to this parent, iff they are unsuccessful in the local cache. So things like "nesting and inheritance of scopes" can easily be modelled. #p When the generated (non-co-)rewriter is called directly, without overriding any method by the user, it does an #emph!identity! transformation, it does nothing. #nl But when the generated #emph!co-!rewriter is called directly, without any overriding by the user, it creates a #emph!deep copy! of its argument. #p All generated rewriters contain two methods for every class definition #src!C!, namely #src!action(C)! and #src!rewriteFields(C)!. #nl #src!action()! is called by the #src!match()! cascade, as defined for the general case and described above in #ref txtvisitorbasics. It (1) performs the preparatory steps of rewriting, and (2) is not specially concerned with the fields selected by the traversal order. #p Both kinds of methods may be overwritten by the user. A certain contract must be kept, which is best explained by looking at the behaviour of the generated code. #p #bold!For the (non-co-)rewriter!, the interface to use consists of #ldots #commentchar\ #source \\private Map cache // can be set to null for disabling caching \\private // it must not be manipulated explicitly, boolean lookUp(Object) // sets result/ismulti flag and returns true, // iff an entry exists in the rewriter's cache, // or in that of its parent. void putToCache() // memorize the currently set result (single or multiple) void useCache(boolean) // sets whether the generated "action()" may read the cache Object original ; // must be readable and writeable by "action()" Object getResult(); // returns the most recently set result boolean isMulti(); // returns whether this is multiple (= a list) void revert(); // reset result to original void substitute(Object newresult); // set newresult as result void substitute_multiple(List newresults); // set newresults as multiple result void substitute_empty(); // set empty list as multiple result #/source #commentchar/ #p In case of the (non-co-)rewriter, the generated method #src!RW.action(C c)! does the following: #list #i It looks up in the cache #src!RW.cache! whether the object #src!o! has already been processed by this rewriter instance (or by some rewriter in the #src!parent! chain). In this case the result of the earlier visiting process is drawn from the cache and stored as result (single or multiple), and the method returns immediately. #nl This cache look-up can explicitly be disabled by calling #src!useCache(false)!. #i If no result is retrieved from the cache, then the object #src!o! itself is stored into #src!original! and memorized as the (likely only intermediate!) value of #src!result!. #i A copy (i.e. a shallow clone) of #src!o! is created, #i and the method #src!rewriteFields(Object)! is called with that clone as an argument. #i When this method returns, whatever is currently the value of #src!result! is left there (for the caller of #src!action(Object)!) and saved to the cache as the rewriting result of the visited object #src!o!, by calling #src!putToCache()! #/list The method #src!rewriteFields(C c)! generated for every class #src!C! performs the non-generic, field structure specific rewriting. Its argument is the clone of the object. It first calls #src!rewriteFields((D)c)! for the superclass "D". Then it saves the current value of "result" into a local variable. It assumes that this points either to the original or to the clone, depending on whether changes to any field have happened in the super-class(es). #p Then for all those fields which are selected by the chosen traversal order, #src!match()! is called on their contents. #p Whatever this method returns in the variable #src!result! is compared with the original value contained in the field. Iff a change has happend, the result is stored into the field of the clone, and the overall local result of the method is overwritten to point to the clone. #p Finally, after all fields from the corresponding traversal selection have been rewritten, the local result is copied to #src!result!, for communicating it to the caller (which may be a #src rewriteFields()#/ of a sub-class or the #src action()#/ method of the same class). #p If a field value is an aggregate (i.e., is of a "container type"), (1) a temporary new aggregate object ist constructed. Then (2) #src!match()! is called sequentially on the contained objects, and (3) #src!result! is step by step treated accordingly, i.e. stored into the temporary aggregate. Here also the occurance of changes is monitored in a similar was as described for simple values. Whenever a change happens in an arbitrarily deep nesting level, then the local result is re-adjusted to point to the clone. #p Any user-defined, overwritten method may behave similar, e.g. define the result of the rewriting by calling the methods from the interface above. #p A user-defined, overwritten method may return more or less than one object by calling #src!substitute_multiple(List)! or #src!substitute_empty()!. This list of objects will be inserted in the nearest enclosing list or map structure. Up to this, multiplicity distributes! #p E.g. having a structure and code like (in a symbolic notation!) #source A a SEQ (B * C) action (B b){ original = b ; substitute_multiple(new List(b1, b2)); action (C c){ original = c ; substitute_multiple(new List(c1, c2, c3)); a1 = { (B1,C2) } #/source #p #ldots this will yield #ldots #source rewrite(a1) = { (b1, c1)(b1, c2)(b1, c3), (b2, c1)(b2, c2)(b2, c3) } #/source #p Please note that the level which calls "#src!match!" always needs the #emph!pointer value! comparison #src original!=getResult()#/ for determining whether a relevant change has happened. So it is part of the contract of #src!action()! to set #emph!both! values before returning! #p #bold The operations of the #ital!co-!rewriter#/bold are much simpler. The interface is #commentchar\ #source boolean lookUp(Object) // sets the variable "result" and returns true, // iff an entry exists in the co-rewriter's cache, void putToCache(origObj, newObj) // memorize newObj as the rewriting result of origObj // set the value of result=newObj // This can ONLY BE CALLED ONCE for each key Object getResult() // returns the most recently set result rewriteDone(Object key)// restores the "result" value to the clone of "key" // (this is called before returning from the // rewriting method, mostly "action(Object)") #/source #commentchar/ #p The generated #src!action(o)! method #ldots #list #i #ldots first creates #src!clone=o.doclone()!. #i Then it enters the clone into the cache #emph!in advance!. Only this enables the generated co-rewriter to deal with cyclic data. #i Finally it calls #src!rewriteFields(clone)!. #/list The method #src!rewriteFields(clone)! calls #src!match(clone.get_f())! on all selected fields, as in the non-co-rewriter case. But it needs not monitor whether changes occur, since all objects are copied anyhow. #p When the user overrides the generated #src!action(Object o)! method, the user's code should #ldots #list #i look-up in the cache whether #src!o! has already been visited. #i enter a new rewriting result into the cache by calling #src!putToCache(object, object)! #emph!before! descending into sub-fields, whenever there could be a ("cyclic") path in the sub-structure which leads back to the currently rewritten object! This method can only be called #emph!once! and will throw an #src!InvalidStateException! iff the key is already contained in the map #i Call #src!match()! on field contents and update the fields of the clone by setting them to the value of #src!getResult()! #i Immediately before returning, if any recursive descend has happened, finally #src!rewriteDone(orig)! must be called to restore the result variable to the clone which has been cached for #src!orig!. #/list // ------------------------------------------------------------------ #h3 #title Rewriting of Aggregates #p When rewriting the field contents of #src!SET! and #src!SEQ! type, the rewriting is done element-wise, from "left to right", in the case of #src!SEQ!. But of course a declarative, sequence independent style of coding is always more robust and better readable. #p This is esp. true when rewriting #src!MAP! and #src!REL! type values. The #java/ libraries which realize the corresponding data structures have imperative behaviour: E.g. the #src!map! class has overwriting semantics: Whatever is put LAST determines the current value. Of course this can make programs very hard to understand. #p Here our approach is more declarative, and independent of this sequential order. Consider the following diagram: #source M ---------------------> | | | L | R V V =====================> M' #/source #p Let "M" be the mapping which shall be rewritten. It can be a map or a multi-map. #nl Then rewriting is applied to the domain of M, yielding a new, auxiliary mapping "L", and then to the range of M, yielding "R". When we alllow "#src!substitute_multiple()!", then one single (1) element can be re-written to more than one (>1) elements, and we get multimaps for L and/or R. Otherwise we get maps. #p As can easily be seen in the diagram, the result M' of rewriting M is defined to #nl #src! L-inv o M o R! #p If the declared type of M (and consequently M') is a multimap, this works in any case. #p If the declared type of M (and M') is a map, then L-inv and R must be maps, i.e. L can be a multi-map, but must be injective, while R must be a map. If these conditions are violated by the user-defined rewriting rules (which define L and R) when applied to the current data M, then an exception is thrown. The result is independent of any sequential order of API calls. // ------------------------------------------------------------------ #h3 #title Visiting and Rewriting "#src!null!" #p "The invention of #src!null! was a billion dollar mistake". #p Of course you should avoid to use it, whenever possible. One major achievement of #umod is eliminate illegal null references, and legal references must be declared explicitly by the type constructor "#src!OPT!", se #ref txtstrictness. #p #src!null! behaves in a very irregular way. #list #i It is a value which has a special type, which is sub-type of any other type. #i It can be cast into a certain, special type and thus used for controling overloading resolution. #i But the "#src!instanceof!" test again shows a different behaviour. #/list #p On the static, text level, there are always different "types" of null, so we could have supported #src!match((A)null)! and #src!action((A)null)!. The latter could even make a static cast to the superclass, say "B", symbolically written as #nl #src!action((A)null){ ! #nl #src! match((B)null); ! #nl #src!} ! #nl #p But of course you cannot descend to any field in the null case, so at least when #src!action()! starts doing so, the value #src!null! must be treated specially. #p But, even worse, you cannot store all these different null values to a cache! They all are the same, when seen as a runtime value! (Of course you COULD introduce auxiliary wrappers which tag all these different nulls. But this would be a lot of work for a construct which, as demonstrated above, is better avoided anyhow!) #p So we decided #bold not to visit nor to rewrite any null value!#/ As soon as any visitor or rewriter finds a value (contained in a field or an aggregate) to be #src!==null!, it does not do anything but leaves it unchanged. #p This is not really a problem, because you can treat the (rare!) cases when field values may be #src!==null! explicitly, one step earlier, when visiting the containing object itself! This is much more sensible also because at this point the context of the #src!null! value is still known. A (theoretically possible, but not implemented) visitation of a #src!null! value would require explicit passing of additional information, anyhow, to be of any worth. // ------------------------------------------------------------------ #h3 #title Diagnosis #p Each visitor-based processing code is somewhere in the middle between "declarative" and "imperative" style of programming. Being determined by the selection of the methods overriden as well as by the processed data, the outcoming flow of control can be quite surprising. Therefore an interactive debugging is supported by the generated code. #p Every visitor/rewriter has a field #source protected java.io.PrintStream _visitor_debug_stream = null; #/source #p Whenver this value is #src(!=null), some intermediate steps #nl ((currently: only the replacing of the result by the clone AND sub-change in MAPs in the (non-co-)rewriting process)) #nl are dumped to this #src!PrintStream!. #p When the #emph!command line switch! "#src!--visitordebug! " is set to true for code generation, more debugging code will be included. see #ref txtcommandlineoptions. #p#kind missing Aufräumen !! //for (non-co-)rewriting sets and lists the results of each descending, //and #emph!before! descending in a multiphase visitor. #nl // ------------------------------------------------------------------ #h3 #title Optimization #p By setting the command line switch "#src!--visitoroptimize!" to true (see #ref txtcommandlineoptions), all generated visitor and rewriter code will incorporate the following optimization tactics: #list #i When compiling the model, #umod makes an "SCC" analysis of all model classes w.r.t. the "associations" (in UML speak) defined by the types of the field definitions, #i and then analyses which SCCs are reachable by every distinct field. #i The results of this analysis are encoded into some static final data, and thus available at runtime. #i Whenever (at runtime) the code of a user defined class, derived from a certain generated visitor/rewriter, is #emph!loaded!, this code is questioned for the set of classes for which an override of any method does exist. #nl (This analysis is performed on the #emph!binary! code, at class loading time, but this is only for technical reasons. The semantically identical results could be drawn out of the source text.) #i From the "overriden classes" we can abstract to "overridden SCCs", #i and finally can conclude which fields ("assiciations") #emph!never! need to be followed, because they only lead to one or more SCCs for which no user-defined visitor/rewriter method exists. #/list #p This optimization should make sense with data models which decompose into disjoint spheres with only few connections, for all those visitors/rewriters which process not all of these spheres. These results were presented in #cite lt11a on the "ICMT 2011"; for further info please refer to the paper or to #link 0/markuslepper.eu/papers/zuerich.pdf #text the slides of the talk#/link. // FIXME hier besser einen BIB EINTRAG ????? #p#kind missing what with CO-rewriters ???? // ------------------------------------------------------------------ #h2 #title Visualization #p Visualization of a data model (or parts thereof) is supported by different means. // ------------------------------------------------------------------ #h3 #title User-Defined Visualization by the Modifier #src!TOSTRING!#label txttostring #source A | B1 b1 B1 b2 char | | C1 c1 MAP string TO C2 c2 int TOSTRING JAVA return b1.toString().substring(2)+">>"+c1 ; $$ FORMAT "c2 '==>' b1 ; c1" #/source #p The #src!TOSTRING! directive can appear in a class definition at any position like a field definition. It is followed by a fragment of java source text enclosed in "#src!JAVA...$$!", like all other #emph!verbatim! java. #nl This source text fragment must consist of a statement or a sequence of statements (in the java sense). #nl It will be #ital verbatim #/ encapsulated in the methode declaration #src!public String toString(){ ... }!, so it has to end with the type-correct #src!return! statement. #p W.r.t error reporting, the same rules apply as with other #ital!verbatim! java source text, cf. #ref txtverbatimjava above. // ------------------------------------------------------------------ #h3 #title User-Defined Visualization by the Modifier #src.FORMAT. #label txt_format_frontend_language #p The #src!FORMAT! directive can appear in a class definition at any position like a field definition. The #src!FORMAT! keyword is followed by string constant which contains a format description. This is compiled into a method of a dedicated visitor, which constructs a #src!Format! object for visualizing an instance of this class. #p The syntax of the format directives is an instance of the #link format.html #loc txt_format_frontend_language #text generic syntax for format front-ends. #p The #src!Format! object which represents an object #src!a! of class #src!A! of a model class #src!M! can be generated by calling #ldots #source class A { public Format format() {...} } #/source #p #ldots which is a wrapper for #ldots #source class M { public static Format toFormat (Object o) {...} } --or-- class M { class Formatter { public static Format process (Object o) {...} } } #/source #p #ldots which is a wrapper for (the protected method!) #ldots #source new M.__Formatter().toFormat(Object o) #/source #p Further there is a static function with a #src!mode! parameter. It first set a global variable in the #src!Formatter! object which normally defaults to #src!0!(zero), and which is used in the #src!$switch $mode{..}! expressions as described #link format.html #loc txt_switch_mode#text in the format front-end documentation#/. #source class M { public static Format toFormat (Object o, int mode) {...} } #/source #p Whenever a format directive needs to embed a format for an object of a model class for which no #src!FORMAT! directive is given, a call is compiled to #ldots #source protected Format M.__Formatter.defaultformat(Object o){ result = Format.literal(String.valueOf(o)) ; } #/source #p Whenever a format directive needs to embed a format for an object which is not part of the #umod model (i.e. whenever an field with a type defined by an #src!EXT! import appears in the format directive), a call is compiled to #ldots #source protected Format M.__Formatter.foreignObject(Object o) #/source #p This method tests whether #src!o! implements #link 2/eu/bandm/tools/format/Formattable.html #text !/format/Formattable #/link, and in this case #src!!format() is called, otherwise #src!Format.literal(String.valueOf(o))!; #p These both methods can be overridden by deriving a new formatting visitor from #src!M.__Formatter!. #p Whenever the user wants to change the values of #src!nulltext!, #src!mode!, #src!format_empty! and/or #src!default_indent!, he/she cannot use the above-mentioned wrappers, but (1) has to create an explicit instance anyhow (e.g. by #src!formatter = new M.__Formatter()!), (2) assign to these public fields, and (3) create the format by calling #src!format = formatter.process(o)! explicitly. // ------------------------------------------------------------------ #h3 #title Automated Swing #src!Tree! Generation #p If activated by a command line switch (see #ref txtcommandlineoptions), code is generated for a java swing tree representation. #p For a certain model definiton #src!Mymod! and a reference to an object #src!myobj!, a swing tree is generated by calling #source JTree tree = new JTree(new Mymod.__TREEGEN__().growRoot(myobj)) ; #/source #p For convenience, there is a runtime class offering the static method #link 2/eu/bandm/tools/umod/runtime/SwingBrowser.html #loc model2swingpanel(java.lang.String,java.lang.Object,java.lang.Class) #text #src!umod.runtime.SwingBrowser model2swingpanel(...)! #/link, which opens a top-level window with scrolls bars etc. and all you need for browsing a #umod model via a GUI. /* ============================ ALT #p An example can be found at the end of #link 1/src/de/tu_berlin/cs/uebb/tools/umod/UModMain.java#blank text !/.../umod/UModMain.java#/. ===*/ #p#kind src Our implementation is based on #emph!suspensions!, i.e. implements #emph!lazy! generation of tree objects. The source is found in #link 3/eu/bandm/tools/util/SwingForester.java#/link, and rather instructive to read. // ------------------------------------------------------------------ #h3 #title User-Directed Visitor-Based Dump Routines #p Esp. for debugging purpose, a dedicated visitor can be generated which ia declared "#src!IS PRINTER!", cf. the syntax of visitor declarations in #ref txt_syntax_visitor_declaration. #p The constructor of each such visitor takes a #src!java.io.PrintStream! as its only argument. Whenever the #src!match()! method of this visitor is called for a certain object, this object is printed to this stream as follows: #list #i first a sequence of characters like "#src!| | |...!", indicating the match call's nesting level, #i then a simpel #src!toString()! represetnation of the object, #i then the name/value pairs of all fields which are #emph!not! marked for descending (using #src!toString()!), #i followed by the output caused by (1) recursively matching all fields marked for descending (by the traversal order selected for this visitor), after (2) the above-mentioned nesting level has been inctreased- #/list #p When a field has to be printed which is of aggregate type, then (1) a new line is opened for every item in the aggregate, and (2) the name of this field and the current index position is printed in "#src![....]!" before the call to #src!match()! of the current value. #p Plese note that this is currently still a primitive implementation, and #bold cannot deal with cylic data !#/bold /* === OLD For an example please refer to #link 1/src2/de/tu_berlin/cs/uebb/tools/dtd/Test.java#blank #/, which also demonstrates the TSoap-serialization. == */ #p#kind missing The name of the field currently causing the descending is #emph!not! printed. // ------------------------------------------------------------------ #h2 #title XML Encoded Serialization/Deserialization #label txt_tsoap //#h2 #title "Typed SOAP" #xml -Based Serialization // DISKUSSION der varianten in MeMo090331 // und in umod/TODO_umod.txt #p The basic umod xml-serialization is based on rules which try to combine simplicity, readability and non-redundancy. #p #list #i all primitive types are encapsulated into their type name used as a tag. #i all sequences, sets, maps, multimaps simply serialize their content (in canonical order), and encapsulate this into one and the same generic "aggregate" tag #i all fields are tagged with the field name #i the type-driven tags as listed above (=primitives and generic aggregates), are #emph!omitted! when directly under a field tag. #i all objects are tagged with the class name #i on top-level of field contents, null values MAY be read, but are not written. Instead, fields with a null value are simply omitted. #i empty structures on top-level, which are not optional, are also omitted. (this corresponds to the default rules when constructing an object: only in case of #src!OPT! types there is a difference between null and an empty aggregate!) #i the "right-not-left" case of a co-pair is wrapped into a dedicated element. #/list Additionally, there is a special empty #emph!reference! element which realizes (by an "idref" attribute) a pointer to some object defined at some other place (i.e. earlier when writing, or earlier or later when reading). The code generated by #umod starts every serialization of objects with one(1) single certain root object, descending in a #emph!depth-first! discipline. Therefore #emph!back-patching! is never required in this case, but the de-serialization code does support it. The first reference to an object always leads to an "in-place" dump of the complete object structure, as defined above. This is fine in case of objects which are only referred to once. It is esp. fine for human readers in case of "algebraic" objects, which do not have an "identity" beyond there structure. In this case, the usage of "id", "idref" and "reference" would only serve as a kind of shorthand notation. This is different with non-algebraic objects, where identity (and "self-identity" and "non-identity" !-) does carry more semantics than the collection of field values! #p The writing out of an instance of model "#src!M!" is started by some code like #source final java.io.PrintWriter p0 = new java.io.PrintWriter(outstream); final eu.bandm.tools.util.ContentPrinter cp = new ContentPrinter(p0); final eu.bandm.umod.runtime.XMLconfiguration conf = new XMLconfiguration(); final M.SAX_Writer dumper = new M.SAX_Writer(cp, conf); dumper.match(myTopLevelOjectForWriting); #/source #p The concrete tag strings and attribute names to use are configurable and are initialized via the #link 2/eu/bandm/tools/umod/runtime/XMLconfiguration.html #text XMLconfiguration #/ object. #p When writing, no errors should occur, but failures. These are reported via thrown exceptions. #p#kind src An example is in #link 3/eu/bandm/tools/d2d2/base/Main.java #text the main file of triple-dee#/. #p#kind src CURRENTLY there is a test like #src! ~/metatools/src/eu/bandm/tools/d2d2/test>make test_ddf2tsoap MODULE=basic OUT=dumpbasic.xml ; less dumpbasic.xml ! which works quite nice ! #p The reading works as follows: #source FIXME MISSING #/source #p On reading of course errors can occur, esp. when the external representation is not "valid" w.r.t. the implicit syntax rules of the DTD, which reflect the object structure of the umod model. #p#kind missing ERKLAERUNG WIE lesen ?? WO BEISPIEL ??? #p#kind missing Link to EXTERNALLY defined serialization/deserialization for external classes. // ------------------------------------------------------- #h1 #title Using the #umod Tool // ------------------------------------------------------------- // ------------------------------------------------------------- #h2 #title Command Line Options #label txtcommandlineoptions #p The options for the current implementation of the #umod tool are as follows: #p #cmdline_option_documentation ../../src/eu/bandm/tools/umod/umodOptions.xml #lang en #p #xemph!Attention! #src!--setterfunctions! and #src!--getterfunctions! are deprecated. // ------------------------------------------------------------- #h2 #title Splitting the input text into input files #label txt_splitInput #p Currently more than one input files can be supplied. All these files must follow the same syntax, as described above, and use the same module name. All contained declarations will be processes as if they were contained in one single source file. So this allows the separation e.g. of documentation and declaration, or of different trees of the forest. But is only a #emph!provisionary! means, until real modularization and parametrization will be introduced. // ------------------------------------------------------------- #h2 #title Error Messages #p The basic philosophy is to delegate most error messages to the subsequent step, i.e. the execution of a Java compiler. Some problems cannot be detected without detailed analysis, and we do not want to re-implement things done by the Java compiler anyhow. #nl This implies that the error messages generated there must be "calculated backwards" to find their cause in the #umod source. #nl Nevertheless basic errors and warnings will be generated by the #umod tool on its own. #p (More to come) #p#kind missing FIXME ERROR MESSAGES MISSING #eof