[all pages:] introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]



All pages: introduction message/location/muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf DOC-EN:lablog.ddf SOURCE:basic.dd2 DOC:xslt.ddf SOURCE:xslt.dd2 DOC:meta.ddf [site map]



go one page back go to start go to start go one page ahead
umod bandm meta_tools auxiliaries

option --- Standardized Command Line Parsing and GUI Editing



(related API documentation: package option.runtime   )


1          Purpose and Way of Operation
2          Data Model and Description Format
2.1          Historic Predecessors and Paradigms
2.2          General Data Model
2.2.1          Data Types for Arguments
2.2.2          Positional Options
2.3          Input Data Format
2.3.1          Multilingual Text for Documentation
2.3.2          Enumeration Types
2.3.3          Grouping Documentation Comments
2.3.4          Option Declarations
2.3.5          Special Categories of Options
2.3.6          Types
2.3.7          Default Values
2.3.8          Special Empty Type
2.3.9          Enabling Conditions
2.4          Survey on Features and Design Decisions Compared to Predecessors
3          Operation of the Tool and the Generated Code
3.1          Compiler for Model and Gui Code
3.2          Generated Model Code
3.2.1          Setter Functions
3.3          Command Line Parsing
3.3.1          String Arguments containing Whitespace
3.3.2          Representing Boolean values
3.3.3          Multiple File Names, e.g. for "class path"
3.3.4          Argument values starting with a "minus-sign" Character
3.3.5          Fragmented Lists
3.4          Help Function "usage()"
3.5          Unparsing
3.6          Usage of the GUI
3.7          Automated Integration of Options' Descriptions into HTML Based Manuals

^ToC 1 Purpose and Way of Operation

Purpose of the option package is the parametrization of applications, --- by command line parameters and/or by a graphical user interface (GUI).

Historically, there have been several attempts for standardization in both areas, but no one has definitely prevailed. The option package is our approach to automate both by source code generation.

The option compiler is realized by the class <mt>.option.Compiler. The required run-time code is in <mt>.option.runtime.

The compiler takes a specification (including documentation) of all options of an application, encoded in XML, and generates ...

  1. code for command line parsing,
  2. code for the user help funtion "usage()" in different languages,
  3. a "swing"-based graphic user interface for configuring these options,
  4. integration into user documentation html texts
  5. TO COME: an html- and ecma-script based gui.

The central aim of this package is to treat different operating systems, different ways for starting an application, different styles of use, different languages, and differently socialized users transparently and combinable.
On a first glance, one might assume the task to be rather trivial, but it is not.

^ToC 2 Data Model and Description Format

^ToC 2.1 Historic Predecessors and Paradigms

Historically there exist several standardizations for parametrization of an application by comand line arguments:

  1. The old X-11 way, using long names with only one dash as a prefix, like "-option...".
    This style has recently been re-activated by java/bin/.... No-one knows what good for ?!
  2. IEEE "posix getopt()", as implemented in nearly every programming language as a standard library. It is documented in the IEEE posix standard http://www.opengroup.org/onlinepubs/9699919799/, therein esp. Utility Argument Syntax and Utility Syntax Guidelines .
    Since the posix specification is full of "shoulds and coulds", we daresay the implementations do differ in many details, without having looked at all of them.
  3. The GNU extension "getopt_long()" allowing long option names with two(2) minus signs as a lead-in. like "--option..."

See also the valuable discussions at http://stackoverflow.com/questions/367309/which-command-line-commands-style-do-you-prefe(2014), http: and www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html

Our approach basically follows getopt_long(),but goes beyond w.r.t. typing and type checking of the options' parameters.

^ToC 2.2 General Data Model

The terminology in this area has grown historically, and many words are heavily overloaded. We use in the following:

  1. "Parametrization" for the total configuration information for one certain instantiation of an application.
  2. "Option" for the different syntactical parts of such a parametrization, as identified by a name (long and/or short). On a command line, these options can be permuted and are recognized by the lead-in minus sign character.
  3. "Argument" for the concrete data related to a certain option in a certain parametrization. In case of the commmand line interface, this can be e.g. a list of integers following an option name.

The main issue with our data model is that it is more abstract than merely the command-line oriented. It shall serve for different representations, text as well as gui, which makes the task non-trivial. From this, the most important consequences in the following model are ...

  1. the choice between presence and absence of a certain option exists by the most trivial means in every command-line version (simply leave it out!) but must be realized explicitly in a gui version.
  2. a sequential order of the options' value definitions exists naturally with a using a command-line input, but is hard to express in a gui version.

The resulting underlying data model can be more easily described following the command-line version:

  1. any application instantiation is determined by
    1. the identity of the application,
    2. and a concrete parametrization
  2. the concrete parametrization has to follow the parametrization specification for this application.
  3. the parametrization specification defines a collection of options, together with the types of their arguments.
  4. each option has for identification purpose
    1. a short name only, made up from one(1) character,
    2. or a long name only, made up from at least two(2) characters, the first of which must be a letter, followed by letters or digits,
    3. or both of them.
    The short name character may not be a whitespace or a minus sign. The long name may contain minus signs, but not at first position.
  5. both names must be unique among all options of a certain application.
  6. each option is assigned a sequence of data types for its arguments.
  7. in a concrete parametrization, the option is represented in the command line text
    1. either by one(1) single "minus-sign" character "-" = 0x2D, followed (directly, without intervening whitespace) by the options's short name,
    2. or by two(1) of these "minus-sign" characters, followed (directly, without intervening whitespace) by the options's long name.
    (This corresponds to the "getopt_long(...)" function in GNU.)
  8. Both representations are followed by a sequence of the representations of the option's arguments, which may be more than one, according to the option's specification. An empty list of arguments is only possible as an exception, see Section 2.3.8.
  9. The representations of the single arguments are separated by whitespace.
  10. Each option can appear (conceptually/semantically) at most once in each concrete parametrization. This is in contrast to the other models mentioned in Section 2.1, but is compensated by the possibility to give multiple arguments instead. (But see Section 3.3.5 for continuing the list of a "repeting group" by a second appearance of the option name.)
  11. The sequential order of options as they appear on a command line is never significant.
    (This is a severe difference to many other command line systems, but is unavoidable, since in a gui there is no corresponding notion of sequential order. At least not a natural one !-)

^ToC 2.2.1 Data Types for Arguments

The data type assigned to each option can be defined by the following grammar:

    simpletypes = int | float | rat | string | bool | enum | enumset | specialtypes
    specialtypes = uri // more to come: | file | action | ...

    type = simpletypes*, rep?
    rep = RepKind, simpletypes+ 
    RepKind = star | plus 

The sequence of arguments of a certain option must comply to the sequence of types as specified.

The types and corresponding arguments contained in the "rep" construct are called "repeting group" in the following. This group may be repeated arbitrarily often. It needs to appear at least once in the "plus" case, and can have the length zero in the "star" case.

^ToC 2.2.2 Positional Options

Concerning the command line parsing, there are some further rules for modeling the historic behaviour:
The posix standard and other publications describe the syntax of a certain application instantiation (e.g.
here ) frequently by examples like

utility_name[-a][-b][-c option_argument]
    [-d|-e][-f[option_argument]][operand...]

Therein "utility_name" corresponds to our notion of "application", and "option_argument" is nearly the same as with us.
Now "operand" are all these tokens on the commandline following the last option, if any, which (1) are not not arguments to the preceding (last) option, and (2) are not options themselves.
For us (2) it means, that their text does not start with a minus-sign character. (In posix they could also be explicitly declared not to be options by the special separator "--" which ends the option list. This case we do ignore.)

These "operands" are parameters to the application, but are not options in the sense defined so far. But of course we want to treat all command line components in a unified way, and we have to map them to some gui. Therefore we define some further rules:

  1. An option can be idenfied by a short name which is one(1) single numeric ASCII digit, i.e. taken from "0, 1, ..., 9". In this case it represents a "positional option" (This corresponds to "operand" in the posix nomenclature from above).
  2. When after parsing all explicitly mentioned options there is still character data in the contents of the command line, then this data is considered to be represent arguments for positional options.
  3. All those positional options which have not yet appeared in the command-line (by explicitly using there long or short name) are mapped to this rest input data in ascending order of their short names.
  4. As long as there is any text left to be parsed as an option argument, the next positional option is recognized implicitly.
  5. From the last point it follows that a positional option may not have the empty type for its arguments, ...
  6. and that none but the last positional option may have a repetiting pattern type for its arguments. (Any non-last positional option with repeting type would swallow all the rest of the command line, because there is no further "minus-sign" to stop it! See Section 3.3 below.)

A standard way to support this historic way of parametrizing is to give those positional parameters "speaking" long names, as in

   -m / --mode      : what to do with the files
   -C / --classpath : where the libraries are searched for 
   -0 / --inputfile
   -1 / --outputfile
   -2 / --logfile 

^ToC 2.3 Input Data Format

All option specifications must be contained in one single XML document. This must be declared as

<!DOCTYPE optionlist PUBLIC "+//IDN bandm.eu//DTD option//EN"  "">

The syntax of the input is defined by the dtd, see here a navigatable pretty-print.

The toplevel definition is

<!ELEMENT optionlist (enumeration|option|comment)+ >

^ToC 2.3.1 Multilingual Text for Documentation

A text element type is employed ubiquituously for multi-lingual documentation:

<!ELEMENT text (#PCDATA)>
<!ATTLIST text  lang  NMTOKEN #REQUIRED>

The lang attribute should be used following the same rules as common practice with xml:lang, i.e. following RFC 4646 / RFC 4647 / "[IETF BCP 47], Tags for the Identification of Languages".

(( Currently, we refer to ISO 639-2, as listed e.g. in [isolanguage], instead, because the URL for IETF BCP 47 as referred to at the end of [xml] is BROKEN !!))

The set of supported languages may be chosen arbitrarily (including the empty set), but it should be the same in all lists of text elements.

^ToC 2.3.2 Enumeration Types

Enumeration types which shall serve as data type for option arguments must be declared according to ...

<!ELEMENT enumeration (desc?, enumitem)+) >
<!ATTLIST enumeration name NMTOKEN #IMPLIED>


<!ELEMENT enumitem (desc)? >
<!ATTLIST enumitem value      CDATA #REQUIRED
                   compilable NMTOKEN #IMPLIED
>

Multilingual documentation text may be added to enumeration types as a whole, and/or to every item separately. (As mentioned above, the support of the different languages should be consistent over all objects carrying doc texts.)

The front-end appearance of an enumeration item (on the command line or shown in a GUI may be an arbitrary string, containing arbitrary character data, like "<->" or "0abc").
In cases they do not make up a valid Java identifier, the attribute compilable must be given, which is used in the generated enum{..} code instead.

^ToC 2.3.3 Grouping Documentation Comments

comment elements can be interspered into the sequence of option declarations. Currently, they only lead to additional text labels in the generated GUI.

<!ELEMENT comment (text)+>
<!ATTLIST comment name  NMTOKEN #IMPLIED>

(The "name" attribute shall be used later to switch on/off a whole group of options in the GUI.)

^ToC 2.3.4 Option Declarations

In the following definitions it holds that ...

  1. desc is a multi-lingual description. It should cover the meaning and the correct usage of the option. As mentioned above, all text lists should support the same set of languages.
  2. The short name of an option (cf. above Section 2.2) is given by the attribute abbrev, the long name by the attribute name. At least one of them must be specified.
  3. If type is omitted, the type of the option's arguments is the empty type, e.g. there are no arguments.
  4. condition describes logic conditions on other options and their arguments which must be fulfilled to enable this option, cf. Section 2.3.9 below.

<!ELEMENT option (noGui?, isMeta?, desc, type?, condition?) >
<!ATTLIST option name     NMTOKEN #IMPLIED
                 abbrev   NMTOKEN #IMPLIED
                 required (yes|no) "no"   >
<!ELEMENT desc (text)+>

^ToC 2.3.5 Special Categories of Options

Currently three special categories can be assigned to an option, as in

    <option name='lang' required="yes">
    <noGui />
    <isMeta />
    <desc><text lang="de"> ....

The category noGui prevents this option from appearing in the generated gui.
The category meta marks this option as adressing the outer execution conditions in which the "payload task" of the application will be performed, but does not contribute to the definition of this task, to "what really to do".
The category required indicates that a command line input which does not specify this option is invalid.

meta and noGui appear together in many cases. Typical examples are

language the user language in which gui and command line error messages will be presented
geometry initial window size of GUI etc.
debug level of verbosity / of debug facilities presented by the GUI
version trigger the Posix "version" behaviour, as described by http://www.gnu.org/prep/standards/html_node/002d_002d_version.html
gui start interactive GUI inspite of complete payload options given in the command line
help trigger the Posix "help" behaviour, as described by http://www.gnu.org/prep/standards/html_node/002d_002d_help.html
clearParsistent clear from the machine's the persistency cache all memorized data related to this application.

FIXME LINK
(For instance, <METATOOLS>/DocumentedDistribution2has an own, dedicated way of presenting the value of the language option in the GUI, namely as drop-down in the ubiquituous menu line, supporting flags, and clearPersistent as a function in the help menu. Therefore these options (and others) do appear in the GUI, but automated generation is suppressed by noGui.)

All non-meta arguments refer to the definition of the payload task. A boolean flag in the model reflects the fact that such options are present.

^ToC 2.3.6 Types

Basically each option can have arbitrarily many arguments. The prefix of the argument sequence has to follow a first sequence of types, the rest of the arguments can follow a repeated pattern of types. By this means argument lists of varying length can be specified.
Currently "optional" arguments are not supported, therefore after every option name for each declared argument type an external value representation must follow in the command line input text. (Thereis only one exception, see below Section 2.3.8).

<!ENTITY %  simpletypes '(int | float | rat | bool | string | uri 
                          | enum | enumset | action)'>
<!ELEMENT simpletypes (%simpletypes;)>

<!ELEMENT type ( (%simpletypes;)*, rep?   )>
<!ELEMENT rep ((%simpletypes;)+, defaults?) >
<!ATTLIST rep kind     (plus|star) #REQUIRED>

Only the enum and enumset type identifiers have a required attribute, namely the name of the enumeration declaration it refers to.

All option arguments are parsed into a variable with the intuitively corresponding Java type. For rat the meta_tools library type Rational is used.

<!ELEMENT int EMPTY>
<!ATTLIST int default      NMTOKEN #IMPLIED>
<!ELEMENT float EMPTY>
<!ATTLIST float default    NMTOKEN #IMPLIED>
<!ELEMENT rat EMPTY>
<!ATTLIST rat default    NMTOKEN #IMPLIED>
<!ELEMENT bool EMPTY>
<!ATTLIST bool default     NMTOKEN #IMPLIED>
<!ELEMENT string EMPTY>
<!ATTLIST string default   NMTOKEN #IMPLIED>

<!ELEMENT enum  EMPTY>
<!ATTLIST enum name        NMTOKEN #REQUIRED
               default     NMTOKEN #IMPLIED>
<!ELEMENT enumset  EMPTY>
<!ATTLIST enumset name    NMTOKEN  #REQUIRED
                  default NMTOKENS #IMPLIED>

<!ELEMENT uri EMPTY>
<!ATTLIST uri default      NMTOKEN #IMPLIED>
<!ELEMENT action EMPTY>
<!ATTLIST action name      NMTOKEN #IMPLIED>

^ToC 2.3.7 Default Values

Each single argument may optionally be given a default value. This value will be used as initial value for the corresponding fields of the model class (Section 3.2).

Please note that the default value must verbatim be a valid Java expression for initializing a field of this type. (Only String values are given as the string's contents, ie. like typed on the command line.) The check for correctness is left to the subsequent application of the Java compiler. Regrettably, this may show technical details to the user when displaying the option definitions by usage(). (For an argument of rational type the default value must be something like Rational.valueOf(1,4), -- the compiler imports this class into the visible name space if necessary.)

With rep elements the situation is more complicated: First method: the elements which represent the single simple types of the repeated type pattern can each be given their own default, as described above. They define the default values for any newly created instance of the Java class, which is generated to represent this repetion group. One or zero repetition groups (with these values) are created implicitly. depending on "star" or "plus" flavour.

Second method: the rep element itself may be given a default element, consisting of a sequence of v elements, which initialize one of the repeated primitive arguments each. The text values must correspond to the repeated types and must be an integral multiple of the repeated sequence.

For example, the following default declaration leads to three instantions of the argument pattern:

<option name="vle">
   <desc><text lang="en">variable length default example</text></desc>
   <type>
     <int default="0"/>
     <rep>
       <int/>
       <string/>
       <defaults>
         <v>1</v>   <v>"c"</v>
         <v>1+1</v> <v>"d"</v>
         <v>3</v>   <v>"\n"</v>
       </defaults>
     </rep>
   </type>
</option>

Again, type and syntax errrors are left to the Java compiler. (Here also String types must be verbatim valid Java source, including the double quotes.)

^ToC 2.3.8 Special Empty Type

An option with no argument at all carries as only semantics its presence or absence on a command-line input. Historically, they are often called "switches" or "flags".
But there is no natural equivalent to "presence" when using a GUI. Therefore we decided above, that the mere presence or absence of an option with additional parameters should better not carry any semantics. Instead, the parameters should be given a default value which indicates their "absence" in the application's logic.

Since nevertheless parameter-less "switches" are often indispensable, we define a dedicated default value for boolean arguments named "presence", and implicitly give every "empty type" ...

 
  <option abbrev="x"><desc><text lang="en">a simple switch</text></desc>
    <type/>
  </option>

...the meaning of something like ...

 
  <option abbrev=="x"><desc><text lang="en">a simple switch</text></desc>
    <type><bool default="presence"/></type>
  </option>

The value "presence" is virtual and cannot be used explicitly by the user. On later parsing it will result to a default value of "true" if the option is present, and of "false" if not (thereby definining also the actual value to "false"!). This is the only case of an optional argument !
So these all are valid command lines:

 
  applic                    // leads to x = false
  applic -x                 // leads to x = true
  applic -x false           // leads to x = false
  applic -x true            // leads to x = true

The explict argument may turn out useful when a command line is constructed programmatically. Its price is, that the compactification of one-character flags as known by Posix cannot be supported:

 
  -abc 

... could be interpreted as three switch keys "a", "b", "c", but

 
  -stu

...would be the explicit value "true" for the switch key "s", followed by a parsing error ("superfluous argument input 'u'").

^ToC 2.3.9 Enabling Conditions

Some options do only make sense in certain modes of operation, as defined by other options. For convenience, this can be indicated by a GUI e.g. by disactivating the input widgets for a currently not applicable option.

In our model, each option can be assigned an "enabling condition", according to the following grammar:

  Condition ::= PrimeCondition | CompoundCondition
  CompoundCondition ::= not Condition | and (Condition)+ | or (Condition)+
  PrimeCondition ::= Optarg | testEqual TestValue TestValue | testGreater TestValue TestValue 
  TestValue ::= constant String | Optarg
  OptArg ::= option Ident (Number)?

(This is a mere symobolical representation; for the real grammar see the DTD.)

So (1) it can be tested whether a certain argument of a certain option is equal to or greater than a constant value or some other argument, and (2) these tests can be combined by standard logical operators.

Please note that there should be no cyclic dependencies, because then the current, straight-forward implementation of the GUI could run into a dead lock.

^ToC 2.4 Survey on Features and Design Decisions Compared to Predecessors

The following list tries to summaries the design decisions of meta_tools options, which partly differ, partly follow the above-mentioned preceding approaches (see Section 2.1).

  1. Long names must be completely written out; a unique prefix does not suffice.
  2. Long names may contain the minus character "-".
  3. Several names of "switches" (with no arguments) cannot be aggregated into one input token.
  4. Each option can have a sequence of more than one argument, the type of which must be fixed. A trainling subsequence can be repeated.
  5. Options as such may only appear once, but more than once to prolongate such a repetition list.
  6. The sequential order of options on the command line does not have any significance.
  7. The presence or absence of an option on the command line should not have any significance, iff the option has arguments.
  8. Options without arguments (standing just for their presence or absence) can be specified with an optional explicit boolean type.
  9. "Operands" are called "positional parameters" and can alteratively be specified by their names or by an abbreviation, which is a decimal digit.
  10. Relevance of options can be defined dependent on the values of some arguments of other options. In the GUI, this is reflected by active/inactive input fields.

^ToC 3 Operation of the Tool and the Generated Code

^ToC 3.1 Compiler for Model and Gui Code

The option compiler is realized by the class <mt>.option.Compiler.
Its run-time code is in <mt>.option.runtime.

The compiler is called like

 $(JAVA) eu.bandm.tools.option.Compiler <inputFile> \
            <packageName><modelClassName><guiClassName> \
            <rootOfJavaSources>

The input file is an XML file, as described above. The compiler generates one or two(2) Java source files: the model class source is generated anyhow, with the given name, in a package with the given name. The source is written to the file system, relative to the given root, descending the directory levels which correspond to the package name.

The gui class is only generated if the gui class name is not the empty string.

In the following examples, let "MyOpts" be the name of the model class, and "MyGui" be the name of the GUI class.

^ToC 3.2 Generated Model Code

The generated model class MyOpts will offer the following API:

  1. public MyOpts()
    This constructor creates a new model instance. The field values which correspond to the options' arguments (see next list point) are initialized by the default values, as specified in the XML file (
    Section 2.3.7).
  2. parse(String[], MessageReceiver<SimpleMessage>)
    This method takes an array of strings (normally exactly that what has been passed to public static main(...) as its method argument!) and performs the parsing process. Therein, the arguments for recognized options will overwrite the intial default values. For all options not appearing in the input, the default values will survive.
    Please note that the values of the arguments after parsing should be the only point where the presence or absence of an option is reflected. The default values should be selected accordingly.
    The message receiver is needed for different kinds of warnings and errors. By counting critical errors the programmer can decide whether parsing failed:
  public static main (final String[] args){
    final MessageReceiver<Message> m0 = new MessagePrinter<Message>();
    final MessageCounter<Message>  m1 = new MessageCounter<Message>();
    final MessageTee<Message> m3 = new MessageTee<Message>(m0,m1);
    final MyOpts myOpts = new MyOpts();
    myOpts.parse(args, m3);
    if(m1.getCriticalCount()>0)
      System.exit(99);
    // ---- e.t.c.---
  }
  1. Let "grmmpf" be the long name of a certain option.
    Let the type of its arguments be specified as a sequence of three types, with a trailing repeting group:
    "int float string (REP * rat bool)".
    Then the API will offer the following functions by which, after successful parsing (or after turning the knobs of the gui, see below) the current argument values for this option can be obtained:
public class MyOpt extends <mt>/options/runtime/Model {
   ...
public class MyOpt extends <mt>/options/runtime/Model {
   ...
   public int      get_grmmpf_0(); 
   public double   get_grmmpf_1(); 
   public string   get_grmmpf_2(); 
   public int      repcount_grmmpf();
   // repeting group starts counting arguments from zero again:
   public Rational get_grmmpf_0(int); 
   public boolean  get_grmmpf_1(int);

   // additionally indicating whether the option "grmmpf" did appear at all:
   // DEPRECATED, this SHOULD NOT be used!
   public boolean  has_grmmpf ;

In case a certain option does not have a long name, the short name is taken for constructing these identifiers.

^ToC 3.2.1 Setter Functions

Whenever the generated option model class shall be used for representing configuration data / taks specifiations in a more advanced mode, e.g. programatically controlled, the declaration

   <optionlist setterFunction="yes"  >...

will cause the compile to additionally generate setter functions which allow to change the options parameter values.

Currently not yet supported!

^ToC 3.3 Command Line Parsing

The parsing of the options and their parameters may seem trivial, but is not. This because different OSs are involved. The ieee posix specification "Utility Argument Syntax" (esp. point 2.b, to make it more confusing!) uses the notion of "argument string" for the portioning already done be some "shell" program, before the getopt()-code starts working. This is the portioning reflected by the array structure of "static main(final String[] args)" in case of Java.

This also has to be considered here, as well as the different shell processors and system services which lead to the appearance of this "array of strings" or "argument strings".

We took a brute-force approach, what is always a good thing to do when things get toooo historically determined. So we found the following rules, which might appear somehow rude, but do ensure portability:

  1. All "argument strings" are concatenated into on long string, separated by whitespace.
    All withespace appearing in the inner data of such an argument string is temporarily masked and is not recognized as "whitespace" in the following algorithm!
  2. Every option on the command line begins with one or two "minus-sign" characters, followed by its short name or its long name, resp., and some whitespace.
  3. A long name must appear completely, in contrast to other approaches, unambiguousness of some prefix is not sufficient. This is a difference to most Posix interpretations and done to simplify future addition of new options.
  4. The following text is separated into tokens, as defined by the appearance of (unmasked) whitespace
  5. Each token is parsed as an argument for the current option. The parsing is determined by the type declared for this argument.
  6. Whenever a token cannot be parsed as a denotation for the expected type, an error is generated. In this case the parsing process skips all tokens not starting with a minus-sign character, and tries to continue with the parsing of the next option.
    (Mostly this is only done to inform the user about subsequent, further syntax errors. The result of the parsing so far must in most cases be discarded, anyhow.)
  7. When the start of the repeting pattern of the current option's argument types is reached, and the next token does not start with a minus-sign, then a sequence of further arguments according to the repeting pattern is recognized and parsed, as described.
  8. If the first token to read does start with a minus-sign, it is assumed to be the start of the next option.
  9. If the repeting group is of "plus" flavour, it must be entered at least once. Otherwise an error is generated, and parsing is continued with the next option, as described above.
  10. From this simple rule for detecting options, i.e. the end of the repeting arguments, it follows that negative numbers cannot stand at the beginning of a repeting group of arguments (only of the very first in a "plus"-flavour repeting group!)
  11. When, contrarily, the arguments of an option are completely parsed, and there is no repeting group, and the next input token does not start with a minus-sign character, then the explicit options are considered to be complete, and the positional options start parsing.
  12. The parsing process continues as described, but the positional option with the lowest digit as its short name which has not yet appeared in the command line (by explit mentioning or implicitly, because this process will continue looping until all input is consumed!) is substituted, i.e. assumed to be recognized just before.
    The parsing process continues as described above, with the difference that from now on minus-sign characters will no more be significant, i.e. no more explicit option names will be recognized.
    When the input to some options is found in the positional way, but other positional options are given by explicit keys ("mixed input"), a warning is issued.
  13. When all input is consumed, the parsing process terminates.
  14. When (a) the arguments of the last option are completely parsed, (b) no un-parsed input ("garbage") is left over on the input line, and (c) all options marked as "required" have been defined, then the parsing process was successful.
  15. The caller must check the message receiver if any error messages have been emitted during parsing. If so, the parsing result should be discarded.

This description show clearly, that the "positional options" do make the parsing process much more complicated. They are only supported for historic reasons, and should be abandoned when the calls for applications are generated e.g. by make scripts. The difference is small, like

    d2d --path $(HOME)/lib/documents $(PWD) --mode text2xml  source.d2d tmp.xml
... vs ...
    d2d --path $(HOME)/lib/documents $(PWD) --mode text2xml  -0 source.d2d -1 tmp.xml

^ToC 3.3.1 String Arguments containing Whitespace

The parsing algorithm recognizes only the "gaps" between the "argument strings" as separating whitespace, i.e. the gaps between the string objects which make up the string[] passed as an argument to static void main(String[]). All whitespace contained in one such string is masked and treated as character data. Therefore it will end up in the value of an option argument whenever it can be parsed as such, e.g. whenever the type of this argument is "string" (or "character" or "URI", which special cases can be ignored in the context of this discussion).

The string array which arrives at the start of main() is constructed by the co-operative efforts of some shell program and a call to the system service "exec", or sim. Therefore it depends on their correct handling, whether some whitespace can enter the value of an options argument. Consequently, this is out of scope w.r.t. the option module! Puhhhh!!!

E.g. normally any input in "double quotes", like

   $(JAVA) eu.bandm.Application  --title "this is a headline with whitespace"

will stimulate bash or any other shell-like program, to pass the contents (sic!) of these quotes as one single argument string to the system call named "exec", or sim. Therefore it will arrive as one(1) single token at the option parser. But the concrete details of these rules may be very complicated and os-specific.

^ToC 3.3.2 Representing Boolean values

As Boolean values are recognized (not regarding case):

   t true 1 +   --> true
   f false 0 -  --> false

^ToC 3.3.3 Multiple File Names, e.g. for "class path"

It is an old habit of UNIX programs to concatenate file names using a colon ":" in the value of arguments which represent more than one position in the file system.

We recommend using multiple arguments instead. This is os-independent, because some shells will try to resolve the colon as a separator for "drive letters".

 For
   $(JAVA) eu.bandm.Application --libraries a.b:c.d:e.f 
 better 
   $(JAVA) eu.bandm.Application --libraries a.b c.d e.f 

^ToC 3.3.4 Argument values starting with a "minus-sign" Character

Please note that the fact whether a token starts with a minus-sign character is evaluated only whenever a new option may start. As long as the type specification of the arguments of the current option still requires character data, a minus-sign never is interpreted as a new option keyword lead-in.

As soon as a parsing error occurs, input is skipped upto the next minus-sign character, and a new option name is assumed to follow. This does of course not happen after the parsing of the positional option is entered!

^ToC 3.3.5 Fragmented Lists

When constructing command lines automatically, parts of an option's repeting group may come from different sources. In this case it may by helpful to allow multiple occurences of an option, the arguments being concatenated. This is activated by the declaration

   <optionlist fragmentedLists="yes"  >...

This will allow:

   declaration:
     <option foo><type><int/><string/><rep kind="plus"><int/><bool/></rep></type> </option>

   command line
     --foo   3 "stringvalue"  3 true    --otheroption ... --foo 7 false 

This fragmentation is only allowed at group borders. Each group must be defined completely in one particular appearance of the option. Esp. the prefix of arguments before the first group must be given with the first appearance as a whole.
(Fragmentation does never apply to values of type "EnumSet" etc. Inspite of being a collection in themselves, these are treated as one single value which must be specified in once.)

^ToC 3.4 Help Function "usage()"

As usual with unix programs, the compiler generates help functions which list the options, the types of their arguments and the description texts in various languages. Assuming the source file contains descriptions for the languages en, de and sv. Then the following functions will exist:

public class MyOpt extends <mt>/options/runtime/Model {
   ...
   public void usage_en (PrintStream p){...}
   public void usage_de (PrintStream p){...}
   public void usage_sv (PrintStream p){...}
   public void usage (String lang, PrintStream p){...}
   public void usage (PrintStream p){...}
    // defaults to a language chosen by random, only supplied for
    // historic reasons!
}

^ToC 3.5 Unparsing

The function

public class MyOpt extends <mt>/options/runtime/Model {
   ...
   public String serialize (){...}
}

delivers a textual representation of the state of the model which can be used for persistent storage and which will reconstruct this state when submitted to command line parsing.

It most cases it is also wise to insert it into the produced output, together with the name of the programm, its version, the date, etc., as supported by <mt>.format.java.CommentFormats.

^ToC 3.6 Usage of the GUI

The generated GUI class is simply a specialization of a swing JPanel. It contains on the left side (in a "grid bag layout") the short and long names of the options, and on the right side a sequence of input widgets, corresponding to the option's argument types.

When pointing to the names, a tool tip will appear with the description of the option in the current language.

Before the GUI can be operated, i.e. edited by the user, it must be initialized by the values taken from some well-defined model.
A command line parsing process may precede a graphic editing phase, but thanks to the default parameters, this is not required.
Vice versa, after editing is complete, the values have to be transfered back to a model instance, to be retrievable as described above.
For this the gui class offers the following methods:

public class MyGui extends <mt>/options/runtime/Gui {
  ...
  public static MyGui makeInstance(MyOpts model)
  public void model2view(MyOpts model)
  public void view2model(MyOpts model)
  ...
}

The JPanel can be integrated into arbitrary swing containers and programming contexts.

FIXME see for example DocumentedDistribution2

FIXME error handling of the Gui

FIXME non-gui parameters / OBEN: meta-parameters

  1. --geometry (initial window size)

For convenience, it offers (currently just one) default method which runs a dialog, detects input errors and adds some "meta-"buttons to the option panel. There are more than one call patterns to this method, supplying different default arguments. The user language must always be given explicitly.
The result indicates whether the user ended the dialog with the "ok" or the "cancel" button.

public class MyGui extends <mt>/options/runtime/Gui {
  ...
  public boolean editGraphically(String userlanguage)
  public boolean editGraphically(String userlanguage, int width, int height)
  public boolean editGraphically(String userlanguage, int width, int height,
                                 boolean languageSwitchable)
  ...
}

The example from above can thus be continued

     final MyGui myGui = MyGui.makeInstance(myOpts);
     final boolean userPressedOk 
       =  myGui.editGraphically("en", 400, 500);
     if (!userPressedOk){
       System.err.println("Leaving program due to user's cancellation.");
       System.exit(99);
     }
     final MyOpts editedOpts = new MyOpts();
     myGui.view2model(editedOpts);
     // perform task according to the settings in "editedOpts"
     // of course, the "old" model could be re-used to store 
     // the new parametrization, by performing
     //        myGui.view2model(myOpts);

^ToC 3.7 Automated Integration of Options' Descriptions into HTML Based Manuals

The d2d/xml/xhtml-based documentation system, as employed in the doc of meta_tools itself, can integrate a user-readable form of the options specification as a table into the html text. This is part of the "d2d_gp" application architecture, and described in more detail with the module technicalDoc.commandLineDoc.

The d2d-frontend representation works like

#cmdline_option_documentation ../../src/eu/bandm/tools/umod/umodOptions.xml 
    #lang en 

This corresponds to the xml element

      <cmdline_option_documentation>
        <url>../../src/eu/bandm/tools/umod/umodOptions.xml</url>
        <lang>en</lang>
      </cmdline_option_documentation>

The rendering process as defined by "docpage_xml2xhtml.xsl" will create a nice table with names, descriptions, type patterns and default values.

The result can be seen e.g. in the umod documentation.

The sequential order of the print-out can be defined independently of the source text by

<optionlist defaultSorting="0AaB"> ....

with 

<!ENTITY % sorting '( source | 0AaB | 0ABa | AaB0 | ABa0 | AaB | ABa )'>

or in the defintion of the option list by the attribute "defaultSorting" with the same values. If none of these two places defines a sorting, then "source order" is the default.

The meaning of these "sort strategies" is:
AaB -> treat lower case and upper case equivalently
ABa -> first upper case, then lower case (like the old-fashioned ASCII table !-)
Sort strategy contains "0" -> sort according to abbreviations (=one character names); afterwards all those which do not have an abbreviation. The position of the "0" is the position of the numeric abbreviations = positional parameters.
Sort strategy does not contain "0" -> sort according to long names. Afterwards all those which do not have a long name.

Only when printing in "source", the interspered "#text" fragments make sense. Therefore they are ommitted when re-sorting.




go one page back go to start go to start go one page ahead
umod bandm meta_tools auxiliaries

made    2024-08-30_17h51   by    lepper   on    happy-ubuntu        Valid XHTML 1.0 Transitional Valid CSS 2.1

produced with eu.bandm.metatools.d2d    and    XSLT    FYI view page d2d source text