#d2d 2.0 text using mtdocpage : webpage // ================================================================================ #title #src!option! --- Standardized Command Line Parsing and GUI Editing #htmlTitle bandm metatools "option" --- Standardized Command Line Parsing #lang en // ================================================================================ #tableOfContents // ================================================================================ #h1 #title Purpose and Way of Operation // ================================================================================ #p Purpose of the #src!option! package is the parametrization of applications, --- by command line parameters and/or by a graphical user interface (GUI). #p Historically, there have been several attempts for standardization in both areas, but no one has definitely prevailed. The #src!option! package is our approach to automate both by source code generation. #p The #src!option! compiler is realized by the class #link ../api/eu/bandm/tools/option/Compiler.html #text .option.Compiler#/. The required run-time code is in #link ../api/eu/bandm/tools/option/runtime/package-summary.html #text .option.runtime#/. #p The compiler takes a specification (including documentation) of all options of an application, encoded in XML, and generates #ldots #list #i code for command line parsing, #i code for the user help funtion "#src!usage()!" in different languages, #i a "swing"-based graphic user interface for configuring these options, #i integration into user documentation html texts #i TO COME: an html- and ecma-script based gui. #/list #p The central aim of this package is to treat different operating systems, different ways for starting an application, different styles of use, different languages, and differently socialized users #emph!transparently! and combinable. #nl On a first glance, one might assume the task to be rather trivial, but it is not. // ================================================================================ #h1 #title Data Model and Description Format // ================================================================================ // ================================================================================ #h2 #title Historic Predecessors and Paradigms #label txt_hist #p Historically there exist several standardizations for parametrization of an application by #emph!comand line arguments!: #list #i The old X-11 way, using long names with only one dash as a prefix, like "#src!-option...!". #nl This style has recently been re-activated by #src!java/bin/...!. No-one knows what good for ?! #i IEEE "posix #src!getopt()!", as implemented in nearly every programming language as a standard library. It is documented in the IEEE posix standard #link 0/www.opengroup.org/onlinepubs/9699919799/ #/link, therein esp. #link 0/www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html #loc tag_12_01 #text Utility Argument Syntax #/link and #link 0/www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html #loc tag_12_02 #text Utility Syntax Guidelines #/link. #nl Since the posix specification is full of "shoulds and coulds", we daresay the implementations do differ in many details, without having looked at all of them. #i The GNU extension "#src!getopt_long()!" allowing long option names with #emph!two(2)! minus signs as a lead-in. like "#src!--option...!" #/list #p See also the valuable discussions at // https://web.archive.org/web/20090503075914/https://stackoverflow.com/questions/367309/which-command-line-commands-style-do-you-prefer (2014) #link 0/stackoverflow.com/questions/367309/which-command-line-commands-style-do-you-prefe#/link (2014), // 0/web.archive.org/web/20090503075914/http://catb.org/~esr/writings/taoup/html/ch10s05.html #link http://catb.org/~esr/writings/taoup/html/ch10s05.html #/link and // https://web.archive.org/web/20090503075914/http://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html #link www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html#/link #p Our approach basically follows #src!getopt_long(),!but goes beyond w.r.t. typing and type checking of the options' parameters. // ================================================================================ #h2 #title General Data Model #label txt_generaldatamodel #p The terminology in this area has grown historically, and many words are heavily overloaded. We use in the following: #list #i "Parametrization" for the total configuration information for one certain instantiation of an application. #i "Option" for the different syntactical parts of such a parametrization, as identified by a name (long and/or short). On a command line, these options can be permuted and are recognized by the lead-in minus sign character. #i "Argument" for the concrete data related to a certain option in a certain parametrization. In case of the commmand line interface, this can be e.g. a list of integers following an option name. #/list #p The main issue with our data model is that it is more abstract than merely the command-line oriented. It shall serve for different representations, text as well as gui, which makes the task non-trivial. From this, the most important consequences in the following model are #ldots #list #i the choice between presence and absence of a certain option exists by the most trivial means in every command-line version (simply leave it out!) but must be realized explicitly in a gui version. #i a sequential order of the options' value definitions exists naturally with a using a command-line input, but is hard to express in a gui version. #/list #p The resulting underlying data model can be more easily described following the command-line version: #list #i any application instantiation is determined by #list #i the identity of the application, #i and a concrete parametrization #/list #i the concrete parametrization has to follow the parametrization specification for this application. #i the parametrization specification defines a collection of #emph!options!, together with the types of their #emph!arguments!. // FIXME #ii --> knallt in "liftAllDirectors()" //#p #i each option has for identification purpose #list #i a short name only, made up from one(1) character, #i or a long name only, made up from at least two(2) characters, the first of which must be a letter, followed by letters or digits, #i or both of them. #/list The short name character may not be a whitespace or a minus sign. The long name may contain minus signs, but not at first position. #i both names must be unique among all options of a certain application. #i each option is assigned a sequence of data types for its arguments. #i in a concrete parametrization, the option is represented in the command line text #list #i either by one(1) single "minus-sign" character "-" = 0x2D, followed (directly, without intervening whitespace) by the options's short name, #i or by two(1) of these "minus-sign" characters, followed (directly, without intervening whitespace) by the options's long name. #/list (This corresponds to the "#src!getopt_long(...)!" function in GNU.) #i Both representations are followed by a sequence of the representations of the option's arguments, which may be more than one, according to the option's specification. An #emph!empty! list of arguments is only possible as an exception, see #ref txt_single_boolean_argument. #i The representations of the single arguments are separated by whitespace. #i Each option can appear (conceptually/semantically) #emph!at most once! in each concrete parametrization. This is in contrast to the other models mentioned in #ref txt_hist, but is compensated by the possibility to give multiple #emph!arguments! instead. (But see #ref txt_fragmented for continuing the list of a "repeting group" by a second appearance of the option name.) #i The sequential order of options as they appear on a command line is never significant. #nl (This is a severe difference to many other command line systems, but is unavoidable, since in a gui there is no corresponding notion of sequential order. At least not a natural one !-) #/list // ================================================================================ #h3 #title Data Types for Arguments #label txt_datatypes #commentchar\ #p The data type assigned to each option can be defined by the following grammar: #source simpletypes = int | float | rat | string | bool | enum | enumset | specialtypes specialtypes = uri // more to come: | file | action | ... type = simpletypes*, rep? rep = RepKind, simpletypes+ RepKind = star | plus #/source #commentchar/ #p The sequence of arguments of a certain option must comply to the sequence of types as specified. #p The types and corresponding arguments contained in the "#src!rep!" construct are called "repeting group" in the following. This group may be repeated arbitrarily often. It needs to appear at least once in the "#src!plus!" case, and can have the length zero in the "#src!star!" case. /* ==== #p The data type "char" is #xemph!deprecated!: It is complicated enough to get the intended value for a #src!string! type argument without too much harm through the command-line/shell/system-exec-call pipeline. This is nearly impossible for many special characters (backslash, single quote, etc.). So taking a character from the input string #emph!verbatim! as the representation of an argument of character type is not a good idea, -- better encode it as integer value, which may be even input in hexadecimal format. FIXME MEHR == */ // ================================================================================ #h3 #title Positional Options #label txt_positional_options #p Concerning the command line parsing, there are some further rules for modeling the historic behaviour: #nl The posix standard and other publications describe the syntax of a certain application instantiation (e.g. #link 0/www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html #loc tag_12_01 #top#text here #/link) frequently by examples like #source utility_name[-a][-b][-c option_argument] [-d|-e][-f[option_argument]][operand...] #/source #p Therein "#src!utility_name!" corresponds to our notion of "application", and "#src!option_argument!" is nearly the same as with us. #nl Now "#src!operand!" are all these tokens on the commandline following the last option, if any, which (1) are not not arguments to the preceding (last) option, and (2) are not options themselves. #nl For us (2) it means, that their text does not start with a minus-sign character. (In posix they could also be explicitly declared not to be options by the special separator "#src!--!" which ends the option list. This case we do ignore.) #p These "operands" are parameters to the application, but are not options in the sense defined so far. But of course we want to treat all command line components in a unified way, and we have to map them to some gui. Therefore we define some further rules: #list #i An option can be idenfied by a short name which is one(1) single #emph!numeric ASCII digit!, i.e. taken from "#src!0!, #src!1!, #ldots, #src!9!". In this case it represents a "#emph!positional option!" (This corresponds to "operand" in the posix nomenclature from above). #i When after parsing all explicitly mentioned options there is still character data in the contents of the command line, then this data is considered to be represent arguments for positional options. #i All those positional options which have not yet appeared in the command-line (by explicitly using there long or short name) are mapped to this rest input data in ascending order of their short names. #i As long as there is any text left to be parsed as an option argument, the next positional option is recognized implicitly. #i From the last point it follows that a positional option #emph!may not have the empty type! for its arguments, #ldots #i and that none but the last positional option may have a repetiting pattern type for its arguments. (Any non-last positional option with repeting type would swallow all the rest of the command line, because there is no further "minus-sign" to stop it! See #ref txt_parsingprocess below.) #/list #p A standard way to support this historic way of parametrizing is to give those positional parameters "speaking" long names, as in #source -m / --mode : what to do with the files -C / --classpath : where the libraries are searched for -0 / --inputfile -1 / --outputfile -2 / --logfile #/source // ================================================================================ #h2 #title Input Data Format #p All option specifications must be contained in one single XML document. This must be declared as #commentchar \ #source #/source #commentchar / #p The syntax of the input is defined by the dtd, see here #link ../api/eu/bandm/tools/option/absy/doc-files/option.dtd-rendered.html #text a linked rendering.#/text // PRE20221011 file contained in the #link file_option_dtd.html // #text APPENDIX: option dtd file#/link #p#kind src The original of this dtd can be found at #link 3/eu/bandm/tools/option/absy/option.dtd #text /option/absy/option.dtd #/link #p The toplevel definition is #source #/source // ================================================================================ #h3 #title Multilingual Text for Documentation #p A text element type is employed ubiquituously for multi-lingual documentation: #suppressVerbatimCommandCharWarning 2 #source #/source #p#kind missing The "#src!lang!" attribute means the same as "#src!xml:lang!", and should be replaced by this (as soon as all our tools support name spaces !-) #p The #src!lang! attribute should be used following the same rules as common practice with #src!xml:lang!, i.e. following RFC 4646 / RFC 4647 / "[IETF BCP 47], Tags for the Identification of Languages". #p (( Currently, we refer to ISO 639-2, as listed e.g. in #cite isolanguage, instead, because the URL for IETF BCP 47 as referred to at the end of #cite xml is BROKEN !!)) #p The set of supported languages may be chosen arbitrarily (including the empty set), but it should be the same in all lists of #src!text! elements. // ================================================================================ #h3 #title Enumeration Types #p Enumeration types which shall serve as data type for option arguments must be declared according to #ldots #suppressVerbatimCommandCharWarning 2 #source #/source #p Multilingual documentation text may be added to enumeration types as a whole, and/or to every item separately. (As mentioned above, the support of the different languages should be consistent over all objects carrying doc texts.) #p The front-end appearance of an enumeration item (on the command line or shown in a GUI may be an arbitrary string, containing arbitrary character data, like "#src!<->!" or "#src!0abc!"). #nl In cases they do not make up a valid Java identifier, the attribute #src!compilable! must be given, which is used in the generated #src!enum{..}! code instead. #p#kind missing Currently only enumeration types may be declared. We intend to add "type" declarations for general data type reusage. // ================================================================================ #h3 #title Grouping Documentation Comments #p #src!comment! elements can be interspered into the sequence of option declarations. Currently, they only lead to additional text labels in the generated GUI. #suppressVerbatimCommandCharWarning 1 #source #/source #p (The "name" attribute shall be used later to switch on/off a whole group of options in the GUI.) #p#kind missing NOT YET OK ! (kills gui !-) // FIXME // ================================================================================ #h3 #title Option Declarations #p In the following definitions it holds that #ldots #list #i #src!desc! is a multi-lingual description. It should cover the meaning and the correct usage of the option. As mentioned above, all text lists should support the same set of languages. #i The short name of an option (cf. above #ref txt_generaldatamodel) is given by the attribute #src!abbrev!, the long name by the attribute #src!name!. At least one of them must be specified. #i If #src!type! is omitted, the type of the option's arguments is the empty type, e.g. there are no arguments. #i #src!condition! describes logic conditions on other options and their arguments which must be fulfilled to #emph!enable! this option, cf. #ref txt_enablingconditions below. #/list #suppressVerbatimCommandCharWarning 2 #source #/source // ================================================================================ #h3 #title Special Categories of Options #p Currently three special categories can be assigned to an option, as in #source