Declarative Command-line Interfaces
(Extended Abstract)

Damian Conway

School of Computer Science and Software Engineering
Monash University
Clayton 3168, Australia

mailto:damian@csse.monash.edu.au
http://www.csse.monash.edu.au/~damian

Abstract

This paper describes a different approach to generating command-line argument parsers in Perl [1]. The system presented takes a standard "usage" description and reverse-engineers a parser which satisfies that description. This ability to declaratively specify complex parsers also proves useful in other contexts, such as comma-separated-value processing, simple input parsing, and string interpolation. The full paper is available on-line from: http://www.csse.monash.edu.au/~damian/papers/HTML/Getopt.html

Introduction

Non sunt multiplicanda entia praeter necessitatem.
- William of Occam
In apparent defiance of Occam's Razor, command-line argument parsing libraries multiply beyond all reasonable necessity. A 1994 survey [2] compares a dozen libraries for C/C++ alone, whilst the Comprehensive Perl Archive Network catalogues nine distinct Perl packages for the same purpose. Worse still, this paper describes Getopt::Declare - yet another command-line argument parser for Perl.

Command-line processing packages multiply because, despite its apparent simplicity, unrestricted command-line processing is a complex and specialized parsing task. Solutions may be optimized for execution speed, library size, flexibility, expressive power, level of automation, ease of use, or conciseness of specification, but not for all of these at once.

Getopt::Declare is an attempt to optimize first for ease-of-use, then for power, flexibility, expressiveness, and automation. More significantly, Getopt::Declare represents a quite different approach to specifying the nature and meaning of command-line parameters. Most Getopt:: packages take a list of the allowed parameters in some form, possibly annotated with corresponding parameter descriptions, or lists of subarguments, or other flags which control the command-line processing. In contrast, to use Getopt::Declare, the programmer simply specifies the complete "usage" string they wish to have implemented. Getopt::Declare then parses this specification and builds a command-line processor to match.

Thus, when using the standard Getopt::Long, one might write:

        GetOptions('foo|f=s', \$foo, 'bar=i', \&proc, 'ar=s@', \@ar)
                or die;
        print "foo = $foo, ar = ", @ar;
whereas, using Getopt::Declare, one would write:
        $args = new Getopt::Declare q{

                -foo <str>      Peeking option
                -f <str>        [ditto]
                -bar <num:i>    Drinking option
                                        { proc($_PARAM_,$num) }
                -ar <str>...    Pirate option [repeatable]
        };
        print "foo = $args->{-foo}, ar = ", @{$args->{-ar}};
which is considerably more verbose, but also much clearer and easier to get right. Note that the Getopt::Declare also provides full automatic usage and version enquiry parameters (-h and -v, respectively) and detailed error messages.

As the above example indicates, to parse the command-line in @ARGV one simply creates a Getopt::Declare object, by passing Getopt::Declare::new() a specification of the various parameters that may be encountered. The specification is a single string in which the syntax of each parameter is declared, along with a description and (optionally) one or more actions to be performed when the parameter is encountered.

Calling Getopt::Declare::new() parses the contents of the array @ARGV, extracting any arguments which match the parameters defined in the specification string, and storing the parsed values as hash elements within the new Getopt::Declare object being created. The command-line is parsed sequentially, by attempting to match each parameter in the object's specification string against the current elements in the @ARGV array. The order in which parameters are tried against @ARGV is determined by three rules:

  1. Parameters with longer flags are tried first. Hence the command-line argument "-quiet" would be parsed as matching the parameter -quiet rather than the parameter -q <string>, even if the -q parameter was defined first.

  2.  
  3. Parameter variants with the most components are matched first. Hence the argument "-rand 12345" would be parsed as matching the parameter variant -rand <seed>, rather than the variant -rand, even if the "shorter" -rand variant was defined first.

  4.  
  5. Otherwise, parameters are matched in the order they are defined in the specification string.
Elements of @ARGV which do not match any defined parameter are collected during the parse and are eventually put back into @ARGV

Features of the Getopt::Declare package include:

Some of these features are summarized in this abstract, but all are discussed in the complete paper.
 

Specifying command-line parameters

In a Getopt::Declare specification, each parameter consists of three parts: the parameter definition, a textual description, and any actions to be performed when the parameter is matched.

The parameter definition consists of a leading flag or parameter variable, followed by any number of value place-holders (parameter variables) or literal characters (punctuators), optionally separated by spaces. The parameter definition is terminated by one or more tabs (at least one trailing tab must be present). Each parameter definition would also normally include a textual description (after the mandatory tab(s)), which forms the basis of the automatically-generated usage information provided by Getopt::Declare. For example:

        -v                        Verbose mode
        in=<infile>               Specify input file
                                   (will fail if file does not exist)
        +range <from>..<to>       Specify range of columns to consider
        --line <start> - <stop>   Specify range of lines to process
        ignore bad lines          Ignore bad lines :-)
        <outfile>                 Specify an output file
The parameter description may also contain special directives which alter the way in which the parameter is parsed. Some of these are described in later sections of this abstract (and the rest in the full paper).

By default, a parameter variable in a parameter will match a single blank-terminated or quote-delimited string. For example, the parameter:

        -val <str>
would match any of the following the command-line arguments:
        -value                  # <str> <- "ue"
        -val abcd               # <str> <- "abcd"
        -val "a value"          # <str> <- "a value"
It is also possible to restrict the types of values which may be matched by a given parameter variable (the complete mechanism for specifying new matching types is described in the full paper). For example:
        -limit <threshold:n>    Set threshold to some (real) value
        -count <N:i>            Set count to <N> (must be integer)
        -name <name:qs>         Set name to <name> (may be quote-delimited)
Parameter variables are treated as scalars, unless they are immediately followed by an ellipsis (...), in which case they act like an array, and match the specified type sequentially as many times as possible.
Note that both scalar and list parameter variables "respect" the flags of other parameters, as well as their own trailing punctuators. For example:
        -cp <b_list>... <dir>          # <list> gets all but last trailing string       
        -rm <c_list>... ;              # <list> gets all trailing strings until ;
Except for the leading flag, any part of a parameter definition may be made optional by placing it in square brackets. For example:
        +range <from> [..] [<to>]
        -list [<page>...]
        -q[uiet]
Each parameter specification may also include one or more blocks of Perl code, specified in a pair of curly brackets (which must start on a new line). For example:
        -v      Verbose mode
                        { $::verbose = 1; }
        -q      Quiet mode
                        { $::verbose = 0; }
Each action is executed as soon as the corresponding parameter is successfully matched (as "strict" do blocks in the package in which the Getopt::Declare object containing them was created). In addition, each parameter variable belonging to the corresponding parameter is made available within actions as a (block-scoped) Perl variable with the same name. For example:
        +range <from>..<to>   Set range
                                  { setrange($from, $to); }
        -list <page:i>...     Specify pages to list
                                  { foreach (@page) { list($_) if $_ > 0 } }
Note that scalar parameter variables become scalar Perl variables, and list parameter variables become Perl arrays.

Parsing from other sources

Getopt::Declare normally parses the contents of @ARGV, but can be made to parse from other text sources. Thus Getopt::Declare::new() takes an optional second parameter, which specifies the source to be parsed. Sources which may be specified are: references to filehandles, references to arrays of file names (which are opened and read in),  standard configuration files (".{progname}rc"), subroutine references (which are called repeatedly to return a string to be parsed), or actual strings.

If any specified source corresponds to an interactive TTY (for example: \*STDIN or ['-'] or new IO::File('<-'), etc.), then data from that source is read in and parsed line-by-line, after the processing of any other source files (see "Other applications" for an example).

Using Getopt::Declare objects after command-line processing

For each successfully matched parameter, the Getopt::Declare object will contain a hash element. The key of that element will be the leading flag or parameter variable name of the parameter. The value of the element will be a reference to another hash which contains the names and values of each distinct parameter variable and/or punctuator which was matched by the parameter. As a special case, if a parameter consists of a single parameter variable (optionally preceded by a flag), then the value for the corresponding hash key is not a hash reference, but the actual value matched.

For example, given the following specification:

        $args = new Getopt::Declare q{
                -v <value> [exact]      Specify search value
                <infile>                Input file
                -o <outfiles>...        Output files
        };
the object $args would have the following members (assuming that all parameters were matched):
 
$args->{'-v'}{'<value>'}  The argument matched by the <value> parameter variable of the -v parameter. 
$args->{'-v'}{'exact'}  The argument (if any) matched by the optional [exact] punctuator of the -v parameter. 
$args->{'<infile>'}  The argument matched by the <infile> parameter. 
@{$args->{'-o'}}  The list of arguments matched by the <outfiles> parameter variable of the -o parameter. 


Flag clustering

Like some other Getopt:: packages, Getopt::Declare allows parameter flags to be "clustered" or "bundled". That is, if two or more flags have the same flag prefix (one or more leading non-whitespace and non-alphanumeric characters), those flags may be concatenated behind a single copy of that prefix.

Getopt::Declare allows flag clustering at any point where the remainder of the command-line being processed starts with a non-whitespace character and where the remaining substring would not otherwise immediately match a parameter flag. This means that multiple-character flags can be clustered, as can flags with parameter variables and punctuators.

If the idea of such unconstrained flag clustering is too libertarian for a particular application, the feature may be restricted (or removed entirely), by including a [cluster: <option>] directive anywhere in the specification string.
 

Parameter dependencies

Getopt::Declare provides several other directives which modify the behaviour of the command-line parser in some way:

[required]

The [required] directive specifies that an argument matching the corresponding parameter must appear somewhere in the command-line. If no such argument is found, Getopt::Declare::new() calls die with an appropriate error message.

[repeatable]

By default, Getopt::Declare objects allow each of their parameters to be matched only once per parse. However, it is often useful to allow a particular parameter to match more than once. Any parameter whose description includes the directive [repeatable] is never excluded as a potential argument match, no matter how many times it has matched previously. For example:
        -nice      Increment nice value [repeatable]
                       { $::nice++; }

[mutex:<flaglist>]

The [mutex:...] directive specifies a set of parameters which are to be treated as mutually exclusive. That is, no two or more of them may appear in the same command-line. For example:
        -case       set to all lower case
        -CASE       SET TO ALL UPPER CASE
        [mutex: -case -CASE]

Usage and version information

By default, Getopt::Declare automatically defines six case-insensitive parameters: three "help" parameters (-h, -help, and --help) and three "version" parameters (-v, -version, and --version).  Hence, most attempts by the user to request information at the command-line will be successful.

The specification passed to Getopt::Declare::new() is used (almost verbatim) as a "usage" display whenever usage information is requested. In addition to this information, Getopt::Declare displays three sample command-lines: one indicating the normal usage (including any required parameter variables), one indicating how to invoke help, and one indicating how to determine the current version of the program.
 

Other applications

The wide range of features and ease-of-specification of Getopt::Declare parsers, combined with their ability to parse from sources other than @ARGV, make them adaptable to a number of other common parsing applications. The full paper describes three such examples: parsing comma-separated values, processing text templates, and implementing simple command languages, but only the first is described here.

Alan Citterman's excellent Text::CSV package provides a simple mechanism for parsing comma-separated values:

        my $parser = new Text::CSV;
        open CSV_FILE, $datafile or die;
        while (defined($line = <CSV_FILE>))
        {
                if ($csv->parse($line))
                {
                        my ($ID, $name, $score) = $csv->fields();
                        process_marks($ID, $name, $score) && next
                                if $ID =~ /^[A-Z]\d{7}$/ && $score eq 0+$score;
                }
                print STDERR "Invalid data: $line\n";
        }
Getopt::Declare can mimic this behaviour, somewhat more compactly:
        my $format =
        q{      [repeatable]
                <ID:/[A-Z]\d{7}/> , <name:qs> , <score:n>       VALID FORMAT
                                { process_marks($ID, $name, $score); }
                <line:/.*/>                                     ELSE ERROR
                                { print STDERR "Invalid data: $line\n"; }
        };
        new Getopt::Declare ($format, [$datafile]) or die;
More importantly, Getopt::Declare makes it simple to handle variant formats of comma-separated values in the same input stream:
        my $format =
        q{      [repeatable]
                <ID:/[A-Z]\d{7}/> , <name:qs> , <score:n>       FORMAT 1
                                { process_marks($ID, $name, $score); }
                <name:qs> , <ID:/[A-Z]\d{7}/> , <score:n>       FORMAT 2
                                { process_marks($ID, $name, $score); }
                <ID:/[A-Z]\d{7}/> , <score:n>                   FORMAT 3
                                { process_marks($ID, '???', $score); }
                <line:/.*/>                                     ELSE ERROR
                                { print STDERR "Invalid data: $line\n"; }
        };
        new Getopt::Declare ($format, [$datafile]) or die;

Conclusion

The Getopt::Declare package fills yet another niche in the multi-dimensional Getopt:: solution space. Its approach of constructing command-line recognizers by reverse-engineering a "usage" specification proves to be a simple and powerful means of argument parsing, and one which encourages better documentation of a program's code and interface.

Moreover, the approach is easily generalized to provide declarative solutions to a range of similar parsing tasks, where the full power of a recursive parser is not required.

Getopt::Declare is freely available from the author from:

http://www.csse.monash.edu.au/~damian/CPAN/Getopt-Declare.tar.gz

References

[1]
Wall, L., Christiansen, T., & Schwartz, R.L., Programming Perl, 2nd Edition, O'Reilly & Associates, 1996.
[2]
Conway, D.M., Autogenerating Documented Command Line Interfaces, in "Springer's Lecture Notes of Computer Science: Human-Computer Interaction", ed. Blumenthal, Gornostaev & Unger, vol. 876., pp. 77-94, Springer-Verlag, Berlin, 1994.