|
- William of Occam
|
Command-line processing packages multiply because, despite its apparent simplicity, unrestricted command-line processing is a complex and specialized parsing task. Solutions may be optimized for execution speed, library size, flexibility, expressive power, level of automation, ease of use, or conciseness of specification, but not for all of these at once.
Getopt::Declare is an attempt to optimize first for ease-of-use, then for power, flexibility, expressiveness, and automation. More significantly, Getopt::Declare represents a quite different approach to specifying the nature and meaning of command-line parameters. Most Getopt:: packages take a list of the allowed parameters in some form, possibly annotated with corresponding parameter descriptions, or lists of subarguments, or other flags which control the command-line processing. In contrast, to use Getopt::Declare, the programmer simply specifies the complete "usage" string they wish to have implemented. Getopt::Declare then parses this specification and builds a command-line processor to match.
Thus, when using the standard Getopt::Long, one might write:
GetOptions('foo|f=s', \$foo, 'bar=i', \&proc, 'ar=s@', \@ar) or die; print "foo = $foo, ar = ", @ar;whereas, using Getopt::Declare, one would write:
$args = new Getopt::Declare q{ -foo <str> Peeking option -f <str> [ditto] -bar <num:i> Drinking option { proc($_PARAM_,$num) } -ar <str>... Pirate option [repeatable] }; print "foo = $args->{-foo}, ar = ", @{$args->{-ar}};which is considerably more verbose, but also much clearer and easier to get right. Note that the Getopt::Declare also provides full automatic usage and version enquiry parameters (-h and -v, respectively) and detailed error messages.
As the above example indicates, to parse the command-line in @ARGV one simply creates a Getopt::Declare object, by passing Getopt::Declare::new() a specification of the various parameters that may be encountered. The specification is a single string in which the syntax of each parameter is declared, along with a description and (optionally) one or more actions to be performed when the parameter is encountered.
Calling Getopt::Declare::new() parses the contents of the array @ARGV, extracting any arguments which match the parameters defined in the specification string, and storing the parsed values as hash elements within the new Getopt::Declare object being created. The command-line is parsed sequentially, by attempting to match each parameter in the object's specification string against the current elements in the @ARGV array. The order in which parameters are tried against @ARGV is determined by three rules:
Features of the Getopt::Declare package include:
The parameter definition consists of a leading flag or parameter variable, followed by any number of value place-holders (parameter variables) or literal characters (punctuators), optionally separated by spaces. The parameter definition is terminated by one or more tabs (at least one trailing tab must be present). Each parameter definition would also normally include a textual description (after the mandatory tab(s)), which forms the basis of the automatically-generated usage information provided by Getopt::Declare. For example:
-v Verbose mode in=<infile> Specify input file (will fail if file does not exist) +range <from>..<to> Specify range of columns to consider --line <start> - <stop> Specify range of lines to process ignore bad lines Ignore bad lines :-) <outfile> Specify an output fileThe parameter description may also contain special directives which alter the way in which the parameter is parsed. Some of these are described in later sections of this abstract (and the rest in the full paper).
By default, a parameter variable in a parameter will match a single blank-terminated or quote-delimited string. For example, the parameter:
-val <str>would match any of the following the command-line arguments:
-value # <str> <- "ue" -val abcd # <str> <- "abcd" -val "a value" # <str> <- "a value"It is also possible to restrict the types of values which may be matched by a given parameter variable (the complete mechanism for specifying new matching types is described in the full paper). For example:
-limit <threshold:n> Set threshold to some (real) value -count <N:i> Set count to <N> (must be integer) -name <name:qs> Set name to <name> (may be quote-delimited)Parameter variables are treated as scalars, unless they are immediately followed by an ellipsis (...), in which case they act like an array, and match the specified type sequentially as many times as possible.
-cp <b_list>... <dir> # <list> gets all but last trailing string -rm <c_list>... ; # <list> gets all trailing strings until ;Except for the leading flag, any part of a parameter definition may be made optional by placing it in square brackets. For example:
+range <from> [..] [<to>] -list [<page>...] -q[uiet]Each parameter specification may also include one or more blocks of Perl code, specified in a pair of curly brackets (which must start on a new line). For example:
-v Verbose mode { $::verbose = 1; } -q Quiet mode { $::verbose = 0; }Each action is executed as soon as the corresponding parameter is successfully matched (as "strict" do blocks in the package in which the Getopt::Declare object containing them was created). In addition, each parameter variable belonging to the corresponding parameter is made available within actions as a (block-scoped) Perl variable with the same name. For example:
+range <from>..<to> Set range { setrange($from, $to); } -list <page:i>... Specify pages to list { foreach (@page) { list($_) if $_ > 0 } }Note that scalar parameter variables become scalar Perl variables, and list parameter variables become Perl arrays.
If any specified source corresponds to an interactive TTY (for example: \*STDIN or ['-'] or new IO::File('<-'), etc.), then data from that source is read in and parsed line-by-line, after the processing of any other source files (see "Other applications" for an example).
For example, given the following specification:
$args = new Getopt::Declare q{ -v <value> [exact] Specify search value <infile> Input file -o <outfiles>... Output files };the object $args would have the following members (assuming that all parameters were matched):
$args->{'-v'}{'<value>'} | The argument matched by the <value> parameter variable of the -v parameter. |
$args->{'-v'}{'exact'} | The argument (if any) matched by the optional [exact] punctuator of the -v parameter. |
$args->{'<infile>'} | The argument matched by the <infile> parameter. |
@{$args->{'-o'}} | The list of arguments matched by the <outfiles> parameter variable of the -o parameter. |
Getopt::Declare allows flag clustering at any point where the remainder of the command-line being processed starts with a non-whitespace character and where the remaining substring would not otherwise immediately match a parameter flag. This means that multiple-character flags can be clustered, as can flags with parameter variables and punctuators.
If the idea of such unconstrained flag clustering is too libertarian
for a particular application, the feature may be restricted (or removed
entirely), by including a [cluster: <option>]
directive anywhere in the specification string.
-nice Increment nice value [repeatable] { $::nice++; }
-case set to all lower case -CASE SET TO ALL UPPER CASE [mutex: -case -CASE]
The specification passed to Getopt::Declare::new() is
used (almost verbatim) as a "usage" display whenever usage information
is requested. In addition to this information, Getopt::Declare
displays three sample command-lines: one indicating the normal usage (including
any required parameter variables), one indicating how to invoke help, and
one indicating how to determine the current version of the program.
Alan Citterman's excellent Text::CSV package provides a simple mechanism for parsing comma-separated values:
my $parser = new Text::CSV; open CSV_FILE, $datafile or die; while (defined($line = <CSV_FILE>)) { if ($csv->parse($line)) { my ($ID, $name, $score) = $csv->fields(); process_marks($ID, $name, $score) && next if $ID =~ /^[A-Z]\d{7}$/ && $score eq 0+$score; } print STDERR "Invalid data: $line\n"; }Getopt::Declare can mimic this behaviour, somewhat more compactly:
my $format = q{ [repeatable] <ID:/[A-Z]\d{7}/> , <name:qs> , <score:n> VALID FORMAT { process_marks($ID, $name, $score); } <line:/.*/> ELSE ERROR { print STDERR "Invalid data: $line\n"; } }; new Getopt::Declare ($format, [$datafile]) or die;More importantly, Getopt::Declare makes it simple to handle variant formats of comma-separated values in the same input stream:
my $format = q{ [repeatable] <ID:/[A-Z]\d{7}/> , <name:qs> , <score:n> FORMAT 1 { process_marks($ID, $name, $score); } <name:qs> , <ID:/[A-Z]\d{7}/> , <score:n> FORMAT 2 { process_marks($ID, $name, $score); } <ID:/[A-Z]\d{7}/> , <score:n> FORMAT 3 { process_marks($ID, '???', $score); } <line:/.*/> ELSE ERROR { print STDERR "Invalid data: $line\n"; } }; new Getopt::Declare ($format, [$datafile]) or die;
Moreover, the approach is easily generalized to provide declarative solutions to a range of similar parsing tasks, where the full power of a recursive parser is not required.
Getopt::Declare is freely available from the author from: