VMS-to-Unix Phrase Book
<=  Return           

3.3  Handling Command Line Arguments

Problem

You want your scripts to handle command line arguments like existing Unix commands and programs.

Solution

Use the shell's argument handling variables and related commands to parse the command line.

Discussion

In VMS it is fairly difficult to write DCL programs that can handle command line options just like a built-in command or well written application. This is because the system library's command line parsing tools, easily available to COBOL, FORTRAN, Pascal, and C programs are not directly available to a DCL script. And since the DCL command syntax is so rich, it is more difficult to anticipate and manually parse all of the syntactically correct permutations possible.

In contrast, the traditional Unix command line syntax is very simple. This is probably because originally every Unix program had to rely on itself to parse it's own command arguments without help from library routines. Because of this simple syntax it is relatively easy to write shell scripts that behave much like a native Unix program or built-in command.

In today's Unix systems there are now tools to aid in command line parsing, including at the shell level. The example presented here, however, does not use these tools because doing it 'the hard way' in this simple program better illustrates general scripting constructs which are also if interest to the new Unix script programmer.

Here is a simple example script that echos all of the arguments specified on the command line used to invoke it:

    #!/bin/sh

    while [ $# != 0 ]
    do
       echo "<$1>";
       shift;
    done;
    exit 0;
The command line arguments for a script are stored in the special shell variables $1 thru $9. The special shell variable $# contains the count of the number of initialized arguments the script was invoked with. Finally the shell command shift'pops' the first parameter off the argument stack, making the next argument the new $1 argument. Thus this program marches through each argument, printing it to STDOUT. When $# reaches zero, there are no more arguments.

Just like DCL, a parameter is typically a space delimited word. But quoting (1.5) can be used to group words together into a single paramter. Consider our example program given the following arguments.

unix> ./get-args.sh this that "and these" too
<this>
<that>
<and these>
<too>
unix>
We could have also used the 'strong' quotes as well to achieve the same effect.

vpurge.sh

Before we look at our next example, we need to mention that the Emacs editor, unlike vi, automatically creates a back of version of any file edited. The backup file has the same name as the original with a tilde (~) character appended to the end. In 2.5 we also illustrate that Emacs can be made to create numbered backup files which are similar to VMS's own file version mechanism. As a motivation, the program we will use to illustrate more advanced command line parsing is a utility similar in function to the VMS purge command. You can view the complete program in your current browser window or if your browser supports it, in a new separate window which may be more convenient as you read about how the program works.

Let's look at the example program vpurge, starting from the top.

    help () 
    { 
      code 
    }
As briefly mentioned in 1.3, the Bourne shell provides a mechanism for defining shell functions. Once defined, these functions can be invoked as if calling a built-in shell command. Here we are defining a built-in help page for how to use this utility.
    cat <<EOF
    some text here
    EOF
This is an example of a shell 'here' document. The << redirection defines a special label which will be used to delimit text to be captured and sent to the program's STDIN. Whatever text immediately follows starting with the next line, until that label is seen as the only text on a line by itself, is captured. Care must be taken not to accidentally add one or more spaces at the end of the label definition or in the label itself. When this happens, much head scratching results. To illustrate, let's represent the space character in the next example as [sp]. Neither the command
    cat <<EOF[sp]
    some text here
    EOF
nor
    cat <<EOF
    some text here
    EOF[sp]
will work as expected because the label definition and label invocation are not identical because of that stray space!

Our definition of the die shell function gives us a convenient way to terminate our shell program with an error message and an appropriate error exit code. The command:

    /bin/echo "! `basename $0` ($1): $2" >&2
invokes the external echo command rather than the shell's built-in echo. This is not strictly necessary and in fact will slow down the script slightly. But this example illustrates that on most Unix systems two versions of echo exist; one is a built-in for whatever shell you are using, and the other is a utility program by the same name. It is left as a homework assignment to investigate the differences between these two commands and when you'd want to use one over the other. In this particular example either would work as desired.

What we are doing in this command is taking the script's own name, as invoked on the command line, from the special shell variable $0 and using the basename command to return only the script's filename (i.e. minus any path information). The $0 variable always contains the filespec used to actually invoke the script. But contrast this to the first and second parameters ($1 & $2) which contain the argument values being passed to this shell function. Also note that the function's argument values are being written to STDERR rather than the default of STDOUT ( >&2).

    shift 2
This shell command 'pops' the first two parameters to our shell function. What use to be argument $3 (if it exists) is now $1, etc.
    while [ &# -gt 0 ]
    do
       commands
    done
We've seen the special shell variable &# before. It contains the number of initialized arguments but in this context, for our shell function. As before this value changes as we use shift to pop off function arguments. So this loop keeps printing argument $1 to STDERR and pops off that argument, making the next argument $1 until there are no more argument values.

In the 'main line' of the program:

    verbose=off
    interactive=off
    test=off
    force=off
These are the shell variables that will be used to implement the options described in the help text.
    while [&# -gt 0 ]
    do
       commands
    done
This is the same construct used in our shell function, only now the context is for the script itself, so this will be testing for command line arguments.
    case "$1" in
       string) command; command;;
       string) command; command;;
         :
            *) command; command;;
    esac
In a shell case statement string is a literal character string to be matched in the referenced shell variable; the right parenthesis is used as a delimiter between the matching string and the first statement for that case. The same wildcard characters used for file matching may be used here. Remember that these wildcard characters do not have the same semantics as regular expressions used by grep and other Unix utilities. It's usually a good idea to have a last case of *) to catch any unexpected values.

In the while loop surrounding the case statement, we are matching expected command line options which follow the original Unix convention of being a minus followed by a single letter. Notice that we do not handle the newer but common convention of allowing single letter options to be concatenated together behind a single hyphen (for example -abc to invoke the -a, -b, and -c options). The shift command is used as before to pop off arguments. The break command causes an exit from both the case statement and the while loop. The case --) is used to implement the common Unix convention of using double hyphens to signal the end of options and the beginning of parameters. This is necessary should the user need to enter a filename that happens to start with a hyphen (a deprecated practice).

    if [ $# -gt 0 ]
    then allFiles="$@"
    else allFiles="`(ls -1 | grep -v '~$') 2>/dev/null`"
    fi;
After the while loop has processed all the options, any remaining arguments must be parameters for the script to process, in this case, filenames. If there are no more parameters, then we default to the special case of matching all files, just like the VMS purge command. Here we use the ls command to generate a list of all the files in the current directory and filter out any that end with a tilde (~). Any warning messages generated are thrown away. Is this necessary? Is it a good idea?

    for nxtFile in $allFiles
    do
      case "$nxtFile" in
        *~) die 1 "invalid file specified: $nxtFile" "try -h for help";;
      esac
    done
The for loop can be used to iterate over a list of values. The variable allFiles should contain a list of zero or more filenames delimited by spaces. The variable nxtFile will be assigned each of these values in turn until the end of the list or the break command is used.
    hit="`ls $next~ 2>/dev/null`"
    hit="$hit `(ls $next.~*~ 2>/dev/null | egrep '\.~[0-9]+~$')`"
Backticks are used to capture the output from the ls command. We are looking for files with the naming patterns of
    foobar~
    foobar.~23~
To catch the latter we filter the more general .~*~ wildcard pattern using grep, looking specifically for a period followed by a tilde, followed by one or more digits, followed by a tilde, followed by the end of the string. Because the way we combine the previous version of hit with itself, we are always introducing a space (can you see where?). So we test for a single space to detect an empty string.

Finally, we remove the files we have matched, if any, applying our own command line options as appropriate.

See Also

1.5 - Quoting;
2.3 - Matching Multiple Files Using Wildcards;
Chapter 10 of Unix for OpenVMS Users ;
Chapter 46 of Unix Power Tools .


<=  Return           

Colophon:
Best Viewed With Any Browser
This page maintained by:
    Bill.Costa@unh.edu
    of the Enterprise Computing Group
    in the dept of Computing & Information Sevices
    at the University of New Hampshire

Typographical
Conventions

Created:  31-Jan-2001 BC
Revised:  27-Mar-2001 BC