Wednesday, October 17, 2018

Code review: parsing the command line

This is a continuation of the Code Review article series.

At its core, DOS is a command line operating system. Sure, many users might only use the command line long enough to launch their favorite DOS application or game. But for many FreeDOS users and developers, the command line is where it's at.

When I first started working on FreeDOS, I wanted to make sure FreeDOS programs could parse the command line easily and consistently, so every FreeDOS program used pretty much the same syntax. Or as similar as possible.

Let's do a quick review: on DOS systems, most command line programs and utilities use the slash character (/) to start a command line option. Options can be single-letter or single-character options, or they can be entire words. For example, a standard option to tell the program to display a "Help" page is /?. Depending on the program, you might have other command line options, such as /A (perhaps to indicate "all") or /FORCE (to force an action). And on DOS, the options are usually case-insensitive, so /A and /a would usually be treated the same.

Under Unix and Linux systems, the standard way to parse command line arguments is via the getopt() system function. This function allows the program to parse the command line for options that begin with a hyphen (-) such as -a or -o. You can give an argument to an option using the equal sign (=) such as -f=file.txt, or as a following argument such as -f file.txt. GNU extended getopt() to provide a getopt_long() function, which makes command line options more readable. Long options start with a double dash (--) such as --print-all, and can include a short option alternative such as -a (same as --print-all).

For FreeDOS, I didn't want to re-invent the wheel. Why write a completely new library when I can modify something that already does the job? I modified the GNU getopt library to provide a DOS version of getopt() and getopt_long(). I also did some code cleanup to remove some Unix-y things and make the library more suitable for DOS. You can find version 1.2 of the FreeDOS getopt library at ibiblio under files/devel/libs/getopt.

The Readme provides a quick overview and changes from the GNU version, including:

The getopt_long() function works like getopt() except that it also
accepts long options, started out by a slash char.  Both long and
short options may take arguments, which are set off by equals ('=').

Short options are case sensitive.  Long options are not.  This is a
compromise from the UNIX getopt().

The getopt_long() function returns the option character if the option
was found successfully, ':' if there was a missing argument to one of
the options, '?' for an unknown option character, or EOF for the end
of the option list.

getopt_long_only() returns the option character when a short option
is recognized.  For a long option, they return val if flag is NULL,
and 0 otherwise.  Error and EOF returns are the same as for getopt(),
plus '?' for an ambiguous match or an extraneous parameter.

See the foo.c sample program to see how to use getopt_long.

----------------------------------------------------------------------
CHANGES FROM THE GNU getopt_long() FUNCTION:
----------------------------------------------------------------------

I have not yet implemented all features from GNU getopt_long:

 - flag is not used in longopts.

 - longindex is not yet used.

These should be implemented in a future version of getopt_long.

----------------------------------------------------------------------
OTHER ISSUES FOR THE getopt_long() FUNCTION:
----------------------------------------------------------------------

Options must be separated on the command line.  Combining options is
not allowed.  You must write: "foo /a /v" and not "foo /av".  The
second version would try to match a long option called "/av".

Also, you must write: "foo /a /v" and not "foo /a/v".  The second
version would try to match a long option called "/a/v".

I should note this is not a perfect replacement for standard MS-DOS command line parsing. In classic MS-DOS, command line options always start with a slash character, no matter if a long or short option, and you don't need to separate them with spaces. So an MS-DOS program that used both the /A and /V options would interpret /A/V and /A /V as the same. But for the purposes of providing a standard command line experience, I figured this was a good trade-off.

You use the FreeDOS getopt just like the GNU getopt. Here's a sample program:

/*
  The following example program, adapted from the GNU getopt_long
  manual, illustrates the use of getopt_long() with most of its
  features.
*/

/* This program should compile under Linux and DOS equally well. */

#include <stdio.h>

#ifdef unix
#define _GNU_SOURCE
#include <getopt.h>
#else /* assumes DOS */
#include <stdlib.h>
#include "getopt_l.h"
#endif

int
main (argc, argv)
     int argc;
     char **argv;
{
  int c;
  int option_index = 0;
  static struct option long_options[] =
  {
    {"help", 0, 0, 'h'},
    {"verbose", 0, 0, 'v'},
    {"extra", 1, 0, 'x'},
    {0, 0, 0, 0}
  };

  while ((c = getopt_long (argc, argv, "x:hv",
         long_options, &option_index)) != EOF)
    {
      switch (c)
      {
      case 'x':
        printf ("x -> option = %s\n", optarg);
        break;
      case 'h':
        printf ("print help\n");
        break;
      case 'v':
        printf ("verbose mode = on\n");
        break;
      default:
        printf ("?? getopt returned character %c (optopt=%c)\n", c, optopt);
      }
    }

  if (optind < argc)
    {
      printf ("non-option ARGV-elements: ");
      while (optind < argc)
 printf ("%s ", argv[optind++]);
      printf ("\n");
    }

  exit (0);
}

The key is that you define the long options in a structured list, where each item gives the long option name, whether or not the option takes an argument, and the short option. End the list with zeroes.

Each time you call getopt_long(), the function returns the next option on the command line. It does some work behind the scenes to re-order the command line, in case the user put some command line options after regular arguments. For each option returned, use a switched block to trigger an action, such as setting a flag.

After all command line options are processed, the getopt library sets a variable (optind) to the index in the command line vector (in this case, argv) for the first regular option. In many programs, the regular options are files that the program will act against.

1 comment:

  1. What size does a simple, compiled example program have?

    For small utilities, that only need 1 parameter it is quicker, smaller and more easy to use strncmp, like this:

    ...
    if(argc == 2) {
    if(!strncmp("/force", argv[1], 6) { do sometinhg... }
    ...

    This only works, if only one argument is used. For parsing some more this could be sufficient:

    ...
    while(argc > 0) {
    if(!strncmp("/bla", argv[argc], ...
    ...
    argc--;
    }

    ReplyDelete