Wednesday, July 5, 2017

How to support different spoken languages in FreeDOS programs

All summer, we are doing the FreeDOS Summer Coding Blog Challenge, where we ask you to write and share some tips that new developers can use to get started in FreeDOS. I thought I'd start by writing about the Cats and Kitten libraries to support different spoken languages in your programs.
When you write a new program, you probably don't think about spoken languages other than your own. I am a native English speaker, so when I write a new program, all of my error messages and outputted text is in English. And that works well for the many people who have English as their native language, or who know enough English as a second language to get by. But what about others who don't speak English, or who only know a little English? They can't understand what my programs are saying.

So in the early 2000s, I decided FreeDOS should have a system to support different spoken languages. This type of multi-language support is called Internationalization (or "I18N," because there are eighteen letters between the "I" and "N").

I looked around at how other popular systems had implemented multiple spoken languages. The standard Unix method is with a set of C library functions built around language "catalogs." A catalog is just a file that contains all the error messages and other printed text from a program. In the Unix method, you have a different catalog for every language: English, German, Italian, Spanish, French, and so on.

My first FreeDOS I18N library was a stripped-down implementation of the Unix method. This is a very simple method. Every time you want to print some text in the user's preferred language, you first look up the message string from the catalog using the catgets() function—so named because it will get a string from a message catalog.

In Unix, you use catgets() this way:

  string = catgets(cat, set, num, "Hello world");

This fetches message string number num from message set set, from language catalog cat. The organization of messages into sets allows developers to group status messages into one set (say, set 1), error messages into another set (such as set 7), and so on.

Before calling catgets(), you need to open the appropriate language catalog with a previous call to catopen(). Typically, you have one catalog per language, so you have a different language file for English, another for Spanish, etc. Before your program exits, you close any message catalogs with calls to catclose().

If the string doesn't exist in the message catalog, catgets() returns a default string that you passed to it; in this case, the default string was "Hello world."

I implemented a simplified version of these functions in a FreeDOS Internationalization library called Cats. To save on memory, Cats supported only one open catalog at a time. If you tried to open a second message catalog, the call to catopen() would return an error (-1).

Message catalogs were very simple under Cats. Implemented as plain text files, Cats loaded the entire message catalog into memory at run-time. In this way, you didn't need to recompile the program just to support other languages; you just added another message catalog file for the new language. An English message catalog for a simple program might look like this:

  1.1:Hello world
  7.4:Failure writing to drive A:

The same message catalog in Spanish might look like this:

  1.1:Hola mundo
  7.4:Fallo al escribir en la unidad A:

For example, the string "Failure writing to drive A:" is message number 4 in set 7.

The Cats library was a simple way for developers to add support for different languages in their programs, written in C. And because Cats implemented a Unix standard, it made porting Unix tools to FreeDOS much easier. Once you added the calls to catgets(), all you needed to support other languages was a message catalog that someone translated to a different language. And I kept the Cats message catalogs very simple; they were plain text files.

Cats was a neat innovation, but loading the messages into memory was cumbersome because it used streams. I admit that I just did it the quick and easy way. Fortunately, other FreeDOS developers improved on Cats to optimize the loading of catalogs, reduce memory footprint, and add other enhancements. The new library was noticeably smaller, so we renamed it Kitten, also written in C.

Because of the optimizations, Kitten used a slightly different way to retrieve messages. Since Cats only supported one message catalog at a time anyway, Kitten removed the cat catalog identifier. Once you open a message catalog with kittenopen(), all calls to kittengets() assume that message catalog. You only need to make a single call to kittenclose() before you end the program.

Using Kitten made it much easier to support different spoken languages in FreeDOS programs. Here's a trivial example to put it all together:

  /* test.c */

  #include <stdio.h>
  #include <stdlib.h>
  #include "kitten.h"

  int
  main(void)
  {
    char *s;
 
    kittenopen("test");
 
    s = kittengets(7, 4, "Failure writing to drive A:");
    puts(s);
 
    kittenclose();
    exit(0);
  }

This loads a message catalog "test" into memory, then retrieves message 4 from set 7 into a string pointer s. If message 4 in set 7 isn't found, kittengets() returns the default string "Failure writing to drive A:" into s. The program prints the message to the user, then closes the message catalog before exiting.

Typically, you name the message catalog after the program's name. So the message catalog for the FreeDOS CHOICE program is "choice". Kitten searches for the language file in a few locations on disk, and always appends the value of the %LANG% environment variable, which is typically set to the two-letter language abbreviation: "en" for English or "es" for Spanish. The DOS filename for the English version of the "choice" language catalog is CHOICE.EN, and the Spanish language version is CHOICE.ES.

A limitation to Cats and Kitten is that it can only support single-byte character sets supported by DOS. So Cats and Kitten cannot support Chinese or Japanese, but should do fine with most other languages.
You can find Cats and Kitten at the FreeDOS files archive on ibiblio, under devel/libs/cats/

We made three revisions to Kitten, so the latest version is Kitten revision C, which you can download directly as kitten-c.zip.

FreeDOS contributor Mateusz "Fox" Viste wrote a similar implementation for Pascal programs, called Cubs. You can also find it on ibiblio, under devel/libs/cats/cubs/

FreeDOS developer Eric Auer created a command-line version of Kitten, named Localize, so you can provide internationalization support for DOS Batch (BAT) files. Find it on ibiblio, under devel/libs/cats/localize/

No comments:

Post a Comment