String Resources

Many modules use human-readable strings that are normally translated into other languages as part of the localization process. However, the practice of "hardcoding" these strings inside your program code makes localization quite difficult, if not impossible. Therefore, you should store your localizable text resources in separate files. This document describes the APIs used to obtain such strings from resource files. It also describes the format of the resource files.

This document does not discuss UI resource files. (See XUI Window Language.)

API

First, you create an instance of the string bundle factory. See mozilla/intl/strres/tests for some sample code.

Next, you get a bundle of strings using the following method (rewrite in XPIDL?):

    class nsIStringBundleFactory : public nsIFactory
    {
    public:
      NS_IMETHOD CreateBundle(nsIURL* aURL, const nsString& aLocale,
                              nsIStringBundle** aResult) = 0;
    };

The URL argument can be local or remote (e.g. resource: or http:).

Currently, the locale argument is a simple string. In the future, we will use nsILocale. See the locale spec for details about nsILocale. We will probably obtain the Accept-Language property from the locale object, and use that to look up string resource files. The precise fallback mechanism (to deal with the list of languages) still needs to be designed.

This is how you get individual strings from the bundle:

    class nsIStringBundle : public nsISupports
    {
    public:
      NS_IMETHOD GetStringFromID(PRInt32 aID, nsString& aResult) = 0;
      NS_IMETHOD GetStringFromName(const nsString& aName,
                                   nsString& aResult) = 0;
    };

String Resource File Format

We will use the Java property file format. Here is an example using string IDs:

    # arbitrary comment
    ## @loc a comment for the localizer (translator)
    ## @doc a comment by the documentation group
    cannotFindFile = Netscape is unable to find the file or directory named %s.

The ## stuff is similar to Java's /** (JavaDoc). All lines beginning with # are removed by a tool to produce the final, compact deliverable.

And here is the same example using integer IDs:

    # arbitrary comment
    ## @name NAV_CANNOT_FIND_FILE
    ## @loc a comment for the localizer (translator)
    ## @doc a comment by the documentation group
    1234 = Netscape is unable to find the file or directory named %s.

The @name attribute can be used to generate #define's for C/C++ programmers to use in their source code, for readability.

Note that the order of subject, verb and object in a sentence depends on the language, and that it is better to use numbered arguments if using printf-style formatting.

The resource file must be in US-ASCII (all bytes less than 127). Non-ASCII characters are represented as \uXXXX, where XXXX is a 4-digit hexadecimal number in Unicode (UTF-16). Non-ASCII characters are only permitted on the right hand side of the equals (=) sign.

File Naming Convention

Java uses file names like awtLocalization_ja.properties. The extension (.properties) is unchanged, since some may depend on this. So the language is inserted before the extension, after an underscore (_).

Should we follow Java's lead? Or create subdirectories for languages? Should there be any difference between http: and file: (or resource:) URLs?

Leveraging Old Translations

We are considering writing a tool to migrate some of the strings from the old versions of the product. This tool would also generate some info that a leveraging tool could use to reuse the translations of the old versions. For example:

    ## @oldid 1234
    cannotFindFile = Netscape is unable to find the file or directory named %s.

This would certainly work for all the strings in the old allxpstr.h. It may even be possible to do this for the WINFE-specific XP_GetString strings and the WINFE dialog strings. This needs to be investigated.

This would also allow some modules to continue to use the old XP_GetString in the short term, migrating to the new API in the long term.

String vs Integer IDs

The benefits of integers IDs:

speed: array indexing is faster than hash tables
size: integers are smaller than readable strings

The benefits of string IDs:

groupability: e.g. connect.refused, connect.timeout
insertability: can insert new string without changing IDs

Can We Get the Best of Both Worlds?

It should be possible to write a tool that works with a file format that gives us groupability and insertability, while retaining speed and compactness in the code. For example:

    connect.refused = Connection was refused
    connect.refused.id = 1234
    connect.timeout = Connection timed out
    connect.timeout.id = 5678

The human inserts new strings in the desired location, and a tool later finds the next available integer ID, and inserts it (*.id). Then, another tool generates the integer-ID based file for product delivery:

    ...
    1234=Connection was refused
    ...
    5678=Connection timed out

The tool could also generate a header file for C/C++ programmers:

    ...
    #define NET_CONNECT_REFUSED 1234L
    ...
    #define NET_CONNECT_TIMEOUT 5678L

Tools can be written in a ubiquitous language like Perl for local execution, or maybe Web-based tools could be provided for remote execution (e.g. CGI).

People that don't want to use or wait for such tools can do it manually.

There are also some logistical problems with such tools. What if several programmers are working on one file at the same time? Their tools might generate the same integer ID for different strings. I am sure we could come up with some system to manage such a process and avoid collisions, but do we want to go there?

Tools also add complexity. We may want to stick to simple text editors.

Questions

Do we really want to pass a URL as an argument? How does this mesh with DCOM?

Do we really want to return the bundle as an interface? If we ever decide to use DCOM, that means that we need to go across the Net for individual strings.

What is the philosophy behind resource: URLs? Are they always equivalent to file: URLs? Or are they sometimes remote resources?

Do we need the concept of a "path"? Like Java's CLASSPATH, X Windows' file search path, etc?

What are the XUI folks planning to do? Will they use resource: URLs, etc?

Use XPCOM for the GetBundle API, but non-XPCOM for GetString API since XPCOM is heavyweight?