XUL Localizability issues

by Tao Cheng< tao@netscape.com >

Document History

Goals

This document serves the following purposes

Principles

Here are a list of principles the author intends to follow in seeking the solution for this issue.

XUL Localizability dependency

The XUL localizability has dependency on the following items.
  • XUL Coding Style guidelines: submitted for internal review.
  • XUL/XML parser dependency

  • The proposed XUL Localizability solution (see http://www.mozilla.org/projects/intl/xul-l10n.html for more information.) requires two features from XML parser:
  • General entities substitution. The proposal requires all localizable resources to be declared as general entities in a language-specific external DTD. Reference to these entities will be substituted inline during parsing.
  • Location of this DTD will be determined by the combination of system locale and the URI referenced by the SYSTEM identifier in the doctype declaration or entity declaration. A potential solution is that the parser, in encountering the SYSTEM keyword, notifies the application through a callback hook so that locale information can be used to fetch the localized version of DTD.  For example, assuming the doctype declaration is this:

  •     <!DOCTYPE xui SYSTEM "http://www.mozilla.org/xul/toolbar.dtd">

    And the current system locale is "fr", then the parser shall fetch

        "http://www.mozilla.org/xul/fr/toolbar.dtd"

    for the entity declaration.

    The earlier these features, especially (1), get into the tree, the less work we need to convert existing XUL file/stream into localizable format.

    To sum up, the XUL localizability dependency on XML parser is

  • Internal and external general entity support.
  • External DTD support.
  • SYSTEM identifier callback hook.
  • Candidates of the final solution

    1. XUL + language-specific DTD. (adopted)

      Description:

      • Put all localizable resources in a language DTD file. Example of such resources are text strings, customizable icons, and URLs. Most of them can be described by text/parsed entities.
      • Non locale sensitive resources shall not be in this DTD file.
      • Use SYSTEM identifier to reference this DTD file.
      • Need to implement locale sensitive file lookup for the language specific DTD file.
      • Put format strings, such as "Item %d of %d", in text entities and compute the value in the application code such as MailCore or BrowserCore.
      • To dynamically switch languages, we need to reload the XUL and its DTD (probably from a remote host). This is because once the DOM tree is created, the entities and DTDs have already been processed.

      Sample XUL: toolbar.xul


        <!DOCTYPE xui SYSTEM "toolbar.dtd">

        <xul:toolbar>

          &txtContentData;
          <button cmd="nsCmd:BrowserBack" style="background-color:rgb(192,192,192);">
            <img src="resource:/res/toolbar/TB_Back.gif"/>
            &txtBack;
          </button>

          <button cmd="nsCmd:BrowserForward" style="background-color:rgb(192,192,192);">
            <img src="resource:/res/toolbar/TB_Forward.gif"/>
            &txtForward;
          </button>

          <button cmd="nsCmd:BrowserWizard" style="background-color:rgb(192,192,192);">
            <img src="&iconWizard;"/>
            &txtWizard;
          </button>

        </xul:toolbar>

      Sample DTD: toolbar.dtd


        <!ENTITY txtContentData "Random content data">
        <!ENTITY txtBack "Back to %s">
        <!ENTITY txtForward "Forward">
        <!ENTITY iconWizard "resource:/res/toolbar/TB_Wizard.gif">
        <!ENTITY txtWizard "Wizard">

      Pros:

      • Already standard compliant; no new syntax names or tags need to be introduced.
      • Only one minor tweak needed: escape "%" used in formatting string, such as "%d out of %d" for dynamic strings binding. For example, use a numeric character reference (NCR), '&#37;' to escape '%'.
      • Text replacement can be in either content or attribute values (but not in the attribute names).

      Cons:

      • The language-specific DTD file is not flat file. Need a DTD parser to extract localizable resources into a flat file for localizers.
      • Two file formats to deal with: the property file and the DTD file.
      • Hard to group text entity by UI component.
      • We lose the information of text entities after parsing.
      • In switching languages, we need to reload the XUL and its DTD (probably from a remote host) and reconstruct the DOM tree.

        In the example of a dialog UI, if we used entities and DTDs, we would have to tear down the whole DOM tree and the dialog that sits on top of that, and then rebuild a new DOM tree and dialog. This would be wasteful, since our layout manager is able to resize elements dynamically, so we can "edit" the DOM tree and have the dialogs redraw themselves automatically.

        However, we can live with this performance drag since the users might not switch language in runtime that often.

    2. Single XUL file with Java-like property file. (ruled out due to technical difficulty)

      Descriptions:

      • Assign a widgetID to each widget in XUL file, and a resTag to each localizable resource/attribute of a widget in the widget code. Then, call gettext(widgetID, resTag, default_string) to retrieve the resources from a Java-like property file in runtime. For example, a label widget can be described as <label widgetID="345" text="label string"/> in a XUL file. Then, the function call to retrieve localized text will be gettext(345, RES_TEXT, "label string");
      • If the property file does not exist or the combination of widgetID and resTag does not resolve to a resource string, the default_string will be returned in instead.
      • All localizable resources must be stored in Java-like property file.
      • The resources replacement may happen as early as in parsing or as late as in widget initialization.
      • Reference to the property file will be declared as an external unparsed entity and stashed in the DOM tree for later use. See sample XUL declaration below.

      Sample XUL: toolbar.xul


        <!DOCTYPE xui SYSTEM "toolbar.dtd">

        <!-- L10N-PTY type of data: file format can be found at http://www.netscape.com/PropertyFile -->
        <!NOTATION L10N-PTY SYSTEM "http://www.netscape.com/PropertyFile">
        <!ENTITY JFile SYSTEM "http://www.home.org/l10n.property" NDATA L10N-PTY>

        <xul:toolbar>


          <label widgetID="8000">Random content data <label>
          <button widgetID="8001"
            cmd="nsCmd:BrowserBack"
            style="background-color:rgb(192,192,192);"
            img="resource:/res/toolbar/TB_Back.gif">Back to &#37;s
          </button>

          <button widgetID="8002"
            cmd="nsCmd:BrowserForward"
            style="background-color:rgb(192,192,192);"
            img="resource:/res/toolbar/TB_Forward.gif">Forward
          </button>

          <button widgetID="8003"
            cmd="nsCmd:BrowserWizard"
            style="background-color:rgb(192,192,192);"
            img="resource:/res/toolbar/TB_Wizard.gif">Wizard
          </button>

        </xul:toolbar>

      Sample property file: property.toolbar

        8000: Random content data
        8001.img: resource:/res/toolbar/TB_Back.gif
        8001: Back to &#37;s
        8002.img: resource:/res/toolbar/TB_Forward.gif
        8002: Forward
        8003.img: resource:/res/toolbar/TB_Wizard.gif
        8003: Wizard

      Sample resource tags definition

        #define RES_TEXT   0x1234
        #define RES_IMG     0x1235

      To get the text string for a "Back" button's label, we call

        gettext(8001, RES_TEXT, "Back to &#37;s")

      Pros

      • All localizable resources are uniquely identified by the combination of widgetID and the resource tags. The application/front end developers can easily update a UI element's attribute/resource.
      • Core development work will not be block by gettext() implementation. However, we shall request the UI developers to put English string, localization notes, and comments in the property file.
      • The fallback mechanism allows the developers to work without the presence of property files.
      • The English version of property file can be automatically generated during XUL to DOM conversion.
      • Provide fallback mechanism to default strings.
      • The property file is flat and in clear text; easy to localize and leverage.
      • The implementation of nsStringBoundle interface is about to finish. The basic facilities of parsing the property file and retrieving text are ready to check in.
      • Consistent with the scheme in "String Resources"; only one file format to deal with.
      • Resources are grouped by widgets. This also makes the property file more readable.
      • Easy to leverage the property file. All resources are IDed and ready for comparision.

      Cons

      • Need to treat content data as the text resource of a label widget. (So it can be identified and edited by application code.)
      • Need to implement a mechanism to automatically bind localizable resources to widgets. However, the amount of work can be reduced by performing the localized resources binding in widget initialization time since we need to bind the UI attributes in the DOM to the underlying widgets anyway.
      • Need to ensure the uniqueness of the widgetID. However, the appCore developers need to have a way to uniquely identify a widget anyway.
      • Localizable resources strings are duplicated twice: one in XUL and the other in property file.
      • Need to extend the Java-like property file to support structured resources.
      • Technical difficulty:once XUL has been converted to DOM tree, the content can't be changed anymore.
    3. Use text entities for content data and IDs for widget resources. (ruled out due to technical difficulty)

      Description: With the marriage of #1 and #2, we can take the advantage of both worlds. The idea is to use text entities for content data to remedy the awkwardness of the #2 approach in dealing with content data.

      Pros:

      • Reference to content data is XML standard compliant (general entity).
      • All localizable resources are uniquely identified.
      • UI developers will be able to specify widget resources directly in XUL. Extraction of localizable resources can be performed in client's build process. Localization is invisible to the UI developers.
      • For UI that does not contain HTML data, we have only one file, the property file, to deal with.
      • For those contain HTML data, we deal with them outside of XUL. This also helps us make the XUL file clean.

      Cons:

      • Why not simply use the DTD approach?
      • Technical difficulty:once XUL has been converted to DOM tree, the content can't be changed anymore.
    4. "@.*;" + property file

      Description: Assuming the "timely access" problem can be overcome, we could get around the "syntax constraint" problem by using an entity-like syntax of our own. That is, we invent something, say we use the "@" symbol like entities use the "&". Then these things are used throughout the content just like entities would have been used to do localization. This still assumes we have some way to get at the language-specific-substitution text after parsing (so it can't be a parser directive; it may have to be some sort of special element that XUL will recognize and not display). If all this worked, we'd be free to stick in localizable text anywhere without constraining the element and attribute structure. The above example

      <element l10nID="100" text="english version"/>

      becomes

      <element text="@100;"/> ( or <element>@100;</element>, if that's more appropriate for the widget).

      There just needs to be a single routine somewhere central that knows where to find the table of localized text strings. It finds "@.*;" sequences and substitutes them. We have to walk the content model after parsing and hand every string to this routine, and widgets have to pass all their text strings through it before they do anything with them.

      Cons:

      • The entity solution is more XML compliant and less work to implement.
    5. Using XLinks and XPointers for XUL Localisation (by Daniel McGowan) (ruled out due to technical difficulty)

      Abstract:
      Use XLink & Xpointer to specifically referance text in a file that is seperate from the base XUL file so that this text can be easily localised and display this text to the end user in manner consistent with XPFE requirements.

      Pros:
      Since it is all written in vanilla XML there is no need to create custom file types and this system can accept anything the parser can handle. It maintains the name value paring essential for localisation. It allows us to add localisation and developer notes to the object (e.g. button) and the localised text separately but maintain a direct link between the two. The text is pulled into the UI elements when the XUL file is parsed. This also addressed the goal of separating markup, style and content.
      Cons:
      This does not leave us with a flat file solution. However the file containing the text to be localised is of such a simple format that writing a tool to parse it is a trivial exercise. We are going need some form of tool to convert native encoding to unicode character references.
      There are 4 files to track! Actually the language specific DTD is complete and valid as is so it could easily be declared inline in the language specific XML file. The link-attributes has been entitised and could conceivably be inherited from a higher level DTD.
      In reloading downloadble chrome, not all related files can be blown away by the client.

      Here is an example syntax needed for a button UI element.
      UI.XUL
      UI.DTD

      <button
         href="&locale/uilang.xml|
         id(1234).child(text)"
      >

         <content-info>
         Put comments on button

         functionality here
         </content-info>

        other xul markup
      </button>

       

      <!ENTITY  % link-attributes
        "xlink:form     CDATA    #FIXED 'simple'
         href           CDATA    #REQUIRED
         content-info   CDATA    #IMPLIED
         show           CDATA    #FIXED 'embed'
         actuate        CDATA    #FIXED 'auto'"
      >

      <!ELEMENT button (#PCDATA)>
      <!ATTLIST button 
          %link-attributes;
          other button specific attributes
      >
       

      UILANG.XML UILANG.DTD
      <loctext id = "1244">
        <text>
         Gallia est omnis divisa in partes tres, 

         quarum unam incolunt Belgae,
         aliam Aquitani, tertiam qui ipsorum lingua
         Celtae, nostra Galli appellantur. 
        </text>
        <note>These are Ceasars first words on Gaul.
               This button soulld be centered on 
               column 1 of the dialog
        </note>
      <!ELEMENT loctext (text, note?)>
      <!ATTLIST loctext id ID #REQUIRED>
      <!ELEMENT text (#PCDATA)>
      <!ELEMENT note (#PCDATA)>

      So, when <button> tag is parsed the "simple" xlink href (which is #REQUIRED) is automatically (actuate = 'auto') embedded (show = 'embed') with the text from the <text> child element of the element with id = 1234 in the file at URI location which is the value of &locale(some more globally set value)/UILANG.XML.

      OK so that is probably a bad explaination but I hope the code is clear enough. If you have any questions don't hesitste to flame away.

      Daniel.

    Comparision (*****: excellent, *: show stopper)

    Criteria to examine XUL + Language-specific DTD XUL + Language-specific property file Description
    Simple ***** **** Need to define resources tags in widget code and widgetID in XUL file. Both core development and localization work shall be made easy and less error prone.
    Leveragible *** Need a parser to list, identify, and compare resources. ***** All resources are in property files which are flat and easier for leveraging. Localization results shall be leveragible from release to release.
    Consistent *** Two file formats, DTD and property file, to deal with. ***** Only one file format, property file. This scheme that will work across modules instead of within the XUL component only.
    Standard compliant ***** **** (We can extend property file format to have similar syntax to X/MOTIF's application default file.) Achievable on all platforms including Unix, Windows, Mac, and others.
    Portable ***** ***** Achievable on all platforms including Unix, Windows, Mac, and others.
    Extensible ***** In the same direction as XML. **** Need to treat content data as the text resource of a label widget. The adopted solution will be flexible for customization and future extension. 
    Dynamic binding *** Resources binding mostly appens in XML parser. ***** Resources binding occurs at the last minute. Some of the items requiring translation may be dynamic, usually because they require string composition ("Installing item 5 of 10").
    Validatible *** (need a DTD parser) ***** (right on the scene) Localizers/translators will be able to validate the localization results.
    Parsable *** (DTD file contains XML tags, keywords, and others) ***** (localizable resources are easily identified) It should be possible to unambiguously and automatically determine which embedded items contain localizable text, and what items need to be locked.
    Invisible (Internationalization) *** (entity defined in external DTD) **** (developers need to assign an id to each resource; but the generation of the US/EN property file could be done by the XUL parser.) As much as possible, the standard tools that create US UI should emit files that already localizable, without requiring additional processing.
    Identifiable **** (entity names are unique; but we lose them after parsing.) **** (all resources are identified by the combination of widgetID and resTag; but we must treat content data as the text resource of a label widget) All resources shall be uniquely identifiable
    Dynamic Language Switching *** Need to reload the XUL **** We can design it to modify localizable attributes only. Dynamically switch to different language and reflect it to UI. (this does not happen quite often.)

    How to locate the language-specific file

    In general, we need two information to locate the language-specific file:
    1. The reference to the location of the Java property files declared in the unparsed entities as described in the solution #2.
    2. The locale information in the client.

    For example, if we declare the entity as

      <!ENTITY JFile SYSTEM "http://www.home.org/l10n-property.xxx" NDATA L10N-PTY>

    And, the current locale is "ja". Then, our real URI is

      "http://www.home.org/l10n-property_ja.xxx"
      or
      "http://www.home.org/ja/l10n-property.xxx"

    The real location of the language-specific DTD file can be determined in a similar fashion.

    References