The Mozilla
Organization
Our Mission
Who We Are
Getting Involved
Community
Editorials
What's New
Development
Roadmap
Module Owners
Blue Sky
Projects
Status
Tools
Products
Source Code
Binaries
Documentation
License Terms
Bug Reports
Search
Feedback


XUL Localizability issues

by Tao Cheng< tao@netscape.com >

Document History

  • 02/26/99: Correct entity reference syntax.
  • 02/19/99: Insert two XUL L12y solutions ""@.*;" + property file", proposed by Daniel Matejka, and "Using XLinks and XPointers for XUL Localisation" by Daniel McGowan.
  • 02/17/99: At last, a consensus of the solution for XUL localizability has been reached. The "XUL + language-specific DTD" is adopted.
  • 02/10/99: In solution #2, relax the rule that the content data must be a text attribute value; instead, treat content data as the element content. See the samples in solution #2.
  • 02/09/99: Add explanation of the "default string" to solution #2.
  • 02/08/99: Add "How to locate the language-specific file" section.
  • 02/08/99: Take out all out-dated sections.
  • 02/08/99: Re-evaluate solution #1 and #2 in the comparison table.
  • 02/08/99: Revised solution #2 to reduce the number of IDs needed for resources identification. In the revised version, resources are uniquely identified by the combination of widget ID and resource tag. Java-property file format is also extended to support structuralized resources.
  • 02/05/99: Revamp the "Candidates of the final solution" section and add a table of comparison.
  • 02/05/99: Compile for final review; record more discussion.
  • 01/29/99: Record more discussion.
  • 01/28/99: Added two sections: "Ideas" and "Candidates of the final solution".
  • 01/28/99: Compiled feedback from Daniel Matejka.
  • 01/25/99: Feedback from Daniel Matejka in red.
  • 01/25/99: Post this document to newsgroup, news://news.mozilla.org/netscape.public.mozilla.xpfe, for broader audience and discussion.
  • 01/25/99: Per Scott Collins, http://www.meer.net/ScottCollins/, not all UI widgets can be described in XUL file. Q: does this mean we need another mechanism to solve the non-XUL UI components?
  • 01/20/99: Add a new section, "How does the XUL concept work?"
  • 01/20/99: In the "XUL Localizability issues" held today, it's proposed that
    • Get more up-to-date documents on XUL spec architecture so that we can come up with a solution.
    • Need get a workable sample XUL application so that we can identify the problem better.
    • Choose from option #1 and #2 to embed localization information in XUL. Then combine the strength of option #3, String Resources, option #4, gettext() as the underlying mechanism to retrieve text strings. We may load the string resources from a locale suffixed property file and fall back to the default strings, as described in gettext(), when needed.
    • 1 and 2 are mostly headaches for localization and build people, and I can't really speak for them. But both numbers 3 and 4 demand extra implementation work.

  • 01/20/99: Incorporate Rob Thorne's comments and suggestions (in blue).
  • 01/19/99: Add reference to Erik's String Resources . We might be able to consolidate the idea presented there with the gettext() scheme.
  • 01/16/99: First draft.

Goals

This document serves the following purposes
  • Identify the Internationalization and Localization requirements in the Seamonkey project.
  • Discuss the XUL Localizability issues in Seamonkey, 5.0.
  • Record the proposed solutions, and their pros and cons.
  • Keep track of the status of the related issues.
  • Feature complete by first beta so it can be tested.

Principles

Here are a list of principles the author intends to follow in seeking the solution for this issue.
  • Simple. Both core development and localization work will be easy and less error prone. Win-win situation ;)
  • Leveragible. Localization results shall be leveragible from release to release. Localization costs money.
  • Consistent. If possible, we shall seek a scheme that will work across modules instead of within the XUL component only.
  • Portable. The final solution will be achievable on all platforms including Unix, Windows, Mac, and others.
  • Extensible. The adopted solution will be flexible for customization and future extension.
  • Dynamic binding. Some of the items requiring translation may be dynamic, usually because they require string composition ("Installing item 5 of 10").
  • Validatible. Localizers/translators will be able to validate the localization results.
  • Parseable:  It should be possible to unambiguously and automatically determine which embedded items contain localizable text, and what items need to be locked.
  • Invisible (Internationalization). As much as possible, the standard tools that create US UI should emit files that already localizable, without requiring additional processing.

Candidates of the final solution

  1. XUL + language-specific DTD. (adopted)

    Description:

    • Put all localizable resources in a language DTD file. Example of such resources are text strings, customizable icons, and URLs. Most of them can be described by text/parsed entities.
    • Non locale sensitive resources shall not be in this DTD file.
    • Use SYSTEM identifier to reference this DTD file.
    • Need to implement locale sensitive file lookup for the language specific DTD file.
    • Put format strings, such as "Item %d of %d", in text entities and compute the value in the application code such as MailCore or BrowserCore.
    • To dynamically switch languages, we need to reload the XUL and its DTD (probably from a remote host). This is because once the DOM tree is created, the entities and DTDs have already been processed.

    Sample XUL: toolbar.xul


      <!DOCTYPE xui SYSTEM "toolbar.dtd">

      <xul:toolbar>

        &txtContentData;
        <button cmd="nsCmd:BrowserBack" style="background-color:rgb(192,192,192);">
          <img src="resource:/res/toolbar/TB_Back.gif"/>
          &txtBack;
        </button>

        <button cmd="nsCmd:BrowserForward" style="background-color:rgb(192,192,192);">
          <img src="resource:/res/toolbar/TB_Forward.gif"/>
          &txtForward;
        </button>

        <button cmd="nsCmd:BrowserWizard" style="background-color:rgb(192,192,192);">
          <img src="&iconWizard;"/>
          &txtWizard;
        </button>

      </xul:toolbar>

    Sample DTD: toolbar.dtd


      <!ENTITY txtContentData "Random content data">
      <!ENTITY txtBack "Back to %s">
      <!ENTITY txtForward "Forward">
      <!ENTITY iconWizard "resource:/res/toolbar/TB_Wizard.gif">
      <!ENTITY txtWizard "Wizard">

    Pros:

    • Already standard compliant; no new syntax names or tags need to be introduced.
    • Only one minor tweak needed: escape "%" used in formatting string, such as "%d out of %d" for dynamic strings binding. For example, use a numeric character reference (NCR), '&#37;' to escape '%'.
    • Text replacement can be in either content or attribute values (but not in the attribute names).

    Cons:

    • The language-specific DTD file is not flat file. Need a DTD parser to extract localizable resources into a flat file for localizers.
    • Two file formats to deal with: the property file and the DTD file.
    • Hard to group text entity by UI component.
    • We lose the information of text entities after parsing.
    • In switching languages, we need to reload the XUL and its DTD (probably from a remote host) and reconstruct the DOM tree.

      In the example of a dialog UI, if we used entities and DTDs, we would have to tear down the whole DOM tree and the dialog that sits on top of that, and then rebuild a new DOM tree and dialog. This would be wasteful, since our layout manager is able to resize elements dynamically, so we can "edit" the DOM tree and have the dialogs redraw themselves automatically.

      However, we can live with this performance drag since the users might not switch language in runtime that often.

  2. Single XUL file with Java-like property file. (ruled out due to technical difficulty)

    Descriptions:

    • Assign a widgetID to each widget in XUL file, and a resTag to each localizable resource/attribute of a widget in the widget code. Then, call gettext(widgetID, resTag, default_string) to retrieve the resources from a Java-like property file in runtime. For example, a label widget can be described as <label widgetID="345" text="label string"/> in a XUL file. Then, the function call to retrieve localized text will be gettext(345, RES_TEXT, "label string");
    • If the property file does not exist or the combination of widgetID and resTag does not resolve to a resource string, the default_string will be returned in instead.
    • All localizable resources must be stored in Java-like property file.
    • The resources replacement may happen as early as in parsing or as late as in widget initialization.
    • Reference to the property file will be declared as an external unparsed entity and stashed in the DOM tree for later use. See sample XUL declaration below.

    Sample XUL: toolbar.xul


      <!DOCTYPE xui SYSTEM "toolbar.dtd">

      <!-- L10N-PTY type of data: file format can be found at http://www.netscape.com/PropertyFile -->
      <!NOTATION L10N-PTY SYSTEM "http://www.netscape.com/PropertyFile">
      <!ENTITY JFile SYSTEM "http://www.home.org/l10n.property" NDATA L10N-PTY>

      <xul:toolbar>


        <label widgetID="8000">Random content data <label>
        <button widgetID="8001"
          cmd="nsCmd:BrowserBack"
          style="background-color:rgb(192,192,192);"
          img="resource:/res/toolbar/TB_Back.gif">Back to &#37;s
        </button>

        <button widgetID="8002"
          cmd="nsCmd:BrowserForward"
          style="background-color:rgb(192,192,192);"
          img="resource:/res/toolbar/TB_Forward.gif">Forward
        </button>

        <button widgetID="8003"
          cmd="nsCmd:BrowserWizard"
          style="background-color:rgb(192,192,192);"
          img="resource:/res/toolbar/TB_Wizard.gif">Wizard
        </button>

      </xul:toolbar>

    Sample property file: property.toolbar

      8000: Random content data
      8001.img: resource:/res/toolbar/TB_Back.gif
      8001: Back to &#37;s
      8002.img: resource:/res/toolbar/TB_Forward.gif
      8002: Forward
      8003.img: resource:/res/toolbar/TB_Wizard.gif
      8003: Wizard

    Sample resource tags definition

      #define RES_TEXT   0x1234
      #define RES_IMG     0x1235

    To get the text string for a "Back" button's label, we call

      gettext(8001, RES_TEXT, "Back to &#37;s")

    Pros

    • All localizable resources are uniquely identified by the combination of widgetID and the resource tags. The application/front end developers can easily update a UI element's attribute/resource.
    • Core development work will not be block by gettext() implementation. However, we shall request the UI developers to put English string, localization notes, and comments in the property file.
    • The fallback mechanism allows the developers to work without the presence of property files.
    • The English version of property file can be automatically generated during XUL to DOM conversion.
    • Provide fallback mechanism to default strings.
    • The property file is flat and in clear text; easy to localize and leverage.
    • The implementation of nsStringBoundle interface is about to finish. The basic facilities of parsing the property file and retrieving text are ready to check in.
    • Consistent with the scheme in "String Resources"; only one file format to deal with.
    • Resources are grouped by widgets. This also makes the property file more readable.
    • Easy to leverage the property file. All resources are IDed and ready for comparision.

    Cons

    • Need to treat content data as the text resource of a label widget. (So it can be identified and edited by application code.)
    • Need to implement a mechanism to automatically bind localizable resources to widgets. However, the amount of work can be reduced by performing the localized resources binding in widget initialization time since we need to bind the UI attributes in the DOM to the underlying widgets anyway.
    • Need to ensure the uniqueness of the widgetID. However, the appCore developers need to have a way to uniquely identify a widget anyway.
    • Localizable resources strings are duplicated twice: one in XUL and the other in property file.
    • Need to extend the Java-like property file to support structured resources.
    • Technical difficulty:once XUL has been converted to DOM tree, the content can't be changed anymore.
  3. Use text entities for content data and IDs for widget resources. (ruled out due to technical difficulty)

    Description: With the marriage of #1 and #2, we can take the advantage of both worlds. The idea is to use text entities for content data to remedy the awkwardness of the #2 approach in dealing with content data.

    Pros:

    • Reference to content data is XML standard compliant (general entity).
    • All localizable resources are uniquely identified.
    • UI developers will be able to specify widget resources directly in XUL. Extraction of localizable resources can be performed in client's build process. Localization is invisible to the UI developers.
    • For UI that does not contain HTML data, we have only one file, the property file, to deal with.
    • For those contain HTML data, we deal with them outside of XUL. This also helps us make the XUL file clean.

    Cons:

    • Why not simply use the DTD approach?
    • Technical difficulty:once XUL has been converted to DOM tree, the content can't be changed anymore.
  4. "@.*;" + property file

    Description: Assuming the "timely access" problem can be overcome, we could get around the "syntax constraint" problem by using an entity-like syntax of our own. That is, we invent something, say we use the "@" symbol like entities use the "&". Then these things are used throughout the content just like entities would have been used to do localization. This still assumes we have some way to get at the language-specific-substitution text after parsing (so it can't be a parser directive; it may have to be some sort of special element that XUL will recognize and not display). If all this worked, we'd be free to stick in localizable text anywhere without constraining the element and attribute structure. The above example

    <element l10nID="100" text="english version"/>

    becomes

    <element text="@100;"/> ( or <element>@100;</element>, if that's more appropriate for the widget).

    There just needs to be a single routine somewhere central that knows where to find the table of localized text strings. It finds "@.*;" sequences and substitutes them. We have to walk the content model after parsing and hand every string to this routine, and widgets have to pass all their text strings through it before they do anything with them.

    Cons:

    • The entity solution is more XML compliant and less work to implement.
  5. Using XLinks and XPointers for XUL Localisation (by Daniel McGowan) (ruled out due to technical difficulty)

    Abstract:
    Use XLink & Xpointer to specifically referance text in a file that is seperate from the base XUL file so that this text can be easily localised and display this text to the end user in manner consistent with XPFE requirements.

    Pros:
    Since it is all written in vanilla XML there is no need to create custom file types and this system can accept anything the parser can handle. It maintains the name value paring essential for localisation. It allows us to add localisation and developer notes to the object (e.g. button) and the localised text separately but maintain a direct link between the two. The text is pulled into the UI elements when the XUL file is parsed. This also addressed the goal of separating markup, style and content.
    Cons:
    This does not leave us with a flat file solution. However the file containing the text to be localised is of such a simple format that writing a tool to parse it is a trivial exercise. We are going need some form of tool to convert native encoding to unicode character references.
    There are 4 files to track! Actually the language specific DTD is complete and valid as is so it could easily be declared inline in the language specific XML file. The link-attributes has been entitised and could conceivably be inherited from a higher level DTD.
    In reloading downloadble chrome, not all related files can be blown away by the client.

    Here is an example syntax needed for a button UI element.
    UI.XUL
    UI.DTD

    <button href="&locale/uilang.xml|id(1234).child(text)">
       <content-info>Put comments on button
                     functionality here
       </content-info>

      other xul markup
    </button>

     

    <!ENTITY  % link-attributes
      "xlink:form     CDATA    #FIXED 'simple'
       href           CDATA    #REQUIRED
       content-info   CDATA    #IMPLIED
       show           CDATA    #FIXED 'embed'
       actuate        CDATA    #FIXED 'auto'"
    >

    <!ELEMENT button (#PCDATA)>
    <!ATTLIST button 
        %link-attributes;
        other button specific attributes
    >
     

    UILANG.XML UILANG.DTD
    <loctext id = "1244">
      <text>Gallia est omnis divisa in partes tres, 
            quarum unam incolunt Belgae, aliam Aquitani, 
            tertiam qui ipsorum lingua Celtae, 
            nostra Galli appellantur. 
      </text>
      <note>These are Ceasars first words on Gaul.
             This button soulld be centered on 
             column 1 of the dialog
      </note>
    <!ELEMENT loctext (text, note?)>
    <!ATTLIST loctext id ID #REQUIRED>
    <!ELEMENT text (#PCDATA)>
    <!ELEMENT note (#PCDATA)>

    So, when <button> tag is parsed the "simple" xlink href (which is #REQUIRED) is automatically (actuate = 'auto') embedded (show = 'embed') with the text from the <text> child element of the element with id = 1234 in the file at URI location which is the value of &locale(some more globally set value)/UILANG.XML.

    OK so that is probably a bad explaination but I hope the code is clear enough. If you have any questions don't hesitste to flame away.

    Daniel.

Comparision (*****: excellent, *: show stopper)

Criteria to examine XUL + Language-specific DTD XUL + Language-specific property file Description
Simple ***** **** Need to define resources tags in widget code and widgetID in XUL file. Both core development and localization work shall be made easy and less error prone.
Leveragible *** Need a parser to list, identify, and compare resources. ***** All resources are in property files which are flat and easier for leveraging. Localization results shall be leveragible from release to release.
Consistent *** Two file formats, DTD and property file, to deal with. ***** Only one file format, property file. This scheme that will work across modules instead of within the XUL component only.
Standard compliant ***** **** (We can extend property file format to have similar syntax to X/MOTIF's application default file.) Achievable on all platforms including Unix, Windows, Mac, and others.
Portable ***** ***** Achievable on all platforms including Unix, Windows, Mac, and others.
Extensible ***** In the same direction as XML. **** Need to treat content data as the text resource of a label widget. The adopted solution will be flexible for customization and future extension. 
Dynamic binding *** Resources binding mostly appens in XML parser. ***** Resources binding occurs at the last minute. Some of the items requiring translation may be dynamic, usually because they require string composition ("Installing item 5 of 10").
Validatible *** (need a DTD parser) ***** (right on the scene) Localizers/translators will be able to validate the localization results.
Parsable *** (DTD file contains XML tags, keywords, and others) ***** (localizable resources are easily identified) It should be possible to unambiguously and automatically determine which embedded items contain localizable text, and what items need to be locked.
Invisible (Internationalization) *** (entity defined in external DTD) **** (developers need to assign an id to each resource; but the generation of the US/EN property file could be done by the XUL parser.) As much as possible, the standard tools that create US UI should emit files that already localizable, without requiring additional processing.
Identifiable **** (entity names are unique; but we lose them after parsing.) **** (all resources are identified by the combination of widgetID and resTag; but we must treat content data as the text resource of a label widget) All resources shall be uniquely identifiable
Dynamic Language Switching *** Need to reload the XUL **** We can design it to modify localizable attributes only. Dynamically switch to different language and reflect it to UI. (this does not happen quite often.)

How to locate the language-specific file

In general, we need two information to locate the language-specific file:
  1. The reference to the location of the Java property files declared in the unparsed entities as described in the solution #2.
  2. The locale information in the client.

For example, if we declare the entity as

    <!ENTITY JFile SYSTEM "http://www.home.org/l10n-property.xxx" NDATA L10N-PTY>

And, the current locale is "ja". Then, our real URI is

    "http://www.home.org/l10n-property_ja.xxx"
    or
    "http://www.home.org/ja/l10n-property.xxx"

The real location of the language-specific DTD file can be determined in a similar fashion.

References



Copyright © 1998 The Mozilla Organization.