 |
|
 |
|
Seamonkey Localization and Leveraging Tools
by Tao Cheng < tao@netscape.com
>
Document History
- 03/29/99: Second pass revision. Post to the groups to collect comments
and feedbacks.
-
03/25/99: Move "XUL Localizability dependency" to XUL
Localizability issues; start to work on the full spec of L10n Tools.
-
03/18/99: Finishing up the first draft.
Introduction
The quality of localization of a software product relies on collective
efforts from various sources including good internationalization planning and
support, localizability consideration in the software design and
implementation phases, and a set of state-of-the-art localization tools.
A set of good localization tools not only increases the quality of the
localization but also reduces the related cost. The localization and
leveraging tools we created for earlier generations of Communicator clients
set good example of this concept.
Seamonkey is our next generation browsing and messaging client whose
internal architecture is undergoing drastic restructuring. Most of the
components, including user interface, will be rewritten using new
technologies such as AOM, XUL, and XPCOM style modularization.
Subsequently, the tools for earlier client release become practically
insufficient and potentially inefficient. To solve this problem, we need to
review our existing tools and revamp them to fulfill the requirements of the
new generation client products.
In this document, the author will analyze the requirements of the new
tools and examine our existing tools for their capacity. Then come up with a
outline of how to develop localization and leveraging tools
for our next generation client product, Seamonkey.
Finally, as many other documents, this document is to be updated as we
collect more information and the client modules evolve.
Design Principles
Before jumping to the design and implementation of the tools, it is a good
idea to list the principles we intend to follow along the design and
implementation process:
-
Portability. Seamonkey is designed toward the direction of
platform independence. There is no reason to develop platform
dependent tools for a platform independent product.
-
Consistency. It would be a localization nightmare if we need a
different set of tools for each individual component of the client product.
Needing different sets of tools for different components of the client
indicates our client localizability solutions are not consistent nor
developers friendly. For localizers, multiple file formats reduces their
productivity, too. Therefore, we shall advocate a singular localizability
solution across all client modules and components wherever feasible.
-
Validability. Normally, localization work is performed at a remote
site. The translators need to be able to validate the results in their local
environment. Sending translations back and forth between we and localizers
is time consuming and inclined to misplace or lose data.
-
Leveragibility. Localization is costly. The ability to leverage is
essential to bring down the cost. In addition, it will make the translation
consistency among releases.
-
Flat file. Ideally, we want to developers to put localizable
resources in flat file; if not feasible, we shall convert them to flat
files.
Strategies
With the principles in mind, here are our design and implementation strategies:
-
Centralized. We shall eliminate the number of different file formats
need to process and localize. In client development, we want to adopt the
same localizability solution whenever applicable. For client localization
, we want to reduce the variety of file format the translators need to
deal with. Therefore, we need to adopt a singular file format as the center
of localization. Other file formats need to be converted to this format
before sending to localization vendors.
-
Incremental. In consideration of time constraint, we want to develop
the tools incrementally: begin with basic features, then gradually migrate
to the full blown version. This strategy also fits the nature of software
development cycle: incremental evolution and iterative refinement.
-
Componentized. A good software system shall be be highly
componentized. The concept of Data abstraction and information hiding
shall be applied to identify our problem domain and system functionality.
Properties and behaviors of each component will be analyzed and identified
so that they are highly cohered. Relationships and interfaces between
components will be clearly defined so that there are loosely coupled.
-
Pluggable. If our system is highly componentized, individual modules
shall be pluggable. This will give us a great advantage of leveraging
existing tool components while keeping the flexibility of adapting to new
technology.
Localization work in Seamonkey
In Mozilla 5.0, L10n work can be divided into three categories:
XUL-based User Interfaces. The adopted localizability
solution for XUL based UI is the language-specific
DTD approach. All localizable resources needed in XUL will be declared
as general entities in the external DTD and substituted by the XUL parser
before the XUL is converted into DOM tree. Resource strings in this type
of DTD files need to be extracted into intermediate file format, say L10n
file, that the localization vendors can easily deal with.
Base component modules and non-XUL describable
UI such as native widgets. The "nsIStringBundle" is our 5.0 equivalent
of XP_GetString(). It provides a COMified interface to retrieve resource
strings from Java-property-like files. Resource strings in such files need
to extracted into the so called L10n file for the vendors as well.
Legacy code. Seamonkey contains some 4.x modules
which use XP_GetString() to get resource strings from resource files. To
localize the legacy code, we can either create a set of C wrappers around
nsStringBundle() or use the 4.x XP_GetString() approach directly. The former
is preferred.
Plan
Based on what we have established so far, the author would like to layout
the plan, by priority, for localization and leveraging tools as below.
-
Define the intermediate file format, say L10n, as the center of the
localization and leveraging work.
-
Write tools to convert files between the L10n format and all existing file
formats such as the property file and DTD file in Seamonkey or the DOG
file in legacy code. These tools shall be highly modularized and pluggable
so that they can be built into either command line tools or graphical user
interface based tools.
-
Write tools to collect resource strings from all L10n files, organize them
into a sortable data structure. The purpose of this is to compare translations
between modules or UI components so that we can leverage them among different
components, different releases and even different products. The ability
of collecting and comparing also allows us to increase the consistency
and quality of translations.
-
Build graphic user interface based tools on top of the command line tools.
Since modularization and pluggability has been the design and implementation
strategies, we shall be able to reuse developed functions in the command
line tools and construct user interface on top of them.
Tasks breakdown and estimated resources allocation
The plan laid out above will be executed in an incremental fashion, phase
by phase, as listed below.
-
Prerequisite. Define a flat file format which will be used to store
translation. So far, there are two file formats, the Java property file
format and the 4.x DOG file format, under discussion. The final decision
is pending on the inputs from our localization vendors and OEM customers.
Java property file
Pros
- Industry standard. The Java property file format is the center file
format of localization for Java applications. It's more likely that
localization vendors are more comfortable with this file format.
- It's the file format being used by the nsStringBundle interface in
Seamonkey. Another standalone module also uses a similar file format.
Cons
- Some feedbacks suggest that translators might accidentally damage the
keys in the property file.
DOG file
Pros
- It's the central file format for 4.x localization. All 4.x localization
and leveraging tools work with DOG files. We can save lots of implementation
efforts by adopting DOG format.
Cons
- It's a proprietary file format; we might be the only company uses this
format.
- The property file used by nsStringBundle and other modules needs to be
converted into DOG format. Although, this shall not be a significant effort.
-
Phase 1 (entry level). Command line localization
tools needed for each of the first two categories:
-
DTD files (for XUL)
-
DTD files-> L10n files. A DTD parser to extract entity names and values
pairs into L10n files which will be sent to vendors to localization. Entity
values containing markup or URLs can be dealt separately.
-
L10n files-> DTD files. A flat file parser to extract result strings and
replace English strings in DTD with localized ones.
-
Property files (for nsStringBundle)
-
Property file-> L10n files.
-
L10n files -> property files.
-
Legacy code. Create a set of C wrappers around nsStringBundle() or use
the 4.x XP_GetString() approach. The former is preferred.
-
Phase 2 (intermediate level).
-
A localization tool (and a potential leveraging tool) to
-
Collect all localization results from the L10n files generated from phase 1.
-
Sort localization results by different attributes such as
-
by English text
-
by UI components (menu item...)
-
by application component (navigator window, mail/news windows)
-
Dump collected translations by sorted attributed (mostly ID) to a single
L10n file.
-
Leveraging tool
-
Match/find a localized resource by a given "English string".
-
Automated batch job to leverage existing translations.
-
Phase 3 (advanced level)
-
GUI based cross platform tool (might be a browser-based) to give WYSWYG
affect. The translators can verify the result as they progress.
-
The translators can leverage/import existing results from other components.
-
Leveraging 4.x results.
-
Dump 4.x result to L01n files for 5.0 use.
Tasks breakdown & Estimated Time Table *
| Task |
ID |
Predecessors |
Start |
Finish |
Status |
Ownership |
| Seamonkey L10n/leveraging Tools spec. - first draft |
|
|
03/17/99 |
03/19/99 |
collected feedbacks from bobj: 1), need more discussion on how to locate
the DTD file; 2), our long term plan for the DTD parser shall base on the
xpat library. |
tao |
| Seamonkey L10n/leveraging Tools spec. - second draft |
|
|
03/26/99 |
03/30/99 |
|
tao |
| L10n and Leveraging Tools |
|
|
----- |
----- |
|
|
| Define a flat file format, *.l10n, which will be used to store translation. |
XL13 |
|
03/19/99 |
03/23/99 |
3 days; need to decide what type of files for localization vendors
to work on; pending on inputs from l10n vendors, Dublin, and IBM. |
tao |
| L10n Tools - phase 1 (XUL) - use xpat to extract entity names/values
from DTD |
|
|
----- |
----- |
= |
|
| L10n Tools - phase 1 (XUL) 1). Learn xpat |
XL6 |
|
|
|
3 days |
|
| L10n Tools - phase 1 (XUL) 2). Write a standalone parser lives
in client |
XL6 |
|
|
|
5 days |
|
| L10n Tools - phase 1 (XUL) 3). Dump entity names/values into the *.l10n
file |
XL10 |
XL13 |
|
|
2 days |
|
| L10n Tools - phase 1 (XUL) 4). XP testing on parser |
XL6 |
XL6, XL10 |
|
|
2 days |
|
| L10n Tools - phase 1 (XUL) 5). Extract L10n results from *.l10n file |
XL11 |
XL13 |
|
|
2 days (might be able to use existing L10n/Leveraging tools) |
|
| L10n Tools - phase 1 (XUL) 6). Replace the associated entity names/values
in DTD |
XL11 |
|
|
|
3 days (need to preserve the original position and context of the DTD) |
|
| L10n Tools - phase 1 (property file) 1) Extract resource id and
value pair from property file |
|
|
|
|
1.5 days |
|
| L10n Tools - phase 1 (property file) 2) dump resource ID and US strings
to *.l1n file |
|
|
|
|
1 day |
|
| L10n Tools - phase 1 (XUL) 5). Extract L10n results from *.l10n file
and replace the associated entity names/values in property file |
|
|
|
|
3 days (need to preserve the original position and context of the property
file) |
|
| L10n Tools - phase 2 |
|
|
----- |
----- |
= |
|
| L10n Tools - phase 2. 1). Tool to collect localization results from
*.l10n file |
|
|
|
|
3 days |
|
| L10n Tools - phase 2. 2). Make localization results sortable. |
|
|
|
|
4 days |
|
| L10n Tools - phase 2. 3). Sort localization results by unique
resource ID |
XL9 |
|
|
|
1 day |
|
| L10n Tools - phase 2. 4). Sort localization results by English
text. |
XL11 |
|
|
|
1 day |
|
| L10n Tools - phase 2. 5). Sort localization results by UI components
(menu item...). |
XL7 |
|
|
|
1 day |
|
| L10n Tools - phase 2. 6). (optional) Sort localization results
by application component (navigator window, mail/news windows) |
|
|
|
|
1 day |
|
| Leveraging tool - 1). Match/find a localized resource by a given "English
string". |
|
|
|
|
1 day |
|
| Leveraging tool - 2). Automated batch job to leverage existing translations. |
|
|
|
|
3 days |
|
| L10n Tools - phase 3 |
|
|
|
|
|
|
| L10n Tools - phase 1 (legacy code) |
|
|
----- |
----- |
|
|
| Need a set of C wrappers around nsIStringBundle so that non C++ components
can use property file. |
|
|
|
|
10 days |
|
| Investigate if we need to leverage 4.x translation results. |
XL17 |
|
|
|
3 days |
|
| Leveraging tool - collect all translations from existing DOG files
and build a table for matching. |
XL17 |
|
|
|
5 days |
|
| Leveraging tool - build a bridge to import translation from 4.x to
5.0. |
XL17 |
|
|
|
5 days |
|
| JavaScript L12y |
XL20 |
|
|
|
20 days |
tao |
| L10n validator |
XL21 |
|
|
|
15 days |
|
| Pseudo localization |
XL18 |
|
|
|
5 days |
mantse |
* finish date is estimated.
+ M4: 04/06/99; M5: 04/27/99; M6: 05/18/99.
! XL18, XL20, XL21 might need to be reassigned or swapped
with some lower priority items.
|
|
 |