|
|
libi18n Module Description
Discussion: netscape.public.mozilla.i18n or
mozilla-i18n@mozilla.org
Last Update: March 31,1998
Contact: Bob Jung <bobj@netscape.com>
Contents
Introduction
The Mozilla family (Navigator and Communicator) is globally enabled.
Globally enabled software shares common source code from which we build
a single binary executable (per platform) that supports a wide variety
of languages. The initial Mozilla source release supports Western,
Central European, Chinese, Japanese, Korean, Greek, Turkish and Cyrillic
languages. (For an overview of Mozilla Internationalization (I18N) and
Localization (L10N), check out the Mozilla Internationalization
& Localization Guidelines.)
Libi18n provides the underlying internationalization utility
functions used in Mozilla to support international Web browsing and Internet
Mail/News functionality. The emphasis is on underlying because
there is a lot of other code that must be written in order to internationalize
features.
Mozilla programmers should call the libi18n APIs wherever possible,
but should also expect to write module and feature specific I18N aware
code. Check out the other Mozilla modules to see how this has been
done. In addition to calling libi18n, significant amount of programming
has been required to internationalize the
HTML layout engine, the
front end (UI and text rendering) code, and mail/news.
This document only provides an overview of the libi18n module.
For information on general I18N issues and the I18N of other Mozilla modules
see I18N Guidelines.
The functions that libi18n provides to other Mozilla modules include:
Character Code Conversion
Finding Character Boundaries
Handling I18N related HTTP Headers
Line/Word Breaking (for text layout support)
Locale Sensitive Operations (collation, date/time formatting)
Mail/News Header Processing
Platform Independent String Resources
String Comparison
Unicode String Functions
The corresponding libi18n public API specifications are documented in the
International Library Reference.
History
With a very small I18N team and tight product release schedules, our strategy
over the past 3 years has been to incrementally add features -- prioritized
by Netscape's international market needs.
Our initial work for Netscape Navigator (NN) 1.1 focused on adding Japanese
Web browsing capability. We invented the notion of a document character
set and a window (or font encoding) character set and provided a stream
module to convert incoming text documents from the document charset to
the window charset. This streams module and various Japanese charset
converters were the first libi18n functions. After the first Beta, we added
the ability in libi18n to auto-detect between the 3 common Japanese charset
encodings: Shift_JIS, JIS and EUC-JP.
NN1.1 was a significant advancement for Japanese Web browsing and was
well received. However, all of its UI was still in English.
In order to localize NN, we created a special "i-build" (NN1.1i) because
NN1.1 was full of hard-coded strings and other localization unfriendly
coding practices. We added libi18n APIs to make it easier to resource
user visible strings. NN1.1i was then localized into Japanese, German
and French -- Netscape's first localized releases! The localizability
infrastructure created for NN1.1i was then merged back into the mainstream
source code for NN2.x and later releases.
NN2.x extended our charset support beyond Western and Japanese.
Our NN1.1 stream module and charset converter architecture were designed
to be extensible (not Japanese centric) which made it straightforward to
add Chinese, Korean and Central European charset encodings support in the
NN2.0 libi18n.
Other NN2.x libi18n additions included:
-
Enhancing the charset concept to be on a per window/context base instead
of globally affecting all windows/contexts
-
RFC1522 support to handle MIME headers. (Really these functions should
migrate from libi18n to the libmime library.)
-
XP locale support (e.g., sorting, time & date)
-
HTTP Accept-Language header support
NN3.x libi18n added:
-
Additional charset converters for Cyrillic, Greek and Turkish
-
Enhanced line wrapping for Asian languages (kinsoku shori)
NN4.x libi18n added:
-
Unicode 2.0 converters
-
Korean charset auto-detection
-
HTTP Accept-Charset header support
The overall (not just libi18n) evolution of the Netscape client I18N and
L10N support is highlighted by a table of the Netscape
I18N/L10N Client History.
How
It Works
Libi18n is a collection of fundamental internationalization functions.
So it is difficult to write How It Works because there really are
several "it"s. In this document, we mention a few of the bigger "it"s
and include links to others.
Document Charset Conversion
One of the most important functions provided by libi18n is character set
conversion of the incoming text data. As each block of text data
is received from the net (or cache), the libi18n stream module heuristically
determines (to the best of its ability) the character set encoding of the
incoming document, then it converts the data block from the "document"
character encoding to the "window" character encoding (usually equivalent
to the font encoding) before passing the data downstream to the HTML parser
and layout engine.
Currently the HTML parser and layout engine assumes HTML special characters
(e.g., '<', '>') in text data passed downstream to them are encoded
as ASCII values. Therefore ISO-2022-xx and other 7-bit encodings
such as UTF-7 and HZ are converted to an ASCII "superset" encoding, and
UCS-2 is converted to UTF-8 by the libi18n conversion module before being
sent downstream to the HTML parser.
The character set converters called by the libi18n stream module must
maintain state because (1) the text data may be stateful or contain multibyte
characters and (2) state is needed in some cases in which libi18n auto-selects
from a few character encodings (e.g., between the 3 common Japanese encodings).
The actual character set conversion functions can be categorized in
three types:
-
Algorithmic conversions for Chinese, Japanese and Korean (e.g., Shift_JIS
<-> EUC-JP)
-
Table driven for 1-byte to 1-byte encodings (e.g., CP1250 <-> ISO8859-2)
-
Table driven for Unicode conversions
The document character set encodings currently supported by Communicator
are listed in the Netscape More
Tips and Technical Information for International Users.
See the documentation on the Mozilla network library in the mozilla.org
list of technical papers for more information on the Mozilla streams
architecture.
Managing Charset Encodings
In addition, to doing the initial charset conversion of the text document
data, Mozilla needs to track and manage the charset information, so that
any text input, display or manipulation is performed correctly. The
charset has significant effect on layout
and editing including the behavior of line wrapping, selection,
copy and paste. The behavior of the front
ends (MacFE, WinFE and XFE) is also greatly affected by the charset
information (e.g., how they measure and draw).
There are several types of Mozilla contexts (e.g., Web browsing, HTML
composing, mail reading, mail composing) that need to track and use the
charset information. Libi18n provides the APIs to manage the getting and
setting for information in the charset object.
XP Locale Functions
The Cross Platform (XP) locale functions provide platform independent APIs
for string collation and date/time formatting. Because these are
wrappers to the existing locale functions provided by the operating system
the behavior may not be totally consistent across platforms.
Other libi18n Functions
There's more functionality provided by libi18n, but this document is intended
to provide a brief overview. For more info on how to write code using
the libi18n functions see the description of the libi18n public APIs, International
Library Reference.
Where
It's Headed
-
Modularity
-
The most important next step for the libi18n module is to modularize.
Currently the interfaces are not cleanly separated from the rest of the
client. This is the highest priority because when we achieve modularity
it will make further development easier and faster.
-
Extensibility
-
The second most important step is to make underlying support (e.g., encoding
conversions) easily extensible without modifying the library itself.
Adding a new simple language or character encoding should be simply a matter
of dropping in a new binary module.
-
Resource Handling
-
The localizable resources need to be modularized. Each component/module
should maintain its own set of localized resources rather than the one
pot (e.g., allxpstr.h)
of resources for all modules. This should go hand-in-hand with the
general Mozilla push towards modularization.
More utility functionality
-
More flexible and powerful message formatting
-
Enhancements to current string and character processing
-
String creation/destruction
-
String functions (extract, replace, concatenate...)
-
Character attributes
-
Platform Independent Locale management
-
Platform Independent Collation
-
Enhanced code set detection
-
Date/Time/Number formatting
-
Text boundary detection
Please contact us at netscape.public.mozilla.i18n
and let us know if you would like to help work on enhancments to libi18n.
Known
Issues
-
Extending Charsets will become easier:
We have been working on making it easier to add additional charset
support, but could not complete this in time for the initial Mozilla source
release on 3/31/98. Now that the source clean-up is complete,
we will resume working on this.
-
Traditional Chinese charset encoding converters (Big5 to/from CNS 11643)
are missing:
In the process of cleaning up the Mozilla sources by 3/31/98 to then
net, we had to remove this code because it was not freely distributable.
This will be fixed soon by implementing an NPL version of this functionality.
If anyone on the net wants to help us, please let us know!
See
Also
Copyright © 1998 Netscape
Communications Corporation
|