libi18n Module Description

The Mozilla Organization
	At A Glance
	Feedback
	Get Involved
	Newsgroups
	License Terms
	Newsbot

Developer Docs
	Roadmap
	Projects
	Ports
	Module Owners
	Hacking
	Get the Source
	Build It

Testing
	Download
	Bugzilla
	Bug Writing

Tools
	View Source
	Tree Status
	New Checkins
	Submit A Bug

FAQ
Search

libi18n Module Description

Discussion: netscape.public.mozilla.i18n or mozilla-i18n@mozilla.org
Last Update: March 31,1998
Contact: Bob Jung <bobj@netscape.com>

Contents

Introduction
History
How It Works
Where It's Headed
Known Issues
See Also

Introduction

The Mozilla family (Navigator and Communicator) is globally enabled. Globally enabled software shares common source code from which we build a single binary executable (per platform) that supports a wide variety of languages. The initial Mozilla source release supports Western, Central European, Chinese, Japanese, Korean, Greek, Turkish and Cyrillic languages. (For an overview of Mozilla Internationalization (I18N) and Localization (L10N), check out the Mozilla Internationalization & Localization Guidelines.)

Libi18n provides the underlying internationalization utility functions used in Mozilla to support international Web browsing and Internet Mail/News functionality. The emphasis is on underlying because there is a lot of other code that must be written in order to internationalize features.

Mozilla programmers should call the libi18n APIs wherever possible, but should also expect to write module and feature specific I18N aware code. Check out the other Mozilla modules to see how this has been done. In addition to calling libi18n, significant amount of programming has been required to internationalize the HTML layout engine, the front end (UI and text rendering) code, and mail/news.

This document only provides an overview of the libi18n module. For information on general I18N issues and the I18N of other Mozilla modules see I18N Guidelines.

The functions that libi18n provides to other Mozilla modules include:

Character Code Conversion

Finding Character Boundaries

Handling I18N related HTTP Headers

Line/Word Breaking (for text layout support)

Locale Sensitive Operations (collation, date/time formatting)

Mail/News Header Processing

Platform Independent String Resources

String Comparison

Unicode String Functions

The corresponding libi18n public API specifications are documented in the International Library Reference.

History

With a very small I18N team and tight product release schedules, our strategy over the past 3 years has been to incrementally add features -- prioritized by Netscape's international market needs.

Our initial work for Netscape Navigator (NN) 1.1 focused on adding Japanese Web browsing capability. We invented the notion of a document character set and a window (or font encoding) character set and provided a stream module to convert incoming text documents from the document charset to the window charset. This streams module and various Japanese charset converters were the first libi18n functions. After the first Beta, we added the ability in libi18n to auto-detect between the 3 common Japanese charset encodings: Shift_JIS, JIS and EUC-JP.

NN1.1 was a significant advancement for Japanese Web browsing and was well received. However, all of its UI was still in English. In order to localize NN, we created a special "i-build" (NN1.1i) because NN1.1 was full of hard-coded strings and other localization unfriendly coding practices. We added libi18n APIs to make it easier to resource user visible strings. NN1.1i was then localized into Japanese, German and French -- Netscape's first localized releases! The localizability infrastructure created for NN1.1i was then merged back into the mainstream source code for NN2.x and later releases.

NN2.x extended our charset support beyond Western and Japanese. Our NN1.1 stream module and charset converter architecture were designed to be extensible (not Japanese centric) which made it straightforward to add Chinese, Korean and Central European charset encodings support in the NN2.0 libi18n.

Other NN2.x libi18n additions included:

Enhancing the charset concept to be on a per window/context base instead of globally affecting all windows/contexts
RFC1522 support to handle MIME headers. (Really these functions should migrate from libi18n to the libmime library.)
XP locale support (e.g., sorting, time & date)
HTTP Accept-Language header support

NN3.x libi18n added:

Additional charset converters for Cyrillic, Greek and Turkish
Enhanced line wrapping for Asian languages (kinsoku shori)

NN4.x libi18n added:

Unicode 2.0 converters
Korean charset auto-detection
HTTP Accept-Charset header support

The overall (not just libi18n) evolution of the Netscape client I18N and L10N support is highlighted by a table of the Netscape I18N/L10N Client History.

How It Works

Libi18n is a collection of fundamental internationalization functions. So it is difficult to write How It Works because there really are several "it"s. In this document, we mention a few of the bigger "it"s and include links to others.

Document Charset Conversion

One of the most important functions provided by libi18n is character set conversion of the incoming text data. As each block of text data is received from the net (or cache), the libi18n stream module heuristically determines (to the best of its ability) the character set encoding of the incoming document, then it converts the data block from the "document" character encoding to the "window" character encoding (usually equivalent to the font encoding) before passing the data downstream to the HTML parser and layout engine.

Currently the HTML parser and layout engine assumes HTML special characters (e.g., '<', '>') in text data passed downstream to them are encoded as ASCII values. Therefore ISO-2022-xx and other 7-bit encodings such as UTF-7 and HZ are converted to an ASCII "superset" encoding, and UCS-2 is converted to UTF-8 by the libi18n conversion module before being sent downstream to the HTML parser.

The character set converters called by the libi18n stream module must maintain state because (1) the text data may be stateful or contain multibyte characters and (2) state is needed in some cases in which libi18n auto-selects from a few character encodings (e.g., between the 3 common Japanese encodings).

The actual character set conversion functions can be categorized in three types:

Algorithmic conversions for Chinese, Japanese and Korean (e.g., Shift_JIS <-> EUC-JP)
Table driven for 1-byte to 1-byte encodings (e.g., CP1250 <-> ISO8859-2)
Table driven for Unicode conversions

The document character set encodings currently supported by Communicator are listed in the Netscape More Tips and Technical Information for International Users.

See the documentation on the Mozilla network library in the mozilla.org list of technical papers for more information on the Mozilla streams architecture.

Managing Charset Encodings

In addition, to doing the initial charset conversion of the text document data, Mozilla needs to track and manage the charset information, so that any text input, display or manipulation is performed correctly. The charset has significant effect on layout and editing including the behavior of line wrapping, selection, copy and paste. The behavior of the front ends (MacFE, WinFE and XFE) is also greatly affected by the charset information (e.g., how they measure and draw).

There are several types of Mozilla contexts (e.g., Web browsing, HTML composing, mail reading, mail composing) that need to track and use the charset information. Libi18n provides the APIs to manage the getting and setting for information in the charset object.

XP Locale Functions

The Cross Platform (XP) locale functions provide platform independent APIs for string collation and date/time formatting. Because these are wrappers to the existing locale functions provided by the operating system the behavior may not be totally consistent across platforms.

Other libi18n Functions

There's more functionality provided by libi18n, but this document is intended to provide a brief overview. For more info on how to write code using the libi18n functions see the description of the libi18n public APIs, International Library Reference.

Where It's Headed

Modularity: The most important next step for the libi18n module is to modularize. Currently the interfaces are not cleanly separated from the rest of the client. This is the highest priority because when we achieve modularity it will make further development easier and faster.
Extensibility: The second most important step is to make underlying support (e.g., encoding conversions) easily extensible without modifying the library itself. Adding a new simple language or character encoding should be simply a matter of dropping in a new binary module.
Resource Handling: The localizable resources need to be modularized. Each component/module should maintain its own set of localized resources rather than the one pot (e.g., allxpstr.h) of resources for all modules. This should go hand-in-hand with the general Mozilla push towards modularization.

Known Issues

Extending Charsets will become easier:

Traditional Chinese charset encoding converters (Big5 to/from CNS 11643) are missing: