![]() |
How To Add Additional Charset SupportTasks Outline:In Cross-platform code:
In Window Code:
In Macintosh Code:
In UNIX Code:
DetailsIn Cross-Platform Code1. Add CharSetID with the property of the charset.A. Decide the property of the charset you need to add:
B. Define your CharSetID in file include/csid.h2. Associate CharSetID with MIME charset name and Java Encoding Name.You need to do this for every CharSetID you add.Procedure:A. Before you add any name, please look at two document first:
{"Shift_JIS", "SJIS", CS_SJIS},
...
/* aliases for Shift_JIS: */
{"x-sjis", "", CS_SJIS},
{"ms_Kanji", "", CS_SJIS},
{"csShiftJIS", "", CS_SJIS},
{"Windows-31J", "", CS_SJIS},
in the above example, where
3. Add Unicode conversion tables.A. Create the conversion table: Before create a conversion table for your charset, take a look at the directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl. You may find the one for your charset is already there, even they are not build into the mozilla binary. If you cannot find one there, follow the instruction below. (Thanks for Hovik Melikyan <hovik@undp.am> to write the following section for me)A.1. Create a 8 to 16-bit conversion table for the new encoding. The file should contain two columns of hexadecimal numbers, where the left column represents 8-bit codes and the right column - corresponding 16-bit code. If there are some undefined code point or code point have no mapping to Unicode, do not list it in the file. Example: 0x20 0x0020 ... 0xa0 0x00a0 0xa2 0x00a7 0xa3 0x0589 0xa4 0x0029 ...This file will be used for generating both "FROM" and "TO" conversion tables. A.2. Compile utilities in lib/libi18n/unicode/tbltool. Example for Visual C++:
cl -I../../ fromu.c utblutil.c
cl -I../../ tou.c utblutil.c
These programs accept conversion tables as described in A.1. from standard
input and generate resources to standard output in a form suitable for
including in Windows resource files and also UNIX C sources. Example:
fromu < xscii.txt > xscii.uf tou < xcsii.txt > xscii.utA.3 Copy the generated *.uf and *.ut files to lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl respectively. B. Include the Unicode conversion table into the binary by do the following: C. You might also need to make additions to shift tables in lib/libi18n/ugendata.c 4. Add additional codeset conversion support.[To Be Written]How to write a new conversion routine. 5. Add codeset conversion rules.[To Be Written, IDEA and OUTLINE only for now]How to change lib/libi18n/fe_ccc.c Outline:
6. Add ASCII fallback for Name Entity or Numeric Character Reference Not Supported in this CharsetIf you want to add ASCII fallback for the Name Entity or Numeric Character References for the newly added charset, you should do this step:Procedure:Open file lib/libparse/pa_amp.c, locate the function pa_map_escape and goto the end of this function. There should be #ifdef PLATFORM section which have a big switch statement there, add your code there to provide different ASCII fallback for character you cannot display in the font CharSetID.7. Add character length/column width information for multibyte charset.If the font or document CharSetID you add is multibyte charset. You should provide character/column length information as below.Procedure:A. Open lib/libi18n/csstrlen.c , locate the csinfo_tbl in the beginning of the file. Look at the current entry, add addition entry if no entry match the characteristic of your charset. Here is the entry for Japanese EUC_JP charset.
{{{2,2,{0xa1,0xfe}}, {2,1,{0x8e,0x8e}}, {3,2,{0x8f,0x8f}}}}, /* For EUC_JP */
This entry specify if the first byte of a character is
To Be Improved
8. Add (pseudo) toLower table for case-insensitive folding.You need to add tables to one of the following files to perform correct case-insensitive search. It is only need to be done for the font CharSetID. A document CharSetID which is not a font CharSetID in that platform do not need do this.Procedure:A. If the charset you add is single byte charset:A.1 open file lib/libi18n/sblower.c , A.2 Add (pseudo) to-lower-case table for the code point 0x80-0xFF. To reduce size in the platform which do not use the charset as font CharSetID, put #ifdef PLATFORM around the new table. A.3 Change function INTL_GetSingleByteToLowerMap to return the new (pseudo) to-lower-case table. B. If the charset you add is multibyte charset:
typedef struct {
unsigned char src_b1;
unsigned char src_b2_start;
unsigned char src_b2_end;
unsigned char dest_b1;
unsigned char dest_b2_start;
} DoubleByteToLowerMap;
B.3 Change function INTL_GetDoubleByteToLowerMap to return the new (pseudo)
to-lower-case table.
9. Add line breaking prohibit information for multibyte charset.Line wrapping behavior is affected by the following property of CharSetID:
Procedure:To change INTL_KinsokuClass, change file lib/libi18n/kinsokuf.c and lib/libi18n/kinsokud.cReferences:
To Be Improved:
10.Add word breaking information for multibyte charset.This is for word selection (by double click) feature.Procedure:Open file and change the function INTL_CharClass. It is hard to explain how to change it. Look at by yourself.To be improved:
In Window Code1.Add CodePage to CharSetID mappingYou need to do the following if you add additional font CharSetID. This have to be done so a Window code page number could be map to the new CharSetID. You don't need to do this if you only add addition CharSetID for document CharSetID and use the existing CharSetID as font CharSetID.Procedure:Open file cmd/winfe/intlwin.cpp, and locate for function CIntlWin::CodePageToCsid , add code there if necessary. That function map a Window code page to a CharSetID you defined in ns/include/csid.hTo Be Improved:The current implementation should be improved in near future by moving such mapping into resoruce so no C code need to be changed.2.Add Single Byte Conversion TableIf the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Window, you need to do this step.Procedure:Open file cmd/winfe/res/convtbls.rc and add lines:
MIMECharsetFrom_TO_MIMECharsetTo RCDATA
BEGIN
/*8x*/ 0x8180, 0x8382, 0x8584, 0x8786, 0x8988, 0x8B8A, 0x8D8C, 0x8F8E,
/*9x*/ 0x9190, 0x9392, 0x9594, 0x9796, 0x9998, 0x9B9A, 0x9D9C, 0x9F9E,
/*Ax*/ ..............................................................
/*Bx*/ ..............................................................
/*Cx*/ ..............................................................
/*Dx*/ ..............................................................
/*Ex*/ ..............................................................
/*Fx*/ ..............................................................
END /* End of MIMECharsetFrom_TO_MIMECharsetTo */
where
3.Add Unicode Conversion TablesYou should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).Procedure:A. Open file lib/libi18n/unicode/unitable.rc and add lines
XXX.UF RCDATA
BEGIN
#include "ufrmtbl\\xxx.uf"
END
XXX.UT RCDATA
BEGIN
#include "utotbl\\xxx.ut"
END
where
{CS_XXX, {"XXX.UF", 0, NULL}, {"XXX.UT", 0, NULL}},
to the table utablenametbl before the last entry
{CS_DEFAULT, {"", 0, NULL}, {"", 0, NULL}}
where
4.Add menu items to "View:Encoding" menuIf you want to add additional menu items to the "View:Encoding" menu, you need to do this step.Procedure:A. Open file cmd/winfe/genfram2.cpp, locate for funciton nIDToCsid() and add entry to the end of the talbe nid_to_csid, put down the CharSetID you want it appeared on the "View:Encoding" menu. The position of CharSetID do not affect the position of the "View:Encoding" menu. It simply decide what is the nID for that CharSetIDB.1 Open file cmd/winfe/res/mozilla.rc2 , locate POPUP "&Encoding" , you will find 8 of them (with two comment out):
MENUITEM "LanguageGroup (Charset)", ID_OPTIONS_ENCODING_XX
where
To Be Improved:Currently all the menu are defined in the front end. We may move this into cross-platform code and make the menu item easily changable in the future.5.Add menu item for "Font" preference "For the Encoding" menuIf you add additional font CharSetID, you need to do the following to make a new menu item appear in the font preference so people can associate fonts with that font CharSetID.Procedure:A. Open file cmd/winfe/mozilla.rc , by using DeveloperStudio, not text editor, and add additional Stirng for the menu items in "For the Encoding" menu. You should name the RESOURCE ID IDS_LANGUAGE_XX so we can easily find them later. Close the file, the Developer Studio should change cmd/winfe/mozilla.rc as well as cmd/winfe/resource.h .B. Open file cmd/winfe/intlwin.cpp and B.1 Locate lang_table and add new lines
IDS_LANGUAGE_XX, CS_XXX, CS_XXX, CS_YYY, 0,
CS_XXX, "PropoFont", 12, "FixedFonte", 10, CHARSET, CHARSET,
where
[Note: Need to talk about modules/libpref/src/win/winpref.js] In Macintosh Code1.Add Script to CharSetID mappingYou need to do the following if you add additional font CharSetID. This have to be done so a Macintosh Script code could be map to the new CharSetID. You don't need to do this if you only add addition CharSetID for document CharSetID and use the existing CharSetID as font CharSetID.Procedure:Open file cmd/macfe/utility/uintl.cp , Look at the function ScriptToEncoding, uncomment the Script code which you should map to and add a return statement to return the CharSetID.To Be Improved:The current implementation should be improved in near future by moving such mapping into resoruce so no C code need to be changed.2.Add Single Byte Conversion TableIf the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Macintosh, you need to do this step.Procedure:A. [Optional] Open file cmd/macfe/include/resgui.h , go to the end, and add a macro define for the conversion table resource id. This macro value is used in file lib/libi18n/fe_ccc.c as described in section "XXXX", and the resoruce file you will create in the next step. We suggest you define the resoruce id as
#define xlat_FromCharSetID_TO_ToCharSetID (((FromCharSetID & 0xff) << 8 ) | (ToCharSetID & 0xff))
where
#include "csid.h"
#include "resgui.h"
data 'xlat' ( xlat_FromCharSetID_TO_ToCharSetID, "Resoruce Name",purgeable){
/* x0x1 x2x3 x4x5 x6x7 x8x9 xAxB xCxD xExF */
/*8x*/ $"C481 82C9 A5D6 DCE1 B9C8 E4E8 C6E6 E98F"
/*9x*/ $"9FCF EDEF 9495 96F3 98F4 F69B FACC ECFC"
/*Ax*/ $"86B0 CAA3 A795 B6DF AEA9 99EA A8AD AEAF"
/*Bx*/ $"B0B1 B2B3 B4B5 B6B7 B3B9 BABC BEC5 E5BF"
/*Cx*/ $"C0D1 ACC3 F1D2 C6AB BB85 A0F2 D5CD F5CF"
/*Dx*/ $"9697 9394 9192 F7D7 D8C0 E0D8 8B9B F8DF"
/*Ex*/ $"E08A 8284 9A8C 9CC1 8D9D CD8E 9EED D3D4"
/*Fx*/ $"F0D9 DAF9 DBFB F6F7 DDFD FAAF A3BF FEA1"
};
where
To Be Improved:We probably should make the definitation of resource id a Macro in csid.h to remove the requirement of changing resgui.h .3.Add Unicode Conversion TableYou should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).Procedure:Open file cmd/macfe/restext/ufrm.r , and add lines:
resource 'UFRM' ( CharSetID, "Resoruce Name", purgeable) {{
#include "xxx.uf"
}};
resource 'UTO ' ( CharSetID, "Resoruce Name", purgeable) {{
#include "xxx.ut"
}};
where the xxx.uf and xxx.ut
is the unicode conversion table file which could be found in lib/libi18n/unicode/ufrmtbl
and lib/libi18n/unicode/utotbl directory.
4.Add menu item to "View:Encoding" menuIf you want to add additional menu items to the "View:Encoding" menu, you need to do this step.Procedure:A. Open file cmd/macfe/include/resgui.h locate the word ENCODING_CEILING, and addition line before that line:
#define cmd_XXXX 14nn
where
fontCharSetID, documentCharSetID, cmd_XXX,
where the first field is the font CharSetID, the second field is the document
CharSetID, and the third one is the Command ID you defined in cmd/macfe/include/resgui.h.
C. Open file cmd/macfe/rsrc/navigator/MenusRat.cnst and cmd/macfe/rsrc/communicator/Menus.cnst by using Metrowerks Constructor. Open "> Common View Encoding" (Res ID = 8) in the "Menus" section. Add additional menu items into it. The "Command ID" should be the one you just defined in file cmd/macfe/include/resgui.h . To Be Improved:
5.Add menu item for "Font" Preference "For the Encoding" menuIf you add additional font CharSetID, you need to do the following to make a new menu item appear in the font preference so people can associate fonts with that font CharSetID.Procedure:A. Open file cmd/macfe/rsrc/communicator/TextTraits.cnst by using Metrowerks Constructor. Add two new TextTraits by coping TextTraits 4001 and 4002. Name your TextTraits with the Font CharSetID so we can figure out what the TextTraits for later on. The content of these two TextTraits is not important since it will dynamiclly changed in the application initialization time. You better make the TextTraits id follow the TextTraits currently defined for the same purpose (from 4001 to 4024).B. Open file cmd/macfe/restext/macfe.r , locate "resource 'Fnec'", add additional line "LanguageGroup", "PropFont", "FixedFont", 12, 10, CharSetID, ScriptCode, TextTraitsID1, TextTraitsID2;where
In UNIX Code1.Add Font Handling CodeProcedure:A. Open file cmd/xfe/resources, locate the "This table maps X11 font charsets to MIME charsets" section, add or change lines*documentFonts.charset*XLDFCharset: MIMECharsetwhere
*documentFonts.charsetlang*MIMECharset: LanguageGroup
where
[To Be Written] D. In the same file, locate the "! Unicode Pseudo Font" section,
2.Add Single Byte Conversion TableIf the CharSetID you add is a single byte charset and you want to use One2One conversion procedure to convert between font CharSetID and document CharSetID on Unix, you need to do this step.Procedure:A. Open file lib/libi18n/sbconvtb.c, locate the "#ifdef XP_UNIX" section, add new table as following
PRIVATE unsigned char MIMECharsetFrom_to_MIMECharsetTo[] = {
/*8x*/ '?', '?', ',', 'f', '?', '?', '?', '?', '^', '?', 'S', '<', '?', '?', '?', '?',
/*9x*/ '?', '?', '?', '?', '?', '*', '-', '-', '~', '?', 's', '>', '?', '?', '?', 'Y',
/*Ax*/ 0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF,
/*Bx*/ 0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF,
/*Cx*/ 0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF,
/*Dx*/ 0xD0,0xD1,0xD2,0xD3,0xD4,0xD5,0xD6,0xD7,0xD8,0xD9,0xDA,0xDB,0xDC,0xDD,0xDE,0xDF,
/*Ex*/ 0xE0,0xE1,0xE2,0xE3,0xE4,0xE5,0xE6,0xE7,0xE8,0xE9,0xEA,0xEB,0xEC,0xED,0xEE,0xEF,
/*Fx*/ 0xF0,0xF1,0xF2,0xF3,0xF4,0xF5,0xF6,0xF7,0xF8,0xF9,0xFA,0xFB,0xFC,0xFD,0xFE,0xFF
};
PRIVATE char *MIMECharsetFrom_to_MIMECharsetTo_p = (char*)MIMECharsetFrom_to_MIMECharsetTo;
where
...
else if ((from_csid == CharSetIDFrom) && (to_csid == CharSetIDTo)) {
return &MIMECharsetFrom_to_MIMECharsetTo_p;
}
where
3.Add Unicode Conversion TableYou should add Unicode conversion table for the new CharSetID you add. The conversion tables for many charset is already generated into directory lib/libi18n/unicode/ufrmtbl and lib/libi18n/unicode/utotbl . In most of the case you simply need to include them into the resource file. In case if the charset you add do not have a conversion table in that two directory and you need to generate one. Look at the tools in lib/libi18n/unicode/tbltool directory. Those tool expect the input file format as those table could be found on unicode ftp site (ftp://ftp.unicode.org/Public/MAPPINGS/).Procedure:A. Open file lib/libi18n/ucs2.c and locate the "#ifdef XP_UNIX" section, add lines
PRIVATE uint16 XXFromTbl[] = {
#include "xx.uf"
};
PRIVATE uint16 XXToTbl[] = {
#include "xx.ut"
};
where
C. In the same file, locate "LoadFromUCS2Table" function in the "#ifdef XP_UNIX" section, add code into that function to return XXToTbl. 4.Add menu item to "View:Encoding" menuIf you want to add additional menu items to the "View:Encoding" menu, you need to do this step.Procedure:A. Open file cmd/xfe/resources, locate "! View/Encoding Submenu" and add line into that section:
*languageGroupEncCmdString: LanguageGroup (Charset)
where
{ xfeCmdChangeDocumentEncoding, TOGGLEBUTTON, NULL, "EncodingRadioGroup", False, (void*)CharSetID },
where CharSetID is the docuemnt CharSetID you defined
in file include/csid.h.
C. Open file cmd/xfe/src/HTMLView.cpp , locate "XFE_HTMLView::commandToString" , add code like the following :
else if (IS_CMD(xfeCmdChangeDocumentEncoding))
{
char *res = NULL;
int doc_csid = (int)calldata;
switch (doc_csid)
{
...
case CharSetID:
res = "languageGroupEncCmdString";
break;
...
}
}
where
Appendix A: Files You Need To Change:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Copyright © 1998-2000 The Mozilla Organization.
Last modified May 1, 1998. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||