dev builds
themes manual
device status forums
mailing lists
IRC bugs
dev guide

Rockbox mail archive

Subject: Re: [ rockbox-Patches-920256 ] Traditional/Simplified Chinese patch

Re: [ rockbox-Patches-920256 ] Traditional/Simplified Chinese patch

From: Tat Tang <>
Date: Tue, 23 Mar 2004 15:26:21 -0800 (PST)

> Is support for both big5, gb and unicode necessary?

Well, GB2312 is used for Simplified Chinese
characters, as written in China. Big5 is used for
Traditional Chinese characters as written in Taiwan
and Hong Kong.

These are separate and distinct character sets. For
example, the character for "electricity" is written
differently between the two scripts. The simplified
version has no mapping in Big5 and the traditional
version has no mapping in GB2312.

Even where characters appear in both scripts, they map
to different code points, for example the character
for "you" maps to 0xC4E3 in GB2312 and 0xA741 in Big5.

Yes, both Big5 and GB2312 need to be supported.

> Those lookup tables use 15-30 KB code space. Are all
> three code tables regularly used for file names and
> ID3 tags?

I'm finding that filenames are getting stored as
Unicode and id3 tags (courtesy of freedb) are coming
through as Big5 or GB2312.

>From the documentation that I have, GB2312 defines
3755 frequently used Hanzi and Big5 defines 5401
frequently used Hanzi. So there is scope to reduce the
size of these tables, however some users are keen to
read lyrics/books while listening to music. It may be
possible to provide full mappings and reduced mappings
to trade off against mp3 buffering. Is there a rule of
thumb for calculating minuimum buffer space?

> If we need all three, can these tables be moved
> outside the main code somehow? Maybe loaded at boot?

The tables come in pairs, there is a Unicode->Big5
lookup and Unicode->GB2312 lookup.

It's only necessary to maintain a single lookup. A
particular "document" will be Big5 or GB2312 in the
same way that a document will be Greek or Russian.

Loading the required table at boot makes a lot of
sense. It also allows the user to switch on the fly
between Traditional, Simplified and possibly other
MBCS languages.

This would be combined with your previous suggestion
of having a switch between single and multi-byte mode.

So, the proposed boot sequence is something like this:

1. Boot into single byte mode.
2. If configured for single byte mode then finished
3. Otherwise, malloc a suitable buffer and load the
required conversion table.
4. Switch into multi-byte mode.

This level of switching between single and multi-byte
mode means that single-byte mode users don't incur the
extra memory overhead. It will require rebooting the

When in multi-byte mode, there will an ability to
switch into Western languages (i.e. soft switch into
single byte mode).

There would also be the ability to switch between
multi-byte charsets. The initial thought was to
provide a browse charsets menu, however it seems that
a change in charset would always result in a font

This seems clumsy, and it would be nicer if the user
selects a font and the required charset is
automatically loaded. This requires changing the font
format to include a charset. Does this sound like a
reasonable change?

> gives a total of nine recorder builds!
> Those files cannot be loaded by the Archos firmware.
> Did you test the code with Rockbox burned in flash?

I hear you. And I understand the flash rom has a size
limit. I guess loading the conversion table at boot
solves everything.

Thanks for the feedback. And let me know how you feel
about the proposed changes.


Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
Received on 2004-03-24

Page was last modified "Mon Nov 16 10:57:21 2020" The Rockbox Crew -- Privacy Policy