FS#2649 - Unicode patch

Attached to Project: Rockbox
Opened by Frank Dischner (phaedrus961) - Wednesday, 24 August 2005, 12:16 GMT
Last edited by Marcoen Hirschberg (marcoen) - Tuesday, 06 December 2005, 15:30 GMT
Task Type Patches
Category Language
Status Closed
Assigned To No-one
Operating System
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No


This patch is by no means complete, but I wanted to
upload it so that others can test/fix/improve etc. It
was originally written by Marcoen Hirschberg and has
been updated by me. There are still many things which
need implementing/fixing, but for me it is quite
usable. The utf8gen script is used to convert the lang
files to utf8.

Things that currently work:
1) Display of unicode filenames, id3 and vorbis tags.
2) Writing of unicode filenames.
3) Font caching.*
4) Basic BiDi support.
5) Selectable codepage conversions (no CJK yet).**

Things that don't work:
1) Player doesn't display unicode strings properly.
2) Doesn't compile for V2 or FM Recorders due to 200k
size limit (hopefully this will change when the code
cleanup is complete).
3) Virtual keyboard does funny things with chars above
4) Text viewer only supports utf8.

There are probably some more things I forgot.

To use this, you will need an ISO-10646 font like those
ISO-8859-1 fonts will also work, but you will only get

*Font caching was taken from the Chinese patch, which
was written by Tat Tang and updated by Tenry Fu and
myself. Although it works fine, at least one font
crashes my archos and I don't yet know why. Strangely,
it works fine in the sim.

**Jens suggested loadable codepage conversion tables
and I think this is a good idea, or rather necessary if
we want conversions for CJK. I have some ideas how to
do this, but currently no time to do it.
This task depends upon

Closed by  Marcoen Hirschberg (marcoen)
Tuesday, 06 December 2005, 15:30 GMT
Reason for closing:  
Comment by Jonas Häggqvist (rasher) - Wednesday, 24 August 2005, 12:33 GMT

Very cool, looking forward to this getting more work (Might
unicode be an additional goal for 2.(5+1) besides iriver
support?). I'll gladly test this, but unassigning myself
from it since I have absolutely no idea about the
implementation/code/rockbox details of it.
Comment by Frank Dischner (phaedrus961) - Thursday, 25 August 2005, 03:17 GMT

Updated patch to work with utf8 id3 tags.
Comment by Frank Dischner (phaedrus961) - Saturday, 15 October 2005, 03:20 GMT

Updated to latest CVS and added cjk codepage conversions.
Codepage tables are now loaded from disk to reduce binary size.
Comment by Frank Dischner (phaedrus961) - Friday, 21 October 2005, 10:29 GMT

Current patch is now close to completion, I hope. Most
everything is now working. The Player and the virtual
keyboard now handle utf8 strings correctly. I've added a
feature to save a list of loaded glyphs at shutdown and
reload them at boot. I've also added a perl script to
convert any bdf font to iso10646 encoding using the mapping
files from
Comment by Frank Dischner (phaedrus961) - Monday, 24 October 2005, 08:48 GMT

changed code so that the glyph list is saved in lru order.
Comment by Frank Dischner (phaedrus961) - Monday, 24 October 2005, 20:13 GMT

Finally fixed the crash on archos :) The patch description
is now way out of date. Only thing remaining to be done is
conversions for the text viewer. Everything else is working
flawlessly for me.
Comment by Frank Dischner (phaedrus961) - Wednesday, 26 October 2005, 07:37 GMT

Code cleanup and optimization.
Comment by Frank Dischner (phaedrus961) - Saturday, 29 October 2005, 22:56 GMT

A few more optimizations. Also (somewhat) fixed the half
screen problem in the text viewer.
Comment by Frank Dischner (phaedrus961) - Wednesday, 02 November 2005, 07:19 GMT

Fixed a bug with fonts that have very large glyphs and began
adapting the text viewer to utf8.
Comment by Frank Dischner (phaedrus961) - Tuesday, 08 November 2005, 06:08 GMT

Updated to latest cvs.
Comment by Frank Dischner (phaedrus961) - Friday, 18 November 2005, 07:50 GMT

Added Arabic joining and updated to latest cvs
Comment by Anonymous Submitter - Saturday, 19 November 2005, 02:49 GMT

Why are using this formula to count a number of UTF-8
bytes for UTC character?

-------- firmware/common/unicode.c, Line 93 --------
if (ucs > 0x7F)
while (ucs >> (6*tail + 2))
-------- end of code --------

It makes U-100 - U-7FF get stored in 3 bytes instead of 2
and U-3FFF - U-FFFF in 4 bytes instead of 3.

Maybe this will be better? :)

-------- start of code --------
while (ucs >> (5*tail + 6))
-------- end of code --------
Comment by takka (tfact) - Monday, 21 November 2005, 02:07 GMT

Updated to latest cvs.
Comment by takka (tfact) - Monday, 21 November 2005, 05:20 GMT

Updated to latest cvs
and marge romete keyboard at