release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Wiki > Main > UnicodeGuide (compare)

Difference: UnicodeGuide (r28 vs. r27)

Rockbox Unicode Guide

Character Encoding

All typed text we use has its "character encoding" (sometimes referred to as page code). For English or other Latin-based languages, text is usually encoded in the ISO-8859-1 page code. Here are some more examples of other page codes:

  • Simplified Chinese GB2312
  • Traditional Chinese Big5
  • Korean KSX1001
  • Japanese Shift_JIS
  • Hebrew ISO-8859-8
  • Arabic CP1256 (Windows-1256)
  • Cyrillic CP1251 (Windows-1251)
  • Greek ISO-8859-7
  • Unicode UTF-8

TIP Note: each language doesn't necessarily have only one page code. Also, Rockbox may support one of a language's page codes, but not the rest.

What is Unicode?

Unicode as defined by Wikipedia is "an industry standard whose goal is to provide the means by which text of all forms and languages can be encoded for use by computers." To put it simply, Unicode supports multiple languages, therefore eliminating the need to use different page codes for every single language. More about the use of Unicode explained below.

ID3 Situations

I have non-Unicode ID3s in "insert 1 foreign language"

Tags are usually coded in the OS's default page code (default page code on your PC's OS). This is the case for most tags, except tags very very recently created by programs like MP3tag and other tagging programs which encode tags in Unicode by default instead. When your tag is coded this way, and not in Unicode, you can only display it on Rockbox by selecting the appropriate "default page code" and "font".

TIP Note: In Windows XP, the default page code settings can be found at "Control Panel/Regional and Language Settings/Advanced tab/Language for non-Unicode programs".

To explain even more, say you have an ID3 tag with Chinese info, coded in the OS's default code page (in this case Chinese). This tag would show as garbage on Rockbox unless you choose Chinese as your default page code, and Unifont as your font. This way, Chinese songs and English songs (ISO-8859-1) will all display properly.

Drawbacks of using Unifont

The problem with Unifont is that it might not be optimized for the WPS of your choice. The 6+12x13 font, added on Feb. 12 2006, supports Japanese and Korean characters and tends to be compatible with a larger number of WPSs. It is also smaller than Unifont and allows more characters to fit on the screen. You may wish to use 6+12x13 instead of Unifont if you only need support for Japanese and Korean characters.

I have non-Unicode ID3s in "insert 2 or more foreign language(s)"

If you have ID3 tags with more than 1 foreign language, then the above solution wouldn't be perfect for you. Say that in addition to Chinese ID3s, you also had Arabic, Greek, Hebrew, Korean or Japanese ID3s (a combination of any of these two or more would work). For my example, I'll choose Arabic as the second foreign language in my ID3s. If you're using windows, having ID3s in two of these languages means that you need to switch your default page code depending on which language you want to display. If I want to display an Arabic encoded ID3, I'd have to choose Arabic as my default page code. If I want to display a Chinese encoded ID3, I'd have to choose Chinese. If Arabic was set as the default page code, Arabic songs would display fine, but Chinese songs will show as garbage. The same is true for Rockbox. If your default page code on Rockbox is Chinese, Chinese ID3s will show fine, but Arabic ID3s will show as garbage.

There is a solution! Enter Unicode. Rather than encoding each tag in its native language's codec, such as encoding an Arabic tag in the Arabic page code, or a Chinese tag in the Chinese page code, we encode ALL tags regardless of the language in Unicode. This way, you do not need to tell Rockbox (or windows, or any other OS) which page code to use. Simply choose the font "Unifont" in Rockbox, and all the tags will show with no problem! You would then be able to play an Arabic song, followed by a Chinese, then Greek, then Korean etc... and all the tags would show properly! Without even changing Rockbox's default page code language!


Unicode.JPG

Drawbacks of using Unicode

The drawback from using Unicode encoded ID3s is that not all PC MP3 players and DAPs support Unicode.

Solution for people who don't like Unifont

The problem with displaying tags with foreign languages is that you have to use Unifont, which may not appeal to everyone. One solution is to use 6+12x13, which as mentioned above works well if you only need Japanese and Korean characters. A fairly large number of WPSs are compatible with the 6+12x13 font. Alternatively, you can create two different .cfg files on Rockbox. Make one for English music (or anything using ISO-8859-1) which has your favorite font (Snap, Chicago etc...) and your favorite WPS. Then have a second .cfg for "international" or "world" music, using Unifont, and an appropriate WPS of your choice. Check also the list of Rockbox unicode fonts.

TIP stevenyu from Mistic River brought to my attention an H100 WPS optimized for the use with Unifont. Go to the WpsGallery and scroll down to FrejBon's Uniskin. Hopefully we'll have more WPSs optimized for Unifont in the future.


uniskin_screenshot1.png uniskin_screenshot2.png

Notice that the current song in the screenshots shows in Japanese, and the next song that shows at the bottom is in Korean.

Solution for playing MP3s with Unicode tags on PC

It seems that in Windows, Real Player does not support Unicode. However, Foobar 2000, Windows Media Player, iTunes and Winamp all support Unicode. Even displaying Unicode ID3 tags in Explorer is supported. For Linux, I read that Xmms doesn't do Unicode very well (didn't try it my self). I read that Rhythmbox and Amarok display Unicode fine.

Here are a bunch of useful links for you:

MP3 players supporting Unicode

MP3 tagging software supporting Unicode

  • Mp3tag - Probably the best at the moment. Allows you to choose between Unicode or ISO-8859-1. To convert current tags to Unicode, simply make sure tag writing is set to Unicode, select all, and save the tags again.
  • Unitagger - A simple Unicode tagging program.
  • ID3-TagIT 3 - My personal favorite ID3 tagger. Might have less options than Mp3tag when it comes to Unicode.
  • ID3iconv - A Java command tool. May be usefull to some.
  • foobar2000 - This player also allows tag editing, and can also convert ISO-8859-1 coded tags to Unicode
  • Music Tag Editor 1.2<---->Mp3 Tag Assistant Professional 2.6 - 2 Programs, you can like it or not. Just try!
  • EasyTag (≥ v1.99.9) - Supports unicode tagging using id3lib. Linux/GTK+

TIP Note: It won't hurt to use multiple tagging programs if none of them have all the features you want. Most likely I'll use Mp3tag for converting to Unicode, and continue to use ID3-TagIT 3 for tagging.

ALERT! You can Help! If you know other MP3 players, or tagging tools (for Windows or Linux) that support Unicode, let us know!

Using Unicode in WPS

Other than using Unicode for ID3 tags, you can also use it to customize your WPS. For example, you can have the words "Battery", "Next Song" or "Playing Now" written in Korean, Greek or Russian etc... This increases the possibility to locolize the way your player looks. All you have to do to allow this is to make sure your .wps file is saved and encoded in Unicode (UTF-8 or UTF-16), and that you use Unifont in Rockbox.

Text editors supporting Unicode

  • Notepad - Maybe the simplest tool to use. Notepad in Windows 2000 and Windows XP supports encoding in UTF-8. After you click "Save As", make sure you select "UTF-8" as your encoding type at the bottom of the Save As window. ALERT! Notepad in Windows 95, 98 and ME does not support Unicode.
  • Unicode and Multilingual Editors and Word Processors for Windows - A good site with links to various Editors supporting Unicode.
  • Notepad2 - A simple Notepad alternative with support for syntax highlighting.
  • Programmer's Notepad - A more advanced text editor.
  • Notepad++ - Another advanced text editor.
  • SC UniPad - A Unicode text editor featuring an on-screen soft keyboard.
  • Vim & gVim - Another advanced text editor.
  • PSPad - Yet another text editor that does a whole lot of things.
  • jEdit - Highly configurable and customizable free Java text editor.
  • Scite - Small footprint, cross platform, syntax highlighting text editor based on Scintilla

r28 - 19 Aug 2010 - 09:22:17 - SeanInglis

Revision r28 - 19 Aug 2010 - 09:22 - SeanInglis
Revision r27 - 07 Jun 2010 - 19:22 - BjornStenberg
Copyright by the contributing authors.