Rockbox.org home
release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Rockbox mail archive

Subject: XML for language files

XML for language files

From: Jonas Häggqvist <rasher_at_rasher.dk>
Date: Thu, 20 Sep 2007 02:46:25 +0200

After looking at genlang (and running away in terror), handling the Danish
translation for a while and committing quite a few language patches, I
started wondering about the source language file format that Rockbox uses
(that is, not the binary format used on the player).

The current format was introduced/discussed [1] in June 2006, where Daniel
Stenberg proposed the current format in relation to the langv2 rework and a
small-scale flamewar about using an XML format erupted. In the end, I think
Daniel got tired of arguing without getting anywhere and implemented the
current format. There are no major problems with it, but I still think
moving it gently to XML would have some benefits:

- genlang should become a lot simpler to modify since all the parsing would
  be done externally.
- Similarly, the actual code should also be simpler, since it just consists
  of walking the tree and comparing node/attribute values.
- The parser would be more robust to subtle syntax differences between
  english.lang and the translations. It'd either work as expected, or break
  loudly rather than producing unexpected results, possibly silently. At
  least it'd be harder to write a valid XML file that still broke genlang.
- It'd be somewhat easier to produce one-off scripts in whatever language to
  edit language files since you don't have to write a parser first.
- It should be fairly easy to write out a schema file and create an SVN
  pre-commit hook that validates it and rejects the commit if it's broken.
- My own pet-project, the online translator [2] would also benefit for many
  of the above reasons :)

I've already written a langv2toxml script (in Perl, using XML::LibXML which
is rather nice) which produces output which looks reasonable to me [3]. The
actual syntax is open for debate of course (and please do comment on it if
you think it could be better), but I've tried modelling it very closely to
the current syntax, which does work pretty well. Of course, there are
drawbacks as well:

- The file is somewhat harder to read and modify. I won't argue that this is
  true, but I don't think it's really a huge difference. This is probably
  the most important problem.
- The files are about 30% larger. I don't think this is really a problem.
- Other things that I want you to tell me, because I can't think of more.

I admit I'm probably biased from having already written one parser (used on
my website) and wondering why I couldn't just use an XML parser. It wasn't
*that* hard, but I know the parser is making some assumptions about the
file, that might not hold true after a translator had his dirty hands on it.
Maybe I just need to take a step back and breathe deeply, but if no one
objects too much, I'll create a new genlang to work with whatever XML schema
is worked out (and also have a go at creating an XML scheme file to validate
against).

Remember, I'm not talking about the format used on the player, but the
source format - I'm not *that* insane.

[1] http://www.rockbox.org/mail/archive/rockbox-archive-2005-06/0363.shtml
[2] http://rasher.dk/rockbox/translate/
[3] http://rasher.dk/rockbox/wallisertitsch.xml

-- 
Jonas Häggqvist
rasher(at)rasher(dot)dk
Received on 2007-09-20

Page was last modified "Jan 10 2012" The Rockbox Crew
aaa