Rockbox mail archiveSubject: XML for language files
XML for language files
From: Jonas Häggqvist <rasher_at_rasher.dk>
Date: Thu, 20 Sep 2007 02:46:25 +0200
After looking at genlang (and running away in terror), handling the Danish
translation for a while and committing quite a few language patches, I
started wondering about the source language file format that Rockbox uses
(that is, not the binary format used on the player).
The current format was introduced/discussed  in June 2006, where Daniel
Stenberg proposed the current format in relation to the langv2 rework and a
small-scale flamewar about using an XML format erupted. In the end, I think
Daniel got tired of arguing without getting anywhere and implemented the
current format. There are no major problems with it, but I still think
moving it gently to XML would have some benefits:
- genlang should become a lot simpler to modify since all the parsing would
be done externally.
- Similarly, the actual code should also be simpler, since it just consists
of walking the tree and comparing node/attribute values.
- The parser would be more robust to subtle syntax differences between
english.lang and the translations. It'd either work as expected, or break
loudly rather than producing unexpected results, possibly silently. At
least it'd be harder to write a valid XML file that still broke genlang.
- It'd be somewhat easier to produce one-off scripts in whatever language to
edit language files since you don't have to write a parser first.
- It should be fairly easy to write out a schema file and create an SVN
pre-commit hook that validates it and rejects the commit if it's broken.
- My own pet-project, the online translator  would also benefit for many
of the above reasons :)
I've already written a langv2toxml script (in Perl, using XML::LibXML which
is rather nice) which produces output which looks reasonable to me . The
actual syntax is open for debate of course (and please do comment on it if
you think it could be better), but I've tried modelling it very closely to
the current syntax, which does work pretty well. Of course, there are
drawbacks as well:
- The file is somewhat harder to read and modify. I won't argue that this is
true, but I don't think it's really a huge difference. This is probably
the most important problem.
- The files are about 30% larger. I don't think this is really a problem.
- Other things that I want you to tell me, because I can't think of more.
I admit I'm probably biased from having already written one parser (used on
my website) and wondering why I couldn't just use an XML parser. It wasn't
*that* hard, but I know the parser is making some assumptions about the
file, that might not hold true after a translator had his dirty hands on it.
Maybe I just need to take a step back and breathe deeply, but if no one
objects too much, I'll create a new genlang to work with whatever XML schema
is worked out (and also have a go at creating an XML scheme file to validate
Remember, I'm not talking about the format used on the player, but the
source format - I'm not *that* insane.
-- Jonas Häggqvist rasher(at)rasher(dot)dkReceived on 2007-09-20