release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Wiki > Main > PluginDict (compare)

Difference: PluginDict (r15 vs. r14)

Dictionary


Creating a dictionary file

1. Download the prolog version of the WordNet dictionary here: WNprolog-2.0.tar.gz

2. Extract wn_g.pl and wn_s.pl from it.

3. Put wn2rdf.pl, wn_g.pl, wn_s.pl and the rdf2binary tool in 1 directory, and execute wn2rdf.pl

4. Execute the rdf2binary tool, it will output dict.desc and dict.index

5. Copy dict.desc and dict.index to ./rockbox/rocks/apps on the player.


Sources for dictionary files

There is nearly everything needed for a german<->english single word translator on http://dict.tu-chemnitz.de
They are providing the wordlist under the GNU GPL Version 2 (whatever it means to have versions for licenses. Thats totally new to me). See http://ftp.tu-chemnitz.de/pub/Local/urz/ding/de-en/ to download a 6 MB text file of words and words and words smile

For those who don't want to compile their own Dictionary files, you can download them from http://www.rockbox.dreamhosters.com/dict.zip (6.6MB) . If you want to download the two parts separately just get, http://www.rockbox.dreamhosters.com/dict.desc (17MB) and http://www.rockbox.dreamhosters.com/dict.index (5.0MB).


The rockbox dictionary format

The input format for rdf2binary is very simple at this moment. It's one line per word, starting with the word, then a tab and then the description. The only thing you should be aware of when creating this files is that they must be in alphabetical order, and all words should be in lowercase.


The binary format

The binary format used for the index is pretty simple, the struct is like this one:

struct {
    char word[WORDLEN];
    long offset;
};

WORDLEN is a define in the rdf2binary tool, and the plugin. And the offset is an offset in dict.desc where the description is stored.


The improved binary format

This is still an idea under construction, but the new format would be just 1 file containing:

FieldSizeDescription
magic number 4 bytes A simple identifier to know if this is a valid file.
version 4 bytes A format version number, this way we can detect old files and return an error.
max_wordlen 4 bytes The maximum word lenght used in this file.
wordcount 4 bytes The word count.

After that there should be the index data:

FieldSizeDescription
offset 4 bytes The offset from the beginning of the file to the description
word max_wordlen The word, lenght from the header.

And then just plain text description data, one description per line.


The hash binary format

Header:

FieldSizeDescription
magic number 4 bytes A simple identifier to know if this is a valid file.
version 4 bytes A format version number, this way we can detect old files and return an error.
wordcount 4 bytes The word count.

Offset table:

FieldSizeDescription
offset 4 bytes The offset from the beginning of the file to a value in the hash table

Hash table:

FieldSizeDescription
name variable The word
offset 4 bytes The offset from the beginning of the file to the description

When searching for a word with hash X, the plugin looks up the offset for X and X+1 in the offset table. It reads the data between those offsets on looks for the word, we were searching for. It's just a hash table with chaining.


CategoryPlugin: Dictionary plugin (ALERT! documentation) [PlayerDONE, RecorderDONE, OndioDONE, H100DONE, H300DONE, H10DONE, iPod 1G2GDONE, iPod 4GDONE, iPod ColorDONE, iPod MiniDONE, iPod NanoDONE, iPod VideoDONE, iAudio M5DONE, iAudio X5DONE, GigabeatDONE, Sansa e200DONE, Sansa c200DONE, Sansa FuzeDONE, MR-100DONE]

r17 - 22 Jan 2011 - 11:24:27 - GabrielMaia?

Revision r15 - 14 Sep 2010 - 10:47 - MichaelStummvoll
Revision r14 - 16 May 2010 - 20:48 - JustinHannigan
Copyright by the contributing authors.