Rockbox mail archiveSubject: ID3 Database file format
ID3 Database file format
From: Peter van Hardenberg <pvh_at_uvic.ca>
Date: Fri, 22 Oct 2004 00:34:52 -0700
I've been following this discussion on the list for a while now, and
since I have some experience with binary file formats and databases, I
have a fairly detailed proposal. Non-technical types will probably want
to skip this one.
To summarize the more detailed writing below, I think the file should
consist of a filenames segment, and then a segment for each searchable
(browsable?) field. The segments for each field would be a binary tree
of all the represented values, and then a seperate segment indexed by
the binary tree listing the songIDs at each leaf of the tree. The
motivation behind this design was to maximize performance without
sacrificing flexibility. I am a undecided about the best method of
implementing fields that would be updated on the 'box. These may have to
be stored in an unindexed segment of the file and searched the slow way.
All the best,
ROCKBOX ID3 DATABASE FORMAT:
- header indication locations of other parts of file
- song entries
- searchable field (genre)
- searchable field (year)
- searchable field (artist)
Basically states the version of the file, the names of the available
fields, and the offsets for each segment of the file.
The song entries table should have this structure:
<songID, songFilename, (other handy stored attributes like precomputed
Each file should appear once, with a fixed record size, in an arbitrary
order but with a sequential numbering. This allows random access to
segments of the file. SongID could be omitted and considered implicit in
the structure of the file.
No other information needs to be stored here. The other information is
all searched for, and comes in varying sizes and densities.
Each searchable field consists of two segments.
1) The binary tree.
2) The song-list.
The searchable fields are a binary tree with this structure:
<value, pointer, numRecords>
Where pointer is a reference to the list of files with that value.
Because the pointer/numRecords are offsets, the binary tree for each
field consists of fixed-length records.
The song list would simply be an increasing list of songIDs indexed into
by the binary tree, with each value receiving its own block of IDs. A
songID may appear in more than one block. (For example, a song could
have two genres, or two collaborating artists.)
For example, under "genre", "rock" may occupy the block of songs ranging
from 1-> 298.
PERFORMING COMPLEX QUERIES:
Because the data is sorted and stored using fixed length records
wherever possible, computing a query involving multiple fields is quite
simple. Here is a breakdown of the procedure, using a search for "80s rock".
1) Load the song database header. Find the genre segment.
2) Search the binary tree for the offset/size of rock.
3) Load the "rock" song IDs into memory.
4) Search the binary tree for the offset/size of 1980s.
5) Load the "1980s" song IDs into memory.
6) Create an intersection list from the data. (Because the ids are
already sorted, this is very fast.)
7) Create a playlist by loading the song filenames based on the songIDs.
Received on 2004-10-22