|
Rockbox mail archiveSubject: ID3 Database file formatID3 Database file format
From: Peter van Hardenberg <pvh_at_uvic.ca>
Date: Fri, 22 Oct 2004 00:34:52 -0700 I've been following this discussion on the list for a while now, and since I have some experience with binary file formats and databases, I have a fairly detailed proposal. Non-technical types will probably want to skip this one. To summarize the more detailed writing below, I think the file should consist of a filenames segment, and then a segment for each searchable (browsable?) field. The segments for each field would be a binary tree of all the represented values, and then a seperate segment indexed by the binary tree listing the songIDs at each leaf of the tree. The motivation behind this design was to maximize performance without sacrificing flexibility. I am a undecided about the best method of implementing fields that would be updated on the 'box. These may have to be stored in an unindexed segment of the file and searched the slow way. All the best, Peter (details follow) ROCKBOX ID3 DATABASE FORMAT: OVERALL LAYOUT: -------- - header indication locations of other parts of file - song entries - searchable field (genre) - searchable field (year) - searchable field (artist) (etc) HEADER: Basically states the version of the file, the names of the available fields, and the offsets for each segment of the file. SONG ENTRIES: The song entries table should have this structure: <songID, songFilename, (other handy stored attributes like precomputed song title)> Each file should appear once, with a fixed record size, in an arbitrary order but with a sequential numbering. This allows random access to segments of the file. SongID could be omitted and considered implicit in the structure of the file. No other information needs to be stored here. The other information is all searched for, and comes in varying sizes and densities. SEARCHABLE FIELDS: Each searchable field consists of two segments. 1) The binary tree. 2) The song-list. The searchable fields are a binary tree with this structure: <value, pointer, numRecords> Where pointer is a reference to the list of files with that value. Because the pointer/numRecords are offsets, the binary tree for each field consists of fixed-length records. The song list would simply be an increasing list of songIDs indexed into by the binary tree, with each value receiving its own block of IDs. A songID may appear in more than one block. (For example, a song could have two genres, or two collaborating artists.) For example, under "genre", "rock" may occupy the block of songs ranging from 1-> 298. PERFORMING COMPLEX QUERIES: Because the data is sorted and stored using fixed length records wherever possible, computing a query involving multiple fields is quite simple. Here is a breakdown of the procedure, using a search for "80s rock". 1) Load the song database header. Find the genre segment. 2) Search the binary tree for the offset/size of rock. 3) Load the "rock" song IDs into memory. 4) Search the binary tree for the offset/size of 1980s. 5) Load the "1980s" song IDs into memory. 6) Create an intersection list from the data. (Because the ids are already sorted, this is very fast.) 7) Create a playlist by loading the song filenames based on the songIDs. _______________________________________________ http://cool.haxx.se/mailman/listinfo/rockbox Received on 2004-10-22 Page template was last modified "Tue Sep 7 00:00:02 2021" The Rockbox Crew -- Privacy Policy |