Rockbox mail archiveSubject: Re: ID3 database browsing
Re: ID3 database browsing
From: BlueChip <cs_bluechip_at_webtribe.net>
Date: Mon, 18 Oct 2004 00:51:50 +0100
The xbase approach is sound and still used in various guises to this very day.
This link: http://www.e-bachmann.dk/docs/xbase.htm
is a spec of many of them (dbase ii definitely included)
The internal record structure is irrelevant to us, as it would be infintely
simpler for us to use the record structure defined by id3.org for the
As with the header format, things like "encrypted" and "language driver"
have no relevance to us.
You can boundry pad it and use fat hacks for super-fast record access
...but this kind of hacking does demand contiguous filespace to store the
data/index in. If the file becomes fragmented, fat hacking ceases to be a
It is classic to reserve record 0 as a file header (number of records,
number flagged for deletion, etc) and there are many good reasons for this,
such as fast retrieval of answers to common questions like "how many file
Indexing can be a simple binary tree:
...again the xbase solution is ovely complex for this purpose
...it is about here I should say that I am not aiming to match an existing
standard, rather, improve on an obselete one.
The heavy part is SORTING the index files after new records have been added
- this COULD be very expensive on the jukebox, and happen in a milli-second
on your p7-40GHz ...hence the solution there is to =periodically= sort your
indexes with a pc.
The next problem is the fact that id3v2 uses variable length strings - so
if indexing on 'title' your index would need to be variable length also,
because if you limit it to, say, string32 and your new album is titled
string33 then the last char will be lost fom the index and hence be
unsearchable (well, searchable, but with the slow-backup search algorithm.)
However, this said, I would hallucinate that 32 characters would be enough
to uniquely identify any song or album title, and if not, surely it will
round it down to a list of no more than three or four which you could then
choose from - so it is not perhaps a problem which is relevant to us.
We have, say, 1.8MB of memory to play with ...each key (in our example
index file) is a fixed 32bytes ...allowing you to read the index for a
database of just under 60,000 files. So if you choose this method, sorting
can be done entirely in RAM ...it may not be life-changingly fast, but by
removing the disk accessing everything starts to take on a level of
plausibility for jukebox'ness.
In a sorted database of 16,000 files it would take a maximum of 15 1-sector
disk reads to locate the record data (id3 tag) and read it into memory,
with an memory-cost of "max_id3_record_size" <- for which I suggest 4K and
strip out any image data if the id3v2 is >4K ...for surely it will be image
data that has made the tag go over 4K.
Again, with some fat hacking, the indexing can be mind-meltingly fast. But
don't forget this depends on fixed length records, or a big chunk of
perverted thinking about record markers and such.
The code for this implementation is well known, documented to hell and
back, open sourced in a million database projects by every man and his dog
under various guises. The magic words are: binary tree; binary chop;
random file access; divide and conquer; xbase; database indexing; and
probably a whole host of other things dreamed up by Boyce, Codd and the
rest of the weirdos in the database world ;)
>I don't agree to this hypothesis. Instead I would like to recall the
>dbase argument: Back in the good old days (TM) it was possible to index
>this amount of data (say 12000 records) on slower CPUs with much less
>RAM and much slower mass storage in reasonable time. To achieve this
>they've probably put a lot of thoughts into their file formats.
>Unfortunately I'm not a database developer, but I suspect at least the
> - Having fixed-length records makes it much more efficient to do
> inplace updates or re-sorting of index files that dont fit
> completely into RAM.
> - Having multiple files allows it to add records (either by appending
> or by replacing) without inserting data into a file, which is a
> costly operation, especially if you do it at the beginning of a 20M
> database file (ok, this can be circumvented (1) by FAT-level hacks and
> stuffing bytes to cluster borders or (2) by having a huge file with
> a fs like structure and internal fragmentation).
> - Compared to the CPU/RAM limitiations, there is a lot of mass storage
> accessible, so we don't have to optimize the file format for size,
> but we can focus on fast searches and reasonably fast re-indexes or
> - BTW, can't we use the full MPEG buffer at least for the
> re-indexing process?
> > What separates the archos from the neuros and iPod is that they
> _require_ an
> > ID3 database. With rockbox, if you don't want one, you don't have to
> have one.
> > If you DO want one, you're probably willing to go to the trouble of
> > the DB on your computer (and as another post said, you need to connect
> to your
> > PC to add music anyway).
>Hmm, I do a lot of recordings without having access to a host meanwhile,
>so an external application (even if it were coded in python or perl and stored
>on the JBR) wouldn't be a help here.
> > -scott
>Having an efficient file format would make it possible for all to use
>this feature, starting with an adhoc one would probably restrict it to
>the host oriented ones for some time (until it gets replaced).
>Just my 5c,
Received on 2004-10-18
Page was last modified "Sat May 23 08:12:40 2020" The Rockbox Crew