Tagcache Database Format
Intro
This is a quick attempt to document the Tagcache disk database format. I'm new to all of this, so corrections/additions are very welcome.
This document is currently incomplete.
Updates
Date |
Version |
Changes |
4/17/2011 |
Rockbox Version 3.8.1 |
Various changes to better align with current version. Matches Tagcache magic version 0x5443480E |
General notes
- the source for the tagcache is in apps/tagcache.c and apps/tagcache.h
- The DB uses the native endianness of its host CPU, which is big endian for coldfire and SH1, and little endian for ARM.
Files
The database is comprised of ten files under .rockbox, namely
- database_0.tcd (artist)
- database_1.tcd (album)
- database_2.tcd (genre)
- database_3.tcd (title)
- database_4.tcd (filename)
- database_5.tcd (composer)
- database_6.tcd (comment)
- database_7.tcd (albumartist)
- database_8.tcd (grouping)
- database_idx.tcd
0-8 are associated with specific tags (noted in parentheses after them), and idx is the master index. There appear to be various mappings that occur for tags that don't exist in a file.
- albumartist -> artist
- grouping --> title
Tag file format
the tag files share the following common format:
Header:
Bytes |
Content |
4 |
Database version |
4 |
Size (in bytes) of the non-header part of the file |
4 |
Number of entries in the file |
Entry:
Bytes |
Content |
4 |
Length (in bytes) of the data portion of the entry |
4 |
Index of the item in the master index |
variable |
The entry's data |
- The "Index of the item in the master index" only applies to database_3 and _4.tcd. In the other files, this bytes are always 0xFFFFFFFF
- The tag entry's data is encoded in Unicode
- The tag entry's data ends with a null byte
- The tag entry's data is always padded with Xes (after the null byte) so that the data length is 4+8*n (where n is an integer). This is needed to lower memory requirements while building a lookup table in tagcache commit stage.
- As of 3.8.1, database_4.tcd (filenames) does not add this padding. All other files do have this padding.
- If a file does not have data for a tag, the string "<Untagged>" is used instead
Index file format
the index file uses the following format:
Header:
Bytes |
Content |
4 |
Database version (first three bytes are TCH, last byte is the version) |
4 |
Size (in bytes) of the non-header part of the file |
4 |
Number of entries in the file |
4 |
Serial (used for last played, see note) |
4 |
Commit id (increments by 1 each commit) |
4 |
Dirty (if true, db commit has failed and the db is broken) |
Entry:
Bytes |
Content |
Notes |
4 |
artist |
byte offset for tag file |
4 |
album |
byte offset for tag file |
4 |
genre |
byte offset for tag file |
4 |
title |
byte offset for tag file |
4 |
filename |
byte offset for tag file |
4 |
composer |
byte offset for tag file |
4 |
comment |
byte offset for tag file |
4 |
albumartist |
byte offset for tag file |
4 |
grouping |
byte offset for tag file |
4 |
year |
|
4 |
discnumber |
|
4 |
tracknumber |
|
4 |
bitrate |
|
4 |
length |
In milliseconds |
4 |
playcount |
|
4 |
rating |
|
4 |
playtime |
|
4 |
lastplayed |
|
4 |
commitid |
|
4 |
mtime |
see below |
4 |
lastoffset |
Last offset into the file for automatic resume |
4 |
flags |
see below |
the first 9 items in the entry contain byte offsets for the data in their respective tag file. The remaining values are embedded into the index. numerical values are 0 when not defined. On tracks flagged as DELETED, the offsets are a CRC32 of the original string data, to allow for statistics resurrection.
serial increments by one every time a track is played. The currently playing song's lastplayed entry is set to the current value of serial.
rockbox creates a new entry in this DB whenever the file's mtime changes, for simplicity reasons. The old entry is flagged DELETED, and the new one (assuming tags match sufficiently), will have its statistical data restored from the old entry and the RESURRECTED flag set. This RESURRECTED flag should remain in place until all matching DELETED entries are removed (which will never happen inside rockbox itself, but external software could do this).
Mtime
The mtime uses the Fat32 mtime schema, which can be found in the fat32 documentation on
DataSheets.
Flags
the flags value in the DB is a 32-bit int. When FLAG_DIRCACHE is set, the higher 16 bits of the flag contains an index of the entry in the master DB index.
Value |
Name |
Meaning |
1 |
FLAG_DELETED |
file's mtime changed since rockbox last saw it. for simplicity reasons, rockbox creates a new entry in the index when this happens, and flags the old one DELETED |
2 |
FLAG_DIRCACHE |
filename is a dircache pointer. Only affects the in-RAM DB, not the one on disk |
4 |
FLAG_DIRTY |
numeric data in for the track was modified |
8 |
FLAG_TRKNUMGEN |
track number was generated from the filename because the tag was missing |
16 |
FLAG_RESURRECTED |
"statistics data has been resurrected". An entry flagged as DELETED was found to have data matching this entry's, so staistical data was restored from this DELETED entry. |
Matching files for resurrection
the tagcache source lays out the following conditions for statistical data resurrection:
- tag_length must match exactly
- if tag_filename matches as well, no further checking is required
- at least two of the tag_artist, tag_album, and tag_title hashes must match
Tools
- rbdb.py: Simple python script to parse the DBs
- Usage: run the script in the directory containing the DB files. The script takes one argument. If the argument is a number from 0 through 8, it parses that DB. Otherwise, it will parse the main index file.
- rblib.py: A python library for interfacing with rockbox. Currently only supports reading and writing existing databases.
- Rockbox Database Manager: A Python-based tool to manipulate and examine the rockbox database
- Java Tagcache Database Generator: A Java-based tool for generating a new set of tagcache database files from a PC that can be loaded on to a Rockbox device.
Copyright © by the contributing authors.