release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Search | Go
Wiki > Main > ArenOlson > TagcacheDBFormat

Tagcache Database Format

Intro

This is a quick attempt to document the Tagcache disk database format. I'm new to all of this, so corrections/additions are very welcome.

This document is currently incomplete.

Updates

Date Version Changes
4/17/2011 Rockbox Version 3.8.1 Various changes to better align with current version. Matches Tagcache magic version 0x5443480E

General notes

  • the source for the tagcache is in apps/tagcache.c and apps/tagcache.h
  • The DB uses the native endianness of its host CPU, which is big endian for coldfire and SH1, and little endian for ARM.

Files

The database is comprised of ten files under .rockbox, namely

  • database_0.tcd (artist)
  • database_1.tcd (album)
  • database_2.tcd (genre)
  • database_3.tcd (title)
  • database_4.tcd (filename)
  • database_5.tcd (composer)
  • database_6.tcd (comment)
  • database_7.tcd (albumartist)
  • database_8.tcd (grouping)
  • database_idx.tcd

0-8 are associated with specific tags (noted in parentheses after them), and idx is the master index. There appear to be various mappings that occur for tags that don't exist in a file.

  • albumartist -> artist
  • grouping --> title

Tag file format

the tag files share the following common format:

Header:
Bytes Content
4 Database version
4 Size (in bytes) of the non-header part of the file
4 Number of entries in the file

Entry:
Bytes Content
4 Length (in bytes) of the data portion of the entry
4 Index of the item in the master index
variable The entry's data

  • The "Index of the item in the master index" only applies to database_3 and _4.tcd. In the other files, this bytes are always 0xFFFFFFFF
  • The tag entry's data is encoded in Unicode
  • The tag entry's data ends with a null byte
  • The tag entry's data is always padded with Xes (after the null byte) so that the data length is 4+8*n (where n is an integer). This is needed to lower memory requirements while building a lookup table in tagcache commit stage.
    • As of 3.8.1, database_4.tcd (filenames) does not add this padding. All other files do have this padding.
  • If a file does not have data for a tag, the string "<Untagged>" is used instead

Index file format

the index file uses the following format:

Header:
Bytes Content
4 Database version (first three bytes are TCH, last byte is the version)
4 Size (in bytes) of the non-header part of the file
4 Number of entries in the file
4 Serial (used for last played, see note)
4 Commit id (increments by 1 each commit)
4 Dirty (if true, db commit has failed and the db is broken)

Entry:
Bytes Content Notes
4 artist byte offset for tag file
4 album byte offset for tag file
4 genre byte offset for tag file
4 title byte offset for tag file
4 filename byte offset for tag file
4 composer byte offset for tag file
4 comment byte offset for tag file
4 albumartist byte offset for tag file
4 grouping byte offset for tag file
4 year  
4 discnumber  
4 tracknumber  
4 bitrate  
4 length In milliseconds
4 playcount  
4 rating  
4 playtime  
4 lastplayed  
4 commitid  
4 mtime see below
4 lastoffset Last offset into the file for automatic resume
4 flags see below

the first 9 items in the entry contain byte offsets for the data in their respective tag file. The remaining values are embedded into the index. numerical values are 0 when not defined. On tracks flagged as DELETED, the offsets are a CRC32 of the original string data, to allow for statistics resurrection.

serial increments by one every time a track is played. The currently playing song's lastplayed entry is set to the current value of serial.

rockbox creates a new entry in this DB whenever the file's mtime changes, for simplicity reasons. The old entry is flagged DELETED, and the new one (assuming tags match sufficiently), will have its statistical data restored from the old entry and the RESURRECTED flag set. This RESURRECTED flag should remain in place until all matching DELETED entries are removed (which will never happen inside rockbox itself, but external software could do this).

Mtime

The mtime uses the Fat32 mtime schema, which can be found in the fat32 documentation on DataSheets.

Flags

the flags value in the DB is a 32-bit int. When FLAG_DIRCACHE is set, the higher 16 bits of the flag contains an index of the entry in the master DB index.

Value Name Meaning
1 FLAG_DELETED file's mtime changed since rockbox last saw it. for simplicity reasons, rockbox creates a new entry in the index when this happens, and flags the old one DELETED
2 FLAG_DIRCACHE filename is a dircache pointer. Only affects the in-RAM DB, not the one on disk
4 FLAG_DIRTY numeric data in for the track was modified
8 FLAG_TRKNUMGEN track number was generated from the filename because the tag was missing
16 FLAG_RESURRECTED "statistics data has been resurrected". An entry flagged as DELETED was found to have data matching this entry's, so staistical data was restored from this DELETED entry.

Matching files for resurrection

the tagcache source lays out the following conditions for statistical data resurrection:
  • tag_length must match exactly
  • if tag_filename matches as well, no further checking is required
  • at least two of the tag_artist, tag_album, and tag_title hashes must match

Tools

  • rbdb.py: Simple python script to parse the DBs
    • Usage: run the script in the directory containing the DB files. The script takes one argument. If the argument is a number from 0 through 8, it parses that DB. Otherwise, it will parse the main index file.
  • rblib.py: A python library for interfacing with rockbox. Currently only supports reading and writing existing databases.
  • Rockbox Database Manager: A Python-based tool to manipulate and examine the rockbox database
  • Java Tagcache Database Generator: A Java-based tool for generating a new set of tagcache database files from a PC that can be loaded on to a Rockbox device.

r28 - 02 Apr 2021 - 20:46:07 - UnknownUser


Parents: ArenOlson
Copyright © by the contributing authors.