Wiki discussion about a proposed new internal audio API
The in-progress port of Rockbox to the iRiver and other devices requires both software audio decoding and an abstraction of the audio hardware and playback features of the different target devices, neither of which are present in the current Archos-oriented code. The aim of this document is to:
- Provide the Rockbox application (i.e. the code in apps/) with an abstract Audio API capable of playing and recording multiple audio formats.
- Provide a CODEC API to support the Audio API in that task.
- Provide a low-level Audio Device Driver layer inside the firmware to abstract the details of the ever-changing hardware supported by Rockbox and to enable the implementation of emulation of the audio hardware within the Rockbox UI simulator.
Alain Berguerand has thought about a design proposal of his own
Architecture needed for software codecs
We need a dual-buffer system with a filter in between:
- One buffer for compressed data, which is fed to the codec. ("buf1")
- A transport/filter function, which receives uncompressed data from the codec and writes it to the uncompressed buffer. This filter is responsible for optional manipulation of the data, such as gap removal, crossfading, equalizing etc.
- One buffer for uncompressed data, which is fed to the DAC. ("buf2")
Basic data flow:
- loader (disk->buf1): makes sure buf1 never runs out of uncompressed data
- codec (buf1->codec): uncompresses data from buf1 and feeds the filter
- filter (codec->buf2): receives uncompressed data from the codec and decides where in buf2 it goes, and if something should be done to it first
- feeder (buf2->dac): reads uncompressed data from buf2 and feeds it to the dac
Questions: How does the above architecture deal with different sampling frequencies, mono/stereo and (possibly) sample sizes? Shouldn't be a problem in buf1 (the compressed data will contain the relevant meta-data), but is an issue for buf2. Do we want to attempt cross-fading between a 44.1KHz file and a 48KHz file? Is a "gapless" change in playback frequency possible on the iRiver?
Keep in mind we need to make this work in reverse too, for recording:
For devices with hardware codecs, the chain is shortcut between the loader and the feeder:
- loader (disk->buf1)
- feeder (buf1->hwcodec)
NOTE: the possibility of implementing a dual-buffer approach for devices with hardware codecs was discussed on IRC (2005-02-16 - very start of the day) - for the MAS devices, buf1 would contain the MP3 data as read from the disk, and the "codec" would be a "swap-copy" routine to bitswap the data in preparation for sending to the hwcodec. The existing architecture bitswaps the data in-place, right after reading it from the disk. Implementing a dual-buffer scheme here will sacrifice some RAM.
In addition, the audio API needs to support instant playback of short audio clips from memory or files - for Talkbox support, key beeps etc.
Overview of existing APIs
[Can someone who is familiar with the current playback and recording systems write a high-level description here?]
This section describes the highest level of the API - namely that between the rockbox application and the rockbox firmware.
The first problem when playing an audio file is to determine the format. A simple approach which is probably good enough is just, in the first instance, use the file extension to decide if the file is supported or not. This guess needs to be confirmed by the actual codec code - for example, a ".WAV" file could theoretically contain one of many types of data, not just uncompressed PCM (e.g. GSM 6.10). So the codec code itself needs the ability to double-check that the file is supported.
The following is a list of proposed file formats to be recognised (but maybe not playable - that depends on the hardware) by Rockbox. I propose that the definitions of a particular hardware device includes a
definition which is a combination of the following values:
/* ROCKBOX Audio Formats */
#define AFMT_MPA_L1 0x0001 // MPEG Audio layer 1
#define AFMT_MPA_L2 0x0002 // MPEG Audio layer 2
#define AFMT_MPA_L3 0x0004 // MPEG Audio layer 3
// (MPEG-1, 2, 2.5 layers 1, 2 and 3 */
#define AFMT_PCM_WAV 0x0008 // Uncompressed PCM in a WAV file
#define AFMT_OGG_VORBIS 0x0010 // Ogg Vorbis
#define AFMT_FLAC 0x0020 // FLAC
#define AFMT_MPC 0x0040 // Musepack
#define AFMT_AAC 0x0080 // AAC
#define AFMT_APE 0x0100 // Monkey's Audio
#define AFMT_WMA 0x0200 // Windows Media Audio
#define AFMT_A52 0x0400 // A/52 (aka AC3) audio
#define AFMT_REAL 0x0800 // Realaudio
So the hardware definition for the Archos Jukeboxes would be:
#define HARDWARE_PLAYBACK_FORMATS ( AFMT_MPA_L2 | AFMT_MPA_L3 )
and the iRiver H120/H140 might be defined as:
#define HARDWARE_PLAYBACK_FORMATS ( AFMT_MPA_L1 | AFMT_MPA_L2 | AFMT_MPA_L3 | AFMT_PCM_WAV | AFMT_OGG_VORBIS | AFMT_FLAC )
[Please propose and discuss.... ]
There are three tasks needed to be done during the playback of a track which require knowledge of the codec's file format:
- Reading metadata from the file (including ID3-type tags and technical data such as sampling rate and total running time)
- Loading compressed data from the file into the compressed data buffer
- The actual decoding of the data to PCM suitable for the audio driver
Audio Driver API
This section describes how the audio hardware of the various devices can be abstracted.
[Please propose and discuss.... ]
This section describes the API for providing decoding and encoding of the audio codecs to be supported in Rockbox. Metadata (e.g. ID3 tags) are also a feature of the codecs and so the codec API needs to include the appropriate functions to read (and write?) the metadata in a file.
We need to remember to give credit to the codec authors and information about decoder versions in the "Info" menu screen and other relevant places.
A main design goal of Rockbox is to minimise battery usage by keeping the hard-disk powered down as much as possible, and performing as few power-hungry spin-ups as possible.
It is proposed that codecs dynamically loadable - using a specialised version of the existing general-purpose plugin architecture already in Rockbox. This will remove any limititations on the number of codecs Rockbox can support. However, both the number of "codec slots" (the destinations for loadable codecs) and the number of codecs compiled into Rockbox should be configurable.
In order to allow Rockbox to support many different types of codecs (such as "non-streaming" codecs like SID/MOD, or codecs that offer "hybrid" compression like wavpack where two input files are needed to produce one output stream), it is proposed that the codecs themselves manage the memory buffer for the tracks that they are playing.
The lifetime of a codec
When Rockbox is initially started, no codecs will be activated. The user will add some songs to the playlist and the first codec will need to be loaded into a codec slot and initialised.
In order to allow the codec to make full use of the disk-spinup, it should start (in a seperate thread) the loading of the data from disk. This should not be a CPU-intensive task, but it is desirable for the codec to remove any redundant data during this loading process in order to maximise memory usage. The codec should be able to peek into the playlist in order to load multiple files during the same load operation - subject to the available memory.
As soon as possible after the codec has loaded a small amount of the first file into memory, the codec should start decoding that data into either the cross-fade buffer or directly into the low-level audio buffer. Codec implementations should aim to minimise the amount of copying of data between buffers
When a change of codecs is necessary, the audio system will need to load the second codec and initialise the decoding of the next file before the first one has finished.
- The basic "decoding loop" in the audio system will ask the codec to provide X bytes (e.g. 4096 bytes) of uncompressed audio from the stream.
[this section is now out-dated by the above changes to the API overview]
This is the general initialisation call to the codec - so the codec can allocate memory and perform any other housekeeping tasks before it is ready to actually load and decode a file. Return codes would include:
int codec_open_file(???, file_info_struct* file_info)
function is responsible for initialising the codec for the decoding of a specific file.
parameter is used to return the technical information about the file such as the bitrate of the compressed data, the PCM samplerate (e.g. 44.1KHz) sample word size (e.g. 16-bit), number of channels (e..g. 2) and total number of samples in the stream.
Return codes would include:
int codec_get_metadata(???, metadata_struct* metadata)
This function returns ID3-tag type information from the file. We may want to call it either before or after a file is opened. i.e. to read the metadata from a track wie will be playing in the future, but without initialising a full decoder instance.
int codec_decode_data(char* pcmbuf, int* size)
This function would, with the help of the "read" callback, decode "size" bytes of PCM data from the input stream (in the format specified in the
returned from the
variable would be modified to return the actual number of bytes read. This may be less than the number requested in the case of a failure to the read() callback or an end of file condition. Return values would include:
CODEC_RECOVERED_FROM_ERROR // e.g. sync was lost, but decoding continued. The audio system could feed this back to the user.
CODEC_INTERNAL_ERROR // An unexpected internal error from the codec
int codec_seek(long offset, int whence)
This function would seek on a sample-accurate basis in the file. For some codecs this could be an expensive operation, in which case we may want to allow the codec to "guess" at the appropriate seek point.
NOTE: Seeking is a complicated issue and possible seeking strategies for each supported codec need to be discussed before deciding on the semantics of this function. But some codecs (e.g. WAV, FLAC) are designed to allow sample-accurate seeking, so this should be the benchmark.
This function is called when the audio system is finished with a file - either when the end of the file has been reached, or the user has cancelled playback. It can not fail. The codec is returned to the same state as that just following a call to
This function causes the codec to release any memory. It then can not be used again until a call to
[Please propose and discuss.... ]
Now that code can be tested on the iRiver itself, it would be useful to see example implementations of simple "viewer" plugins which decode a track from a compressed format and write to a WAV file. These can be used for testing decoding speed and optimisation work can begin before the full audio API is developed and implemented.
Library source code and "codec2wav" test plugins are now in CVS for MPEG Audio (libmad), FLAC (libFLAC), AC-3/A-52 (liba52) and OGG Vorbis (Tremor). If you are actively working on such an implementation for a different codec, add the name of the codec (and your name) to the following list. We are especially in need of someone to investigate the implementation of "non-streaming" codecs such as the various sequencer formats.
These decoders will be used as the basis for the Rockbox implementations of the decoders.
None of the initial codec implementations are running fast enough for real-time decoding. Therefore effort is needed to optimise the libraries for the iRiver environment.
Some profiling information on libmad and libFLAC can be found here: http://ipodlinux.org/forums/viewtopic.php?t=850
Copyright © by the contributing authors.