• Status New
  • Percent Complete
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Release 3.6
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by saratoga - 2010-11-15

FS#11759 - Rearrange libmad synthesis memory acceses for arm

Work in progress patch. Currently decodes audio but with some glitches. Has a small mountain of debug code included.

The basic idea is to rearrange the D filter coefficients in the synthesis filter so that pairs of them are used sequentially. This is not easy because the taps need to be loaded in the seemingly random order needed by the audio samples. However, this rearrangement seems to be possible:

0 1 2 3 4 5 6 7 (original sequence)
0 2 1 3 4 6 5 7 (new sequence)

The complication is that the code assumes that it can start a new filter at any offset, even odd ones, which means each and every filter needs to be rewritten 4 times, one for each of the 4 possible alignments. This patch does that.

Once I'm certain that it works, I intend to convert the D coefficients to packed 16 bit values, then use packed 16 bit multiply instructions on ARMv5E+. This should lead to a small speed up on armv4 (just because ldm instructions can be used instead of ldr) and a very large speed up on arm9E and arm11 (because packed multiplies are tremendously faster and much easier to pipeline).

Corrected a bug in one of the filters.

Above patch is confirmed to produce bit per bit identical output to SVN using lame_128k.mp3

Above patch without debug code.

Converted to use 16 bit D coefficients. c code has an RMS error of 1.3 pcm levels, and a peak error of 8 levels for lame_128k.mp3. This seems more then acceptable.

Edit: Note that volume is off in that patch. I'll correct this later.

Thought of a better way to rearrange the D coefficients. This one is both much simpler and should give significantly better performance on arm11. In this version the D coefficients are split into two table: D_even and D_odd, which unsurprisingly contain the even and odd coefficients from the old table. The dewindowing code is then unrolled and rearranged to accommodate the new even and odd tables.

As a result, all memory accesses are now fully sequential, each D coefficient can be packed into a 32 bit pair, and all windowed sample data are used to generate 2 samples for each time they are loaded.

Remove about 50KB of debug code from that patch.
Write ASM version.

Above but with a lot of bugs fixed. Output should be correct now.

Overlooked some code in the above patch. Now fixed.

Finally converted all filters to use the new even/odd coefficients. Removed old 'sorted' coefficients introduced in the original patch. Output is identical to SVN.

* Delete a lot of debug code
* Reintroduce macros for code that won't be moved into the .S file

* Introduce ASM code for the 4 macro functions that won't be included in the .S file.

*Clean up most of the debug and dead code
*Finish reordering the body of for loop

Pretty much all thats left is actually converting the core each loop to ASM.

* Rearranged arrays in memory to consolidate pointers and save 2 registers
* Wrote the first half of the first sb_sample function in assembly

Added a simple test file to try debugging the asm code. Not sure why it currently crashes on decode, probably a dumb mistake somewhere.


Available keyboard shortcuts


Task Details

Task Editing