• Status Closed
  • Percent Complete
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System SW-codec
  • Severity Low
  • Priority Very Low
  • Reported Version Release 3.4
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Buschel - 2010-06-06
Last edited by Buschel - 2010-06-10

FS#11365 - eabi: libmad experiments on arm

When using the new eabi toolchain it possible to disable the asm version of dct32 within libmad and to use the C version instead. The non-eabi compiled libmad hard-crashed when doing so.
From my experience with mpc the asm version of dct32 is _slower_ than the C version compiled with -O1. I have measured decoding speed for a specific mp3 file with eabi and different libmad configurations:

1) -O2 and asm-dct32 ⇒ 21.1 MHz (svn)
2) -O2 and C-dct32 ⇒ 24.84 MHz 3) -O1 and asm-dct32 ⇒ 21.1 MHz 4) -O1 and C-dct32 ⇒ 20.5 MHz

So, on arm (arm7tdmi) there is no reasonable difference in speed between -O1 and -O2 when using the asm’ed dct32. Combining C-dct32 with -O1 results in the fastest decoding.

The following patch can be used for evaluation on other targets as well. It is not taking care of different CPU types or arm architectures.

Closed by  Buschel
2010-06-10 19:07
Reason for closing:  Accepted
Additional comments about closing:  

Submitted with r26746.

You should disable running the synth filter on COP if you do these tests on PP, otherwise its hard to measure the true speed up due to waiting on the COP thread.

You are right, I will do so for clear results. But in the end the dualcore measurement is what counts. This is what happens in the real world, and what will effect the boost/unboost behaviour of the player. Or did I make a logical mistake?

Dual core is what counts for PP. But if you find a way to make the dct faster in general, it would be nice to have it, even if on PP the improvement is hidden by waiting on the synth filter for dual core.

Here are the results on arm7tdmi without COP:

1) -O2 and asm-dct32 ⇒ 37.9 MHz (svn)
2) -O2 and C-dct32 ⇒ 41.0 MHz 3) -O1 and asm-dct32 ⇒ 37.9 MHz 4) -O1 and C-dct32 ⇒ 37.0 MHz

So, savings are ~0.9 MHz in total. Quite a lot when taking into account the latest efforts and results.

When reading the original flyspray entry that introduced the asm'ed dct32 it was described that this solution is not optimized for cycles but for size (to better fit into cache I guess). Seems like current CPUs do not scale with size that much anymore.

Testing on AS3525v2:

34.94MHz stock for 192k.
34.11 MHz without ASM + w/ EABI

Also, without synth_full:


Will try some other tests when I get a chance.

This patch version will use -O1 for ARM and -O2 for other CPUs. When using ARM it will also disable the dct32-asm implementation.

Interesting: On PP502x (arm7tdmi, iPod Video) this patch will create a heavy crash when not using the eabi toolchain ('*Panic* Stkov mp3dec (1)'). This crash vanishes when disabling the multicore option for this CPU (not defining MPA_SYNTH_ON_COP in codecs/mpa.c). Any ideas?

Reason for the crash with non-eabi in combination with COP was too small stack size of the COP-thread. The following patch simply doubles the COP-thread's stack size. The speed up of mp3 decoding on arm7tdmi is 20.7 MHz now (svn: 21.1 MHz). On other (non multicore) arm targets the savings are in the area of ~0.9 MHz.


Available keyboard shortcuts


Task Details

Task Editing