This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#11365 - eabi: libmad experiments on arm
Attached to Project:
Rockbox
Opened by Andree Buschmann (Buschel) - Sunday, 06 June 2010, 23:08 GMT+2
Last edited by Andree Buschmann (Buschel) - Thursday, 10 June 2010, 21:07 GMT+2
Opened by Andree Buschmann (Buschel) - Sunday, 06 June 2010, 23:08 GMT+2
Last edited by Andree Buschmann (Buschel) - Thursday, 10 June 2010, 21:07 GMT+2
|
DetailsWhen using the new eabi toolchain it possible to disable the asm version of dct32 within libmad and to use the C version instead. The non-eabi compiled libmad hard-crashed when doing so.
From my experience with mpc the asm version of dct32 is _slower_ than the C version compiled with -O1. I have measured decoding speed for a specific mp3 file with eabi and different libmad configurations: 1) -O2 and asm-dct32 => 21.1 MHz (svn) 2) -O2 and C-dct32 => 24.84 MHz 3) -O1 and asm-dct32 => 21.1 MHz 4) -O1 and C-dct32 => 20.5 MHz So, on arm (arm7tdmi) there is no reasonable difference in speed between -O1 and -O2 when using the asm'ed dct32. Combining C-dct32 with -O1 results in the fastest decoding. The following patch can be used for evaluation on other targets as well. It is not taking care of different CPU types or arm architectures. |
This task depends upon
Closed by Andree Buschmann (Buschel)
Thursday, 10 June 2010, 21:07 GMT+2
Reason for closing: Accepted
Additional comments about closing: Submitted with r26746.
Thursday, 10 June 2010, 21:07 GMT+2
Reason for closing: Accepted
Additional comments about closing: Submitted with r26746.
1) -O2 and asm-dct32 => 37.9 MHz (svn)
2) -O2 and C-dct32 => 41.0 MHz
3) -O1 and asm-dct32 => 37.9 MHz
4) -O1 and C-dct32 => 37.0 MHz
So, savings are ~0.9 MHz in total. Quite a lot when taking into account the latest efforts and results.
When reading the original flyspray entry that introduced the asm'ed dct32 it was described that this solution is not optimized for cycles but for size (to better fit into cache I guess). Seems like current CPUs do not scale with size that much anymore.
34.94MHz stock for 192k.
34.11 MHz without ASM + w/ EABI
Also, without synth_full:
16.87MHz
Will try some other tests when I get a chance.
Interesting: On PP502x (arm7tdmi, iPod Video) this patch will create a heavy crash when not using the eabi toolchain ('*Panic* Stkov mp3dec (1)'). This crash vanishes when disabling the multicore option for this CPU (not defining MPA_SYNTH_ON_COP in codecs/mpa.c). Any ideas?