This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#11498 - Speed optimization to libwmapro
Attached to Project:
Rockbox
Opened by Andree Buschmann (Buschel) - Monday, 26 July 2010, 08:26 GMT+1
Last edited by MohamedTarek (mtarek16) - Thursday, 05 August 2010, 20:56 GMT+1
Opened by Andree Buschmann (Buschel) - Monday, 26 July 2010, 08:26 GMT+1
Last edited by MohamedTarek (mtarek16) - Thursday, 05 August 2010, 20:56 GMT+1
|
DetailsThis flyspray contains speed optimizations to libwmapro.
v01 introduces asm routines for multiplications. Furthermore it adds fixmul31 and fixmul16 as faster routines can be used for those. Decoding speed for wmapro_173k.wma on PP5022: 43.1 MHz (svn: 49.6 MHz) Test on Coldfire is needed. |
This task depends upon
Closed by MohamedTarek (mtarek16)
Thursday, 05 August 2010, 20:56 GMT+1
Reason for closing: Accepted
Additional comments about closing: Committed in r27703.
Thursday, 05 August 2010, 20:56 GMT+1
Reason for closing: Accepted
Additional comments about closing: Committed in r27703.
edit: the coldifre fixmul32 function in libwma/wmafixed.h is doing the same as the fixmul16 in this patch but is 2 cycles faster and uses one register less
I will change Coldfire's fixmul16() to your proposed implementation. Is this also valid for other codecs implementations (e.g. atrac, mpc)? Or does this faster implementation use any knowledge about the codec's fixed point representation (e.g. 14 bits fract part)...
Decoding speed for wmapro_173k.wma on PP5022: 37.5 MHz (svn: 49.6 MHz)
a) models with large IRAM -> put <WMAProDecodeCtx.tmp> to IRAM.
b) models with normal IRAM -> cannot put <WMAProDecodeCtx.tmp> to IRAM, but move several window tables to IRAM as second best option.
Edit: Histogram of window length for all rockbox wmapro samples is:
len 128 = 1600 calls
len 256 = 370 calls
len 512 = 370 calls
len 1024 = 960 calls
len 2048 = 5700 calls
len 4096 = 0 calls
Decoding speed for wmapro_173k.wma on PP5022: 37.3 MHz (svn: 49.6 MHz)
Edit 2:
mcf5249 wmapro_173k.wma; svn: 179.40MHz 69.22% realtime; wmapro_v04.patch: 117.63MHz 105.57% realtime
Edit:
Submitted non-interleaving patch with r27583.
Decoding speed for wmapro_173k.wma on PP5022: 35.6 MHz (svn r27589: 35.9 MHz)
ToDo: Somebody to change the Coldfire asm and some cleanup of this change.
1) Requantized values were seen to be <600 * (1<<shift), shift's worst case is for 128-size window -> shift = 17 - (8-3) = 12. => 600*(1<<12) = ~(1<<22)
2) quant (the scaling) was seen to go up to idx 106 => 10^(156/20) = ~(1<<26)
3) This results in (1<<22)*(1<<26) = (1<<48)
quite tight....
Task left open till coldfire vector_fixmul_scalar is done.
edit: my idea above about using the extension word of the emac to the the full result with one multiplication can not work.