Rockbox

This is the bug/patch tracker for Rockbox. Click here for more information.

Quick links: Bugs · Patches · Rockbox frontpage

Tasklist

FS#11709 - ARMv5E iQMF for libatrac

Attached to Project: Rockbox
Opened by MichaelGiacomelli (saratoga) - Monday, 01 November 2010, 02:16 GMT+2
Last edited by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 19:30 GMT+2
Task Type Patches
Category Codecs
Status Closed
Assigned To No-one
Player Type All players
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Private No

Details

Work in progress iQMF using the ARMv5E 32x16 multiply instructions. So far this converts libatrac to using 16 bit iQMF coefficients, and then prescales everything going into the iQMF by 8 bits, then the output another 7 bits. Unfortunately, the sim audio is somewhat garbled because the prescaling isn't performed correctly (scaling in the fixmul15 function works, but not prescaling the output of the imdct for some reason). I'm obviously missing something about how data is used in the codec, and either not scaling all the samples, or double scaling some of them.

I haven't tested the asm code, but its fairly simple so it shouldn't be too hard to fix once the sim is working. The ASM code is also not properly scheduled. Once its working it can be arranged to avoid pipeline stalls.
   atrac_armv5e.patch (17.6 KiB)
 apps/codecs/libatrac/fixp_math.h  |   12 ++
 apps/codecs/libatrac/atrac3_arm.S |  170 +++++++++++++++++++-------------------
 apps/codecs/libatrac/atrac3.c     |   48 ++++++----
 3 files changed, 124 insertions(+), 106 deletions(-)

This task depends upon

Closed by  MichaelGiacomelli (saratoga)
Wednesday, 10 November 2010, 19:30 GMT+2
Reason for closing:  Accepted
Additional comments about closing:  Accepted in r28549.
Comment by Andree Buschmann (Buschel) - Monday, 01 November 2010, 10:18 GMT+2
The updated patch corrects scaling for simulation and preserves higher precision of the windowing coefficients.

I have defined a fixmul32x16(int16, int) which needs to do >>=15 to receive a 32bit result. THis was the main fault in your scaling. All the other pre-/post-scaling could be removed -- except the windowing coefficients themselves, of course.

Difference to svn decoding is max +/-4 of +/-32768 (~ -78 dB), and average ~ -110 dB. This is quite few.
   atrac_armv5e_v02.patch (16.7 KiB)
 apps/codecs/libatrac/fixp_math.h  |   12 ++
 apps/codecs/libatrac/atrac3_arm.S |  170 +++++++++++++++++++-------------------
 apps/codecs/libatrac/atrac3.c     |   38 ++++----
 3 files changed, 116 insertions(+), 104 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Monday, 01 November 2010, 14:56 GMT+2
>I have defined a fixmul32x16(int16, int) which needs to do >>=15 to receive a 32bit result. THis was the main fault in your scaling.

thanks, but I had that working, and am now trying to get it to work without a shift. I did some tests yesterday with the c code, and prescaling everything so that each multiply is a single smlaw<x> instruction gives acceptable results. The problem is I just can't seem to get it work once I put it all together :)
Comment by MichaelGiacomelli (saratoga) - Tuesday, 09 November 2010, 23:22 GMT+2
Updated patch to fix various mistakes in the ARM asm pointed out by buschel. Also cleaned up comments. Still doesn't work.
   atrac_armv5e_v03.patch (15.7 KiB)
 apps/codecs/libatrac/fixp_math.h  |   12 ++
 apps/codecs/libatrac/atrac3_arm.S |  173 +++++++++++++++++++-------------------
 apps/codecs/libatrac/atrac3.c     |   44 +++++----
 3 files changed, 127 insertions(+), 102 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 00:02 GMT+2
Thanks again buschel for spotting a stupid bug in the ASM. This version works. Further clean up and scheduling is still needed before this can be committed.

Speed up is quite large, 26% on my test file.
   atrac_armv5e_v04.patch (15.9 KiB)
 apps/codecs/libatrac/fixp_math.h  |   12 ++
 apps/codecs/libatrac/atrac3_arm.S |  175 +++++++++++++++++++-------------------
 apps/codecs/libatrac/atrac3.c     |   44 +++++----
 3 files changed, 128 insertions(+), 103 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 00:21 GMT+2
Restoring buschel's rounding when truncating to 16 bit.
   atrac_armv5e_v05.patch (16.1 KiB)
 apps/codecs/libatrac/fixp_math.h  |   12 ++
 apps/codecs/libatrac/atrac3_arm.S |  176 +++++++++++++++++++-------------------
 apps/codecs/libatrac/atrac3.c     |   42 ++++-----
 3 files changed, 125 insertions(+), 105 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 01:09 GMT+2
* Fix building on armv4, coldfire and sim
* Disable 16 bit optimizations on these targets because theres no performance benefit

TODO:

* Improve scheduling
* Port this to mp3 filterbank ...
   atrac_armv5e_v07.patch (12.8 KiB)
 apps/codecs/libatrac/SOURCES         |    3 
 apps/codecs/libatrac/fixp_math.h     |    2 
 apps/codecs/libatrac/atrac3_armv5e.S |  186 +++++++++++++++++++++++++++++++++++
 apps/codecs/libatrac/atrac3.c        |   50 +++++++--
 4 files changed, 231 insertions(+), 10 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 01:45 GMT+2
* Cleaned up useless changes
* Scheduled for arm11 by overlapping memory accesses with multiply. 2 out of every 3 arm11 loads should incur no stall at all.

Interestingly the improved scheduling speeds up arm9e by 0.5MHz, presumably because some of the memory latency on a cache miss is now hidden.

If anyone has an arm11 target handy it would be interesting to see v7 verses v8 comparisons. Test file here: http://duke.edu/~mgg6/rockbox/girls-just-wanna-atrc.rm

I think this is ready to commit.
   atrac_armv5e_v08.patch (11.9 KiB)
 apps/codecs/libatrac/SOURCES         |    3 
 apps/codecs/libatrac/atrac3_armv5e.S |  186 +++++++++++++++++++++++++++++++++++
 apps/codecs/libatrac/atrac3.c        |   42 ++++++-
 3 files changed, 225 insertions(+), 6 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 16:42 GMT+2
* Add cache alignment for the Beast.

   atrac_armv5e_v09.patch (12 KiB)
 apps/codecs/libatrac/SOURCES         |    3 
 apps/codecs/libatrac/atrac3_armv5e.S |  186 +++++++++++++++++++++++++++++++++++
 apps/codecs/libatrac/atrac3.c        |   42 ++++++-
 3 files changed, 225 insertions(+), 6 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Wednesday, 10 November 2010, 18:03 GMT+2
* Improve scheduling on arm11

arm11 iQMF goes from 15 -> 12MHz runtime. No speed up on arm9e.

On arm11 this version uses 2.8 cycles/tap, verses the ideal (zero wait state memory) 1.75 cycles/tap. However I don't see anyway to improve memory performance. Forcing 32 byte alignment of all buffers does not improve cache performance measurably. It may just be that the Beast ram is fairly slow.
   atrac_armv5e_v10.patch (11.2 KiB)
 apps/codecs/libatrac/SOURCES         |    3 
 apps/codecs/libatrac/atrac3_armv5e.S |  162 +++++++++++++++++++++++++++++++++++
 apps/codecs/libatrac/atrac3.c        |   42 +++++++--
 3 files changed, 201 insertions(+), 6 deletions(-)

Loading...