Rockbox

This is the bug/patch tracker for Rockbox. Click here for more information.

Quick links: Bugs · Patches · Rockbox frontpage

Tasklist

FS#11461 - libfaad aac-he speed optimization

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Tuesday, 06 July 2010, 22:05 GMT+2
Last edited by Andree Buschmann (Buschel) - Friday, 09 July 2010, 20:50 GMT+2
Task Type Patches
Category Codecs
Status Closed
Assigned To No-one
Player Type All players
Severity Low
Priority Normal
Reported Version Release 3.6
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Private No

Details

This patch adds some optimization to the sbr filter that is used by aac-he files. Several loops were unrolled in the analysis and synthesis filters as well as several variables and tables were moved to IRAM.
Speed on PP5022 with 64kaache.m4a: 115.5 MHz (svn 130 MHz)
   aac-he-v04.patch (23.7 KiB)
 apps/codecs/libfaad/sbr_qmf.c   |  307 +++++++++++++++-------------------------
 apps/codecs/libfaad/sbr_qmf_c.h |    2 
 apps/codecs/libfaad/sbr_dct.c   |   56 +++----
 3 files changed, 151 insertions(+), 214 deletions(-)

This task depends upon

Closed by  Andree Buschmann (Buschel)
Friday, 09 July 2010, 20:50 GMT+2
Reason for closing:  Accepted
Additional comments about closing:  Submitted v15 with r27358/r27359
Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 08:31 GMT+2
Thanks to saratoga and stripwax for the hints! Now the codeclib's fft is used instead of the libfaad-fft.
Speed on PP5022 with 64kaache.m4a: 107.4 MHz (svn 115.5 MHz)
   aac-he-v05.patch (25 KiB)
 apps/codecs/libfaad/sbr_qmf.c   |  307 +++++++++++++++-------------------------
 apps/codecs/libfaad/sbr_qmf_c.h |    2 
 apps/codecs/libfaad/sbr_dct.c   |  105 +++++++++----
 3 files changed, 197 insertions(+), 217 deletions(-)

Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 20:23 GMT+2
Clean up latest fft-change and remove unneeded bit_rev_tab2.
Speed on PP5022 with 64kaache.m4a: 106.5 MHz (svn 130 MHz)
   aac-he-v08.patch (31.3 KiB)
 apps/codecs/libfaad/sbr_qmf.c   |  307 ++++++++++++++----------------------
 apps/codecs/libfaad/sbr_qmf_c.h |    2 
 apps/codecs/libfaad/sbr_dct.c   |  334 +++++-----------------------------------
 3 files changed, 167 insertions(+), 476 deletions(-)

Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 22:25 GMT+2
v08: MCF5249 64kaache.m4a 289.47MHz (svn 367.95MHz)
Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 23:58 GMT+2
v10:
- changes dct to "in-place".
- several arrays could be removed based on the "in-place" dct
- shifts were moved to the final output to reduce amount of shifts

Speed on PP5022 with 64kaache.m4a: 106.1 MHz (svn 130 MHz)
   aac-he-v10.patch (33.3 KiB)
 apps/codecs/libfaad/sbr_qmf.c   |  352 +++++++++++++++-------------------------
 apps/codecs/libfaad/sbr_qmf_c.h |    2 
 apps/codecs/libfaad/sbr_dct.c   |  342 +++++---------------------------------
 apps/codecs/libfaad/sbr_dct.h   |    2 
 4 files changed, 184 insertions(+), 514 deletions(-)

Comment by Andree Buschmann (Buschel) - Thursday, 08 July 2010, 08:26 GMT+2
Refacture loop in sbr_qmf_synthesis_64() before calling dct4_kernel(). Now this loop can be further optimized via ldm commands.

Speed on PP5022 with 64kaache.m4a: 106.0 MHz (svn 130 MHz)
   aac-he-v11.patch (33 KiB)
 apps/codecs/libfaad/sbr_qmf.c   |  347 ++++++++++++++--------------------------
 apps/codecs/libfaad/sbr_qmf_c.h |    2 
 apps/codecs/libfaad/sbr_dct.c   |  342 +++++----------------------------------
 apps/codecs/libfaad/sbr_dct.h   |    2 
 4 files changed, 176 insertions(+), 517 deletions(-)

Comment by MichaelGiacomelli (saratoga) - Thursday, 08 July 2010, 17:30 GMT+2
Tested on my Clipv2 w/ 40MHz PCLK (basically then the same as a Fuzev2):

120.07 (SVN) -> 115.7 MHz (v11).

Its surprising that PP is faster. The improved multiplier, fast memory, and 64 bit wide L1 bus usually means that filterbanks are much faster on AMSv2. Mp3 for instance is something like 25% faster vs. PP.
Comment by MichaelGiacomelli (saratoga) - Thursday, 08 July 2010, 19:47 GMT+2
Some more benchmarks on the above clipv2 to see how much different functions used:

hf_generation -> 20.0 MHz
hf_adjustment -> 14.35 MHz

Also, I made a quick sketch of how the SBR stuff fits into LC because the libfaad code is so confusing:


reconstruct_channel_pair->
ifilter_bank ->
ff_imdct_calc
sbrDecodeInit

sbrDecodeCoupleFrame ->
sbr_process_channel ->
sbr_qmf_analysis_32
hf_generation (20.0 MHz) ->
calc_prediction_coef
hf_adjustment (14.35 MHz) ->
estimate_current_envelope
calculate_gain
hf_assembly

sbr_qmf_synthesis_32
Comment by Andree Buschmann (Buschel) - Thursday, 08 July 2010, 22:06 GMT+2
Results on h100: SVN = 33.28% realtime, 373 MHz needed -- with patch v11 = 41.14% realtime, 263 MHz needed
Comment by Andree Buschmann (Buschel) - Friday, 09 July 2010, 00:31 GMT+2
Defining own dct4_revtab[] (derived from codeclib) to be able to move this array to IRAM, refactored parts of sbr_qmf_analysis_32(), use LIKELY-macros in iquant() and finally some tweaking with IRAM configuration.

Speed on PP5022 with 64kaache.m4a: 104.8 MHz (svn 130 MHz)
Speed on MCF5249 with 64kaache.m4a: 242 MHz (svn 373 MHz)

Retest on MCF5250 still missing.
   aac-he-v15.patch (36.7 KiB)
 apps/codecs/libfaad/sbr_qmf.c   |  369 ++++++++++++++--------------------------
 apps/codecs/libfaad/sbr_qmf_c.h |    2 
 apps/codecs/libfaad/sbr_dec.c   |    6 
 apps/codecs/libfaad/sbr_dct.c   |  345 +++++--------------------------------
 apps/codecs/libfaad/specrec.c   |   18 -
 apps/codecs/libfaad/sbr_dct.h   |    2 
 apps/codecs/libfaad/common.c    |    2 
 7 files changed, 201 insertions(+), 543 deletions(-)

Comment by Jens Arnold (amiconn) - Friday, 09 July 2010, 20:08 GMT+2
Measurements

Coldfire MCF5250 (iAudio X5): SVN 33.7 %rt (368.5 MHz) +v15 60.66 %rt (204.72 MHz)
PP5002 (iPod 2nd Gen): SVN 43.44 %rt (184.16 MHz) +v15 52.16 %rt (153.37 MHz)
PP5020 (iRiver H10 6GB): SVN 61.27 %rt (130.56 MHz) +v15 70.28 %rt ((113.83 MHz)

Relative speedups

MCF5250 80%
PP5002 20%
PP5020 15%
Comment by Andree Buschmann (Buschel) - Friday, 09 July 2010, 20:49 GMT+2
Submitted v15 with r27358/r27359.

Loading...