Rockbox

Tasklist

FS#11461 - libfaad aac-he speed optimization

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Tuesday, 06 July 2010, 20:05 GMT
Last edited by Andree Buschmann (Buschel) - Friday, 09 July 2010, 18:50 GMT
Task Type Patches
Category Codecs
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version Release 3.6
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

This patch adds some optimization to the sbr filter that is used by aac-he files. Several loops were unrolled in the analysis and synthesis filters as well as several variables and tables were moved to IRAM.
Speed on PP5022 with 64kaache.m4a: 115.5 MHz (svn 130 MHz)
This task depends upon

Closed by  Andree Buschmann (Buschel)
Friday, 09 July 2010, 18:50 GMT
Reason for closing:  Accepted
Additional comments about closing:  Submitted v15 with r27358/r27359
Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 06:31 GMT
Thanks to saratoga and stripwax for the hints! Now the codeclib's fft is used instead of the libfaad-fft.
Speed on PP5022 with 64kaache.m4a: 107.4 MHz (svn 115.5 MHz)
Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 18:23 GMT
Clean up latest fft-change and remove unneeded bit_rev_tab2.
Speed on PP5022 with 64kaache.m4a: 106.5 MHz (svn 130 MHz)
Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 20:25 GMT
v08: MCF5249 64kaache.m4a 289.47MHz (svn 367.95MHz)
Comment by Andree Buschmann (Buschel) - Wednesday, 07 July 2010, 21:58 GMT
v10:
- changes dct to "in-place".
- several arrays could be removed based on the "in-place" dct
- shifts were moved to the final output to reduce amount of shifts

Speed on PP5022 with 64kaache.m4a: 106.1 MHz (svn 130 MHz)
Comment by Andree Buschmann (Buschel) - Thursday, 08 July 2010, 06:26 GMT
Refacture loop in sbr_qmf_synthesis_64() before calling dct4_kernel(). Now this loop can be further optimized via ldm commands.

Speed on PP5022 with 64kaache.m4a: 106.0 MHz (svn 130 MHz)
Comment by MichaelGiacomelli (saratoga) - Thursday, 08 July 2010, 15:30 GMT
Tested on my Clipv2 w/ 40MHz PCLK (basically then the same as a Fuzev2):

120.07 (SVN) -> 115.7 MHz (v11).

Its surprising that PP is faster. The improved multiplier, fast memory, and 64 bit wide L1 bus usually means that filterbanks are much faster on AMSv2. Mp3 for instance is something like 25% faster vs. PP.
Comment by MichaelGiacomelli (saratoga) - Thursday, 08 July 2010, 17:47 GMT
Some more benchmarks on the above clipv2 to see how much different functions used:

hf_generation -> 20.0 MHz
hf_adjustment -> 14.35 MHz

Also, I made a quick sketch of how the SBR stuff fits into LC because the libfaad code is so confusing:


reconstruct_channel_pair->
ifilter_bank ->
ff_imdct_calc
sbrDecodeInit

sbrDecodeCoupleFrame ->
sbr_process_channel ->
sbr_qmf_analysis_32
hf_generation (20.0 MHz) ->
calc_prediction_coef
hf_adjustment (14.35 MHz) ->
estimate_current_envelope
calculate_gain
hf_assembly

sbr_qmf_synthesis_32
Comment by Andree Buschmann (Buschel) - Thursday, 08 July 2010, 20:06 GMT
Results on h100: SVN = 33.28% realtime, 373 MHz needed -- with patch v11 = 41.14% realtime, 263 MHz needed
Comment by Andree Buschmann (Buschel) - Thursday, 08 July 2010, 22:31 GMT
Defining own dct4_revtab[] (derived from codeclib) to be able to move this array to IRAM, refactored parts of sbr_qmf_analysis_32(), use LIKELY-macros in iquant() and finally some tweaking with IRAM configuration.

Speed on PP5022 with 64kaache.m4a: 104.8 MHz (svn 130 MHz)
Speed on MCF5249 with 64kaache.m4a: 242 MHz (svn 373 MHz)

Retest on MCF5250 still missing.
Comment by Jens Arnold (amiconn) - Friday, 09 July 2010, 18:08 GMT
Measurements

Coldfire MCF5250 (iAudio X5): SVN 33.7 %rt (368.5 MHz) +v15 60.66 %rt (204.72 MHz)
PP5002 (iPod 2nd Gen): SVN 43.44 %rt (184.16 MHz) +v15 52.16 %rt (153.37 MHz)
PP5020 (iRiver H10 6GB): SVN 61.27 %rt (130.56 MHz) +v15 70.28 %rt ((113.83 MHz)

Relative speedups

MCF5250 80%
PP5002 20%
PP5020 15%
Comment by Andree Buschmann (Buschel) - Friday, 09 July 2010, 18:49 GMT
Submitted v15 with r27358/r27359.

Loading...