Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Release 3.6
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Buschel - 2010-07-06
Last edited by Buschel - 2010-07-09

FS#11461 - libfaad aac-he speed optimization

This patch adds some optimization to the sbr filter that is used by aac-he files. Several loops were unrolled in the analysis and synthesis filters as well as several variables and tables were moved to IRAM.
Speed on PP5022 with 64kaache.m4a: 115.5 MHz (svn 130 MHz)

Closed by  Buschel
2010-07-09 18:50
Reason for closing:  Accepted
Additional comments about closing:   Warning: Undefined array key "typography" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 371 Warning: Undefined array key "camelcase" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 407

Submitted v15 with r27358/r27359

Thanks to saratoga and stripwax for the hints! Now the codeclib’s fft is used instead of the libfaad-fft.
Speed on PP5022 with 64kaache.m4a: 107.4 MHz (svn 115.5 MHz)

Clean up latest fft-change and remove unneeded bit_rev_tab2.
Speed on PP5022 with 64kaache.m4a: 106.5 MHz (svn 130 MHz)

v08: MCF5249 64kaache.m4a 289.47MHz (svn 367.95MHz)

v10:
- changes dct to “in-place”.
- several arrays could be removed based on the “in-place” dct
- shifts were moved to the final output to reduce amount of shifts

Speed on PP5022 with 64kaache.m4a: 106.1 MHz (svn 130 MHz)

Refacture loop in sbr_qmf_synthesis_64() before calling dct4_kernel(). Now this loop can be further optimized via ldm commands.

Speed on PP5022 with 64kaache.m4a: 106.0 MHz (svn 130 MHz)

Tested on my Clipv2 w/ 40MHz PCLK (basically then the same as a Fuzev2):

120.07 (SVN) → 115.7 MHz (v11).

Its surprising that PP is faster. The improved multiplier, fast memory, and 64 bit wide L1 bus usually means that filterbanks are much faster on AMSv2. Mp3 for instance is something like 25% faster vs. PP.

Some more benchmarks on the above clipv2 to see how much different functions used:

hf_generation → 20.0 MHz hf_adjustment → 14.35 MHz

Also, I made a quick sketch of how the SBR stuff fits into LC because the libfaad code is so confusing:

reconstruct_channel_pair→

      ifilter_bank ->
              ff_imdct_calc
      sbrDecodeInit		
sbrDecodeCoupleFrame ->
	sbr_process_channel -> 
		sbr_qmf_analysis_32
		hf_generation (20.0 MHz) -> 
			calc_prediction_coef
		hf_adjustment (14.35 MHz) ->	
			estimate_current_envelope
			calculate_gain
			hf_assembly
	sbr_qmf_synthesis_32

Results on h100: SVN = 33.28% realtime, 373 MHz needed – with patch v11 = 41.14% realtime, 263 MHz needed

Defining own dct4_revtab[] (derived from codeclib) to be able to move this array to IRAM, refactored parts of sbr_qmf_analysis_32(), use LIKELY-macros in iquant() and finally some tweaking with IRAM configuration.

Speed on PP5022 with 64kaache.m4a: 104.8 MHz (svn 130 MHz)
Speed on MCF5249 with 64kaache.m4a: 242 MHz (svn 373 MHz)

Retest on MCF5250 still missing.

Measurements

Coldfire MCF5250 (iAudio X5): SVN 33.7 %rt (368.5 MHz) +v15 60.66 %rt (204.72 MHz)
PP5002 (iPod 2nd Gen): SVN 43.44 %rt (184.16 MHz) +v15 52.16 %rt (153.37 MHz)
PP5020 (iRiver H10 6GB): SVN 61.27 %rt (130.56 MHz) +v15 70.28 %rt ((113.83 MHz)

Relative speedups

MCF5250 80%
PP5002 20%
PP5020 15%

Submitted v15 with r27358/r27359.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing