Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Release 3.6
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Buschel - 2010-06-28
Last edited by Buschel - 2010-07-01

FS#11445 - Speed-up of faad decoder

This patch speeds up the faad decoder. Tested on PP5022 (iPod Video), needs testing on CF targets (MCF5250 and other CF CPU).

Changes:
1) bits.c: move buffer to iram.
2) bits.c: check requested buffer_size to avoid memory overwriting
3) iq_table.h: use large buffer to have full table lookup
4) iq_table.h: move array to iram
5) common.h: introduce CPU dependent iram configuration

Measurements on iPod Video (nero_192.m4a):
svn = 36.12 MHz patched = 35.49 MHz

Closed by  Buschel
2010-07-01 21:46
Reason for closing:  Accepted
Additional comments about closing:   Warning: Undefined array key "typography" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 371 Warning: Undefined array key "camelcase" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 407

Submitted with r27225 and r27226.

MPIO HD200 (MCF5249), the same file (nero_192.m4a)
svn = 60.22 MHz (206.21% realtime)
patched = 59.31 MHz (209.37% realtime)

~1.5% speedup

Next version with more speed up on ARM. This version should result in a reasonable speed up on CF with large IRAM (MCF5250).

Changes:
1) bits.c: move buffer to iram.
2) bits.c: check requested buffer_size to avoid memory overwriting
3) iq_table.h: use large buffer to have full table lookup
4) common.h: introduce CPU dependent iram configuation
5) aac.c: define 2 global arrays in iram that are linked to the decder-struct → speed up in the mdct synthesis.

Measurements on iPod Video (nero_192.m4a):
svn = 36.12 MHz patched = 34.23 MHz

Results on iAudio X5 (coldfire MCF5250):
svn (r27168): 60.29MHz (205.96% realtime)
with patch v2: 58.24MHz (213.20% realtime)

Some results from tests on PP5022 (iPod Video) using patch v02 and nero_192.m4a:
- full decode: 34.23 MHz - ifilter_bank() for 2 channels: 14.95 MHz - ms_decode(): 1.15 MHz - quant_to_spec() for 2 channels: 4.87 MHz - tns/is/pns need ~0.26 MHz in total
- 13 MHz seem to be needed by bitstream decoding. ←- seems to be quite a lot…

The v03-patch does additionally use IRAM for several huffman tables. Speed on PP5022 is 34.05 MHz.

The v04-patch uses the correct source code for iquant() when using FIXED_POINT and BIG_IQ_TABLE. It also refactors the requantization of the spectrum in quant_to_spec(). Speed on PP5022 is 33.34 MHz (svn 36.12 MHz).

v08 includes several additional changes:
- refactor some huffman code
- put some more arrays (mostly huffman tables) to IRAM
- inlining the huffman functions called the most
- inlining some bit.c helper functions

Speed on PP5022 is 32.51 MHz (svn 36.12 MHz).
Speed on MCF5249 is 55.00 MHz (svn 60.29 MHz).

Speed on MCF5250 (iAudio X5) with v8: 267.45% realtime (46.43MHz)
SVN see earlier comment.

This patch adds the usage of the asm optimized swap32() instead of faad’s own implementation. This speeds up PP5022 to 32.42 MHz (svn 36.12 MHz). Should not have effect on CF targets, but should be shortly verified.

Speed on PP5002 (iPod 2nd Gen)
SVN: 162.45% realtime (49.24 MHz)
Patch v8: 177.23% realtime (45.13 MHz)

Tiny change on PP5002 with v9: 177.48% realtime (45.07 MHz)

The most used codetables are 6, 8, 10 and 11 (6, 8 and 10 could be moved to IRAM) when measuring the overall usage with nero_192.m4a, nero_256.m4a, nero_320.m4a, nero_400.m4a and 2 more 140kbps samples I have for testing here.
Moved as much huffman code tables as possible to IRAM on targets with standard IRAM. This should speed up decoding on MCF5249.

Edit: Updated to v11, because IRAM was full on PP5002.

MCF5249 v11: 53.25 MHz (svn 60.22 MHz)

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing