Rockbox

Tasklist

FS#11445 - Speed-up of faad decoder

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Monday, 28 June 2010, 21:04 GMT
Last edited by Andree Buschmann (Buschel) - Thursday, 01 July 2010, 21:46 GMT
Task Type Patches
Category Codecs
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version Release 3.6
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

This patch speeds up the faad decoder. Tested on PP5022 (iPod Video), needs testing on CF targets (MCF5250 and other CF CPU).

Changes:
1) bits.c: move buffer to iram.
2) bits.c: check requested buffer_size to avoid memory overwriting
3) iq_table.h: use large buffer to have full table lookup
4) iq_table.h: move array to iram
5) common.h: introduce CPU dependent iram configuration

Measurements on iPod Video (nero_192.m4a):
svn = 36.12 MHz
patched = 35.49 MHz
This task depends upon

Closed by  Andree Buschmann (Buschel)
Thursday, 01 July 2010, 21:46 GMT
Reason for closing:  Accepted
Additional comments about closing:  Submitted with r27225 and r27226.
Comment by Marcin Bukat (MarcinBukat) - Monday, 28 June 2010, 21:29 GMT
MPIO HD200 (MCF5249), the same file (nero_192.m4a)
svn = 60.22 MHz (206.21% realtime)
patched = 59.31 MHz (209.37% realtime)

~1.5% speedup
Comment by Andree Buschmann (Buschel) - Tuesday, 29 June 2010, 06:47 GMT
Next version with more speed up on ARM. This version should result in a reasonable speed up on CF with large IRAM (MCF5250).

Changes:
1) bits.c: move buffer to iram.
2) bits.c: check requested buffer_size to avoid memory overwriting
3) iq_table.h: use large buffer to have full table lookup
4) common.h: introduce CPU dependent iram configuation
5) aac.c: define 2 global arrays in iram that are linked to the decder-struct -> speed up in the mdct synthesis.

Measurements on iPod Video (nero_192.m4a):
svn = 36.12 MHz
patched = 34.23 MHz
Comment by Jens Arnold (amiconn) - Tuesday, 29 June 2010, 07:11 GMT
Results on iAudio X5 (coldfire MCF5250):
svn (r27168): 60.29MHz (205.96% realtime)
with patch v2: 58.24MHz (213.20% realtime)
Comment by Andree Buschmann (Buschel) - Tuesday, 29 June 2010, 20:34 GMT
Some results from tests on PP5022 (iPod Video) using patch v02 and nero_192.m4a:
- full decode: 34.23 MHz
- ifilter_bank() for 2 channels: 14.95 MHz
- ms_decode(): 1.15 MHz
- quant_to_spec() for 2 channels: 4.87 MHz
- tns/is/pns need ~0.26 MHz in total
- 13 MHz seem to be needed by bitstream decoding. <-- seems to be quite a lot...

The v03-patch does additionally use IRAM for several huffman tables. Speed on PP5022 is 34.05 MHz.
Comment by Andree Buschmann (Buschel) - Tuesday, 29 June 2010, 22:22 GMT
The v04-patch uses the correct source code for iquant() when using FIXED_POINT and BIG_IQ_TABLE. It also refactors the requantization of the spectrum in quant_to_spec(). Speed on PP5022 is 33.34 MHz (svn 36.12 MHz).
Comment by Andree Buschmann (Buschel) - Wednesday, 30 June 2010, 20:58 GMT
v08 includes several additional changes:
- refactor some huffman code
- put some more arrays (mostly huffman tables) to IRAM
- inlining the huffman functions called the most
- inlining some bit.c helper functions

Speed on PP5022 is 32.51 MHz (svn 36.12 MHz).
Speed on MCF5249 is 55.00 MHz (svn 60.29 MHz).
Comment by Jens Arnold (amiconn) - Wednesday, 30 June 2010, 22:03 GMT
Speed on MCF5250 (iAudio X5) with v8: 267.45% realtime (46.43MHz)
SVN see earlier comment.
Comment by Andree Buschmann (Buschel) - Wednesday, 30 June 2010, 22:17 GMT
This patch adds the usage of the asm optimized swap32() instead of faad's own implementation. This speeds up PP5022 to 32.42 MHz (svn 36.12 MHz). Should not have effect on CF targets, but should be shortly verified.
Comment by Jens Arnold (amiconn) - Wednesday, 30 June 2010, 22:20 GMT
Speed on PP5002 (iPod 2nd Gen)
SVN: 162.45% realtime (49.24 MHz)
Patch v8: 177.23% realtime (45.13 MHz)
Comment by Jens Arnold (amiconn) - Wednesday, 30 June 2010, 22:28 GMT
Tiny change on PP5002 with v9: 177.48% realtime (45.07 MHz)
Comment by Andree Buschmann (Buschel) - Thursday, 01 July 2010, 06:01 GMT
The most used codetables are 6, 8, 10 and 11 (6, 8 and 10 could be moved to IRAM) when measuring the overall usage with nero_192.m4a, nero_256.m4a, nero_320.m4a, nero_400.m4a and 2 more 140kbps samples I have for testing here.
Moved as much huffman code tables as possible to IRAM on targets with standard IRAM. This should speed up decoding on MCF5249.

Edit: Updated to v11, because IRAM was full on PP5002.
Comment by Andree Buschmann (Buschel) - Thursday, 01 July 2010, 20:44 GMT
MCF5249 v11: 53.25 MHz (svn 60.22 MHz)

Loading...