Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System Another
  • Severity Low
  • Priority Very Low
  • Reported Version Release 3.6
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by stripwax - 2010-10-11
Last edited by stripwax - 2010-11-12

FS#11666 - codeclib arm hand-tuned assembly in fft-ffmpeg doesn't work on Android/-PIC builds

See mailing list thread entitled “Segfault with FasterMDCT patch and -fPIC” for more details.

Background:
The hand-rolled asm in fft-ffmpeg_arm.h uses a combination of preprocessor macros and individual named register selection, in order to enable most use of stm/ldm (requiring register ordering constraints) without writing all of the asm manually - in theory enabling gcc to still make register allocations, local optimisations, etc.
However, it doesn’t seem to work very well. In particular, using -fPIC in a regular ARM environment means that gcc needs to use r10 for its own needs, and the current code doesn’t seem to enable gcc to realise that I am using r10 for my needs. As a result, gcc doesn’t preserve r10 when it ought to, resulting in data aborts.

This patch serves three purposes-
1. Remove a number of the preprocessor macros in fft-ffmpeg_arm.h , in favour of static inlines, hopefully resulting in increased readability
2. Propagate address increments outside the scope of the local function (by simply returning the updated address pointer) for a slight reduction in stacking/unstacking and pointer arith (observed thru disassembly)
3. Reduce number of manually-selected specific register allocations. In particular, the register used for “t1” in the TRANSFORM_xxx macros does not need to be manually allocated if we make some small rearrangements. This also means the code no longer needs to specifically reference r10, which hopefully means gcc will track the PIC register correctly now.

Closed by  stripwax
2010-11-12 10:19
Reason for closing:  Accepted
Additional comments about closing:   Warning: Undefined array key "typography" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 371 Warning: Undefined array key "camelcase" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 407

submittied in r28262

sim seems to work fine still, so I think just need confirmation that the -fPIC problem has been resolved

is there any change in decoding speed?

I did not test rigorously (i.e. didn’t run test-codec) but didn’t observe any obvious boost% differences from “debug buffering thread”. Ideally these changes should actually increase decoding speed (fractionally) as I’m hard-coding fewer manual register allocations.

There actually do appear to be some additional (’loosely-documented’…) constraints for use with gcc inline asm - some of which sound fully appropriate to the class of stm/ldm problems. I hadn’t realised this before and it doesn’t seem to be universal knowledge.
http://hardwarebug.org/2010/07/06/arm-inline-asm-secrets/ If these work, it might be worth playing with “=M”-type constraints to simplify the code somewhat and (ideally) fully remove the manual register allocation. However, the info in the above link seems to disagree with the info in the ‘standard’ docs (below). If anybody has any insight/first-hand experience on this, would be great to know..
http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Machine-Constraints.html#Machine-Constraints

(actually, rereading that hardwarebug link, seems that the way to use this isn’t “=M” in the input/output/clobbers, but rather %M0, %M1 etc in the asm itself. “undocumented feature” indeed…)

Well the above patch didn’t seem to work successfully for me on an alternative arm platform (NSLU2, if you were wondering).

Updated patch attached. Also test_codec results suggest this is faster by about 2% compared to July data on the wiki. (I didn’t actually run a comparison of svn vs patched but will do that shortly)

I suspect the wiki data is stale. http://www.rockbox.org/wiki/CodecPerformanceComparison#Portal_Player_ARM7TDMI Without above patch, I get almost identical test_codec results as the file attached earlier.

commited .. but i’ll keep the patch tracker open to discuss any issues

I think this can be closed, the bug is fixed.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing