This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#11666 - codeclib arm hand-tuned assembly in fft-ffmpeg doesn't work on Android/-PIC builds
Attached to Project:
Rockbox
Opened by Dave Hooper (stripwax) - Tuesday, 12 October 2010, 01:53 GMT+2
Last edited by Dave Hooper (stripwax) - Friday, 12 November 2010, 11:19 GMT+2
Opened by Dave Hooper (stripwax) - Tuesday, 12 October 2010, 01:53 GMT+2
Last edited by Dave Hooper (stripwax) - Friday, 12 November 2010, 11:19 GMT+2
|
DetailsSee mailing list thread entitled "Segfault with FasterMDCT patch and -fPIC" for more details.
Background: The hand-rolled asm in fft-ffmpeg_arm.h uses a combination of preprocessor macros and individual named register selection, in order to enable most use of stm/ldm (requiring register ordering constraints) without writing all of the asm manually - in theory enabling gcc to still make register allocations, local optimisations, etc. However, it doesn't seem to work very well. In particular, using -fPIC in a regular ARM environment means that gcc needs to use r10 for its own needs, and the current code doesn't seem to enable gcc to realise that I am using r10 for my needs. As a result, gcc doesn't preserve r10 when it ought to, resulting in data aborts. This patch serves three purposes- 1. Remove a number of the preprocessor macros in fft-ffmpeg_arm.h , in favour of static inlines, hopefully resulting in increased readability 2. Propagate address increments outside the scope of the local function (by simply returning the updated address pointer) for a slight reduction in stacking/unstacking and pointer arith (observed thru disassembly) 3. Reduce number of manually-selected specific register allocations. In particular, the register used for "t1" in the TRANSFORM_xxx macros does not need to be manually allocated if we make some small rearrangements. This also means the code no longer needs to specifically reference r10, which hopefully means gcc will track the PIC register correctly now. |
This task depends upon
Closed by Dave Hooper (stripwax)
Friday, 12 November 2010, 11:19 GMT+2
Reason for closing: Accepted
Additional comments about closing: submittied in r28262
Friday, 12 November 2010, 11:19 GMT+2
Reason for closing: Accepted
Additional comments about closing: submittied in r28262
There actually do appear to be some additional ('loosely-documented'...) constraints for use with gcc inline asm - some of which sound fully appropriate to the class of stm/ldm problems. I hadn't realised this before and it doesn't seem to be universal knowledge.
http://hardwarebug.org/2010/07/06/arm-inline-asm-secrets/
If these work, it might be worth playing with "=M"-type constraints to simplify the code somewhat and (ideally) fully remove the manual register allocation. However, the info in the above link seems to disagree with the info in the 'standard' docs (below). If anybody has any insight/first-hand experience on this, would be great to know..
http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Machine-Constraints.html#Machine-Constraints
Updated patch attached. Also test_codec results suggest this is faster by about 2% compared to July data on the wiki. (I didn't actually run a comparison of svn vs patched but will do that shortly)
Without above patch, I get almost identical test_codec results as the file attached earlier.