This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#5995 - idct coldfire asm for H300 and X5
Attached to Project:
Rockbox
Opened by karim boucher (mirak) - Thursday, 14 September 2006, 23:55 GMT+2
Last edited by Jens Arnold (amiconn) - Wednesday, 17 October 2007, 00:57 GMT+2
Opened by karim boucher (mirak) - Thursday, 14 September 2006, 23:55 GMT+2
Last edited by Jens Arnold (amiconn) - Wednesday, 17 October 2007, 00:57 GMT+2
|
Detailsthis works and improve speed a lot.
|
This task depends upon
Closed by Jens Arnold (amiconn)
Wednesday, 17 October 2007, 00:57 GMT+2
Reason for closing: Accepted
Additional comments about closing: Committed an all-asm idct for coldfire based on this patch.
Wednesday, 17 October 2007, 00:57 GMT+2
Reason for closing: Accepted
Additional comments about closing: Committed an all-asm idct for coldfire based on this patch.
1) negligible and easy to fix --
idct.c: In function `mpeg2_idct_add_c':
idct.c:1032: warning: unused variable `i'
2) when compiling for X5 with logf support, you get this --
idct.c: In function `idct':
idct.c:317: error: can't find a register in class `ADDR_REGS' while reloading `asm'
I tested this on X5 and got ~22 FPS (!!)
P.S.
problem #1 above is a result of the #ifdef CPU_COLDFIRE; get rid of the compiler warning by having "(void)i;" there.
I couldn't fix problem #2, maybe it's something related to IRAM?
Clobbered means gcc must take care to save them before the asm() function is called because you modifiy them.
It's the list of registers behind last dots ":"
I'm not sure why this should change the asm function which obviously doesn't call logf(), but it needs to be fixed before committing.
BTW, how much FPS do you get on H300?
http://download.rockbox.org/mpeg/elephantsdream-q6-224x176-469kbps.m2v 17fps on that one.
I have a 220*176 video running at 14 fps, the bitrate is 600kbs, I am not sure if it's a problem that the video width is not a multiple of 8.
My build have other arays in iram so it may be sligtly faster.
I fixed tools/configure such that devel build with logf support (but not debug build) will use -fomit-frame-pointer as well, but the problem still arises with debug builds (incl. simulator).
The Right Thing here would be to use the C version when DEBUG is #defined, I'll try it later.
I tried to separate the CF asm functions to a separate idct_cf.S file, but the resulting .rock segfaults.
(See attached patch)
Anyway, you have to consider sim builds as well (#defines CPU_COLDFIRE but runs on x86..)
I don't know either how to create a .S file, but when it works it will be nice.
On this patch I added the "shortcut" used in the C version. This improve fps.
I also tried to always load datas with a movem from block and dest.
This led to some weird data permutations. I am sure that using movem improve perf for loading datas, however I am not sure yet of some byte swapping are worth to store the bytes to dest.
Since RAM is very slow, I told myself it was better to do some shifts in plus and store a long word instead of 4 bytes.
So I let the "standart" version in the code, I will test later.
Do you mean at compile time or the load problem with ibss segment full? If it is the load case you can remove the IBSS_ATTR attribute and it should work, but not as fast:
--- ./apps/plugins/mpegplayer/decode.c.orig 2007-01-02 16:41:09.000000000 +0100
+++ ./apps/plugins/mpegplayer/decode.c 2007-01-02 16:41:10.000000000 +0100
@@ -417,7 +417,7 @@
}
#if defined(CPU_COLDFIRE) && !defined(SIMULATOR)
-static mpeg2dec_t static_mpeg2dec IBSS_ATTR;
+static mpeg2dec_t static_mpeg2dec;
#endif
mpeg2dec_t * mpeg2_init (void)
No it is not the perfect solution because it is (much? - I didn't test it) slower than the original one. Last time I followed the IRC #rockbox channel there was some discussion about how to make some memory management working, but this will need some time. In the meantime I include the fix, better than nothing.
Bye Norbert
Im a newbie here. How do I install the patch on my H340?
Do you have any plans to go back to this work? It would be nice if you (or someone else) could resync it with current mpegplayer and prepare it for committing.
There's much room for improvement in this patch since Coldfire can more efficiently 1) use emac multiplication to clamp outputs than it can use shifting. Better yet, 2) scale the emac routines to saturate themselves with no clamping stage by making all output left-justified. Coldfire core DSP uses 1), SPC codec uses 1) and 2). Another word of advice: avoid msac - it's dog slow - and mac the negative product.
thanks