Rockbox

Tasklist

FS#5995 - idct coldfire asm for H300 and X5

Attached to Project: Rockbox
Opened by karim boucher (mirak) - Thursday, 14 September 2006, 21:55 GMT
Last edited by Jens Arnold (amiconn) - Tuesday, 16 October 2007, 22:57 GMT
Task Type Patches
Category Plugins
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

this works and improve speed a lot.
This task depends upon

Closed by  Jens Arnold (amiconn)
Tuesday, 16 October 2007, 22:57 GMT
Reason for closing:  Accepted
Additional comments about closing:  Committed an all-asm idct for coldfire based on this patch.
Comment by Rani Hod (RaeNye) - Sunday, 17 September 2006, 14:10 GMT
Two problems:

1) negligible and easy to fix --
idct.c: In function `mpeg2_idct_add_c':
idct.c:1032: warning: unused variable `i'

2) when compiling for X5 with logf support, you get this --
idct.c: In function `idct':
idct.c:317: error: can't find a register in class `ADDR_REGS' while reloading `asm'
Comment by Rani Hod (RaeNye) - Sunday, 17 September 2006, 14:34 GMT
Well done!
I tested this on X5 and got ~22 FPS (!!)

P.S.
problem #1 above is a result of the #ifdef CPU_COLDFIRE; get rid of the compiler warning by having "(void)i;" there.
I couldn't fix problem #2, maybe it's something related to IRAM?

Comment by karim boucher (mirak) - Sunday, 17 September 2006, 22:25 GMT
I don't know what is logf support, but you get this message when gcc can't find a register to give to an input variable %[foo] because all registers are clobbered and none left.
Clobbered means gcc must take care to save them before the asm() function is called because you modifiy them.
It's the list of registers behind last dots ":"
Comment by Rani Hod (RaeNye) - Sunday, 17 September 2006, 22:43 GMT
when running the configure script, instead of (N)ormal build choose (D)eveloper and there enable logf support (=debug messages).
I'm not sure why this should change the asm function which obviously doesn't call logf(), but it needs to be fixed before committing.

BTW, how much FPS do you get on H300?
Comment by Will Robertson (aliask) - Monday, 18 September 2006, 02:27 GMT
I just tested this on my H300, and I'm getting 20-23 FPS in the 16:9 version of Elephant's dream. For the record I used to get 16-19 FPS, so this adds a good 4 FPS on to it.
Comment by karim boucher (mirak) - Monday, 18 September 2006, 08:16 GMT
http://download.rockbox.org/mpeg/elephantsdream-q6-176x96-229kbps.m2v on this one I have ~33 fps
http://download.rockbox.org/mpeg/elephantsdream-q6-224x176-469kbps.m2v 17fps on that one.
I have a 220*176 video running at 14 fps, the bitrate is 600kbs, I am not sure if it's a problem that the video width is not a multiple of 8.
My build have other arays in iram so it may be sligtly faster.
Comment by Rani Hod (RaeNye) - Wednesday, 20 September 2006, 13:32 GMT
amiconn suggested that problem #2 above happens because debug builds don't use -fomit-frame-pointer, so there's one less register available.

I fixed tools/configure such that devel build with logf support (but not debug build) will use -fomit-frame-pointer as well, but the problem still arises with debug builds (incl. simulator).

The Right Thing here would be to use the C version when DEBUG is #defined, I'll try it later.
Comment by karim boucher (mirak) - Wednesday, 20 September 2006, 17:05 GMT
idct_copy and idct add_should have a asm and a c version enabled by a ifdef.
Comment by Rani Hod (RaeNye) - Saturday, 23 September 2006, 01:01 GMT
Arg.
I tried to separate the CF asm functions to a separate idct_cf.S file, but the resulting .rock segfaults.
(See attached patch)

Anyway, you have to consider sim builds as well (#defines CPU_COLDFIRE but runs on x86..)
Comment by karim boucher (mirak) - Monday, 25 September 2006, 22:22 GMT
I added the ifdef and removed the ifdef inside the functions of copy and adds.

I don't know either how to create a .S file, but when it works it will be nice.

On this patch I added the "shortcut" used in the C version. This improve fps.
I also tried to always load datas with a movem from block and dest.
This led to some weird data permutations. I am sure that using movem improve perf for loading datas, however I am not sure yet of some byte swapping are worth to store the bytes to dest.

Since RAM is very slow, I told myself it was better to do some shifts in plus and store a long word instead of 4 bytes.
So I let the "standart" version in the code, I will test later.
Comment by Paul van der Heu (paulheu) - Tuesday, 09 January 2007, 15:52 GMT
patch breaks current cvs..
Comment by Norbert Preining (norbusan) - Tuesday, 09 January 2007, 15:59 GMT
Hi Paul!
Do you mean at compile time or the load problem with ibss segment full? If it is the load case you can remove the IBSS_ATTR attribute and it should work, but not as fast:
--- ./apps/plugins/mpegplayer/decode.c.orig 2007-01-02 16:41:09.000000000 +0100
+++ ./apps/plugins/mpegplayer/decode.c 2007-01-02 16:41:10.000000000 +0100
@@ -417,7 +417,7 @@
}

#if defined(CPU_COLDFIRE) && !defined(SIMULATOR)
-static mpeg2dec_t static_mpeg2dec IBSS_ATTR;
+static mpeg2dec_t static_mpeg2dec;
#endif

mpeg2dec_t * mpeg2_init (void)
Comment by Paul van der Heu (paulheu) - Tuesday, 09 January 2007, 20:06 GMT
I get an error when compiling that IRAM is full (or something like that). Applying the patch above did fix this, but if this is at a (speed) penalty is there no other solution?
Comment by Norbert Preining (norbusan) - Tuesday, 09 January 2007, 23:30 GMT
Hi Paul!
No it is not the perfect solution because it is (much? - I didn't test it) slower than the original one. Last time I followed the IRC #rockbox channel there was some discussion about how to make some memory management working, but this will need some time. In the meantime I include the fix, better than nothing.
Bye Norbert
Comment by Melvin Ambrosio (nivlem1001) - Sunday, 25 February 2007, 02:56 GMT
Hello!

Im a newbie here. How do I install the patch on my H340?
Comment by Dave Chapman (linuxstb) - Sunday, 25 March 2007, 20:34 GMT
karim,

Do you have any plans to go back to this work? It would be nice if you (or someone else) could resync it with current mpegplayer and prepare it for committing.
Comment by Michael Sevakis (MikeS) - Monday, 26 March 2007, 02:50 GMT
Looks like an interesting one to pick on when other items are out of the way.
Comment by Nick Brackley (darksaboteur) - Wednesday, 11 July 2007, 11:14 GMT
Is this patch maintained anymore? If so a sync would be much appreciated :D
Comment by Michael Sevakis (MikeS) - Tuesday, 18 September 2007, 00:04 GMT
Ok, somehow I've gotten into doing this for ARM. Will probably stick with the butterfly form there but initial results with changing the simpler routines looks promising.

There's much room for improvement in this patch since Coldfire can more efficiently 1) use emac multiplication to clamp outputs than it can use shifting. Better yet, 2) scale the emac routines to saturate themselves with no clamping stage by making all output left-justified. Coldfire core DSP uses 1), SPC codec uses 1) and 2). Another word of advice: avoid msac - it's dog slow - and mac the negative product.
Comment by karim boucher (mirak) - Tuesday, 18 September 2007, 13:37 GMT
hi, could you give exemples of the optimisation you are talking about ?

thanks

Loading...