• Status Closed
  • Percent Complete
  • Task Type Patches
  • Category Plugins
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by mirak - 2006-09-14
Last edited by amiconn - 2007-10-16

FS#5995 - idct coldfire asm for H300 and X5

this works and improve speed a lot.

Closed by  amiconn
2007-10-16 22:57
Reason for closing:  Accepted
Additional comments about closing:   Warning: Undefined array key "typography" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 371 Warning: Undefined array key "camelcase" in /home/rockbox/flyspray/plugins/dokuwiki/inc/parserutils.php on line 407

Committed an all-asm idct for coldfire based on this patch.

Two problems:

1) negligible and easy to fix – idct.c: In function `mpeg2_idct_add_c’:
idct.c:1032: warning: unused variable `i’

2) when compiling for X5 with logf support, you get this – idct.c: In function `idct’:
idct.c:317: error: can’t find a register in class `ADDR_REGS’ while reloading `asm’

Well done!
I tested this on X5 and got ~22 FPS (!!)

problem #1 above is a result of the #ifdef CPU_COLDFIRE; get rid of the compiler warning by having “(void)i;” there.
I couldn’t fix problem #2, maybe it’s something related to IRAM?

mirak commented on 2006-09-17 22:25

I don’t know what is logf support, but you get this message when gcc can’t find a register to give to an input variable %[foo] because all registers are clobbered and none left.
Clobbered means gcc must take care to save them before the asm() function is called because you modifiy them.
It’s the list of registers behind last dots “:”

when running the configure script, instead of (N)ormal build choose (D)eveloper and there enable logf support (=debug messages).
I’m not sure why this should change the asm function which obviously doesn’t call logf(), but it needs to be fixed before committing.

BTW, how much FPS do you get on H300?

I just tested this on my H300, and I’m getting 20-23 FPS in the 16:9 version of Elephant’s dream. For the record I used to get 16-19 FPS, so this adds a good 4 FPS on to it.

mirak commented on 2006-09-18 08:16 on this one I have ~33 fps 17fps on that one.
I have a 220*176 video running at 14 fps, the bitrate is 600kbs, I am not sure if it’s a problem that the video width is not a multiple of 8.
My build have other arays in iram so it may be sligtly faster.

amiconn suggested that problem #2 above happens because debug builds don’t use -fomit-frame-pointer, so there’s one less register available.

I fixed tools/configure such that devel build with logf support (but not debug build) will use -fomit-frame-pointer as well, but the problem still arises with debug builds (incl. simulator).

The Right Thing here would be to use the C version when DEBUG is #defined, I’ll try it later.

mirak commented on 2006-09-20 17:05

idct_copy and idct add_should have a asm and a c version enabled by a ifdef.

I tried to separate the CF asm functions to a separate idct_cf.S file, but the resulting .rock segfaults.
(See attached patch)

Anyway, you have to consider sim builds as well (#defines CPU_COLDFIRE but runs on x86..)

mirak commented on 2006-09-25 22:22

I added the ifdef and removed the ifdef inside the functions of copy and adds.

I don’t know either how to create a .S file, but when it works it will be nice.

On this patch I added the “shortcut” used in the C version. This improve fps.
I also tried to always load datas with a movem from block and dest.
This led to some weird data permutations. I am sure that using movem improve perf for loading datas, however I am not sure yet of some byte swapping are worth to store the bytes to dest.

Since RAM is very slow, I told myself it was better to do some shifts in plus and store a long word instead of 4 bytes.
So I let the “standart” version in the code, I will test later.

patch breaks current cvs..

Hi Paul!
Do you mean at compile time or the load problem with ibss segment full? If it is the load case you can remove the IBSS_ATTR attribute and it should work, but not as fast:
— ./apps/plugins/mpegplayer/decode.c.orig 2007-01-02 16:41:09.000000000 +0100
+++ ./apps/plugins/mpegplayer/decode.c 2007-01-02 16:41:10.000000000 +0100
@@ -417,7 +417,7 @@

#if defined(CPU_COLDFIRE) && !defined(SIMULATOR)
-static mpeg2dec_t static_mpeg2dec IBSS_ATTR;
+static mpeg2dec_t static_mpeg2dec;

mpeg2dec_t * mpeg2_init (void)

I get an error when compiling that IRAM is full (or something like that). Applying the patch above did fix this, but if this is at a (speed) penalty is there no other solution?

Hi Paul!
No it is not the perfect solution because it is (much? - I didn’t test it) slower than the original one. Last time I followed the IRC #rockbox channel there was some discussion about how to make some memory management working, but this will need some time. In the meantime I include the fix, better than nothing.
Bye Norbert


Im a newbie here. How do I install the patch on my H340?


Do you have any plans to go back to this work? It would be nice if you (or someone else) could resync it with current mpegplayer and prepare it for committing.

MikeS commented on 2007-03-26 02:50

Looks like an interesting one to pick on when other items are out of the way.

Is this patch maintained anymore? If so a sync would be much appreciated :D

MikeS commented on 2007-09-18 00:04

Ok, somehow I’ve gotten into doing this for ARM. Will probably stick with the butterfly form there but initial results with changing the simpler routines looks promising.

There’s much room for improvement in this patch since Coldfire can more efficiently 1) use emac multiplication to clamp outputs than it can use shifting. Better yet, 2) scale the emac routines to saturate themselves with no clamping stage by making all output left-justified. Coldfire core DSP uses 1), SPC codec uses 1) and 2). Another word of advice: avoid msac - it’s dog slow - and mac the negative product.

mirak commented on 2007-09-18 13:37

hi, could you give exemples of the optimisation you are talking about ?



Available keyboard shortcuts


Task Details

Task Editing