Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Plugins
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by karim boucher - 2006-09-14
Last edited by Jens Arnold - 2007-10-16

FS#5995 - idct coldfire asm for H300 and X5

this works and improve speed a lot.

Closed by  Jens Arnold
2007-10-16 22:57
Reason for closing:  Accepted
Additional comments about closing:  

Committed an all-asm idct for coldfire based on this patch.

Rani Hod commented on 2006-09-17 14:10

Two problems:

1) negligible and easy to fix – idct.c: In function `mpeg2_idct_add_c’:
idct.c:1032: warning: unused variable `i’

2) when compiling for X5 with logf support, you get this – idct.c: In function `idct’:
idct.c:317: error: can’t find a register in class `ADDR_REGS’ while reloading `asm’

Rani Hod commented on 2006-09-17 14:34

Well done!
I tested this on X5 and got ~22 FPS (!!)

P.S.
problem #1 above is a result of the #ifdef CPU_COLDFIRE; get rid of the compiler warning by having “(void)i;” there.
I couldn’t fix problem #2, maybe it’s something related to IRAM?

karim boucher commented on 2006-09-17 22:25

I don’t know what is logf support, but you get this message when gcc can’t find a register to give to an input variable %[foo] because all registers are clobbered and none left.
Clobbered means gcc must take care to save them before the asm() function is called because you modifiy them.
It’s the list of registers behind last dots “:”

Rani Hod commented on 2006-09-17 22:43

when running the configure script, instead of (N)ormal build choose (D)eveloper and there enable logf support (=debug messages).
I’m not sure why this should change the asm function which obviously doesn’t call logf(), but it needs to be fixed before committing.

BTW, how much FPS do you get on H300?

Will Robertson commented on 2006-09-18 02:27

I just tested this on my H300, and I’m getting 20-23 FPS in the 16:9 version of Elephant’s dream. For the record I used to get 16-19 FPS, so this adds a good 4 FPS on to it.

karim boucher commented on 2006-09-18 08:16

http://download.rockbox.org/mpeg/elephantsdream-q6-176x96-229kbps.m2v on this one I have ~33 fps
http://download.rockbox.org/mpeg/elephantsdream-q6-224x176-469kbps.m2v 17fps on that one.
I have a 220*176 video running at 14 fps, the bitrate is 600kbs, I am not sure if it’s a problem that the video width is not a multiple of 8.
My build have other arays in iram so it may be sligtly faster.

Rani Hod commented on 2006-09-20 13:32

amiconn suggested that problem #2 above happens because debug builds don’t use -fomit-frame-pointer, so there’s one less register available.

I fixed tools/configure such that devel build with logf support (but not debug build) will use -fomit-frame-pointer as well, but the problem still arises with debug builds (incl. simulator).

The Right Thing here would be to use the C version when DEBUG is #defined, I’ll try it later.

karim boucher commented on 2006-09-20 17:05

idct_copy and idct add_should have a asm and a c version enabled by a ifdef.

Rani Hod commented on 2006-09-23 01:01

Arg.
I tried to separate the CF asm functions to a separate idct_cf.S file, but the resulting .rock segfaults.
(See attached patch)

Anyway, you have to consider sim builds as well (#defines CPU_COLDFIRE but runs on x86..)

karim boucher commented on 2006-09-25 22:22

I added the ifdef and removed the ifdef inside the functions of copy and adds.

I don’t know either how to create a .S file, but when it works it will be nice.

On this patch I added the “shortcut” used in the C version. This improve fps.
I also tried to always load datas with a movem from block and dest.
This led to some weird data permutations. I am sure that using movem improve perf for loading datas, however I am not sure yet of some byte swapping are worth to store the bytes to dest.

Since RAM is very slow, I told myself it was better to do some shifts in plus and store a long word instead of 4 bytes.
So I let the “standart” version in the code, I will test later.

Paul van der Heu commented on 2007-01-09 15:52

patch breaks current cvs..

Norbert Preining commented on 2007-01-09 15:59

Hi Paul!
Do you mean at compile time or the load problem with ibss segment full? If it is the load case you can remove the IBSS_ATTR attribute and it should work, but not as fast:
— ./apps/plugins/mpegplayer/decode.c.orig 2007-01-02 16:41:09.000000000 +0100
+++ ./apps/plugins/mpegplayer/decode.c 2007-01-02 16:41:10.000000000 +0100
@@ -417,7 +417,7 @@
}

#if defined(CPU_COLDFIRE) && !defined(SIMULATOR)
-static mpeg2dec_t static_mpeg2dec IBSS_ATTR;
+static mpeg2dec_t static_mpeg2dec;
#endif

mpeg2dec_t * mpeg2_init (void)

Paul van der Heu commented on 2007-01-09 20:06

I get an error when compiling that IRAM is full (or something like that). Applying the patch above did fix this, but if this is at a (speed) penalty is there no other solution?

Norbert Preining commented on 2007-01-09 23:30

Hi Paul!
No it is not the perfect solution because it is (much? - I didn’t test it) slower than the original one. Last time I followed the IRC #rockbox channel there was some discussion about how to make some memory management working, but this will need some time. In the meantime I include the fix, better than nothing.
Bye Norbert

Melvin Ambrosio commented on 2007-02-25 02:56

Hello!

Im a newbie here. How do I install the patch on my H340?

Dave Chapman commented on 2007-03-25 20:34

karim,

Do you have any plans to go back to this work? It would be nice if you (or someone else) could resync it with current mpegplayer and prepare it for committing.

Michael Sevakis commented on 2007-03-26 02:50

Looks like an interesting one to pick on when other items are out of the way.

Nick Brackley commented on 2007-07-11 11:14

Is this patch maintained anymore? If so a sync would be much appreciated :D

Michael Sevakis commented on 2007-09-18 00:04

Ok, somehow I’ve gotten into doing this for ARM. Will probably stick with the butterfly form there but initial results with changing the simpler routines looks promising.

There’s much room for improvement in this patch since Coldfire can more efficiently 1) use emac multiplication to clamp outputs than it can use shifting. Better yet, 2) scale the emac routines to saturate themselves with no clamping stage by making all output left-justified. Coldfire core DSP uses 1), SPC codec uses 1) and 2). Another word of advice: avoid msac - it’s dog slow - and mac the negative product.

karim boucher commented on 2007-09-18 13:37

hi, could you give exemples of the optimisation you are talking about ?

thanks

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing