Rockbox mail archiveSubject: Re: Possible optimizations for coldfire
Re: Possible optimizations for coldfire
From: Jens Arnold <arnold-j_at_t-online.de>
Date: Thu, 20 Apr 2006 00:55:55 +0200
On 19.04.2006, RaeNye wrote:
> 1. Currently each PutPixel costs 2 mem reads, 4 mem writes and
> some shifts (all because of the 18-bit thingy). Would
What do you mean with PutPixel? There is no such function, and
drawing pixels just requires one 16bit read and one 16bit write
on all 16bit colour targets. The X5 framebuffer is 16bit.
> double-buffering help? i.e., keep another LCD buffer (in DRAM)
> representing the state of the LCD /now/;
> whenever lcd_update() is called, we compare the 16-bit
> (32-bit?) pixel value and only update it and the hardware if
> necessary. Obvious con: memory.
> How can this be profiled, btw?
I would expect this to be slower, for several reasons. First,
you would have to read 2 values from RAM instead of one, and
since there's definitely not enough IRAM for 2 buffers at least
one would be in slooow SDRAM.
My idea was to add burst reading with movem, although that
probably won't help much since the framebuffer is currently in
IRAM. That might change later, if we want other stuff that's
even more time critical to reside in IRAM, and then burst
reading really pays off - cf the H300 lcd driver.
> 2. I might be terribly wrong, but can we shut down all GUI
> code when the backlight is off?
> Yes, this means WPS refreshes, scrolling, etc.
No we can't do so in general. We could perhaps do that in
certain modules like the wps, but these modules then have to be
made lcd status aware. They would need to redraw everything
whenever the light goes on.
> 3. GNU's memcpy() and memset() are not using all possible
> registers (i.e. movem.l does only 16 bytes writes instead of
> the possible 48). It also spends so much time on alignment
> (which is not necessary for movem.l, IIRC).
Rockbox' memcpy() and memset() for coldfire are not gnu - they
are our own, coded by me. There are numerous reasons why they
are designed as they are designed.
The aligment isn't a waste at all, it's a huge speed boost.
Check the coldfire manual about the memory controller and burst
access. Thing is, while the coldfire doesn't strictly enforce
alignment, there are performance penalties involved with
unaligned accesses which are severe. The memory controller can
do 1 (byte), 2 (word), 4 (longword) and 16 byte (line) bursts,
but the access must be aligned at <size>-boundaries to be a
burst access. Let's start with a line burst (16 bytes). Misalign
that by 4 bytes, and you're at longword burst level. Speed
penalty: a *factor* of 2.5! Misalign by 2 byte, and you're down
to word accesses: Another factor of 2. Misalign by byte, and you
are at a level where accesses go byte - word - byte. Another
factor of 1.5...
> The iAudio firmware contains smaller and faster versions,
> which *I* cannot contribute to Rockbox -- as that would be
> considered code theft -- but you can write as you only heard
> the general idea :)
Smaller - sure, our versions are quite large (at least memcpy
and memmove). Faster - I strongly doubt that, see above.
Received on 2006-04-20