Rockbox mail archive
Subject: Re: Various bug fixes and improvements - where to put?
From: Jens Arnold (arnold-j_at_t-online.de)
Date: 2004-03-02
Hello [IDC]Dragon
On 02.03.2004, you wrote:
>> as I have done various patches and improvements on different
>> low-level routines in Rockbox I would like to know where I
>> can put these in order to be revised and eventually
>> integrated into the main cvs.
> The patch tracker is the official way to post patches. I know
> it's difficult for the core developers to follow users'
> ...
> I'm very interested in your patches and like to see them
> there.
Ok, will file patches then (not now, has to wait until the
evening).
>> - corrected memory waste in "bitswap.s" and "descramble.s"
>> (simple .align issue)
> Wasting 2 bytes, or what? The alignment may be intentional,
> because the SH fetches two 16-bit instruction in one 32-bit
> bus cycle. The execution time may depend on this pairing.
Not 2 bytes, but more like 12 bytes (maximum) for each align.
I know well from the SH1 manual that memory load/store
instructions in IRAM should be long aligned in order to not
cause contention with instruction fetch. The problem is with the
syntax of the .align pseudo-instruction:
".align <n>" does not align to <n> bytes but to 2^<n> bytes,
so the ".align 4" which is used there aligns to 16 bytes and not
4. For longword alignment ".align 2" must be used.
This is _confirmed_ by assembler output of compiler (gcc -S
xxx.c), internet gcc tips and by comparing final Rockbox size.
>> - 20% faster bitswap
> How that? (OK, I'll wait for your patch posting)
You will see my new loop...
>> - new turbo-charged copy_read_sectors() in "ata.c", with
>> speed
>> relations as follows:
>> | word-aligned | unaligned
>> -------------+-------------------+--------------------
>> C original | 2.02 MB/s (100 %) | 1.71 MB/s (100 %)
>> [IDC]Dragon | 3.42 MB/s (169 %) | 2.22 MB/s (130 %)
>> new version | 4.76 MB/s (236 %) | 3.33 MB/s (195 %)
> Which "C original" is that? The current routines have already
> been optimized on C level by me (look into the cvs history of
> ata.c), so the gain could be even more drastic if you compare
> with the 2.1 code.
This is against _current_ "ata.c" code ("loop compiles to...)
> assembler code. Again. I'm interested in what you did. A bug
> (?) in the inline assembler prevented me from further
> unrolling the copy loop. I did not manage to do stores with an
> immediate offset, so I had to use r0 for that, and this is
> only good for one level of unrolling.
I _do_ actually use stores with immediate offsets! You have to
obey the constraints of these inctructions:
- the source register must always be r0
- the offset can _only_ be positive (no sign extension) and very
small.
- with the inline assembler syntax, you have to give the offset
already shifted according to data width, e.g. if you want to
store a word 2 bytes ahead, you have to write
"mov.w r0,@(2,r<n>)"
If you use "mov.w r0,@(1,r<n>)" you get an assemble error.
>> - the take-out of cli() / sti() and replacement of these by
>> set_irq_level() made disk loading in the video player
>> extremely
>> slow.
>> I don't understand what the intention in replacing cli() /
>> sti() by set_irq_level() really is. The comment says that
>> cli() / sti() are not safe.
> I didn't understand this either, and it seems I have not yet
> compiled this into the version I'm running, else I may have
> noticed the drag. They should be inline functions.
> For the LCD code, we can take out the interrupt disable (like
> my comment says), I only put this in to be safe for the
I know your comment, but I wanted to retain your safety measure.
Please let me know what you think about my idea about nesting
cli() / sti().
Regards, Jens
_______________________________________________
http://cool.haxx.se/mailman/listinfo/rockbox
Page was last modified "Mar 9 2008" The Rockbox Crew
|