Rockbox.org home
release
dev builds
extras
themes manual
wiki
device status forums
mailing lists
IRC bugs
patches
dev guide



Rockbox mail archive

Subject: Re: arrays & pointers

Re: arrays & pointers

From: Nix <nix_at_esperi.org.uk>
Date: 2005-02-06

On Sun, 30 Jan 2005, Greg Haerr uttered the following:
>> DIR* pdir = opendirs;
>> for ( dd=0; dd<MAX_OPEN_DIRS; dd++, pdir++)
>> if ( !pdir->busy )
>> break;
>
> Since we're on the subject of generated code efficiency, I just
> had to jump in... A couple more comments:
>
> 1. With the pdir declaration above, since it's only
> used in the single for() statement, it's initialization should be
> delayed until within the first expression of the for statement.

This is sometimes a benefit, but quite rarely.

> In this way, the compiler may be able to optimize its use
> to a register, and delay an unneeded store/load combo
> if there's code inbetween the declaration and for statement
> (which there is).

Alas, GCC's register allocator is far too stupid (and runs far too late)
to do things like that. (It's just about the last thing which runs,
while code motion runs far earlier and generally tries to move things
*up*, not down.)

> 2. Although some think my next comment a matter of style,
> too many compilers emit different code: dd++ vs ++dd.
> If ++dd is coded, and the processor has an increment
> memory opcode, this will always be emitted. If dd++
> is coded, and the compiler's optimization is enabled,
> this usually also happens, but otherwise a load/add 1/store
> is emitted, which is slower and longer.

This has not been true of any competent *C* compiler for at least a
decade. (C++ compilers are of course often required to emit different
code; if the user has overloaded both operators there is no choice!)

GCC in particular has been able to handle this since before I was
involved in it, probably since GCC 1.something.

> 3. A lesser known evil: not declaring int as 'unsigned int'
> when the variable is ever used in a shift, modulo, divide,
> or pointer offset. For instance, there is a huge difference
> in the code generated by gcc for the following seemingly
> innocuous use of variable i as "int" vs "unsigned int":
>
> f(i>>3);
> a[i / 2]
> etc.
> If the var is not unsigned, gcc must sign extend and other
> madness before emitting more code.

On some architectures, GCC implements division by powers of two in terms
of shifting for speed reasons. But they're not the same operation in the
negative case: we don't want the sign bit to move around, so we have to
cater for that; this takes three shifts and an addition.

This transformation is done on the h/8300 because the combination of
these operations is faster than a division would be. divs take 20 ticks
according to gcc-3.4.2/gcc/config/sh.c; three shifts and an addition
take much less time than that.

To suppress this transformation, use -Os, not -O1 or -O2, both of which
are designed for environments in minimizing execution time is preferred
over minimizing size. In that situation, GCC makes the choice you're
after.

More details below.

With this simple testcase:

void f (int i);

void foo (int i)
 {
  f(i / 2);
 }

(and its analogue with `unsigned int')

We see this extra RTL being generated immediately before the shift with
GCC-3.4.3:

(insn 11 10 12 0 (parallel [
            (set (reg:SI 60)
                (ashiftrt:SI (reg/v:SI 58 [ i ])
                    (const_int 31 [0x1f])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 12 11 13 0 (parallel [
            (set (reg:SI 61)
                (lshiftrt:SI (reg:SI 60)
                    (const_int 31 [0x1f])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 13 12 14 0 (parallel [
            (set (reg:SI 62)
                (plus:SI (reg/v:SI 58 [ i ])
                    (reg:SI 61)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

[and then this, which is present in the unsigned case as well]

(insn 14 13 15 0 (parallel [
            (set (reg:SI 59)
                (ashiftrt:SI (reg:SI 62)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (expr_list:REG_EQUAL (div:SI (reg/v:SI 58 [ i ])
            (const_int 2 [0x2]))
        (nil)))

Compare to the code generated with -Os:

(insn 11 10 12 0 (set (reg:SI 62)
        (const_int 2 [0x2])) -1 (nil)
    (nil))

(insn 12 11 13 0 (parallel [
            (set (reg:SI 60)
                (div:SI (reg/v:SI 58 [ i ])
                    (reg:SI 62)))
            (set (reg:SI 61)
                (mod:SI (reg/v:SI 58 [ i ])
                    (reg:SI 62)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

> When I contributed
> the font code a few years back, I looked over all of it,
> but haven't taken another look lately for this big byte-saving
> fix.

The fix is to pass the right parameters to GCC. :)

-- 
`anybody who quotes Russ [Allbery] can be forgiven almost anything!'
                                                  -- Stephen J. Turnbull
_______________________________________________
http://cool.haxx.se/mailman/listinfo/rockbox
Received on Sun Feb 6 16:05:53 2005

Page was last modified "Jan 10 2012" The Rockbox Crew
aaa