Rockbox

  • Status New
  • Percent Complete
    0%
  • Task Type Patches
  • Category Infrastructure → Build environment
  • Assigned To No-one
  • Operating System HW-codec
  • Severity Low
  • Priority Very Low
  • Reported Version Daily build (which?)
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Boris Gjenero - 2011-12-08

FS#12431 - SH gcc 4.6.2 with link-time optimization, for Archos targets

I'm now able to build a working copy of Rockbox r31177 for my Archos Recorder V2 using binutils 2.21.1 and gcc 4.6.2, with -Os -flto. The main advantages are a binary size and memory use decrease of 7kb and automatic discarding of unused code, and the main disadvantage is much slower linking. I don't know if this is worth it.

The new binutils is needed because a linker plugin is needed to enable link time optimization of object files stored in library archives, like libfirmware.a. Linker plugin support is automatically detected by gcc, so there's no need for -fuse-linker-plugin.

The attached gcc patch is based on the current gcc-4.0.3-rockbox-1.diff by Jens Arnold (amiconn). I still need to investigate whether the workaround in gcc/config/sh/sh.h is actually needed. Including it shouldn't cause any problems. You can find info about it in IRC logs around this date: http://www.rockbox.org/irc/rockbox-20060427.txt

The attached Rockbox patch changes rockboxdev.sh to build this toolchain, configure to add -flto for gcc 4.6.0 and above, and various things so Rockbox builds properly. The gcc patch can't be automatically downloaded by rockboxdev.sh, so put it the download directory, which is by default, /tmp/rbdev-dl. Note that configure will only use -Os if it finds "rockbox" in the sh-elf-gcc version string, so if you want to try an unpatched gcc, you need to edit configure or the generated Makefile.

Most of the code changes simply add attribute1) to stuff that gcc -flto would otherwise throw away. When C code is only referenced by assembler code, gcc will throw it away. This even happens for references from inline assembler in the same C file. Functions in apps/plugins/lib/gcc-support.c were also getting discarded, resulting in "defined in discarded section" errors.

Link time optimization shuffles around code, and then divides into several large assembler files. (Note how in rockbox.map, instead of the normal .o files, you see a bunch of .ltrans.o files.) Code from the same file may end up in different assembler files. This is why the "bsr _UIE" couldn't reach UIE(), and why .global is needed for _start_thread and _UIE4.

Various little notes: I see no improvement with GLOBAL_LDFLAGS=-fwhole-program, so gcc must be detecting that properly. Adding -ffunction-sections -Wl,–gc-sections is also not helpful. The patch doesn't fix some warnings added by using gcc 4.6.2, but there are only a few, and they should be easy to deal with. It also doesn't make changes needed for -flto for other targets. Without -flto, gcc 4.6.2 generates a binary that's 3 kb bigger than the gcc 4.0.3 binary.

1) used
Thomas Martitz commented on 2011-12-08 07:48

Should probably add USED_ATTR to gcc_extensions.h. It's used in a number of places now.

Nils Wallménius commented on 2011-12-08 16:22

How large is the difference in compile time? Also have you tried with -fno-fat-lto-objects ?

Boris Gjenero commented on 2011-12-09 18:30

I committed USED_ATTR support in r31188. An updated patch using USED_ATTR is attached.

Here are my benchmarks for building a r31189 Archos Recorder V2 binary. Times are in seconds. I first performed the operation once without timing. Then, I timed three repeated operations and divided the reported time by 3. Both columns use code patched by sh_flto-v2.patch. The first column uses the normal sh-elf-gcc 4.0.3 built using an unpatched rockboxdev.sh, and the second column uses gcc 4.6.2 built via the patched rockboxdev.sh and -flto. The computer has a Q6600 CPU at 2.4 GHz, 2GB RAM, a WD Black 1TB, hard drive, and Linux Mint Debian Edition with Update Pack 3 from 2011.08.30. I did not use ccache.

make clean, then make -j4
real 38.3 99.8
user 95.8 293
sys 10.9 16.6

touch main.c then make -j4
real 1.1 33.1
user 0.7 32.1
sys 0.1 0.8

Yeah, it really is that bad. (At first I wasn't even sure if I should create this tracker entry.) I didn't try -fno-fat-lto-objects, because there hasn't been a gcc release yet with that feature. It shouldn't help the second case much anyways. One thing that ought to help is parallelizing with -flto=jobserver, but unfortunately it causes linking to sometimes fail with:
make[1]: * read jobs pipe: No such file or directory. Stop.
make[1]:
* Waiting for unfinished jobs….
lto-wrapper: make returned 2 exit status
/opt/sh-2211-462/lib/gcc/sh-elf/4.6.2/../../../../sh-elf/bin/ld: lto-wrapper failed
collect2: ld returned 1 exit status

A further size reduction of 132 bytes binary size and 104 bytes memory used is possible by using -flto-partition=none. That puts everything into one assembler file.

Nils Wallménius commented on 2011-12-10 08:47

Yes, the second case is rather bad

Rafaël Carré commented on 2011-12-17 01:38

It's something I'll accept.

When building ARM targets in thumb mode, building can take twice as long already

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing