Rockbox

Tasklist

FS#7951 - Speed optimization for yuv-conversion

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Sunday, 14 October 2007, 17:45 GMT
Last edited by Jens Arnold (amiconn) - Monday, 22 October 2007, 00:39 GMT
Task Type Patches
Category LCD
Status Closed
Assigned To No-one
Operating System PortalPlayer-based
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

This patch speeds up the lcd_yuv_blit function for iPod Video a bit and should speed it up a lot for iPod nano, iPod color and iriver H10.

For the iPod Video two changes were made:
- use ICODE_ATTR for lcd_yuv_blit()-function
- simplify the boundary check for red/green/blue components a bit

Result for iPod Video via test_fps:
6.7 -> 7.2 fps (full screen, 30MHz)
20.7 -> 21.8 fps (quarter screen, 30MHz)

For all other players the optimized yuv-conversion was ported from iPod Video to the repespective LCD-drivers. This needs to be tested though as I do not have any of these players.

Please post your experiences/results here.

Thanks :)
This task depends upon

Closed by  Jens Arnold (amiconn)
Monday, 22 October 2007, 00:39 GMT
Reason for closing:  Accepted
Additional comments about closing:  Adapted and committed to SVN.
Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 19:54 GMT
Just to mention one thing: The asm-routines for the yuv-conversion which were written for sansa/gigabeat could also be optimized further. As I do not have the chance to test on such target I won't post any patches about this. But if you take a look at the patches above you will see that it's possible to sum up nearly all offsets which are part of e.g.

G = ( 74 * (Y' - 16) - 24 * (Cb - 128) - 51 * (Cr - 128) + 128 ) >> 8

to

G = ( 74 * Y' - 24 * Cb - 51 * Cr + ROUNDOFFG ) >> 8, whereas ROUNDOFFG = -74*16 + 24*128 + 51*128 + 128 = const

The asm-routines currently applies offsets to Y', Cb, Cr and before shifting. This can be reduced to one single offset before shifting for each of the components R, G and B.
Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 20:39 GMT
In detail for lcd-as-c200.S

following lines are obsolete:

sub r7, r7, #16 @ r7 = Y = (Y' - 16)*74

sub r8, r8, #128 @ Cb -= 128
sub r9, r9, #128 @ Cr -= 128

and lines

add r8, r8, #2 @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
mov r8, r8, asr #2 @
add r9, r9, #256 @ r9 = rv = (r9 + 256) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #128 @ r10 = guv = (-r10 + 128) >> 8
mov r10, r10, asr #8 @

should be replaced with something like this

mov r8, r8, asl #2 @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
sub r8, r8, #541 @
mov r8, r8, asr #4 @
sub r9, r9, #13856 @ r9 = rv = (r9 + ROUNDOFFR) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #8544 @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
mov r10, r10, asr #8 @

Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 20:43 GMT
In detail for lcd-as-c200.S

following lines are obsolete:

sub r7, r7, #16 @ r7 = Y = (Y' - 16)*74

sub r8, r8, #128 @ Cb -= 128
sub r9, r9, #128 @ Cr -= 128

and lines

add r8, r8, #2 @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
mov r8, r8, asr #2 @
add r9, r9, #256 @ r9 = rv = (r9 + 256) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #128 @ r10 = guv = (-r10 + 128) >> 8
mov r10, r10, asr #8 @

should be replaced with something like this

mov r8, r8, asl #2 @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
sub r8, r8, #541 @
mov r8, r8, asr #4 @
sub r9, r9, #13856 @ r9 = rv = (r9 + ROUNDOFFR) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #8544 @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
mov r10, r10, asr #8 @

Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 20:44 GMT
oops, double clicked.. sorry for any inconvenience :o)
Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 21:35 GMT
no guarantee, can't test it here...
Comment by Michael Sevakis (MikeS) - Monday, 15 October 2007, 22:07 GMT
Constants can be no more than 8 bits wide on arm and rotateable by even numbers of bits. A one's compliment move is possible as well.

mov r0, #0xff : r0 = 255
or
mvn r0, #0xff : r0 = -256 (identity: -x = ~x + 1)
mvn r0, #0 : r0 = -1

mov r0, #0x1fe : not legal
mov r0, #0x3fc : ok
mov r0, #0xc0000003f: ok

Comment by Michael Sevakis (MikeS) - Monday, 15 October 2007, 22:12 GMT
I forgot to mention that you _can_ load a value like 0x123456 but it's done like this:

ldr r0, =0x123456

Which is fine but this requires loading the constant from a memory location so to add 0x123456 to something you'd do:
ldr r0, =0x123456 : this comes from a "constant pool" somewhere near the function
add r1, r1, r0
Comment by Andree Buschmann (Buschel) - Tuesday, 16 October 2007, 06:31 GMT
So, this one should be ok then.
Comment by Thom Johansen (preglow) - Tuesday, 16 October 2007, 11:17 GMT
Works for Nano, 2 FPS gain average for one file.
Comment by Thom Johansen (preglow) - Tuesday, 16 October 2007, 14:50 GMT
There are immediate plans (by amiconn) on porting these YUV blit functions to assembler, so don't spend too much time on optimising the C versions, btw :)
Comment by Andree Buschmann (Buschel) - Tuesday, 16 October 2007, 16:32 GMT
preglow: heard about this in IRC yesterday -- after working on my patch :/
Comment by Thom Johansen (preglow) - Tuesday, 16 October 2007, 19:47 GMT
Commited the Nano and Video ones, can't test the rest. Will close this when the ASM versions are commited.
Comment by Andree Buschmann (Buschel) - Sunday, 21 October 2007, 00:52 GMT
New version of patch with assembler optimization for iPod Video, all other targets were dropped as there seems to be an assembler optimization under development right now.

YUV-blit performance is increasing from 7.2 fps -> 8.5 fps (30MHz) for fullscreen on the iPod Video.

I could not add the chroma_buf as I have an internal compiler error when start in working on this... Maybe someone can solve this?
Comment by Andree Buschmann (Buschel) - Sunday, 21 October 2007, 20:33 GMT
New version with chroma-buffer :o)

With this patch the following speed is reached:

fps YUV (full/quarter screen):
8.7 / 25.2 (30MHz), 19.1 / 42 (80MHz)

fps MPEGplayer (elephants dream):
128x96 128x128 160x96 160x128 176x96 176x128 224x128 224x176 320x176 320x240
47 41 43 37 41 35 30 24 17 13

Loading...