Rockbox

This is the bug/patch tracker for Rockbox. Click here for more information.

Quick links: Bugs · Patches · Rockbox frontpage

Tasklist

FS#7951 - Speed optimization for yuv-conversion

Attached to Project: Rockbox
Opened by Andree Buschmann (Buschel) - Sunday, 14 October 2007, 19:45 GMT+2
Last edited by Jens Arnold (amiconn) - Monday, 22 October 2007, 02:39 GMT+2
Task Type Patches
Category LCD
Status Closed
Assigned To No-one
Player Type PortalPlayer-based
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Private No

Details

This patch speeds up the lcd_yuv_blit function for iPod Video a bit and should speed it up a lot for iPod nano, iPod color and iriver H10.

For the iPod Video two changes were made:
- use ICODE_ATTR for lcd_yuv_blit()-function
- simplify the boundary check for red/green/blue components a bit

Result for iPod Video via test_fps:
6.7 -> 7.2 fps (full screen, 30MHz)
20.7 -> 21.8 fps (quarter screen, 30MHz)

For all other players the optimized yuv-conversion was ported from iPod Video to the repespective LCD-drivers. This needs to be tested though as I do not have any of these players.

Please post your experiences/results here.

Thanks :)
   lcd_yuv_blit_optimization_v1.patch (25.5 KiB)
 firmware/target/arm/ipod/video/lcd-video.c    |   54 ++++----
 firmware/target/arm/ipod/lcd-color_nano.c     |  156 ++++++++++---------------
 firmware/target/arm/iriver/h10/lcd-h10_20gb.c |  159 +++++++++++---------------
 firmware/target/arm/iriver/h10/lcd-h10_5gb.c  |  159 +++++++++++---------------
 4 files changed, 228 insertions(+), 300 deletions(-)

This task depends upon

Closed by  Jens Arnold (amiconn)
Monday, 22 October 2007, 02:39 GMT+2
Reason for closing:  Accepted
Additional comments about closing:  Adapted and committed to SVN.
Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 21:54 GMT+2
Just to mention one thing: The asm-routines for the yuv-conversion which were written for sansa/gigabeat could also be optimized further. As I do not have the chance to test on such target I won't post any patches about this. But if you take a look at the patches above you will see that it's possible to sum up nearly all offsets which are part of e.g.

G = ( 74 * (Y' - 16) - 24 * (Cb - 128) - 51 * (Cr - 128) + 128 ) >> 8

to

G = ( 74 * Y' - 24 * Cb - 51 * Cr + ROUNDOFFG ) >> 8, whereas ROUNDOFFG = -74*16 + 24*128 + 51*128 + 128 = const

The asm-routines currently applies offsets to Y', Cb, Cr and before shifting. This can be reduced to one single offset before shifting for each of the components R, G and B.
Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 22:39 GMT+2
In detail for lcd-as-c200.S

following lines are obsolete:

sub r7, r7, #16 @ r7 = Y = (Y' - 16)*74

sub r8, r8, #128 @ Cb -= 128
sub r9, r9, #128 @ Cr -= 128

and lines

add r8, r8, #2 @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
mov r8, r8, asr #2 @
add r9, r9, #256 @ r9 = rv = (r9 + 256) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #128 @ r10 = guv = (-r10 + 128) >> 8
mov r10, r10, asr #8 @

should be replaced with something like this

mov r8, r8, asl #2 @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
sub r8, r8, #541 @
mov r8, r8, asr #4 @
sub r9, r9, #13856 @ r9 = rv = (r9 + ROUNDOFFR) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #8544 @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
mov r10, r10, asr #8 @

Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 22:43 GMT+2
In detail for lcd-as-c200.S

following lines are obsolete:

sub r7, r7, #16 @ r7 = Y = (Y' - 16)*74

sub r8, r8, #128 @ Cb -= 128
sub r9, r9, #128 @ Cr -= 128

and lines

add r8, r8, #2 @ r8 = bu = (Cb*128 + 128) >> 8 -> wrong comment, should be: r8 = bu = (Cb*128 + 256) >> 9
mov r8, r8, asr #2 @
add r9, r9, #256 @ r9 = rv = (r9 + 256) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #128 @ r10 = guv = (-r10 + 128) >> 8
mov r10, r10, asr #8 @

should be replaced with something like this

mov r8, r8, asl #2 @ r8 = bu = (Cb*128 + ROUNDOFFB) >> 9
sub r8, r8, #541 @
mov r8, r8, asr #4 @
sub r9, r9, #13856 @ r9 = rv = (r9 + ROUNDOFFR) >> 9
mov r9, r9, asr #9 @
rsb r10, r10, #8544 @ r10 = guv = (-r10 + ROUNDOFFG) >> 8
mov r10, r10, asr #8 @

Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 22:44 GMT+2
oops, double clicked.. sorry for any inconvenience :o)
Comment by Andree Buschmann (Buschel) - Monday, 15 October 2007, 23:35 GMT+2
no guarantee, can't test it here...
   sansa.patch (3.6 KiB)
 firmware/target/arm/sandisk/sansa-c200/lcd-as-c200.S |   31 ++++++++-----------
 1 file changed, 14 insertions(+), 17 deletions(-)

Comment by Michael Sevakis (MikeS) - Tuesday, 16 October 2007, 00:07 GMT+2
Constants can be no more than 8 bits wide on arm and rotateable by even numbers of bits. A one's compliment move is possible as well.

mov r0, #0xff : r0 = 255
or
mvn r0, #0xff : r0 = -256 (identity: -x = ~x + 1)
mvn r0, #0 : r0 = -1

mov r0, #0x1fe : not legal
mov r0, #0x3fc : ok
mov r0, #0xc0000003f: ok

Comment by Michael Sevakis (MikeS) - Tuesday, 16 October 2007, 00:12 GMT+2
I forgot to mention that you _can_ load a value like 0x123456 but it's done like this:

ldr r0, =0x123456

Which is fine but this requires loading the constant from a memory location so to add 0x123456 to something you'd do:
ldr r0, =0x123456 : this comes from a "constant pool" somewhere near the function
add r1, r1, r0
Comment by Andree Buschmann (Buschel) - Tuesday, 16 October 2007, 08:31 GMT+2
So, this one should be ok then.
   sansa_v2.patch (4.2 KiB)
 firmware/target/arm/sandisk/sansa-c200/lcd-as-c200.S |   36 ++++++++-----------
 1 file changed, 16 insertions(+), 20 deletions(-)

Comment by Thom Johansen (preglow) - Tuesday, 16 October 2007, 13:17 GMT+2
Works for Nano, 2 FPS gain average for one file.
Comment by Thom Johansen (preglow) - Tuesday, 16 October 2007, 16:50 GMT+2
There are immediate plans (by amiconn) on porting these YUV blit functions to assembler, so don't spend too much time on optimising the C versions, btw :)
Comment by Andree Buschmann (Buschel) - Tuesday, 16 October 2007, 18:32 GMT+2
preglow: heard about this in IRC yesterday -- after working on my patch :/
Comment by Thom Johansen (preglow) - Tuesday, 16 October 2007, 21:47 GMT+2
Commited the Nano and Video ones, can't test the rest. Will close this when the ASM versions are commited.
Comment by Andree Buschmann (Buschel) - Sunday, 21 October 2007, 02:52 GMT+2
New version of patch with assembler optimization for iPod Video, all other targets were dropped as there seems to be an assembler optimization under development right now.

YUV-blit performance is increasing from 7.2 fps -> 8.5 fps (30MHz) for fullscreen on the iPod Video.

I could not add the chroma_buf as I have an internal compiler error when start in working on this... Maybe someone can solve this?
   lcd_yuv_blit_optimization_v2.patch (15.9 KiB)
 firmware/target/arm/ipod/video/lcd-video.c |  307 +++++++++++++++++++----------
 1 file changed, 210 insertions(+), 97 deletions(-)

Comment by Andree Buschmann (Buschel) - Sunday, 21 October 2007, 22:33 GMT+2
New version with chroma-buffer :o)

With this patch the following speed is reached:

fps YUV (full/quarter screen):
8.7 / 25.2 (30MHz), 19.1 / 42 (80MHz)

fps MPEGplayer (elephants dream):
128x96 128x128 160x96 160x128 176x96 176x128 224x128 224x176 320x176 320x240
47 41 43 37 41 35 30 24 17 13
   lcd_yuv_blit_optimization_v3.patch (15.6 KiB)
 firmware/target/arm/ipod/video/lcd-video.c |  300 +++++++++++++++++++----------
 1 file changed, 201 insertions(+), 99 deletions(-)

Loading...