Rockbox

Tasklist

FS#10805 - AMS Sansas: Vastly increase SD driver performance

Attached to Project: Rockbox
Opened by Thomas Martitz (kugel.) - Monday, 23 November 2009, 01:23 GMT
Last edited by Rafaël Carré (funman) - Wednesday, 23 June 2010, 04:34 GMT
Task Type Patches
Category Drivers
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version Release 3.4
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

This patch greatly improves SD driver performance, at the cost of memory.

The memory cost is zero for 8MB targets, because the sd buffer is used to IRAM, which is nearly unused as of now (this isn't visible in the RAM usage which rockbox-info.txt gives). It's about 10k for 2MB targets (additionally the buffer cannot be in IRAM on those).

I've made some analysis on different settings: increasing the aligned_buffer, moving it to iram, not using it for properly aligned transfers. The outcome of the analysis is visible here: http://www.alice-dsl.net/simonemartitz/rockbox/fuze_sd_performance.pdf

It turns out that a big ( aligned_buffer at IRAM gives a huge (~250%) transfer rate increase over SVN. A 32*SECTOR_SIZE buffer gives good performance too, but is noticable slower. Using the passed buffer (if word aligned) directly together with cache coherency functions is very fast too, but only increases performance on aligned transfers.

So, I decided to use the fastest setting on 8MB Samsas (64*SECTOR_SIZE buffer at IRAM).
on 2MB samsas I used a combo of 32*SECTOR_SIZE and using the passed buffer directly (this saves 15k RAM, at the cost of speed, code complexity and messing with the caches).

There's 2 problems: a) This separates the way the driver works for the Samsas (not so much code wise, but logic wise), I don't really like that.
b) the code path for 2MB targets gives data and prefetch aborts on my clip, but works fine on my fuze. I have no idea why!

I honestly think the 2MB targets, being flash devices, should also use the fastest setting. That would cost 25k over SVN (this patch adds 10k anyway, so it's effectively only 15k more RAM), but I think the unified code paths and better performance are worth it IMO.
This task depends upon

Closed by  Rafaël Carré (funman)
Wednesday, 23 June 2010, 04:34 GMT
Reason for closing:  Accepted
Additional comments about closing:  r27074
Comment by Paul Louden (Llorean) - Monday, 23 November 2009, 02:58 GMT
What do you actually gain from the better performance? As in, for a user just listening to music, what is this increased performance going to show up as?
Comment by MichaelGiacomelli (saratoga) - Monday, 23 November 2009, 04:20 GMT
Faster buffering is nice, both because theres less overhead and because the flash spends less time powered up.

And of course once we have USB it will make that a lot better.
Comment by Rafaël Carré (funman) - Tuesday, 24 November 2009, 13:08 GMT
Did you measure impact on battery life ?
Comment by Thomas Martitz (kugel.) - Tuesday, 24 November 2009, 13:44 GMT
No, but I doubt it's noticeable. On my fuze, rebuffering takes ~1s with it, instead of ~2. With 128k mp3 it would rebuffer every 4 (and a few seconds) minutes, so the time spend on rebuffering is negible in either case. So, battery life could be incresed by a few minutes if at all.
Comment by Dave Hooper (stripwax) - Saturday, 05 December 2009, 23:16 GMT
Apart from buffering, presumably this would improve user experience for loading large files, large playlists, pictureflow, etc ?
Comment by Thomas Martitz (kugel.) - Sunday, 06 December 2009, 00:30 GMT
Picture flow is definitely improved. I can't notice any "?" covers when scrolling very quickly anymore (those "?" appear for a) no album art for a particular album or b) didn't load the aa from the disk yet).
Comment by Boris Nazaroff (bor_ka) - Sunday, 13 December 2009, 10:50 GMT
Does this patch works with the latest changes in the AMS SD code?
Comment by Marc Aarts (topik) - Monday, 14 December 2009, 13:37 GMT
It doesn't work against r23989 (anymore). 5 of 8 hunks fail when patching. I'd like to try this patch so please make it work with the current state of ata_sd_as3525.c.
Comment by Jack Halpin (FlynDice) - Tuesday, 15 December 2009, 07:02 GMT
Sync to r24003.

This boots up just fine but panics when I try to play:

Dir entry 9 in sector 3 is not free! 62 00 69 00
Comment by Boris Nazaroff (bor_ka) - Tuesday, 15 December 2009, 13:45 GMT
I don't think that just syncing is a good idea, since there were serious changes in the SD code recently.
Comment by Rafaël Carré (funman) - Wednesday, 19 May 2010, 22:12 GMT
PL081 DMA controller doc says:
"You must align the source and destination addresses to the source and destination width"

We use 8 words width so buffer needs to be aligned on 32 bytes.

I just tried something like this on Clip+, and also defining STORAGE_WANTS_ALIGN, but got crashes in codecs, while mpegplayer worked fine.

Idea: perhaps the buffers are not up-aligned and last cache line gets used anyway?
Comment by Rafaël Carré (funman) - Thursday, 20 May 2010, 11:25 GMT
My try at aligning up and down the buffers on 8*4 = 32.

32 is the DMA requirement and the cacheline size, so both up & down cachelines aren't reused after we cleaned/dumped them.

usb_storage not complete.

calls to read()/write() aren't aligned because they use provided buffer directly.

test_disk still doesn't work, not sure why..
Comment by Rafaël Carré (funman) - Thursday, 20 May 2010, 12:54 GMT
I forgot the 127 sectors limit of MCI_DATA_LENGTH

test_disk still works when forcing copy to uncached buffer, else it doesn't
Comment by Rafaël Carré (funman) - Tuesday, 22 June 2010, 09:05 GMT
test_disk passes but playback is buggy especially for lossless

Loading...