This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#8663 - Data corruption on usb write on sansa
Attached to Project:
Rockbox
Opened by Frank Gevaerts (fg) - Friday, 29 February 2008, 00:00 GMT+2
Last edited by Paul Louden (Llorean) - Tuesday, 01 July 2008, 23:02 GMT+2
Opened by Frank Gevaerts (fg) - Friday, 29 February 2008, 00:00 GMT+2
Last edited by Paul Louden (Llorean) - Tuesday, 01 July 2008, 23:02 GMT+2
|
DetailsThe udelay() in ata-c200_e200.c is not needed on my c250, and actually seems to cause write errors during USB connections. This patch makes it conditionnal on SANSA_E200.
Not committed yet because of the potential problems if I'm wrong |
This task depends upon
If someone would like to experiment with DMA read/writes, I've attached the
disassembly from the c200 bootloader.
I've removed that udelay, build with USE_ROCKBOX_USB and USB_HIGH_SPEED.
Then I've connected sansa to computer, copied xubuntu-7.10pl-desktop-i386.iso to microSD card, during copying dmesg said many high speed device reset and two "usb 2-3: device descriptor read/all, error -110" between device resets.
I've put card in card reader then, diffed file against the original - it was exactly the same!
Then, I've started sansa, put microsd card into it, copied file from card to internet memory on sansa (in rockbox), rebooted sansa into OF, and diffed that file against the original one at my computer - it was eactly the same as well.
Further testing shows that on my c250 it doesn't seem to happen (or happens much less) on the internal flash. Only the micro-sd slot seems to be affected.
I would like to change the register definitions accordingly to the documentation of these chips to make further work a bit easier.
They would read like MMC_STRPCL, MMC_STAT, MMC_CLKRT and so on.
Does anybody mind?
Most of them obviously make sense, some unused registers have been identified by probing their default values with e200tool.
The rest is still guessing, especially the CMDAT and I_MASK bits.
It would be nice, if someone would consider to commit this or comment on what to change.
It shouldn't harm anything, as there are no functional changes yet.
The asm code will only work with an aligned buffer. With an unaligned buffer, the old slow routine is used along with an additional udelay to prevent the bug (that's odd, yes). But do we actually ever have an unaligned buffer with ata_write_sectors?
I did very intense testing and it turned out to be absolutely reliable with my e260 and a Sandisk 4GB SDHC card for several weeks now. Write speed is around 3.5 to 4 MB/s on the SD card, the internal flash is slower.
Though, I'm quite uncertain on how it will behave with different players or SD cards because of this fragile timing. Maybe even the speed of the SD card might have its impact (I only own one card). The speed of the internal flash might differ between devices as well. So it will still need a lot of testing. Backups!
1/ compile rockbox for clean svn trunk
2/ install on player using original firmware for transfer
3/ disconnect player
4/ connect using rockbox and copy 550 files (mostly music tracks)
5/ disconnect player
6/ connect using rockbox and run md5sum -c reference.md5 to check if files were ok or not.
Just to make sure that the microsd errors weren't due to a crappy partition, I reformated the disk (with udelay removed) and ran the test (with udelay included) again.
svn-trunk.17708.internal.check
550 OK
svn-trunk.17708.microsd.check
481 FAILED
27 FAILED open or read
42 OK
svn-trunk.17708.microsd.validation.check
465 FAILED
7 FAILED open or read
78 OK
svn-trunk.17708.no-udelay-in-sd-ata.internal.check
460 OK
svn-trunk.17708.no-udelay-in-sd-ata.microsd.check
550 OK
As you can no doubt see, removing the udelay fixes writes to the microsd card.
Antoine, I assume you are referring to Franks initial patch. For me, that udelay was definitely required on my e260. But I always used high speed USB for my tests.
(I already was able to copy files without the udelay over high speed, but I haven't had time to do the md5sums yet. What kind of issues were you getting? Are/were they easily reproducible?)
- increased the udelay from 2 to 3 (Apparently, I was too optimistic)
- cleaned up the asm code
It's quite a bit slower but should be more reliable.
From computer to internal flash I transferred 257 files totaling 1.2 GB. Only one file was corrupted.
From SD to internal flash I transferred 234 files totaling 1.6 GB. Only three were corrupted.
Almost there. Even with the additional delays, high speed transfer is easily 3 times faster than full speed.
I just tried to do the tests with the OF. Although I verified that there was no corruption, my computer lost connection several times (KDE automatically unmounted and remounted) which caused me to restart my verification several times. I did not have this problem with rockbox. I'm curious if other people can reproduce my problems with your latest patch.
Without the patch, running hi-speed I get corruption on 203 of 253 files. Unfortunately, I left my SD card at work, so I can't test it.
Would have been interesting, which kind of corruption your file had.
Jonathan, thanks again. Did you retry the database rebuild? And can you somehow confirm that the remaining corruptions are in fact from the 'two bytes inserted' bug? In my tests it either occurred quite frequently, or it was completely gone. It never happened that sporadically to me. I don't know how to check that with Linux though. Hmm, if the file sizes would be different, then it's definitely not this bug. If they're the same, it probably is.
As I don't have any corruptions at all anymore, I don't know how to improve this any further.
- #ifdef only for e200 as no other targets have been tested yet.
- Not using sd_poll_status() inside the loop because yielding may disturb the timing.
This will hopefully fix the last remaining file corruptions.
- The new code will be used only in USB mode (usb_exclusive_ata), the current code remains completely unchanged.
This way
- We don't have to care for the buffer alignment as the USB buffer will be aligned always.
- Testing will be much easier because only USB mode has to be tested.
- It will not break anything when not in USB mode.
Note that if there was any non-USB related corruption, this will still be there
(I didn't apply any of the patches from this task, to maximise my chances of triggering the corruption)
Seems to work pretty well. The write speeds seems slower but the read speeds are comparable. On write, I'm getting just over 2 MB/s.
Thanks Frank.
Jonathan, I wonder why our test results are so different. Just to make sure, did you remove my patch before testing r17997?
I just double checked with current SVN and a fresh formatted SD card, but the corruption is still there. Then I did a comparison without SERIALIZE_WRITES and found no significant difference. Within a 10MB test file I had the 2 bytes inserted up to 10 times with SERIALIZE_WRITES and up to 8 times without (5 tries on each).
- copied a directory of 415 MB (95 files) within rockbox from internal flash to sd in 2:24 min
- bitwise compared all files via USB - no error
- deleted the directory on sd
- raised the boost counter in the Debug/CPU frequency screen
- copied same files again in 2:21 min
- bitwise compared all files via USB - no error again
- copied a 12 MB file from PC to sd via USB and got 2 bytes inserted at 15 random positions
Since the measured times don't differ much, I believe the CPU is boosted during a file copy as well. So that's probably not the reason.
Another difference between internal copying and USB transfers might be the buffer size. I'll do some tests on it ASAP.
So that's probably the significant difference which makes the bug appear with USB only. But what to do about it? Buffer in IRAM?
Oops, did I say that? ;-)
SERIALIZE_WRITES enabled
SD card: 2.5 MB/s, no corruption
internal flash: 2.1 MB/s, no corruption
SERIALIZE_WRITES disabled
SD card: 2.8 MB/s, no corruption
internal flash: 2.3 MB/s, no corruption
So I think, SERIALIZE_WRITES can be reverted. If the existing code in ata_write_sectors() is considered as proven to work, this patch should be safe. For gaining more speed (up to 3.6 MB/s on SD), my above patch might be considered. But that would surely need some public testing phase, as it introduces new time critical code. Basically, both patches do the same thing: speeding up the FIFO feeding by either using cache or ASM. I wonder what if both patches would be combined...
SERIALIZE_WRITES disabled
SD card: 1.76 MB/s, 5/506 files corrupted
SERIALIZE_WRITES enabled
SD card: 1.57 MB/s, 17/506 files corrupted
What a weird bug...
Did you also check the filesystem for errors between tests, or did you reformat? I'd like to rule out pre-existing errors as a cause for the higher number of errors in the SERIALIZE_WRITES case.
Just to be sure : is this the same sd card that you used previously when you found no errors? Also, how big was your test then?
Steve, any tool for testing USB memory sticks will do. I often used this one: ftp://ftp.heise.de/pub/ct/ctsi/h2testw_1.4.zip
Start testing 10 MB, if that works 100 MB, then the whole card.
Panasonic has a utility for properly formatting SD cards here: http://panasonic.jp/support/global/cs/sd/download/sd_formatter.html See if this makes any difference.
when I have my usb printer connected then there's corruption when copying files to internal flash, but if I unplug the printer all data is ok.
(I've done 4 tests using 5.3gb of data - two with printer connected (different files were corrupted), two without (all data matched))
I used the program that Martin recommended for testing USB sticks and I get far fewer errors. Using SERIALIZE_WRITES and the caching patch, I twice filled my 4GB SDHC with no detected error. Once it detected about 800K of sectors that were wrong.
Before, I tested by copying my music library over to the card, and then comparing each file bit-by-bit. This is fairly easy under the linux command prompt. I just repeated this test for SERIALIZE_WRITES with caching just to be sure. 10 of 509 music files were corrupted. No such corruption occurs with just SERIALIZE_WRITES.
I have no idea why many files would cause a problem, but just a few wouldn't.
Regarding speed: my write rates are much lower than Martin's. This seems to be independent of whether I access the SD or built-in flash. It is much slower than the OF.
OF - 4.81 MB/s
SERIALIZE_WRITES - 1.52 MB/s
SERIALIZE_WRITES w / caching - 1.67 MB/s
caching w/o SERIALIZE_WRITES - 1.90 MB/s
Yesterday I had a corrupted file where a range of sectors hasn't been written at all. It contained the contents of the file which was stored previously at that position. The screenshot desowin did at http://www.rockbox.org/irc/log-20080822#22:25:13 looks pretty much the same. I wasn't able to reproduce this yet. It seems to happen extremely rarely, at least with my hardware.
Jonathan, do you have the ability to prove what kind of corruption you're still experiencing?
Steve, for high speed USB you have to define USE_HIGH_SPEED as well.
I wanted to chkdsk it, but when I connected sansa to my laptop, it just froze whole Vista. (it unfreezed after I unplugged sansa)
But anyway, as we see, it's hardly possible to find a workaround that will work for everyone. So we really have to find the underlying problem, instead of trying to cure the symptoms. Maybe I already found out something, but yet it raises another problem. Investigating further...
I already compared this controller to various SoC. The closest match is probably the PXA27x series plus a bit of PXA255 (search Google for PXA27x developer's manual). My patch at http://www.rockbox.org/tracker/task/8663#comment23382 was about renaming the register and bit definitions accordingly to this manual. It's out of sync now, but I'd resync it, if it would have a chance for committing. Actually, there is too much work in it to be dumped silently.