Rockbox

This is the bug/patch tracker for Rockbox. Click here for more information.

Quick links: Bugs · Patches · Rockbox frontpage

Tasklist

FS#8663 - Data corruption on usb write on sansa

Attached to Project: Rockbox
Opened by Frank Gevaerts (fg) - Friday, 29 February 2008, 00:00 GMT+2
Last edited by Paul Louden (Llorean) - Tuesday, 01 July 2008, 23:02 GMT+2
Task Type Patches
Category Drivers
Status Unconfirmed
Assigned To No-one
Player type Sansa c200
Severity Low
Priority Normal
Reported Version current build
Due in Version Version 3.0
Due Date Undecided
Percent Complete 0%
Private No

Details

The udelay() in ata-c200_e200.c is not needed on my c250, and actually seems to cause write errors during USB connections. This patch makes it conditionnal on SANSA_E200.

Not committed yet because of the potential problems if I'm wrong
This task depends upon

Comment by Michael Sevakis (MikeS) - Friday, 29 February 2008, 03:46 GMT+2
We really need to find out why this creepy problem occurs at all. :\
Comment by Mark Arigo (lowlight) - Friday, 29 February 2008, 17:01 GMT+2
The OF uses the DMA engine for transfers so it's hard to say if it's needed.
If someone would like to experiment with DMA read/writes, I've attached the
disassembly from the c200 bootloader.
Comment by Tomasz Mon (desowin) - Monday, 03 March 2008, 17:55 GMT+2
Is that udelay really needed on e200?

I've removed that udelay, build with USE_ROCKBOX_USB and USB_HIGH_SPEED.
Then I've connected sansa to computer, copied xubuntu-7.10pl-desktop-i386.iso to microSD card, during copying dmesg said many high speed device reset and two "usb 2-3: device descriptor read/all, error -110" between device resets.
I've put card in card reader then, diffed file against the original - it was exactly the same!
Then, I've started sansa, put microsd card into it, copied file from card to internet memory on sansa (in rockbox), rebooted sansa into OF, and diffed that file against the original one at my computer - it was eactly the same as well.
Comment by Frank Gevaerts (fg) - Monday, 10 March 2008, 19:47 GMT+2
What happens is that "something" inserts two bytes at an offset of 0x220 into the write-block. The bytes are copied from 32 bytes earlier. Note that they are _inserted_, not overwritten. It only happens if there is a USB transfer still active (so it might be a USB related ISR that causes this, or USB related DMA).

Further testing shows that on my c250 it doesn't seem to happen (or happens much less) on the internal flash. Only the micro-sd slot seems to be affected.
Comment by Toni (ahellmann) - Friday, 28 March 2008, 18:05 GMT+2
I implemented rudimentary DMA write access on the SD driver for e260 and observed exactly the same effect (2 Bytes inserted / copied from 32 Bytes ahead at random places) at 30MHz and 24MHz, while writing data to disk (without USB dma involved!). I don't know the consequence of this.
Comment by Toni (ahellmann) - Friday, 28 March 2008, 21:19 GMT+2
After changing PLL_CONTROL from 0x8a220501 to 0x8a120501 at 30MHz the dma file transfer seems to be much more stable now on the sansa. At 24MHz I still get sporadically this '2 bytes inserted' effect. I didn't check other negative effects after changing the PLL_CONTROL setting.
Comment by Frank Gevaerts (fg) - Sunday, 20 April 2008, 15:53 GMT+2
The problem seems to go away if the write data is in IRAM
Comment by Martin Ritter (MartinR) - Monday, 21 April 2008, 13:46 GMT+2
This controller is very similar to the one in PXA255 / PXA27x series.
I would like to change the register definitions accordingly to the documentation of these chips to make further work a bit easier.
They would read like MMC_STRPCL, MMC_STAT, MMC_CLKRT and so on.
Does anybody mind?
Comment by Martin Ritter (MartinR) - Tuesday, 22 April 2008, 15:53 GMT+2
Here is my work on changing the register and bit definitions accordingly to the PXA27x Developers Manual.
Most of them obviously make sense, some unused registers have been identified by probing their default values with e200tool.
The rest is still guessing, especially the CMDAT and I_MASK bits.
It would be nice, if someone would consider to commit this or comment on what to change.
It shouldn't harm anything, as there are no functional changes yet.
Comment by Martin Ritter (MartinR) - Thursday, 12 June 2008, 10:38 GMT+2
This might be kind of a workaround for this bug. Surprisingly, it can be circumvented by filling the FIFO of the controller very fast, using a chunk of asm. Though, there has to be a short delay between detecting FIFO_EMPTY and filling the FIFO. Without this I got lockups and corruption. But if it's too long, the 'two bytes inserted' bug reappears. Probably this udelay has to be adjusted for c200 and GoGear. It has to be as short as possible.

The asm code will only work with an aligned buffer. With an unaligned buffer, the old slow routine is used along with an additional udelay to prevent the bug (that's odd, yes). But do we actually ever have an unaligned buffer with ata_write_sectors?

I did very intense testing and it turned out to be absolutely reliable with my e260 and a Sandisk 4GB SDHC card for several weeks now. Write speed is around 3.5 to 4 MB/s on the SD card, the internal flash is slower.

Though, I'm quite uncertain on how it will behave with different players or SD cards because of this fragile timing. Maybe even the speed of the SD card might have its impact (I only own one card). The speed of the internal flash might differ between devices as well. So it will still need a lot of testing. Backups!
Comment by Jonathan Backer (jrbil) - Thursday, 12 June 2008, 18:45 GMT+2
I just tried Martin's patch on my e280 with 2GB microSD card and found no corruption. Not extensive (about 1 GB) but seems to work.
Comment by Antoine Cellerier (dionoea) - Thursday, 12 June 2008, 19:03 GMT+2
Here's the result of my tests with and without the patch (udelay removal) on a Sansa e260 with a 4GB microSD HC card. All of these tests used full speed USB (high speed is a bit unreliable). Test protocol was simple:
1/ compile rockbox for clean svn trunk
2/ install on player using original firmware for transfer
3/ disconnect player
4/ connect using rockbox and copy 550 files (mostly music tracks)
5/ disconnect player
6/ connect using rockbox and run md5sum -c reference.md5 to check if files were ok or not.

Just to make sure that the microsd errors weren't due to a crappy partition, I reformated the disk (with udelay removed) and ran the test (with udelay included) again.

svn-trunk.17708.internal.check
550 OK
svn-trunk.17708.microsd.check
481 FAILED
27 FAILED open or read
42 OK
svn-trunk.17708.microsd.validation.check
465 FAILED
7 FAILED open or read
78 OK
svn-trunk.17708.no-udelay-in-sd-ata.internal.check
460 OK
svn-trunk.17708.no-udelay-in-sd-ata.microsd.check
550 OK

As you can no doubt see, removing the udelay fixes writes to the microsd card.
Comment by Jonathan Backer (jrbil) - Thursday, 12 June 2008, 21:18 GMT+2
Just with internal flash (removed SD), I formatted, copied 280 files and 1.2 GB. Did a diff and fsck, which found no corruption. Built the database and rebooted. Now I have corruption (*PANIC* Dir entry 6 in sector 3 is not free!).
Comment by Jonathan Backer (jrbil) - Thursday, 12 June 2008, 21:48 GMT+2
Just with internal flash (removed SD), I formatted, copied 280 files and 1.2 GB. Did a diff and fsck, which found no corruption. Built the database and rebooted. Now I have corruption (*PANIC* Dir entry 6 in sector 3 is not free!).
Comment by Martin Ritter (MartinR) - Friday, 13 June 2008, 09:44 GMT+2
Jonathan, thanks for testing. Can you please increase the udelay from 2 to 3 and repeat the database rebuild? Also try copying from internal flash to SD and back again.
Antoine, I assume you are referring to Franks initial patch. For me, that udelay was definitely required on my e260. But I always used high speed USB for my tests.
Comment by Antoine Cellerier (dionoea) - Friday, 13 June 2008, 12:53 GMT+2
Yes, I'm referring to the original patch. It works fine with Full Speed. I'll give it a try with high speed (I get lots of connection resets with high speed but it's still kind of usable) and post the results here. Anyway, since this seems to be the only reliable way to copy files without corruption on my microSD HC card (no support from OF), I'm using the udelay-less build for everyday stuff so I'll be able to see if it breaks at some point in the future.

(I already was able to copy files without the udelay over high speed, but I haven't had time to do the md5sums yet. What kind of issues were you getting? Are/were they easily reproducible?)
Comment by Martin Ritter (MartinR) - Friday, 13 June 2008, 14:46 GMT+2
Antoine, I just checked that again. Removing the initial udelay causes corruption on both the SD card and the internal flash for me. Especially on internal writes like building the database or when copying files within Rockbox. Just now Rockbox completely locked up on a file copy. USB transfers to the internal flash do apparently working though.
Comment by Martin Ritter (MartinR) - Friday, 13 June 2008, 15:06 GMT+2
New version of my patch:
- increased the udelay from 2 to 3 (Apparently, I was too optimistic)
- cleaned up the asm code
It's quite a bit slower but should be more reliable.
Comment by Michael Sevakis (MikeS) - Friday, 13 June 2008, 17:46 GMT+2
This wasn't a controlled check but perhaps worth noting: I had copied some movies over to my e260 for mpegplayer development with the retailos USB mode and one of them had some corruption in it. Copying over it again from the same source file fixed it. I can't say whether or not rockbox had some role in it but the file never had writes made to it by rockbox itself. It could just be yet another PP hardware bug.
Comment by Jonathan Backer (jrbil) - Friday, 13 June 2008, 20:20 GMT+2
Just tried the latest patch:

From computer to internal flash I transferred 257 files totaling 1.2 GB. Only one file was corrupted.
From SD to internal flash I transferred 234 files totaling 1.6 GB. Only three were corrupted.

Almost there. Even with the additional delays, high speed transfer is easily 3 times faster than full speed.
Comment by Jonathan Backer (jrbil) - Friday, 13 June 2008, 23:25 GMT+2
Upped the udelay to 4 and still get corruption in 1 of the 257 files when I retried the transfer from the computer to the internal flash. Without your patch 196 of the 257 files get corrupted.

I just tried to do the tests with the OF. Although I verified that there was no corruption, my computer lost connection several times (KDE automatically unmounted and remounted) which caused me to restart my verification several times. I did not have this problem with rockbox. I'm curious if other people can reproduce my problems with your latest patch.
Comment by Jonathan Backer (jrbil) - Saturday, 14 June 2008, 02:40 GMT+2
OK. Testing on my linux box at home. With your patch, I've copied over 750 files and 3.5 GB without any corruption whatsoever.

Without the patch, running hi-speed I get corruption on 203 of 253 files. Unfortunately, I left my SD card at work, so I can't test it.
Comment by Jonathan Backer (jrbil) - Saturday, 14 June 2008, 04:26 GMT+2
Copied approx 500 files (3.5 GB) and got one error. So the problem seems less frequent on my home machine, but still persists.
Comment by Martin Ritter (MartinR) - Monday, 16 June 2008, 10:20 GMT+2
Michael, I too believe that this must be some kind of hardware bug. Or the controller runs at an inappropriate clock rate or something. Maybe it would help if we had the interrupts working. But since the bug also occurs with DMA, it will probably do with interrupts as well.
Would have been interesting, which kind of corruption your file had.

Jonathan, thanks again. Did you retry the database rebuild? And can you somehow confirm that the remaining corruptions are in fact from the 'two bytes inserted' bug? In my tests it either occurred quite frequently, or it was completely gone. It never happened that sporadically to me. I don't know how to check that with Linux though. Hmm, if the file sizes would be different, then it's definitely not this bug. If they're the same, it probably is.
Comment by Linus Nielsen Feltzing (linusnielsen) - Monday, 16 June 2008, 11:14 GMT+2
Maybe it would be better if we set the CPU core speed to match the USB rate?
Comment by Martin Ritter (MartinR) - Wednesday, 18 June 2008, 17:25 GMT+2
Linus, this bug has not much to do with USB actually. If that udelay is slightly changed, it occurs with non USB writes as well. The fact that it is triggered by USB is probably just because the timing is a bit different then. The USB buffer resides in uncached memory, so the FIFO filling loop runs a bit slower than usual. But that's just my guess, I may be completely wrong.
Comment by Martin Ritter (MartinR) - Monday, 23 June 2008, 13:33 GMT+2
Final version of this patch (probably):
As I don't have any corruptions at all anymore, I don't know how to improve this any further.
- #ifdef only for e200 as no other targets have been tested yet.
- Not using sd_poll_status() inside the loop because yielding may disturb the timing.
This will hopefully fix the last remaining file corruptions.
- The new code will be used only in USB mode (usb_exclusive_ata), the current code remains completely unchanged.
This way
- We don't have to care for the buffer alignment as the USB buffer will be aligned always.
- Testing will be much easier because only USB mode has to be tested.
- It will not break anything when not in USB mode.
Comment by Jonathan Backer (jrbil) - Saturday, 28 June 2008, 04:06 GMT+2
I just did some testing on my e280, and the latest patch looks good. With USB transfer, I sent over 1000 files (5 GB) with no errors. I did a bunch of copying around (w/ rockbox not USB) to exercise the other code (1.4 GB), and found no errors there either. Looks solid to me. I haven't tried it with my micro SD card yet because it is at the office.
Comment by Toni (ahellmann) - Saturday, 28 June 2008, 15:27 GMT+2
I tried the latest patch here on e250 with external SD-card: No errors!
Comment by Jonathan Backer (jrbil) - Monday, 30 June 2008, 23:24 GMT+2
I was finally able to try the latest patch with my 2GB microSD card. Good news: the internal flash works flawlessly transferring large amounts of data by USB and directly with Rockbox (no USB) when the SD card is present. Bad news: reading and writing to the SD card causes corruption. Doesn't matter if the transfer is through the USB or Rockbox. To test Rockbox, I copied from known good internal flash to SD and back again and got corruption. Oh, and it's definitely a read _and_ write issue to SD: comparing with the internal flash (copy to SD and then back to internal using Rockbox) showed fewer errors then comparing with the SD (just copy from internal to SD). Moreover, the difference in detected corruption (just one file) disappeared when I tried that particular file again, indicating that it was a one-off read error.
Comment by Jonathan Backer (jrbil) - Tuesday, 01 July 2008, 05:48 GMT+2
Ok. I remembered from previous corruption issues that it was important to reformat. Once I reformatted the SD card, all of my corruption problems vanished. With the latest patch on my e280 with 2GB microSD card, the high speed USB is flawless. Please put it in SVN. Thanks.

Loading...