Rockbox

Tasklist

FS#12857 - Data abort error on iPod Mini 2G

Attached to Project: Rockbox
Opened by Tiago Medeiros (madcat1990) - Sunday, 28 April 2013, 16:07 GMT
Last edited by Dominik Riebeling (bluebrother) - Monday, 17 June 2013, 19:21 GMT
Task Type Bugs
Category Bootloader
Status Closed
Assigned To No-one
Operating System iPod Mini
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Dev Version : 08199cd

Every time I boot into rockbox I get a data abort error :

Data abort at 0005020C
PC : 0005020C sp: 4000B620
bt end

I have restarted in emergency HDD mode, and ran this command in windows :

chkdsk /f /x /r d:

(Being D the letter for the drive in Windows)

Returned no errors.

Rebooted the device, still the same thing.

Re-installed Rockbox (After formatting) Still same thing.

Installed 3.13 just to be safe, and it ran, no problem.

I'm really sorry to bother you guys with this! :(
This task depends upon

Closed by  Dominik Riebeling (bluebrother)
Monday, 17 June 2013, 19:21 GMT
Reason for closing:  Fixed
Additional comments about closing:  Issue seems to be fixed. If it appears again please open a new task.
Comment by Frank Gevaerts (fg) - Sunday, 28 April 2013, 16:41 GMT
This looks like what's discussed at http://www.rockbox.org/irc/log-20130427#00:12:03
Comment by Dominik Riebeling (bluebrother) - Sunday, 28 April 2013, 17:49 GMT
Yes. I've tried reverting 95e23de on current HEAD (370ed6d) and the crash goes away, so this is related.
Comment by Dominik Riebeling (bluebrother) - Sunday, 28 April 2013, 18:05 GMT
Also note, to reproduce the crash you need to shut down the player -- rolo'ing after replacing rockbox.ipod works fine, shutting down and starting it again make the crash show up.

../utils/analysis/find_addr.pl 0x0005020C
build-ipodmini2g/firmware/libfirmware.a(disk.o) -> disk_partinfo

4: e59f3004 ldr r3, [pc, #4] ; 10 <disk_partinfo+0x10>
8: e0830100 add r0, r3, r0, lsl #2
c: e12fff1e bx lr
10: 00000000 .word 0x00000000
Comment by Dominik Riebeling (bluebrother) - Sunday, 28 April 2013, 19:10 GMT
After looking closer at the linker calls I noticed that starting with 95e23de the final link includes --Wl,--gc-sections while the previous revision 8829e90 doesn't. Relinking manually without -Wl,--gc-sections produces a rockbox.ipod that boots fine for me.

I've pushed 850491a which should fix the issue. Please check and report back.
Comment by Michael Sevakis (MikeS) - Sunday, 28 April 2013, 22:52 GMT
I build 08199cd and disk_partinfo is located at 0x000501f4, not (0x0005020c). Even so, I don't see how this function could crash anything except when returning (bad lr). It only calculates an offset from a pointer to "part" in the constant pool (which is correct in the .elf).
Comment by Michael Sevakis (MikeS) - Sunday, 28 April 2013, 23:01 GMT
In the main binary disk_partinfo is only called by the debug screen anyway.
Comment by Michael Sevakis (MikeS) - Sunday, 28 April 2013, 23:19 GMT
Need clarification: roloing 08199cd works fine?

ETA regarding an IRC comment: I must point out that the subdir makefiles don't add --gc-sections, they merely request it by setting CORE_GCSECTIONS ;= yes. root.make is the final determiner if those flags are included in CORE_LDOPTS.
Comment by Dominik Riebeling (bluebrother) - Monday, 29 April 2013, 06:42 GMT
Yes, 08199cd works fine if rolo'ed, but once the player has been shut down it crashes on startup.
Comment by Michael Sevakis (MikeS) - Monday, 29 April 2013, 07:46 GMT
Wow, I think a new level of weird has been reached. My first thought is that it doesn't sound like something with the binary beyond the initialization. If things were out of place, it would crash no matter what.

BTW, how heavily was it checked? Will it run and play normally under those circumstances?
Comment by Tiago Medeiros (madcat1990) - Monday, 29 April 2013, 14:54 GMT
I think the newest commit fixes this.

I restarted without ROLOing, and it worked fine.

Should I revert to 3.13 And re-install the newest commit and ROLO it?
Comment by Tiago Medeiros (madcat1990) - Monday, 29 April 2013, 15:14 GMT
Scratch that. Just had it again while copying a file :S
Comment by Dominik Riebeling (bluebrother) - Monday, 29 April 2013, 19:02 GMT
Can you please be more specific with that "had it again"? The original problem described in this task is Rockbox crashing immediately on startup. 850491a reverts the (presumable) cause for this behaviour (at the cost of increasing the size of Rockbox). If you're copying a file and Rockbox crashes this is likely to be something different. Furthermore, in the "had it again" case, which version of Rockbox are you running? How did you start that version?
Comment by Tiago Medeiros (madcat1990) - Monday, 29 April 2013, 20:32 GMT
Oh, sorry about that, let me go into detail:

I Reverted back to 3.13 Booted nicely and installed 850491a

after plugging it back to USB mode, Windows reported a problem with the drive

ran this in CMD (As admin):

chkdsk /f /x /r d:

No problems reported.

So I decided to put my new music on.

While copying, the dialog stopped responding and I look at the device, everything seems fine.

So I restart my computer, and before POST, I unplug the device, and it gives me that.

Restarting the device still does it..
Comment by Dominik Riebeling (bluebrother) - Monday, 29 April 2013, 21:08 GMT
I've tried a word-wise diff of the map files produces with and without --gc-sections. I'm not sure if I'm understanding the map format correctly but the following lines look suspicious to me. Shouldn't vectorsend be the same?

.vectors 0x0000000000000000 0x40 /home/dom/projects/rockbox-gerrit/build-ipodmini2g/firmware/target/arm/pp/crt0-pp.o
0x0000000000000020 vectors
0x0000000000000040 _vectorsend = .
[-0x000000000008ff00-] {+0x000000000008fcd0+} _vectorscopy = LOADADDR (.vectors)
[-0x000000000008ff00-] {+0x000000000008fcd0+} _noloaddram = LOADADDR (.vectors)

.bss [-0x000000000008ff00-] {+0x000000000008fcd0+} 0x4c778
[-0x000000000008ff00-] {+0x000000000008fcd0+} _edata = .
Comment by Dominik Riebeling (bluebrother) - Monday, 29 April 2013, 21:17 GMT
Ok, so I just got the crash again with 850491a immediately after a USB disconnect, and after resetting the Ipod the same still occured. So that crash still occurs but not on startup of the player. I was assuming that --gc-section was the main culprit, but as it looks like now we have a different root cause (and --gc-sections made that show up earlier).
Comment by Michael Sevakis (MikeS) - Monday, 29 April 2013, 22:35 GMT
Yeah, your description does sound like it could be --gc-sections at the root of it. Could the cache code be an issue? Did you try that cache patch (that I can't recall atm which it is). Pehaps try that with 08199cd.

Vectors should be 64 bytes (8 ldr's and 8 addresses), so that looks ok.
Comment by Michael Sevakis (MikeS) - Monday, 29 April 2013, 22:42 GMT
Nevermind, I see that patch has been pushed. Hmm.
Comment by Michael Sevakis (MikeS) - Wednesday, 01 May 2013, 00:44 GMT
I'm not sure it's proper to remove --gc-sections for every build just because one player has more serious bugs that --gc-sections only makes more obvious. I have quite a variety of hardware and haven't encountered any issues, from Coldfire, to Portal Player, to AMS running thumb code. I'd be happy to add an exclusion for a particular model, with a note about the reason, until the issue is resolved.
Comment by Michael Sevakis (MikeS) - Wednesday, 01 May 2013, 00:45 GMT
And...I noticed that madcat1990 came by IRC stating there was a crash at the same address with what I assume is a recent build.
Comment by Tiago Medeiros (madcat1990) - Wednesday, 01 May 2013, 00:47 GMT
That I did. Need me to test a newer build? or a beta build?
Comment by Tiago Medeiros (madcat1990) - Wednesday, 01 May 2013, 00:51 GMT
I'm actually on IRC right now too!
Comment by Michael Sevakis (MikeS) - Wednesday, 01 May 2013, 01:04 GMT
Nothing really changed anywhere. To me it looks to be in the hands of the owner of said device. I can't replicate on any PP device, but then I haven't tested all of them recently either, because too many and all non-iPod except the 3g which is PP5002.
Comment by Dominik Riebeling (bluebrother) - Saturday, 04 May 2013, 20:44 GMT
After giving it a closer look I figured 95e23de changed the order of -T and -Wl,--gc-sections in the linker call. Changing that back fixed things for me (tested on a CF-modded and a HDD mini2g). I've pushed that change (736c378). Please check a current development version if the problem still occurs (I've checked 736c378 also against a nano2g and e200, both worked fine).

Interestingly I got a "No partition found" error when rolo-ing the new version. Rebooting Rockbox fixed this, not sure if there is still an issue hidden.
Comment by Tiago Medeiros (madcat1990) - Saturday, 04 May 2013, 22:57 GMT
I'll test this new commit of yours
Comment by Michael Sevakis (MikeS) - Sunday, 05 May 2013, 04:00 GMT
Hrm...well, good thing that was caught. :O
Comment by Tiago Medeiros (madcat1990) - Sunday, 05 May 2013, 05:27 GMT
Don't wave your flags yet gents, I got me a new one :

Data abort at 00050204
pc : 00050204 sp : 4000B5A8
bt end

with build 3fd25dc
Comment by Tiago Medeiros (madcat1990) - Sunday, 05 May 2013, 05:34 GMT
Previous comment was from a ROLO

Force resetting the device apparently fixes this issue.

Don't know what happened though =/
Comment by Dominik Riebeling (bluebrother) - Wednesday, 15 May 2013, 20:27 GMT
I've been trying this a couple of times and have an interesting behaviour:

- when updating on a mini2g with HDD everything works, and rolo-ing the new version works as expected.
- when updating on a mini2g with CF mod the new Rockbox version works fine but rolo-ing it crashes.

So there still is a problem left.
Comment by Boris Gjenero (dreamlayers) - Thursday, 16 May 2013, 13:50 GMT
Comparing the 91b850e rockbox.map files, I don't see anything discarded which could be causing a problem like this. I suspect that --gc-sections is merely a trigger for another bug, maybe reading from uninitialized memory.

(I did find one potential problem. There is a data_abort_handler in both firmware/target/arm/pp/crt0-pp.S and lib/unwarminder/safe_read.S. Currently, the one in crt0-pp.S is used, and I think the one in safe_read.S should probably be used instead. This can't cause a data abort during normal operation. It only matters during a backtrace. I've contacted Amaury Pouly, who created safe_read.S.)
Comment by Michael Sevakis (MikeS) - Thursday, 16 May 2013, 18:10 GMT
@Boris: I noticed that discard too and was wondering why any of those are still in C. Amaury has some 'splainin' to do. :)
Comment by Dominik Riebeling (bluebrother) - Thursday, 13 June 2013, 19:54 GMT
I cannot reproduce the crash on both mini2g devices anymore. I haven't bisected this but it appears to me that f6e179b did fix it. From my point of view this issue can be closed as fixed. Can anyone (dis)approve this?
Comment by Tiago Medeiros (madcat1990) - Thursday, 13 June 2013, 20:10 GMT
@Bluebrother : Neither can I. AFAIK This bug has been sprayed.
Comment by Boris Gjenero (dreamlayers) - Thursday, 13 June 2013, 21:43 GMT
I cannot imagine how "f6e179b G#475: Remove data_abort_handler from ARM crt0 files" could have fixed this. It only makes a difference if a backtrace is already happening and the backtrace causes a data abort. After f6e179b, such a situation could lead to a different error, showing the error which triggered the backtrace instead of a data abort that happened during a backtrace. It couldn't have prevented such an error from happening in the first place.

Loading...