Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Bugs
  • Category Bootloader
  • Assigned To No-one
  • Operating System iPod Mini
  • Severity Low
  • Priority Very Low
  • Reported Version Daily build (which?)
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by madcat1990 - 2013-04-28
Last edited by bluebrother - 2013-06-17

FS#12857 - Data abort error on iPod Mini 2G

Dev Version : 08199cd

Every time I boot into rockbox I get a data abort error :

Data abort at 0005020C
PC : 0005020C sp: 4000B620
bt end

I have restarted in emergency HDD mode, and ran this command in windows :

chkdsk /f /x /r d:

(Being D the letter for the drive in Windows)

Returned no errors.

Rebooted the device, still the same thing.

Re-installed Rockbox (After formatting) Still same thing.

Installed 3.13 just to be safe, and it ran, no problem.

I’m really sorry to bother you guys with this! :(

Closed by  bluebrother
2013-06-17 19:21
Reason for closing:  Fixed
Additional comments about closing:  

Issue seems to be fixed. If it appears again please open a new task.

Admin
fg commented on 2013-04-28 16:41

This looks like what’s discussed at http://www.rockbox.org/irc/log-20130427#00:12:03

Yes. I’ve tried reverting 95e23de on current HEAD (370ed6d) and the crash goes away, so this is related.

Also note, to reproduce the crash you need to shut down the player – rolo’ing after replacing rockbox.ipod works fine, shutting down and starting it again make the crash show up.

../utils/analysis/find_addr.pl 0x0005020C
build-ipodmini2g/firmware/libfirmware.a(disk.o) → disk_partinfo

  4:	e59f3004 	ldr	r3, [pc, #4]	; 10 <disk_partinfo+0x10>
  8:	e0830100 	add	r0, r3, r0, lsl #2
  c:	e12fff1e 	bx	lr
 10:	00000000 	.word	0x00000000

After looking closer at the linker calls I noticed that starting with 95e23de the final link includes –Wl,–gc-sections while the previous revision 8829e90 doesn’t. Relinking manually without -Wl,–gc-sections produces a rockbox.ipod that boots fine for me.

I’ve pushed 850491a which should fix the issue. Please check and report back.

MikeS commented on 2013-04-28 22:52

I build 08199cd and disk_partinfo is located at 0x000501f4, not (0x0005020c). Even so, I don’t see how this function could crash anything except when returning (bad lr). It only calculates an offset from a pointer to “part” in the constant pool (which is correct in the .elf).

MikeS commented on 2013-04-28 23:01

In the main binary disk_partinfo is only called by the debug screen anyway.

MikeS commented on 2013-04-28 23:19

Need clarification: roloing 08199cd works fine?

ETA regarding an IRC comment: I must point out that the subdir makefiles don’t add –gc-sections, they merely request it by setting CORE_GCSECTIONS ;= yes. root.make is the final determiner if those flags are included in CORE_LDOPTS.

Yes, 08199cd works fine if rolo’ed, but once the player has been shut down it crashes on startup.

MikeS commented on 2013-04-29 07:46

Wow, I think a new level of weird has been reached. My first thought is that it doesn’t sound like something with the binary beyond the initialization. If things were out of place, it would crash no matter what.

BTW, how heavily was it checked? Will it run and play normally under those circumstances?

I think the newest commit fixes this.

I restarted without ROLOing, and it worked fine.

Should I revert to 3.13 And re-install the newest commit and ROLO it?

Scratch that. Just had it again while copying a file :S

Can you please be more specific with that “had it again”? The original problem described in this task is Rockbox crashing immediately on startup. 850491a reverts the (presumable) cause for this behaviour (at the cost of increasing the size of Rockbox). If you’re copying a file and Rockbox crashes this is likely to be something different. Furthermore, in the “had it again” case, which version of Rockbox are you running? How did you start that version?

Oh, sorry about that, let me go into detail:

I Reverted back to 3.13 Booted nicely and installed 850491a

after plugging it back to USB mode, Windows reported a problem with the drive

ran this in CMD (As admin):

chkdsk /f /x /r d:

No problems reported.

So I decided to put my new music on.

While copying, the dialog stopped responding and I look at the device, everything seems fine.

So I restart my computer, and before POST, I unplug the device, and it gives me that.

Restarting the device still does it..

I’ve tried a word-wise diff of the map files produces with and without –gc-sections. I’m not sure if I’m understanding the map format correctly but the following lines look suspicious to me. Shouldn’t vectorsend be the same?

.vectors 0×0000000000000000 0×40 /home/dom/projects/rockbox-gerrit/build-ipodmini2g/firmware/target/arm/pp/crt0-pp.o

              0x0000000000000020                vectors
              0x0000000000000040                _vectorsend = .
              [-0x000000000008ff00-] {+0x000000000008fcd0+}                _vectorscopy = LOADADDR (.vectors)
              [-0x000000000008ff00-] {+0x000000000008fcd0+}                _noloaddram = LOADADDR (.vectors)

.bss [-0x000000000008ff00-] {+0x000000000008fcd0+} 0x4c778

              [-0x000000000008ff00-] {+0x000000000008fcd0+}                _edata = .

Ok, so I just got the crash again with 850491a immediately after a USB disconnect, and after resetting the Ipod the same still occured. So that crash still occurs but not on startup of the player. I was assuming that –gc-section was the main culprit, but as it looks like now we have a different root cause (and –gc-sections made that show up earlier).

MikeS commented on 2013-04-29 22:35

Yeah, your description does sound like it could be –gc-sections at the root of it. Could the cache code be an issue? Did you try that cache patch (that I can’t recall atm which it is). Pehaps try that with 08199cd.

Vectors should be 64 bytes (8 ldr’s and 8 addresses), so that looks ok.

MikeS commented on 2013-04-29 22:42

Nevermind, I see that patch has been pushed. Hmm.

MikeS commented on 2013-05-01 00:44

I’m not sure it’s proper to remove –gc-sections for every build just because one player has more serious bugs that –gc-sections only makes more obvious. I have quite a variety of hardware and haven’t encountered any issues, from Coldfire, to Portal Player, to AMS running thumb code. I’d be happy to add an exclusion for a particular model, with a note about the reason, until the issue is resolved.

MikeS commented on 2013-05-01 00:45

And…I noticed that madcat1990 came by IRC stating there was a crash at the same address with what I assume is a recent build.

That I did. Need me to test a newer build? or a beta build?

I’m actually on IRC right now too!

MikeS commented on 2013-05-01 01:04

Nothing really changed anywhere. To me it looks to be in the hands of the owner of said device. I can’t replicate on any PP device, but then I haven’t tested all of them recently either, because too many and all non-iPod except the 3g which is PP5002.

After giving it a closer look I figured 95e23de changed the order of -T and -Wl,–gc-sections in the linker call. Changing that back fixed things for me (tested on a CF-modded and a HDD mini2g). I’ve pushed that change (736c378). Please check a current development version if the problem still occurs (I’ve checked 736c378 also against a nano2g and e200, both worked fine).

Interestingly I got a “No partition found” error when rolo-ing the new version. Rebooting Rockbox fixed this, not sure if there is still an issue hidden.

I’ll test this new commit of yours

MikeS commented on 2013-05-05 04:00

Hrm…well, good thing that was caught. :O

Don’t wave your flags yet gents, I got me a new one :

Data abort at 00050204
pc : 00050204 sp : 4000B5A8
bt end

with build 3fd25dc

Previous comment was from a ROLO

Force resetting the device apparently fixes this issue.

Don’t know what happened though =/

I’ve been trying this a couple of times and have an interesting behaviour:

- when updating on a mini2g with HDD everything works, and rolo-ing the new version works as expected.
- when updating on a mini2g with CF mod the new Rockbox version works fine but rolo-ing it crashes.

So there still is a problem left.

Comparing the 91b850e rockbox.map files, I don’t see anything discarded which could be causing a problem like this. I suspect that –gc-sections is merely a trigger for another bug, maybe reading from uninitialized memory.

(I did find one potential problem. There is a data_abort_handler in both firmware/target/arm/pp/crt0-pp.S and lib/unwarminder/safe_read.S. Currently, the one in crt0-pp.S is used, and I think the one in safe_read.S should probably be used instead. This can’t cause a data abort during normal operation. It only matters during a backtrace. I’ve contacted Amaury Pouly, who created safe_read.S.)

MikeS commented on 2013-05-16 18:10

@Boris: I noticed that discard too and was wondering why any of those are still in C. Amaury has some ‘splainin’ to do. :)

I cannot reproduce the crash on both mini2g devices anymore. I haven’t bisected this but it appears to me that f6e179b did fix it. From my point of view this issue can be closed as fixed. Can anyone (dis)approve this?

@Bluebrother : Neither can I. AFAIK This bug has been sprayed.

I cannot imagine how “f6e179b G#475: Remove data_abort_handler from ARM crt0 files” could have fixed this. It only makes a difference if a backtrace is already happening and the backtrace causes a data abort. After f6e179b, such a situation could lead to a different error, showing the error which triggered the backtrace instead of a data abort that happened during a backtrace. It couldn’t have prevented such an error from happening in the first place.

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing