Rockbox

Tasklist

FS#6109 - Recording misses lots of samples (1-2 seconds!)

Attached to Project: Rockbox
Opened by Peter D'Hoye (petur) - Tuesday, 03 October 2006, 09:08 GMT
Last edited by Peter D'Hoye (petur) - Monday, 04 December 2006, 09:12 GMT
Task Type Bugs
Category Recording
Status Closed
Assigned To Michael Sevakis (MikeS)
Operating System All players
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

current CVS recording code is not stable. Missing samples (1-2 seconds) around the time of diskwrite, but not always. Maybe a timing issue/glitch?
example of 80 minutes recording:

missing samples at:
10:23 (delta 10:23 = 4 x buffer time)
15:35 (delta 5:12 = 2 x buffer time)
36:25 (delta 20:50 = 8 x buffer time)
52:01 (delta 15:36 = 6 x buffer time)
72:49 (delta 20:48 = 8 x buffer time)
This task depends upon

Closed by  Michael Sevakis (MikeS)
Friday, 16 February 2007, 10:43 GMT
Reason for closing:  Fixed
Additional comments about closing:  ostensibly, presumably and likley probably
Comment by Paul Louden (darkkone) - Wednesday, 04 October 2006, 18:46 GMT
Just had a thought: Could it be something relating to the new scheduling code?
Comment by Michael Sevakis (MikeS) - Friday, 27 October 2006, 10:30 GMT
This should no longer be a problem in the next recording update. Look for it Oct. 30th or 31st when I can do final validation on my yet-to-arrive H120.
Comment by Michael Sevakis (MikeS) - Tuesday, 07 November 2006, 19:07 GMT
I hope I can close this soon...hurry up! (:^B)



Comment by Peter D'Hoye (petur) - Monday, 04 December 2006, 09:13 GMT
bug re-openend...

Just encountered it again with a build based on CVS 2006-12-02 :(
Maybe my disk is fragmented too much causing saving to take more time here?
Comment by Peter D'Hoye (petur) - Monday, 04 December 2006, 10:15 GMT
just heard a new one: a 1 or 2 second part of the recording gets repeated and then it jumps on to where it should have been.
sample of the glitch: http://users.telenet.be/petur/glitch.wav (listen very good at 0:13 and following seconds)
Comment by Michael Sevakis (MikeS) - Monday, 04 December 2006, 18:54 GMT
Good god what's going on!?! I really wonder if it's the recording code especially after it being changed so much. I sat staring at it and cannot find an obvious problem though reports of glitches/crashes been H3x0 related. I haven't heard anything from H1x0 or X5 users. :-\\ The problem may very well lie with something else interfering (my guess atm but will get to the bottom). The wav may be helpful in seeing. I will also attempt letting recording run while I'm sleeping and see if it happens.
Comment by Peter D'Hoye (petur) - Monday, 04 December 2006, 19:39 GMT
in the wav file the first glitch is not visible in the waveform, the second glitch (where it resumes) is at 15s 910ms according to audacity

In total, I've had 3 glitches in 1:44 hour. The file had 2 fragments. Another (shorter) recording has no audible glitches on first hearing but 29 fragments. Go figure...
Comment by Michael Sevakis (MikeS) - Monday, 01 January 2007, 06:49 GMT
Happy new year!

I did some very important voice recording recently through line-in on the H120 and had no glitches in files > 1hr (MP3, mono, 160kbps). I know it's not as many trips around the ring buffer as a WAV will make but spinup and high watermark are still time based on encoded data, so hmmm. I'm not currently aware of any other person that's had a glitch problem. If you have any buffer overflows you'll get the warning flashing and it was tested (inadvertently) when recording was being added to iPods and H10 and it worked (caught a pcm buffer overflow when a line of code was removed accidentally). The buffer checks are part of a normal build.
Comment by Michael Sevakis (MikeS) - Monday, 01 January 2007, 07:00 GMT
I should add that the checks are a part of a normal build since adding PCMREC_PARANOID. Peter, if you could possibly test record a less critical event but under similar conditions that would be great and something will probably pop up for you regarding a codec buffer overflow I'm guessing. Using the PCMREC_PARANOID will give some more detailed info and checks for corruption of critical variables that would lead to skips as well but is a bit more CPU intensive.
Comment by Riley McNiff (rmcniff) - Tuesday, 30 January 2007, 03:01 GMT
I am having similar problems. I just recorded my first concert with my iRiver H120, and occasionally throughout the recording it simply "skips" samples that do not get recorded, sometimes enough to make it to the next word of a song, so maybe max of 1/2 second missing, but it happens enough throughout the recording that the audio is more or less ruined.

I was running in via optical S/PDIF. However, out of three sets that I've recorded I had no issues with the first, only the last two. I did change the level meters from linear to logrithmic between the first and secods sets and changed the min/max values for them. That's the only change, which seems unlikely to cause such an issue. That also caused the level meters to show up as a solid bar, which eventually began functioning normally after awhile.
Comment by Michael Sevakis (MikeS) - Tuesday, 30 January 2007, 06:17 GMT
Riley,

Did you get any warning flash on the screen on top of the time display? If you were using a build since PCMREC_PARANOID was added then you likely should have (see 3 msgs above). Your description is again what I would expect if something kept the drive from writing and the buffer fills up _and_ doing concert recording too...hmmm. At home on my H120 I've had no luck at duplicating any of this for recordings of any length or format. I know I'm sounding repetitive but have no other plausible theory or pointer in another direction. If you used a recent build and got no warning then I'll have to consider other possibilities.
Comment by Peter D'Hoye (petur) - Tuesday, 30 January 2007, 09:13 GMT
Riley, how full and fragmented is your disk?

jhMikeS, can't we just nudge the watermark levels a bit more to the safe side? I'd rather not wipe my disk everytime I go taping just to be sure it'll be glitchless. Remember, I've never had these issues before the new encoder framework, and my disk didn't fill up much more since that. So I still blame badly chosen bufferlimits.
Comment by Michael Sevakis (MikeS) - Tuesday, 30 January 2007, 12:21 GMT
Are you telling me that wiping the disk has been an effective defense? Odd that a even a fragmented drive couldn't write > 176400 bytes/sec. My x5 disk was always a bit > 1/2 full; the H120 is rather empty. I suppose you can change the 5s minimum watermark pcm_record.c by something larger than the maximum amount of missing time that is being experienced and try it out. Wasn't the old one like 10s?

Change Line 833: 5ull*HZ part could be 8ull*HZ to have a 10s minimum watermark. Usually the compensated watermark for me is around 7.5 seconds (5 + 2.5). The minimum drive compensation is 2s and max 10s. The thresholds are tighter on x5 with only 16MB ram (and 88.2kHz !!) and I don't know about others atm. I don't want to induce a perpetual flushing state on lower mem players.

I want to know if a warning 00000002 came up because that would comfirm my theory. I put that in there so tapers could report back and I'd know exactly what occurred. I'm quite sure it can't be pcm overflow (warning 000000001) since that would cause a skip forward to a section of the latest pcm data received and then back on track and then probably a repeat of that section later if another overflow didn't occurr. Afaik all is jump forward in time by chunk-multiple sizes hence dumping data since no room is left.

Are these players using (larger?) replacement drives? I'm Curious.
Comment by Michael Sevakis (MikeS) - Tuesday, 30 January 2007, 12:49 GMT
Also I'm now considering that thread boosting should occur almost immediately upon growth of encoded data when flushing and not at a flat 1s remaining space since that indicates loss of progress in flushing against the encoder and those file apis yield _a lot_ whereas the codecs yield relatively little. Yielding regulates thread priority as well and the pcm thread giving up all that time causes its priority to be effectively lowered during flushing whereas the codec thread stays relatively steady since it yields explicitly. It seems that more cpu intense encoding actually causes the encoder to flood the ouput buffer (wavpack 88kHz) and thus the regular need for extra boost with that format.
Comment by Riley McNiff (rmcniff) - Tuesday, 30 January 2007, 19:48 GMT
Ok, I did not get any warning messages, but when I record like that I don't pay attention to it much unless the clip light comes on and I need to lower levels or something like that. Does the message go away or does it stick around permanently? FYI, I was running the Jan 23rd 2007 build.

Here's the relevant information about the setup: My drive was practically empty (maybe 2 Gb of music on it), not fragmented at all. I was recording from SPDIF optical in to WAV format at 16bit/44.1kHz. The iRiver is barely used and completely stock, no RTC chip mods, no large capacity drives or batteries. It's never been opened to expose the guts (even though I really want to).

I will mention that it was having some other funny issues at the time. It had locked up about an hour before, causing me to have to reset it with a paper clip. Then it was having some really strange issues with the peak meter (appearing as solid full bars for five minutes, etc). I have since wiped rockbox and done a clean install from the Jan 29 build. No more issues.

My only other idea about what could cause this involves loosing the input signal. Could this possibly be an issue with the recorder temporarily loosing the optical SPDIF signal? That could happen if it were shaken enough (possibly by loud sound?) Was everyone else using a digital source when the issues happened?

I'll be experimenting a lot in the near future. I'll be recording another show in two weeks, so I'll try recording from line, radio, etc and using different codecs. I'd really hate to have this happen with WavePack, where I'd loose potentially more than with just samples missing in a WAVE file. Is there anything else I can do to try and help solve this? I've been looking a lot at the code, but am not familiar with it enough yet to discover much, and I'd love to know that it has been resolved before I loose chunks of another show.
Comment by Michael Sevakis (MikeS) - Wednesday, 31 January 2007, 05:16 GMT
The warning message stays till you start a new recording after pressing stop or leave the screen. It will stay through file splits and pauses.

Yes, S/PDIF errors could also cause the same thing. It will just drop the data that the CPU said was bad and not advance the pcm buffer so sure that's another possiblity. Peter was using a H300 series so no S/PDIF in his case. I don't put a warning for S/PDIF errors since they're common when a source device just begins playback and such and benign in those cases as well.

Dropped data either at the raw pcm end or even after encoding won't hurt a WavPack file at all (or any other format). The header for each WavPack chunk isn't finalized until writing it, each is independent of the others and the RIFF metadata will be based upon what has actually been written to disk just before closing the file. If writing to disk failed recording would have set an error.

The issues you were having could be connected to it as well. I can't see the peakmeters acting that way unless something is really goofed like settings not being reset when they should or a full reinstall not being done when it should have been.

That's the thing, every other problem Peter reported I was able to narrow down and fix rather quickly but this one doesn't seem to have a cause from a mistake that I can spot in the recording code itself. Could be buffer limits but why no trouble for me or others recording under calmer conditions? Loud sound/movement affecting the hd? Perhaps things require more time under those conditions and that's all. The older pcm record didn't have to share CPU time with the codec thread.

Unfortunately your description of what was going on doesn't help me narrow things to a single cause in your case at all.
Comment by Riley McNiff (rmcniff) - Wednesday, 31 January 2007, 17:52 GMT
Well, I have done a LOT of recording testing on a clean install. I recorded both from FM radio and S/PDIF to WAV, AIFF, and WavePack and cannot reproduce the issue. I even got rough with the recorder and the optical cable, rougher than it was at the concert, to try to create S/PDIF errors, but everything came out flawlessly during all tests, and no error messages appeared. That seems to rule out external S/PDIF errors. I was able to reproduce the peak meter issue once however, so I may open a bug for that.

At this point I really can't claim that it's a bug since I can't reproduce it. Like you said, even if there is a bug it could be related to something completely different. I had played with a LOT of setting prior to that recording session, and during my tests I tried to turn everything unnecessary off to keep things clean. I should do a lot more poking through the code to see what is going on.

As for the buffer limits, they do seem to be a little tight, not allowing a lot of room for error. Would it be hard to loosen that up a little? I guess from now on I'll just do a clean install from a tested build with everything turned off before I record something important, but it all leaves me a little unsettled.

What threads run in parallel with the recorder? Is it just the codec being used or are there others?
Comment by Michael Sevakis (MikeS) - Thursday, 01 February 2007, 04:41 GMT
Thanks for doing some checking and trying to reproduce things. It is elusive.

The buffer limits can be changed very easily and I may clean up the way they're defined so they can be different for different mem sizes if needed and I'll test out boosting upon losing progress to the codec thread rather than at 1s left in the buffer.

Recording involves chiefly: 1) codec thread encoding/copying audio data 2) main thread running gui/lcd updates/peakmeters/AGC 3) pcmrec thread (mostly sleeping till a disk flush happens but checking status every 200ms) 4) DMA transfer of pcm data (interrupt every ~46ms for 2048 samples @44.1k).
Comment by Michael Sevakis (MikeS) - Friday, 02 February 2007, 08:31 GMT
I'm wondering if sound vibration from a loud environment is worse than shaking the unit. ??

The limit for a typical HD spininup on an iRiver should now be about the same margin as before (11s-12s). I can't atm test the WavPack at 88kHz on x5 to test the priority boosting - it's guaranteed to be needed there. Will also up priority if 1 second of ground is lost rather than 1s left. I added more yielding to that to try and allow other things to run but would like to tune it. Did some misc. stuff.

Not sure if I should commit yet without testing the WavPack thing on x5.

Also, I have an idea to add a buffer screen similar to the one for the audio thread. Maybe a graphical representation would help visualization and tuning. Would have to be accessible somehow via the recording screen rather than the debug menu so that entering/leaving it doesn't interrupt recording.
Comment by Riley McNiff (rmcniff) - Friday, 02 February 2007, 18:52 GMT
The answer to the vibration question is almost certainly no. At the concert it was fairly well padded, and to lose the S/PDIF signal it would be more an issue with the cable than the A/D or the iRiver itself. I did everything imaginable to that cable and never lost the signal.

I like your ideas about the buffer screen. How about using the A/B button while recording to get there? I don't think it is used otherwise. If you'd like help testing the changes before you commit them I would volunteer my unit. I should probably retest recording again with the settings that I had set previously to make absolutely sure that I can't reproduce this. Maybe I can narrow it down to a particular setting that is causing this
Comment by Michael Sevakis (MikeS) - Friday, 02 February 2007, 19:19 GMT
Some devices are rather limited so something that also enables/disables access would probably be needed sometimes.

Eureka, I did have a breakthrough simulating a burdened flush and I didn't get an overflow in the codec buffer but rather the PCM buffer (even though the flush did complete properly and I might add, somewhat quickly). Indeed if the write pointer overshoots you will lose all 11 seconds of data in the pcm buffer and have a skip forward since the overshoot makes it logically empty. The priority boosting is not letting the codec thread run enough if a flush stays boosted for a long time. I cannot not rule out sound vibration as the reason the drive writes may slow down since they pass the shaking test.

The results of this checking were exactly of the sort that has been experienced. I found an additional skip in Peter's recording that was NOT on a chunk boundary and that can't happen if the codec buffer overflows.

The older non-codec recording would not be susceptible to this problem so quickly since only one thread was involved and the output buffer and the pcm buffer were the same buffer and it was all available memory. More generous buffer limits in the codec buffer probably won't be as helpful as hoped since the pcm buffer seems the one inclined to overflow. In short: 1) Increase the PCM buffer size 2) Better coordinate the two threads during priority boost (which is a mandatory aspect of this but maybe not on all players).

This is the best result I've had the most promising lead by far...though a bit couterintuitive.

You could try blasting music at it at concert levels while recording...or maybe even louder that that to try and enhance the effect.
Comment by Riley McNiff (rmcniff) - Friday, 02 February 2007, 21:58 GMT
Sounds like you might be onto something. I'm not missing 11 seconds of audio, though. Only split seconds here and there, and they seem fairly randomly placed. Sometimes it can go 10 minutes without one. Sometimes there is just one, but other times there may be something like 3 really close (within 3 or 4 seconds). Would it help if I sent some sound clips of where it is skipping?
Comment by Michael Sevakis (MikeS) - Saturday, 03 February 2007, 17:16 GMT
There could be other outputs depending upon whether or not the write position actually overtakes the read position. Some glitch at a non-2048 multiple can only come from the pcm buffer.

Yes, it would probably help. What would also help is for you to give me the exact sample index from sample 0 of your recording that each clip starts at. This tells me something since with wav and aiff encoder chunks are the same size as pcm chunks so always correspond 1:1. Milliseconds aren't close enough. I don't know if your audio software allows working at the sample level with sample indexes though. Peter just send me his whole recording in mono flac so I could look.
Comment by Michael Sevakis (MikeS) - Wednesday, 07 February 2007, 17:32 GMT
Hey, I'm looking for some help to really solve this problem once and for all and I don't really sense that anyone's dedicated to hashing it out hard core 'till it's resolved. I think unfortunately I'm the only one with eyes on the code in pcm_record.c as well and who knows, someone could spot something because I may be too close to it. If you would like me to post patches to vary the setup so it can be tested I'd be glad to but please actually try it out. :)

If it would be better, I can just revert it all back to a codec-less system but I think that's probably overkill and it probably really only needs some tweaking to resolve.
Comment by Michael Sevakis (MikeS) - Wednesday, 07 February 2007, 18:15 GMT
Here's two patches with buffer timing changes with the second having a double sized pcm buffer. Doubling the size might not be so good for 16MB devices, especially x5. For 32MB I think it can afford it.

EDIT: Delete invalid patches.
Comment by Michael Sevakis (MikeS) - Wednesday, 07 February 2007, 18:50 GMT
Use these instead since those two have a test loop used to strain flushing left by accident.
Comment by Michael Sevakis (MikeS) - Friday, 09 February 2007, 00:50 GMT
I have an idea which might solve this for good and that is to boost both the codec AND the pcmrec thread since priorities are relative. The codec thread is stable and cannot run faster than the incoming data rate so will yield a lot when it empties the pcm buffer but not be shut down by the high priority pcmrec thread. This should put the flushing in the position of only having to just get ahead of the incoming data so it can finish. Anything preventing flushing from getting ahead would then have to be catastropically bad indeed.
Comment by Michael Sevakis (MikeS) - Friday, 09 February 2007, 02:37 GMT
Yes! I think I got it. I ran 1/2 the buffer space and a 1,000,000 loop counter in the flushing loop and no skips. Originally it wouldn't even withstand 100,000. With the counter it would spend almost 45 seconds flushing while recording and not a glitch. Conditions would have to be pretty bad for it to take that long without all the stress code included.

BTW: Those patches above contain test code that cuts memory in half. :P I made extra care to remove all that stuff in this one. I just got a bit distracted and forgot about it. Sorry.
Comment by Michael Sevakis (MikeS) - Friday, 09 February 2007, 03:05 GMT
And Riley, anything to say? You were gonna work with me on this and seemed to bail out and ignore me. If people do that I'll just not do any further development and forget about actually caring about fixing things for them. I know I have a lot of ideas and things to say (which are often wrong) and can maybe be a bit overwhelming but that just pisses me off to no end.
Comment by Riley McNiff (rmcniff) - Friday, 09 February 2007, 04:37 GMT
Sorry, I'm still following. I've just been really busy, and had a funeral, etc. to attend, so I haven't had a lot of time to devote to this. Next week should be a lot more free, and I'm recording another concert this weekend, so that should be a good test. I'll get back into it heavy on Monday.

By the way, thanks so much for all the time and effort you have put into this, and don't think that it isn't appreciated, because it certainly is.
Comment by Michael Sevakis (MikeS) - Friday, 09 February 2007, 07:22 GMT
Funeral? I'm sorry to hear about that and am very understanding esp. with recent life troubles of my own (which tend to shorten my fuse alot). It's just been a long go mostly alone on this one with an extreme need for feedback from users "out in the field". So take care and hopefully the last patch can be the _last patch_ regarding this.

Also, I'm looking into the peakmeter issue and have thoughts about how to handle the settings to make any issue with db/% moot.
Comment by Peter D'Hoye (petur) - Friday, 09 February 2007, 08:28 GMT
jhMikeS, sorry to read about your frustration, I really appreciate the effort you're doing. And I really hope RealLife(tm) get a bit less in the way so I can resume working on RB. I'll test your latest patch tomorrow.
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 00:48 GMT
So, um...Petur...what's the verdict? Are we getting close to closing this issue? :)

I will probably commit the changes (along with some simplifications and misc tweaks) since it does make the code more robust against writing problems (10x +) but I know for sure I can max it out with a few more small ones.

Comment by Riley McNiff (rmcniff) - Tuesday, 13 February 2007, 18:49 GMT
First, how does one apply the patches and what would one apply them to? Not knowing, I just installed the latest build last weekend seeing changes that you had made. I assumed that the changes had already been committed.

Anyway, I recorded another concert this last weekend (stealth-taping through mic-in instead of SPDIF), and had no issues at all. The recording came out great. No matter what I do I cannot reproduce any sort of errors. At this point I would say why not just commit the changes? I can think of nothing else that I can do to help out any more and it sounds like you have made some good improvements. Lets just call it done and I'll continue watching for your error messages to pop up while recording and we can just reopen this if it ever happens again. What do you think?
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 19:07 GMT
No buffering changes in cvs yet.
There's no difference at all in the execuction of code when using S/PDIF after initial configuration.
I'm waiting for word from Petur since he was the original reporter before closing anything but he'll probably take it upon himself if he's reasonably satisfied and open it again if anything comes up.
Comment by Peter D'Hoye (petur) - Tuesday, 13 February 2007, 21:40 GMT
Mike,
commit the changes and close the report. I get the feeling you found the possible issue I had so I'm ok with closing the issue.
If I ever have more troubles I can still reopen or submit a new report ;)
Or better, debug myself.

Sorry for letting you on your own on this but I had no time to learn the changed code & debug it...
Comment by Michael Sevakis (MikeS) - Wednesday, 14 February 2007, 21:59 GMT
Saw that you were wondering when I'll commit in the IRC logs (didn't see you logged in). Shortly. I'm just going over some things (maybe add a couple seconds more margin) and testing the x5 a bit on important stuff too.

Had some personal crap going on so I was ready to tear anyone's head off the other day. Sorry for the outburst (and the graphic language). :)
Comment by Peter D'Hoye (petur) - Wednesday, 14 February 2007, 22:34 GMT
ok. Just did another show, will listen to it tomorrow and let you know but I expect everything will be ok :)
Comment by Riley McNiff (rmcniff) - Thursday, 15 February 2007, 00:04 GMT
I'll be doing another show on next Tuesday (Feb 20), and I'd love to try it out with the new changes. Michael, do you think the changes will be committed before then? I'm still not sure what to do with the patches you included, so I'm just planning to reinstall from a build with the changes included. This next recording definitely seems like a deal breaker to me. If I can't reproduce any errors there I'm going to just assume that all is well and we can all sleep better now knowing that this has been fixed (or maybe hibernating permanently)
Comment by Michael Sevakis (MikeS) - Thursday, 15 February 2007, 02:05 GMT
There is one point that was left over from the old system; it rears it's head on x5 wavpack at 88kHz in flushing a fixed number of chunks rather than having a low watermark as a goal...you know, small memory and very high datarate. The output buffer can actually not be flushed fully and could actually slowly grow in size (this audio thread type debug screen I have going is very useful). I should have it worked out and tested tonight so maybe later tonight or tomorrow?

Incidentally the newer wavpack lib seems to suck. Bad compression ratios. Used to be ~870kbits, now I'm negative compression pretty often. :\\

Loading...