Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by AdamGashlin - 2007-01-15

FS#6542 - SPC Player

Here is a port of blargg’s Game_Music_Emu SPC player (based partly on OpenSPC). It plays fast enough on my F40 Gigabeat, it also runs on my iPod photo but not full speed.
Very much unfinished.

Closed by  xaviergr
2007-02-14 04:11
Reason for closing:  Accepted

Now with basic metadata reading (only within the codec), track limits, fading.

FIxes for track length reading, default length.

Basic seeking support added, so it won’t just hang when seeking.

It is rather unfortunate but this patch doesn’t run realtime on H100 or H300 too.
Any ideas to make it faster on the targets with less processing power?

Now with some defines for speed. SPC_NOINTERP disables interpolation and caches decoded samples. SPC_NOECHO disables the echo filter. SPC_HARDTIME fixes the tempo, supposedly to make some calculations faster, but it doesn’t seem to have any effect.
On my iPod photo a 1:55 track plays in 3:35 (53% speed) without any of these defined, and 2:24 (80% speed) with SPC_NOINTERP and SPC_NOECHO, and it still sounds reasonable. Still have to find a way to get a bit more speed…

Thanks to some great advice from the original author there has been a massive speed improvement, some tracks now run full speed on an iPod (with SPC_NOECHO and SPC_NOINTERP).
The code is an absolute disaster with all the rearrangements necessary, but hooray!

blargg has made some more improvements and cleaned up the code a lot. I haven’t measured the improvements yet but it is running great.

spc10.patch has the SPC_NOINTERP and SPC_NOECHO enabled by default. SPC_NOINTERP includes blargg’s fast linear interpolation, which improves audio quality a lot with almost no speed impact.

Faster fade out (subtraction and multiplication rather than divisions for each sample), and a little tweak for the BRR pre-decoder.

oops, left profiling enabled.

some changes to the BRR cache and linear interpolation, improves (for instance) DKC Mine Cart Madness.

Some more fixage, as suggested by blargg (the high note 1:16 into FF4 prelude no longer off pitch, more clickiness gone in DKC).

Major speed improvements by blargg!
Also now patches against the latest SVN, which now includes my NSF port (which could probably stand a lot of improvement, i’ll have to see about getting blargg to help me fix that up…)

Note that the spc17 patch disables frequency scaling on the ipod color, that is necessary for me to get rockbox runnign stably but I usually remove that from the patch when I put it up…

Dag nabbit, forgot to actually include the SPC stuff in spc17…

Let’s see how some code in IRAM (not on ARM) affects playback…

moved large arrays out of Spc_Emu so it can fit in IRAM

spc20 seems to be running great for coldfire, here’s an update without some of the crap I allowed to slip into it (iPod Color stability stuff, mangling of special characters in people’s names).

spc22 enables echo filtering and gaussian interpolation on gigabeat.
I’ve got some builds here:
http://hcs64.com/rockbox/spc.html

Using #21 I get a data abort at 01ed6b24 using r12132 as my build (so the .map will be accurate) on Gigabeat.

This happens if I try to play the second file from the Doom archive at snesmusic.org

Hm, ok, I see what is causing the crash (it starts a sample at the *very end* of memory), but a fix for that still doesn’t play any audio. When I can figure that out I’ll have something up.

Ok, now working. I special-cased the “end of RAM” thing, it’ll just end the wave there.

Here is an example of what happens when the current playtime exceeds the end play time. (In the case of repeat one.) 3:18:40 / 7:42 (–10:-57). As I have been told, this is a bug, that should not ever be happening.

What does this “end the wave” mean in terms of playback? Will it significantly affect the way a sond sounds, or is this a case of “People who are intimately familiar with it will notice, but the casual listener won’t?”

Specifically, which “wave” is ending?

The wave I speak of is a sample set to begin at the end of memory, which is unused. No one will notice because it is not audible anyway. Hopefully no SPC actually has samples wrapping around the end of memory… doesn’t make much sense, and the cache would have to be modified to allow that.
With repeat=1 current playtime is now fixed at 0.
I added some extra history to the BRR cache. Doom likes to fiddle around with the sample table (which is used to convert the sample index into a memory address) while running, which throws off the old cache. The new cache remembers the addresses of past samples independent of whether they’ve been purged from the sample table so the decoding now only needs to be done strictly once per sample per track. I don’t see any reason why this would be incorrect (short of songs modifying the sample data in realtime, which I ignore the possibility of by providing a cache) and it allows Doom tracks to play full speed on iPod.

I’ll have to retrace the “full speed” bit with regards to Doom, there are fewer skips but it is still not full speed.

Just as a recommendation, since these files are unlikely to have a Genre (in my opinion) would it perhaps be a good idea to hard code it to something like “Videogame” so that WPSes don’t just say ‘Blues’?

ok, when I work on metadata reading again I’ll consider dat genre hack

MikeS commented on 2007-02-07 03:30

Quick note-

This part:
while (!ci→pcmbuf_insert1)

  ci->yield();

Should now be:
ci→pcmbuf_insert(samples, NULL, WAV_CHUNK_SIZE/2)
ci→yield();

1) char *)samples, WAV_CHUNK_SIZE*2

Thanks much, spc26 includes this (and has the spc stuff tucked away in a subdirectory).
blargg has sent me some new code, but I haven’t thoroughly tested it yet.

Fails to compile now that Speex is in SVN.
Haven’t yet looked at what needs to be changed, but the failure’s in ID3.H, I think it just relates to the codec number.

Two new patches:
27 fixes the incompatibility with the Speex thing.
28 additionally renders in noninterleaved 24-bit, which I am told enables the DSP to process it more efficiently.

I was intending to run some speed tests to determine what kind of impact the 24-bit rendering (which just shifts over the old 16-bit samples) would have, but I didn’t get around to any of the tests I wanted to do today.

Bah, another change to id3.c necessitates another patch.

MikeS commented on 2007-02-10 17:18

Resync to my last update.

MikeS commented on 2007-02-10 17:47

Yes, would recommend using > 16 bit rendering as the shifts would be more efficient than what the dsp has to do for ⇐ 16 bits. Good thing using the noninterleaved samples now too.

MikeS commented on 2007-02-10 18:04

Haven’t listened till now and this is very cool. I love this kind of stuff. :)

MikeS commented on 2007-02-10 18:59

heh, I see you did that, wasn’t paying attention to the actual values :P

I’m finally up to the latest blargg sent me. I moved the keyon code into another function, as my testing indicates that jumping over rarely used code is more costly than the overhead of the function call when it is used (tests done on an iPod photo/color). I also doubled the size of the buffer I’m decoding into. The resulting several percent (2-3%) performance increase seems to have been enough for Uncharted Waters 2, which was almost-but-not-quite running full speed.
Still not quite fast enough with echo, though we may have that assembly-optimized for coldfire soon, and blargg and I will be looking into improvements on ARM.

MikeS commented on 2007-02-12 13:59

Increasing the output buffer size really seems to hurt the H120 and probably other cf targets. I did test the Uncharted Waters 2 on my iRiver with what I’ve done so far and I get about 85% boost. I think implementing echo will wait until there’s more than enough space in time for it since that should be an easy last step.

BTW: Have you looked at any of the FRACMUL stuff in dsp.c? That’s available optimized for ARM too and is very convenient for handling volume and envelope mul-divs. I’ve used some of that already since gcc seems to produce faster code using it in some places instead of a large asm block.

Are you sure it is the output buffer size that is causing the difficulty and not the anti-inlining? By simply making decode_brr or key_on inline I measure a significant speed hit, perhaps the opposite holds for coldfire? By all means condition things to work better on coldfire.
I have not had a looked at FRACMUL, I will make a point of doing so.

MikeS commented on 2007-02-13 00:42

I’m sure since I unfortunately haven’t actually merged your efforts with mine yet but I did test the buffer size on my fork (I’m worried that they’re not easily merged atm with the removal of “this” on CF and changing some vars to 32 bit along with the general mess now =
) and it really did take a hit but that’s easily set to something good on the target. One problem on ColdFire is that RAM is very slow (without burst reads and writes of contiguous blocks) and ideally everything would go in IRAM but obviously not possible. The best speedup was a good 15%-20% just from redoing the interpolator but I feel I’ll be fighting the huge sparse memory accesses without some other technique. On the X5 it really dogs out since that color display is slow too with the itty bitty 9 bit bus (max 65.5 fps).

I’m thinking hard about how to have some dynamic recompilation to ease the overhead of the CPU core. Might be possible to string assembly fragments together as it executes. And does all of the synth state have to be computed along with the output itself? The work is cut out for anyone doing anything on this. :)

This needs to be more modularized for sure…maybe put the code for each part into .c files?

BTW: What’s your boost ratio on the iPods with Uncharted Waters 2?

I do not have a boost ratio to report, as rockbox does not run stably with frequency scaling on my ipod.
I do have a lot of the state in IRAM already, the RAM and BRR cache won’t fit but I think anything else is in IRAM or registers.

MikeS commented on 2007-02-13 04:37

That’s the rub with it. The scf5250 on x5 has 64k IRAM and the others have 48k. Bummer for that.
I’m doing a double check on the this pointer vs. globals on ColdFire with spc35.patch and then contribute some stuff…probably the interpolator and why not throw in the FRACMUL usage if that’s ok. Things got changed around alot.

I also made a BRR cache that has a direct inverse lookup to eliminate looping to do cache lookups. Seemed to help too. Optimization in the decompression isn’t that important since all samples will eventually be decompressed and cached and all lookups will hit if I understand correctly how it works. The SPC700 can have up to 256 samples in memory to look up?

Sync’d again, genre now set to 36 (Game).

MikeS commented on 2007-02-13 09:32

Synced and easy to side-by-side test this or globals. Hope I didn’t forget any files in the patch.

Just change how the macros get defined at the top of spc_codec.h.

Assembly stuff will follow shortly after sleep.

Since we’ve seen that this is superior (or at least not much different on coldfire) to globals, I think we’ll be sticking with spc36 and friends.
So, yeah, here’s my proposed patch for commital. Gonna run all the builds locally before sending it up.

MikeS commented on 2007-02-13 19:13

I see I put the wrong number on that last one. :p Should have some asm and FRACMUL popped in there shortly.

MikeS commented on 2007-02-13 22:42

Ok, added faster interpolation plus a version you can try on ARM that uses a single multiply. There’s no advantage in the FRACMUL now in applying volume since there’s no post scaling or 64-bit intermediate results. I don’t want to hold this up any longer. :) I can optimize coldfire some more after svn.

MikeS commented on 2007-02-13 22:54

Oops, this bit:

          uint32_t f = voice->position;
          int32_t  output = (int16_t)this->noise;
          if ( (this->r.g.noise_enables & vbit) == 0 )
          {

… Can be:

          int32_t  output = (int16_t)this->noise;
          if ( (this->r.g.noise_enables & vbit) == 0 )
          {
              uint32_t f = voice->position;

Here is the resampler that blargg included, in case we want to do the resampling within the codec.

Crap,forgot the .h

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing