Rockbox

  • Status Closed
  • Percent Complete
    100%
  • Task Type Patches
  • Category Codecs
  • Assigned To No-one
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by Adam Gashlin - 2007-01-15

FS#6542 - SPC Player

Here is a port of blargg’s Game_Music_Emu SPC player (based partly on OpenSPC). It plays fast enough on my F40 Gigabeat, it also runs on my iPod photo but not full speed.
Very much unfinished.

Closed by  Alexander Spyridakis
2007-02-14 04:11
Reason for closing:  Accepted
Adam Gashlin commented on 2007-01-15 14:15

Now with basic metadata reading (only within the codec), track limits, fading.

Adam Gashlin commented on 2007-01-15 23:27

FIxes for track length reading, default length.

Adam Gashlin commented on 2007-01-16 08:39

Basic seeking support added, so it won’t just hang when seeking.

Alexander Spyridakis commented on 2007-01-18 15:44

It is rather unfortunate but this patch doesn’t run realtime on H100 or H300 too.
Any ideas to make it faster on the targets with less processing power?

Adam Gashlin commented on 2007-01-22 09:23

Now with some defines for speed. SPC_NOINTERP disables interpolation and caches decoded samples. SPC_NOECHO disables the echo filter. SPC_HARDTIME fixes the tempo, supposedly to make some calculations faster, but it doesn’t seem to have any effect.
On my iPod photo a 1:55 track plays in 3:35 (53% speed) without any of these defined, and 2:24 (80% speed) with SPC_NOINTERP and SPC_NOECHO, and it still sounds reasonable. Still have to find a way to get a bit more speed…

Adam Gashlin commented on 2007-01-23 05:15

Thanks to some great advice from the original author there has been a massive speed improvement, some tracks now run full speed on an iPod (with SPC_NOECHO and SPC_NOINTERP).
The code is an absolute disaster with all the rearrangements necessary, but hooray!

Adam Gashlin commented on 2007-01-24 02:31

blargg has made some more improvements and cleaned up the code a lot. I haven’t measured the improvements yet but it is running great.

Adam Gashlin commented on 2007-01-24 02:37

spc10.patch has the SPC_NOINTERP and SPC_NOECHO enabled by default. SPC_NOINTERP includes blargg’s fast linear interpolation, which improves audio quality a lot with almost no speed impact.

Adam Gashlin commented on 2007-01-24 07:08

Faster fade out (subtraction and multiplication rather than divisions for each sample), and a little tweak for the BRR pre-decoder.

Adam Gashlin commented on 2007-01-24 07:30

oops, left profiling enabled.

Adam Gashlin commented on 2007-01-24 11:24

some changes to the BRR cache and linear interpolation, improves (for instance) DKC Mine Cart Madness.

Adam Gashlin commented on 2007-01-25 03:25

Some more fixage, as suggested by blargg (the high note 1:16 into FF4 prelude no longer off pitch, more clickiness gone in DKC).

Adam Gashlin commented on 2007-01-27 00:12

Major speed improvements by blargg!
Also now patches against the latest SVN, which now includes my NSF port (which could probably stand a lot of improvement, i’ll have to see about getting blargg to help me fix that up…)

Adam Gashlin commented on 2007-01-27 00:38

Note that the spc17 patch disables frequency scaling on the ipod color, that is necessary for me to get rockbox runnign stably but I usually remove that from the patch when I put it up…

Adam Gashlin commented on 2007-01-27 08:20

Dag nabbit, forgot to actually include the SPC stuff in spc17…

Adam Gashlin commented on 2007-01-28 10:33

Let’s see how some code in IRAM (not on ARM) affects playback…

Adam Gashlin commented on 2007-01-28 10:59

moved large arrays out of Spc_Emu so it can fit in IRAM

Adam Gashlin commented on 2007-01-28 11:22

spc20 seems to be running great for coldfire, here’s an update without some of the crap I allowed to slip into it (iPod Color stability stuff, mangling of special characters in people’s names).

Adam Gashlin commented on 2007-01-28 13:54

spc22 enables echo filtering and gaussian interpolation on gigabeat.
I’ve got some builds here:
http://hcs64.com/rockbox/spc.html

Paul Louden commented on 2007-01-28 23:06

Using #21 I get a data abort at 01ed6b24 using r12132 as my build (so the .map will be accurate) on Gigabeat.

This happens if I try to play the second file from the Doom archive at snesmusic.org

Adam Gashlin commented on 2007-01-29 04:23

Hm, ok, I see what is causing the crash (it starts a sample at the *very end* of memory), but a fix for that still doesn’t play any audio. When I can figure that out I’ll have something up.

Adam Gashlin commented on 2007-01-29 05:10

Ok, now working. I special-cased the “end of RAM” thing, it’ll just end the wave there.

Damien Good commented on 2007-01-29 05:21

Here is an example of what happens when the current playtime exceeds the end play time. (In the case of repeat one.) 3:18:40 / 7:42 (–10:-57). As I have been told, this is a bug, that should not ever be happening.

Paul Louden commented on 2007-01-29 05:25

What does this “end the wave” mean in terms of playback? Will it significantly affect the way a sond sounds, or is this a case of “People who are intimately familiar with it will notice, but the casual listener won’t?”

Specifically, which “wave” is ending?

Adam Gashlin commented on 2007-01-29 06:26

The wave I speak of is a sample set to begin at the end of memory, which is unused. No one will notice because it is not audible anyway. Hopefully no SPC actually has samples wrapping around the end of memory… doesn’t make much sense, and the cache would have to be modified to allow that.
With repeat=1 current playtime is now fixed at 0.
I added some extra history to the BRR cache. Doom likes to fiddle around with the sample table (which is used to convert the sample index into a memory address) while running, which throws off the old cache. The new cache remembers the addresses of past samples independent of whether they’ve been purged from the sample table so the decoding now only needs to be done strictly once per sample per track. I don’t see any reason why this would be incorrect (short of songs modifying the sample data in realtime, which I ignore the possibility of by providing a cache) and it allows Doom tracks to play full speed on iPod.

Adam Gashlin commented on 2007-01-29 06:48

I’ll have to retrace the “full speed” bit with regards to Doom, there are fewer skips but it is still not full speed.

Paul Louden commented on 2007-01-29 07:40

Just as a recommendation, since these files are unlikely to have a Genre (in my opinion) would it perhaps be a good idea to hard code it to something like “Videogame” so that WPSes don’t just say ‘Blues’?

Adam Gashlin commented on 2007-01-29 13:36

ok, when I work on metadata reading again I’ll consider dat genre hack

Michael Sevakis commented on 2007-02-07 03:30

Quick note-

This part:
while (!ci→pcmbuf_insert1)

  ci->yield();

Should now be:
ci→pcmbuf_insert(samples, NULL, WAV_CHUNK_SIZE/2)
ci→yield();

1) char *)samples, WAV_CHUNK_SIZE*2
Adam Gashlin commented on 2007-02-07 06:28

Thanks much, spc26 includes this (and has the spc stuff tucked away in a subdirectory).
blargg has sent me some new code, but I haven’t thoroughly tested it yet.

Paul Louden commented on 2007-02-10 02:15

Fails to compile now that Speex is in SVN.
Haven’t yet looked at what needs to be changed, but the failure’s in ID3.H, I think it just relates to the codec number.

Adam Gashlin commented on 2007-02-10 09:23

Two new patches:
27 fixes the incompatibility with the Speex thing.
28 additionally renders in noninterleaved 24-bit, which I am told enables the DSP to process it more efficiently.

I was intending to run some speed tests to determine what kind of impact the 24-bit rendering (which just shifts over the old 16-bit samples) would have, but I didn’t get around to any of the tests I wanted to do today.

Adam Gashlin commented on 2007-02-10 10:19

Bah, another change to id3.c necessitates another patch.

Michael Sevakis commented on 2007-02-10 17:18

Resync to my last update.

Michael Sevakis commented on 2007-02-10 17:47

Yes, would recommend using > 16 bit rendering as the shifts would be more efficient than what the dsp has to do for ⇐ 16 bits. Good thing using the noninterleaved samples now too.

Michael Sevakis commented on 2007-02-10 18:04

Haven’t listened till now and this is very cool. I love this kind of stuff. :)

Michael Sevakis commented on 2007-02-10 18:59

heh, I see you did that, wasn’t paying attention to the actual values :P

Adam Gashlin commented on 2007-02-11 19:53

I’m finally up to the latest blargg sent me. I moved the keyon code into another function, as my testing indicates that jumping over rarely used code is more costly than the overhead of the function call when it is used (tests done on an iPod photo/color). I also doubled the size of the buffer I’m decoding into. The resulting several percent (2-3%) performance increase seems to have been enough for Uncharted Waters 2, which was almost-but-not-quite running full speed.
Still not quite fast enough with echo, though we may have that assembly-optimized for coldfire soon, and blargg and I will be looking into improvements on ARM.

Michael Sevakis commented on 2007-02-12 13:59

Increasing the output buffer size really seems to hurt the H120 and probably other cf targets. I did test the Uncharted Waters 2 on my iRiver with what I’ve done so far and I get about 85% boost. I think implementing echo will wait until there’s more than enough space in time for it since that should be an easy last step.

BTW: Have you looked at any of the FRACMUL stuff in dsp.c? That’s available optimized for ARM too and is very convenient for handling volume and envelope mul-divs. I’ve used some of that already since gcc seems to produce faster code using it in some places instead of a large asm block.

Adam Gashlin commented on 2007-02-12 23:39

Are you sure it is the output buffer size that is causing the difficulty and not the anti-inlining? By simply making decode_brr or key_on inline I measure a significant speed hit, perhaps the opposite holds for coldfire? By all means condition things to work better on coldfire.
I have not had a looked at FRACMUL, I will make a point of doing so.

Michael Sevakis commented on 2007-02-13 00:42

I’m sure since I unfortunately haven’t actually merged your efforts with mine yet but I did test the buffer size on my fork (I’m worried that they’re not easily merged atm with the removal of “this” on CF and changing some vars to 32 bit along with the general mess now =
) and it really did take a hit but that’s easily set to something good on the target. One problem on ColdFire is that RAM is very slow (without burst reads and writes of contiguous blocks) and ideally everything would go in IRAM but obviously not possible. The best speedup was a good 15%-20% just from redoing the interpolator but I feel I’ll be fighting the huge sparse memory accesses without some other technique. On the X5 it really dogs out since that color display is slow too with the itty bitty 9 bit bus (max 65.5 fps).

I’m thinking hard about how to have some dynamic recompilation to ease the overhead of the CPU core. Might be possible to string assembly fragments together as it executes. And does all of the synth state have to be computed along with the output itself? The work is cut out for anyone doing anything on this. :)

This needs to be more modularized for sure…maybe put the code for each part into .c files?

BTW: What’s your boost ratio on the iPods with Uncharted Waters 2?

Adam Gashlin commented on 2007-02-13 04:22

I do not have a boost ratio to report, as rockbox does not run stably with frequency scaling on my ipod.
I do have a lot of the state in IRAM already, the RAM and BRR cache won’t fit but I think anything else is in IRAM or registers.

Michael Sevakis commented on 2007-02-13 04:37

That’s the rub with it. The scf5250 on x5 has 64k IRAM and the others have 48k. Bummer for that.
I’m doing a double check on the this pointer vs. globals on ColdFire with spc35.patch and then contribute some stuff…probably the interpolator and why not throw in the FRACMUL usage if that’s ok. Things got changed around alot.

I also made a BRR cache that has a direct inverse lookup to eliminate looping to do cache lookups. Seemed to help too. Optimization in the decompression isn’t that important since all samples will eventually be decompressed and cached and all lookups will hit if I understand correctly how it works. The SPC700 can have up to 256 samples in memory to look up?

Adam Gashlin commented on 2007-02-13 06:47

Sync’d again, genre now set to 36 (Game).

Michael Sevakis commented on 2007-02-13 09:32

Synced and easy to side-by-side test this or globals. Hope I didn’t forget any files in the patch.

Just change how the macros get defined at the top of spc_codec.h.

Assembly stuff will follow shortly after sleep.

Adam Gashlin commented on 2007-02-13 12:28

Since we’ve seen that this is superior (or at least not much different on coldfire) to globals, I think we’ll be sticking with spc36 and friends.
So, yeah, here’s my proposed patch for commital. Gonna run all the builds locally before sending it up.

Michael Sevakis commented on 2007-02-13 19:13

I see I put the wrong number on that last one. :p Should have some asm and FRACMUL popped in there shortly.

Michael Sevakis commented on 2007-02-13 22:42

Ok, added faster interpolation plus a version you can try on ARM that uses a single multiply. There’s no advantage in the FRACMUL now in applying volume since there’s no post scaling or 64-bit intermediate results. I don’t want to hold this up any longer. :) I can optimize coldfire some more after svn.

Michael Sevakis commented on 2007-02-13 22:54

Oops, this bit:

          uint32_t f = voice->position;
          int32_t  output = (int16_t)this->noise;
          if ( (this->r.g.noise_enables & vbit) == 0 )
          {

… Can be:

          int32_t  output = (int16_t)this->noise;
          if ( (this->r.g.noise_enables & vbit) == 0 )
          {
              uint32_t f = voice->position;

Adam Gashlin commented on 2007-02-14 04:05

Here is the resampler that blargg included, in case we want to do the resampling within the codec.

Adam Gashlin commented on 2007-02-14 04:05

Crap,forgot the .h

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing