Rockbox

Tasklist

FS#6542 - SPC Player

Attached to Project: Rockbox
Opened by Adam Gashlin (AdamGashlin) - Monday, 15 January 2007, 10:57 GMT
Task Type Patches
Category Codecs
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Here is a port of blargg's Game_Music_Emu SPC player (based partly on OpenSPC). It plays fast enough on my F40 Gigabeat, it also runs on my iPod photo but not full speed.
Very much unfinished.
This task depends upon

Closed by  Alexander Spyridakis (xaviergr)
Wednesday, 14 February 2007, 04:11 GMT
Reason for closing:  Accepted
Comment by Adam Gashlin (AdamGashlin) - Monday, 15 January 2007, 14:15 GMT
Now with basic metadata reading (only within the codec), track limits, fading.
Comment by Adam Gashlin (AdamGashlin) - Monday, 15 January 2007, 23:27 GMT
FIxes for track length reading, default length.
Comment by Adam Gashlin (AdamGashlin) - Tuesday, 16 January 2007, 08:39 GMT
Basic seeking support added, so it won't just hang when seeking.
Comment by Alexander Spyridakis (xaviergr) - Thursday, 18 January 2007, 15:44 GMT
It is rather unfortunate but this patch doesn't run realtime on H100 or H300 too.
Any ideas to make it faster on the targets with less processing power?
Comment by Adam Gashlin (AdamGashlin) - Monday, 22 January 2007, 09:23 GMT
Now with some defines for speed. SPC_NOINTERP disables interpolation and caches decoded samples. SPC_NOECHO disables the echo filter. SPC_HARDTIME fixes the tempo, supposedly to make some calculations faster, but it doesn't seem to have any effect.
On my iPod photo a 1:55 track plays in 3:35 (53% speed) without any of these defined, and 2:24 (80% speed) with SPC_NOINTERP and SPC_NOECHO, and it still sounds reasonable. Still have to find a way to get a bit more speed...
Comment by Adam Gashlin (AdamGashlin) - Tuesday, 23 January 2007, 05:15 GMT
Thanks to some great advice from the original author there has been a massive speed improvement, some tracks now run full speed on an iPod (with SPC_NOECHO and SPC_NOINTERP).
The code is an absolute disaster with all the rearrangements necessary, but hooray!
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 24 January 2007, 02:31 GMT
blargg has made some more improvements and cleaned up the code a lot. I haven't measured the improvements yet but it is running great.
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 24 January 2007, 02:37 GMT
spc10.patch has the SPC_NOINTERP and SPC_NOECHO enabled by default. SPC_NOINTERP includes blargg's fast linear interpolation, which improves audio quality a lot with almost no speed impact.
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 24 January 2007, 07:08 GMT
Faster fade out (subtraction and multiplication rather than divisions for each sample), and a little tweak for the BRR pre-decoder.
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 24 January 2007, 07:30 GMT
oops, left profiling enabled.
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 24 January 2007, 11:24 GMT
some changes to the BRR cache and linear interpolation, improves (for instance) DKC Mine Cart Madness.
Comment by Adam Gashlin (AdamGashlin) - Thursday, 25 January 2007, 03:25 GMT
Some more fixage, as suggested by blargg (the high note 1:16 into FF4 prelude no longer off pitch, more clickiness gone in DKC).
Comment by Adam Gashlin (AdamGashlin) - Saturday, 27 January 2007, 00:12 GMT
Major speed improvements by blargg!
Also now patches against the latest SVN, which now includes my NSF port (which could probably stand a lot of improvement, i'll have to see about getting blargg to help me fix that up...)
Comment by Adam Gashlin (AdamGashlin) - Saturday, 27 January 2007, 00:38 GMT
Note that the spc17 patch disables frequency scaling on the ipod color, that is necessary for me to get rockbox runnign stably but I usually remove that from the patch when I put it up...
Comment by Adam Gashlin (AdamGashlin) - Saturday, 27 January 2007, 08:20 GMT
Dag nabbit, forgot to actually include the SPC stuff in spc17...
Comment by Adam Gashlin (AdamGashlin) - Sunday, 28 January 2007, 10:33 GMT
Let's see how some code in IRAM (not on ARM) affects playback...
Comment by Adam Gashlin (AdamGashlin) - Sunday, 28 January 2007, 10:59 GMT
moved large arrays out of Spc_Emu so it can fit in IRAM
Comment by Adam Gashlin (AdamGashlin) - Sunday, 28 January 2007, 11:22 GMT
spc20 seems to be running great for coldfire, here's an update without some of the crap I allowed to slip into it (iPod Color stability stuff, mangling of special characters in people's names).
Comment by Adam Gashlin (AdamGashlin) - Sunday, 28 January 2007, 13:54 GMT
spc22 enables echo filtering and gaussian interpolation on gigabeat.
I've got some builds here:
http://hcs64.com/rockbox/spc.html
Comment by Paul Louden (darkkone) - Sunday, 28 January 2007, 23:06 GMT
Using #21 I get a data abort at 01ed6b24 using r12132 as my build (so the .map will be accurate) on Gigabeat.

This happens if I try to play the second file from the Doom archive at snesmusic.org
Comment by Adam Gashlin (AdamGashlin) - Monday, 29 January 2007, 04:23 GMT
Hm, ok, I see what is causing the crash (it starts a sample at the *very end* of memory), but a fix for that still doesn't play any audio. When I can figure that out I'll have something up.
Comment by Adam Gashlin (AdamGashlin) - Monday, 29 January 2007, 05:10 GMT
Ok, now working. I special-cased the "end of RAM" thing, it'll just end the wave there.
Comment by Damien Good (caitsith2) - Monday, 29 January 2007, 05:21 GMT
Here is an example of what happens when the current playtime exceeds the end play time. (In the case of repeat one.) 3:18:40 / 7:42 (--10:-57). As I have been told, this is a bug, that should not ever be happening.
Comment by Paul Louden (darkkone) - Monday, 29 January 2007, 05:25 GMT
What does this "end the wave" mean in terms of playback? Will it significantly affect the way a sond sounds, or is this a case of "People who are intimately familiar with it will notice, but the casual listener won't?"

Specifically, which "wave" is ending?
Comment by Adam Gashlin (AdamGashlin) - Monday, 29 January 2007, 06:26 GMT
The wave I speak of is a sample set to begin at the end of memory, which is unused. No one will notice because it is not audible anyway. Hopefully no SPC actually has samples wrapping around the end of memory... doesn't make much sense, and the cache would have to be modified to allow that.
With repeat=1 current playtime is now fixed at 0.
I added some extra history to the BRR cache. Doom likes to fiddle around with the sample table (which is used to convert the sample index into a memory address) while running, which throws off the old cache. The new cache remembers the addresses of past samples independent of whether they've been purged from the sample table so the decoding now only needs to be done strictly once per sample per track. I don't see any reason why this would be incorrect (short of songs modifying the sample data in realtime, which I ignore the possibility of by providing a cache) and it allows Doom tracks to play full speed on iPod.
Comment by Adam Gashlin (AdamGashlin) - Monday, 29 January 2007, 06:48 GMT
I'll have to retrace the "full speed" bit with regards to Doom, there are fewer skips but it is still not full speed.
Comment by Paul Louden (darkkone) - Monday, 29 January 2007, 07:40 GMT
Just as a recommendation, since these files are unlikely to have a Genre (in my opinion) would it perhaps be a good idea to hard code it to something like "Videogame" so that WPSes don't just say 'Blues'?
Comment by Adam Gashlin (AdamGashlin) - Monday, 29 January 2007, 13:36 GMT
ok, when I work on metadata reading again I'll consider dat genre hack
Comment by Michael Sevakis (MikeS) - Wednesday, 07 February 2007, 03:30 GMT
Quick note-

This part:
while (!ci->pcmbuf_insert((char *)samples, WAV_CHUNK_SIZE*2))
ci->yield();

Should now be:
ci->pcmbuf_insert(samples, NULL, WAV_CHUNK_SIZE/2)
ci->yield();

Comment by Adam Gashlin (AdamGashlin) - Wednesday, 07 February 2007, 06:28 GMT
Thanks much, spc26 includes this (and has the spc stuff tucked away in a subdirectory).
blargg has sent me some new code, but I haven't thoroughly tested it yet.
Comment by Paul Louden (darkkone) - Saturday, 10 February 2007, 02:15 GMT
Fails to compile now that Speex is in SVN.
Haven't yet looked at what needs to be changed, but the failure's in ID3.H, I think it just relates to the codec number.
Comment by Adam Gashlin (AdamGashlin) - Saturday, 10 February 2007, 09:23 GMT
Two new patches:
27 fixes the incompatibility with the Speex thing.
28 additionally renders in noninterleaved 24-bit, which I am told enables the DSP to process it more efficiently.

I was intending to run some speed tests to determine what kind of impact the 24-bit rendering (which just shifts over the old 16-bit samples) would have, but I didn't get around to any of the tests I wanted to do today.
Comment by Adam Gashlin (AdamGashlin) - Saturday, 10 February 2007, 10:19 GMT
Bah, another change to id3.c necessitates another patch.
Comment by Michael Sevakis (MikeS) - Saturday, 10 February 2007, 17:18 GMT
Resync to my last update.
Comment by Michael Sevakis (MikeS) - Saturday, 10 February 2007, 17:47 GMT
Yes, would recommend using > 16 bit rendering as the shifts would be more efficient than what the dsp has to do for <= 16 bits. Good thing using the noninterleaved samples now too.
Comment by Michael Sevakis (MikeS) - Saturday, 10 February 2007, 18:04 GMT
Haven't listened till now and this is very cool. I love this kind of stuff. :)
Comment by Michael Sevakis (MikeS) - Saturday, 10 February 2007, 18:59 GMT
heh, I see you did that, wasn't paying attention to the actual values :P
Comment by Adam Gashlin (AdamGashlin) - Sunday, 11 February 2007, 19:53 GMT
I'm finally up to the latest blargg sent me. I moved the keyon code into another function, as my testing indicates that jumping over rarely used code is more costly than the overhead of the function call when it is used (tests done on an iPod photo/color). I also doubled the size of the buffer I'm decoding into. The resulting several percent (2-3%) performance increase seems to have been enough for Uncharted Waters 2, which was almost-but-not-quite running full speed.
Still not quite fast enough with echo, though we may have that assembly-optimized for coldfire soon, and blargg and I will be looking into improvements on ARM.
Comment by Michael Sevakis (MikeS) - Monday, 12 February 2007, 13:59 GMT
Increasing the output buffer size really seems to hurt the H120 and probably other cf targets. I did test the Uncharted Waters 2 on my iRiver with what I've done so far and I get about 85% boost. I think implementing echo will wait until there's more than enough space in time for it since that should be an easy last step.

BTW: Have you looked at any of the FRACMUL stuff in dsp.c? That's available optimized for ARM too and is very convenient for handling volume and envelope mul-divs. I've used some of that already since gcc seems to produce faster code using it in some places instead of a large asm block.
Comment by Adam Gashlin (AdamGashlin) - Monday, 12 February 2007, 23:39 GMT
Are you sure it is the output buffer size that is causing the difficulty and not the anti-inlining? By simply making decode_brr or key_on inline I measure a significant speed hit, perhaps the opposite holds for coldfire? By all means condition things to work better on coldfire.
I have not had a looked at FRACMUL, I will make a point of doing so.
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 00:42 GMT
I'm sure since I unfortunately haven't actually merged your efforts with mine yet but I did test the buffer size on my fork (I'm worried that they're not easily merged atm with the removal of "this" on CF and changing some vars to 32 bit along with the general mess now =\\ ) and it really did take a hit but that's easily set to something good on the target. One problem on ColdFire is that RAM is very slow (without burst reads and writes of contiguous blocks) and ideally everything would go in IRAM but obviously not possible. The best speedup was a good 15%-20% just from redoing the interpolator but I feel I'll be fighting the huge sparse memory accesses without some other technique. On the X5 it really dogs out since that color display is slow too with the itty bitty 9 bit bus (max 65.5 fps).

I'm thinking hard about how to have some dynamic recompilation to ease the overhead of the CPU core. Might be possible to string assembly fragments together as it executes. And does all of the synth state have to be computed along with the output itself? The work is cut out for anyone doing anything on this. :)

This needs to be more modularized for sure...maybe put the code for each part into .c files?

BTW: What's your boost ratio on the iPods with Uncharted Waters 2?
Comment by Adam Gashlin (AdamGashlin) - Tuesday, 13 February 2007, 04:22 GMT
I do not have a boost ratio to report, as rockbox does not run stably with frequency scaling on my ipod.
I do have a lot of the state in IRAM already, the RAM and BRR cache won't fit but I think anything else is in IRAM or registers.
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 04:37 GMT
That's the rub with it. The scf5250 on x5 has 64k IRAM and the others have 48k. Bummer for that.
I'm doing a double check on the this pointer vs. globals on ColdFire with spc35.patch and then contribute some stuff...probably the interpolator and why not throw in the FRACMUL usage if that's ok. Things got changed around alot.

I also made a BRR cache that has a direct inverse lookup to eliminate looping to do cache lookups. Seemed to help too. Optimization in the decompression isn't that important since all samples will eventually be decompressed and cached and all lookups will hit if I understand correctly how it works. The SPC700 can have up to 256 samples in memory to look up?
Comment by Adam Gashlin (AdamGashlin) - Tuesday, 13 February 2007, 06:47 GMT
Sync'd again, genre now set to 36 (Game).
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 09:32 GMT
Synced and easy to side-by-side test this or globals. Hope I didn't forget any files in the patch.

Just change how the macros get defined at the top of spc_codec.h.

Assembly stuff will follow shortly after sleep.
Comment by Adam Gashlin (AdamGashlin) - Tuesday, 13 February 2007, 12:28 GMT
Since we've seen that this is superior (or at least not much different on coldfire) to globals, I think we'll be sticking with spc36 and friends.
So, yeah, here's my proposed patch for commital. Gonna run all the builds locally before sending it up.
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 19:13 GMT
I see I put the wrong number on that last one. :p Should have some asm and FRACMUL popped in there shortly.
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 22:42 GMT
Ok, added faster interpolation plus a version you can try on ARM that uses a single multiply. There's no advantage in the FRACMUL now in applying volume since there's no post scaling or 64-bit intermediate results. I don't want to hold this up any longer. :) I can optimize coldfire some more after svn.
Comment by Michael Sevakis (MikeS) - Tuesday, 13 February 2007, 22:54 GMT
Oops, this bit:
uint32_t f = voice->position;
int32_t output = (int16_t)this->noise;

if ( (this->r.g.noise_enables & vbit) == 0 )
{
...
Can be:
int32_t output = (int16_t)this->noise;

if ( (this->r.g.noise_enables & vbit) == 0 )
{
uint32_t f = voice->position;
...
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 14 February 2007, 04:05 GMT
Here is the resampler that blargg included, in case we want to do the resampling within the codec.
Comment by Adam Gashlin (AdamGashlin) - Wednesday, 14 February 2007, 04:05 GMT
Crap,forgot the .h

Loading...