Rockbox

Tasklist

FS#10048 - AMSSansa MMU and Dcache patch

Attached to Project: Rockbox
Opened by Jack Halpin (FlynDice) - Monday, 23 March 2009, 18:17 GMT
Last edited by Rafaël Carré (funman) - Monday, 08 June 2009, 23:07 GMT
Task Type Patches
Category Operating System/Drivers
Status Closed
Assigned To No-one
Operating System Another
Severity Low
Priority Normal
Reported Version Version 3.1
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

I believe this patch gets the mmu operating on my e280v2 however in doing so it makes the main program fail when it tries to mount disks and cannot. I get the message No partition found. Insert USB cable and fix it(main.c line 493). I am not a programmer so laugh at the code all you want and I wont be offended, I just hope it can be helpful. There's also some code for the HW debug page in debug-as325.c included. I have just mimicked the OF for the most part and looked at the other ARM targets for placement.
This task depends upon

Closed by  Rafaël Carré (funman)
Monday, 08 June 2009, 23:07 GMT
Reason for closing:  Accepted
Additional comments about closing:  Even if further problems arise they'll be easier to identify and fix with a wider testing audience
Comment by Thomas Martitz (kugel.) - Monday, 23 March 2009, 18:29 GMT
Weeh, can you post a new patch with less unrelated changes. It's quite hard to follow what you did.

It seems you reinvent the wheel at some points. I definitely think it can all be done using existing code. It works on the gigabeats too, which have nearly the same cpu.
Comment by Jack Halpin (FlynDice) - Monday, 23 March 2009, 20:19 GMT
OK, here's a cleaner one I hope with just the mmu stuff. There's still a bunch of stuff in the diff that I didn't touch but oh well... I have tried all kinds of ways to do it with the present code in mmu-arm, especially after looking at the OF disassembly and finally "punted" and just simply mimicked the OF and it appeared to work... If you want to do it with existing code be my guest. I really don't care how we get it to work I would just like it to work. Domonoky mentioned that the SD access using DMA might have a problem with the cacheing. Im guessing there's no simple workaround to that?
Comment by Thomas Martitz (kugel.) - Monday, 23 March 2009, 20:21 GMT
I don't say your approach is bad. Not at all. But I'm saying that, once we have a working version, we should try to use already present code.

Getting it to work as more priority for now.
Comment by Jack Halpin (FlynDice) - Tuesday, 31 March 2009, 01:47 GMT
OK, I'm almost sure the mmu and dcache are operating with this patch although I do not know how to test for proof. There was a problem with dma transfers and cached memory that I seem to have alleviated with a "invalidate_dcache_range" function. I was able to use functions from the already existing mmu-arm.c and added a couple of bus control functions to that file also. I tried to modify the SOURCES file to include all the AMS sansas also. I'm not quite sure if I've placed the "invalidate" function in the optimum spot but it works where it is. The gigabeat fx is quite similar so there are probably some more clues there. Look at imx31/sdma-imx31 and s3c2440/gigabeat-fx/ ata-megfx.c. Playback is also broken with this but the ui response is very good.
Comment by Thomas Martitz (kugel.) - Tuesday, 31 March 2009, 01:49 GMT
Wow nice, going to test it asap!
Comment by Thomas Martitz (kugel.) - Tuesday, 31 March 2009, 02:18 GMT
Hmm, hard to tell if it's running yes. Wasn't there a register which can tell it reliably?

I think it's not running. Ogg performs worse than before, pictureflow is approx. 1/3 as fast (having 5fps now, >15 before).

The imx31 seems to clean the dcache rather than invalidating (the former one flushes it too, afaik, i.e. write content to ram).

On a side note: If/When we get this right, we should try to use it actually. Such as mapping the ram nearer to IRAM, so that we can save long-calls. And cache more ram too, 1MB seems a bit low (that doesn't even cover the whole Rockbox ram usage). I'm not sure if the audio buffer should be cached though.
That you're only caching 1MB out of 8MB might actually be a problem, at least I can imagine that.

Edit: Here's your patch without unrelated & whitespaces changes. Please avoid those (tell your editor to not fix trailing spaces, for example), as it makes it hard to see what the patch actually does.
Comment by Jack Halpin (FlynDice) - Tuesday, 31 March 2009, 03:21 GMT
Of course, I never thought of the trailing spaces, was wondering where all that crap was coming from, thanks. I think the mmu is working there's just a little more to figure out.... You can only check to see if it's enabled, not if it's actually running, from the cp15 register 1. I just cached the same 2 1MB sections that the OF did, not with any intelligent planning.... I tried the flush/dump combo from the imx31 but the results were basically the same and this way was simpler. My reason for believing it's working now is that the ui response is very good as opposed to the results we were getting before. I don't know how to test for it though. Hopefully someone who has a better understanding can make some tweaks.
Comment by Andreas Langmann (cyclon1978) - Tuesday, 31 March 2009, 11:13 GMT
i get error message building bootloader:

/boot.link:8 nonconstant expression for length

entering the value causes no boot (screen, buttons remains dark)
Comment by Andreas Langmann (cyclon1978) - Tuesday, 31 March 2009, 11:22 GMT
... on e280v2 i forgot to add
Comment by Rafaël Carré (funman) - Tuesday, 31 March 2009, 13:26 GMT
I think boot.lds need to #include "cpu.h"
Comment by Jack Halpin (FlynDice) - Tuesday, 31 March 2009, 20:21 GMT
Not much new here, I added cpu.h to boot.lds and included the routine that prints out the cp15 registers on the HW debug page(since this is an mmu patch...). It seems the gigabeats and mrobe use the mmu clean_cache routines in their pcm files and I'm hoping to try that in pcm-as3525 to get playback working again, no luck so far though. I have also played around with changing the cache replacement scheme to round-robin(cyclic) as opposed to the default random.
Comment by Michael Sevakis (MikeS) - Tuesday, 31 March 2009, 22:11 GMT
I'd recommend not mapping physical addresses onto themselves as cached so that uncached conherent memory for devices is available by a simple address translations (which is done for all ARM that have a data cache). This will be rather important for keyclicks to work properly with audio DMA. I'd also recommend a writeback+dump before disabling the MMU (if enabled) just in case something would get lost.

If a DMA transfer is memory->peripheral, clean_dcache_range is what is needed beforehand. If DMA transfer is peripheral->memory, then use dump_dcache_range. It writes back the dangling ends if any but not what's in between (though peripheral->memory really needs special cache-alignment handling to be completely safe).
Comment by Rafaël Carré (funman) - Wednesday, 01 April 2009, 11:28 GMT
Jack thanks a lot for your patch : I have some remarks:

* bic r0, r0, #0x41 /* disable mmu & dcache */ => it should be #5 (4 | 1)
* The code to enable mmu should not be enabled in the boot loader (sdram not initialized yet)

* The mmu definitely works (I mirrored SDRAM on another address & tested)

Michael thanks for giving us your expertize ;)
The arm922t datasheet is a bit unclear, I think it requires prior knowledge to understand how caching works.
Comment by Rafaël Carré (funman) - Friday, 03 April 2009, 15:20 GMT
Here is a patch which remaps SDRAM to 0x0 and IRAM to 0x10000000 (proving that MMU works fine)

For now I do not declare the RAM regions as cached or buffered, because else the loaded rockbox won't run.
* immediate black screen on Clip
* immediate white screen on Fuze

I also enable the MMU just before jumping to loaded rockbox

Tested on Clip & Fuze (perhaps the patch doesn't apply without  FS#10092 )

test_disk still fails, Fuze backdrop is still corrupted, md5sum still succeeds!
(To run md5sum you need to replace the quit variable by false, else it will get corrupted and the plugin will exit early)

Problem happen on the Clip : when playing music : "error on DMA channel 0" (channel 0 is dedicated to ATA, not PCM?)


To enable caching:
- comment i/d cache disabling in bootloader/sansa_as3525.c:97
- uncomment dcache cleaning/dumping in ata_sd_as3525.c:714
- s/CACHE_NONE/CACHE_ALL/ in system-as3525.c:268


Now a question : can the DMA peripheral use virtual addresses?
Section 2.4 of PL081 doc only says linked lists must be in flat mapped memory area and we don't use linked lists (LLI)

My tests had mixed results, so I used physical addresses (also valid as virtual addresses since they are flat mapped)


Note that the problems we see (test_disk failing, crashes, bitmap corruption) also happen with icache & dcache disabled.
I started verifying DMA code and found some (not important so far) mistakes, so it might be useful to take a deeper look.
Comment by Jack Halpin (FlynDice) - Saturday, 04 April 2009, 18:05 GMT
I haven't had too much time to play & experiment but 2 thoughts came to mind right away regarding the mapping. First is that you're mapping the area containing TTB_BASE_ADDR to another location . Can we use a virtual address for the actual table? The other is that if you cache the entire DRAM you're including all the buffer areas which I thought I read was not a good idea. So can we map the DRAM with CACHE_NONE first to move it and then a subset(1MB, 2MBs??) with CACHE_ALL to have a cacheable portion? Standard disclaimer. I may just not have a clue....
Comment by Thomas Martitz (kugel.) - Saturday, 04 April 2009, 20:12 GMT
The physical addresses still apply. So the MMU shouldn't have a problem to find the ttb table even if it's mapped. However, it might be a problem if it's cached.

However, the gigabeat F/X remap and cache the whole memory too:


ldr r0, =0x30000000 ;physical address
ldr r1, =0x0 ;virtual address
mov r2, #32 ;memory size
mov r3, #12 ; CACHE_ALL
bl map_section


Edit: BTW, I don't think it's a smart idea to map the RAM to the physical address of the IRAM section, especially not if we plan on using the IRAM.
Comment by Rafaël Carré (funman) - Tuesday, 07 April 2009, 12:46 GMT
Flyndice : the TTB is still at the end of SDRAM (with 0x30000000 base)
What is the buffer area you are talking about ? lcd framebuffer?

kugel : when the MMU is enabled, all addresses are virtual so this is not a problem : the gigabeat also remaps SDRAM to 0x0. This also brings the advantage to not have to copy the vectors.

By the way I found why playback failed on the Clip : the IRAM offset (in dma-pl081.c) must be 0x71000000, not 0x80000000.
Now playback works (proving that IRAM access works as well), but it stops every 2 seconds or so and the UI become very irresponsive.

You can test on Clip or Fuze (ogg playback works quite fine on my Fuze).

Note this patch is only for research uses.
Comment by Jack Halpin (FlynDice) - Tuesday, 07 April 2009, 15:48 GMT
RE Buffer areas: It's quite possible(actually likely...) that I don't understand correctly how the memory is structured in SDRAM. In looking at rockbox.map and the *.lds files I was under the impression that the actual code we're using to run rockbox ends around 0x3014_0000(depending on build of course) followed by the audiobuffer, codec buffer(for !lowmem), plugin buffer, and finally TLB now. I've actually got several more questions from my researching of cache & DMA in the past few days but I'll put that in the form of a forum post later.
Comment by Rafaël Carré (funman) - Wednesday, 08 April 2009, 12:56 GMT
Your impression is right, app.lds file show the memory layout for rockbox, and all the code is at the beginning, followed by audio/codec/plugin buffers & TTB

I'm not sure why buffers areas shouldn't be cached : once we have written/read from it using DMA, this shouldn't be a problem no ? Do you remember where you read it's not a good idea to buffer/cache the audio/codec buffers?
Comment by Michael Sevakis (MikeS) - Friday, 10 April 2009, 04:04 GMT
It's fine to cache memory used by DMA but then proper cache coherence must be employed and I'm guessing as usual DMA and other SoC peripherals must use phyical addresses. For IRAM, perhaps even caching that can help but you can remap the IRAM without any caching or buffering anyway. If you have in-memory framebuffers on any targets, then those should be mapped somewhere with only write buffering.

Make sure to call cpucache_invalidate before jumping from the bootloader to the firmware if you have the MMU and caching enabled.
Comment by Thomas Martitz (kugel.) - Friday, 10 April 2009, 21:09 GMT
I've taken funman's patch and mapped the RAM and IRAM so, that long-calls are no longer needed. Binsize is down by ~50k.
I've also changed the original iram addresses from 0x81000000 to 0x0 again (I don't know where 0x81000000 comes from, but 0x0 should be correct).

With this patch, rockbox is still noticeably and unbearable slow. I've not worked out what's causing the slowness.
Comment by Rafaël Carré (funman) - Thursday, 16 April 2009, 16:42 GMT
Thomas: 0x81000000 is an alias for the IRAM (look the memory map at the end of AS3525 datasheet, page 190 or so)

Michael: perhaps this is why we have problems with the LCD displays. (needs testing)
Comment by Rafaël Carré (funman) - Saturday, 18 April 2009, 13:34 GMT
The different sansa AMS don't use a framebuffer, except lcd_framebuffer defined in firmware/drivers/lcd*.c
I suppose the framebuffer you speak about is a framebuffer written to the device using DMA ?

Also we only use addresses with DMA peripheral (for SD & PCM transfers), that minimizes the area of code which needs attention.
Comment by Rafaël Carré (funman) - Monday, 20 April 2009, 14:41 GMT
kugel : you must put the code at 0x81000000 in boot.lds
Else when you enable the MMU, code at 0x0 will be replaced by SDRAM content.

Using the 0x81000000 (flat-mapped) IRAM alias permits to use the same location when mmu is disabled or enabled.

Perhaps the patch works on your Fuze if the loaded rockbox is too small to overwrite the part of main() executing after enable_mmu(), but it's just a luck !
Comment by Thomas Martitz (kugel.) - Monday, 20 April 2009, 14:50 GMT
Ok, I just wasn't aware that IRAM is at 0x81000000 too.
Comment by Thomas Martitz (kugel.) - Monday, 20 April 2009, 23:03 GMT
This is my patch sync'd and cleaned up (making the IRAM/DRAM addresses based on #defines in as3525.h for easier changing). It also changes the IRAM addresses to be directly after DRAM (on low mem too). It also includes the removal of long-calls (now the clip has over 400k audio buffer :) ). And it uses 0x81000000 for IRAM.

Still utterly slow.
Rafaël, it would be nice if you could apply your local changes to this patch if you made some progress, as this patch gets rid of several hardcoding, which should make further development easier.
Comment by Jack Halpin (FlynDice) - Tuesday, 21 April 2009, 13:25 GMT
I had to add this (#define DRAMORIG DRAM_ORIG) to plugin.lds line 114 to get it to link correctly:
#elif CONFIG_CPU==AS3525
#define DRAMORIG DRAM_ORIG
#ifdef AMS_LOWMEM

I spent most of last night trying to merge this with what I had sort of working but to no avail. I've been taking a slightly different direction, removing the mmu code from the bootloader and no rearranging of memory. At the risk of making things confusing I'll post what I've got here to get the info out and see if it can be of any value.
Comment by Jack Halpin (FlynDice) - Tuesday, 21 April 2009, 23:49 GMT
I got the mapping patch to work with both mapped memory and good ui response. Playback does not work at all though. I've left commented out parts in the code so you can see what I was playing around with. Dram is mapped to 0x0 and IRAM directly after it. In the bootloader I had to jump to 0x44 to start to miss the vectors that get copied to 0x0. I get no sound when I try to play music. The microsd will prevent loading the main program if it is inserted. If inserted after the program is running it makes it crash. See what you think.
Comment by Rafaël Carré (funman) - Thursday, 23 April 2009, 12:40 GMT
flyndice: I have some remarks for your last patch

please remove the diffs for test_disk & synchronous/asynchronous clocking from it (unrelated)

+ for(i=0; i< 0x44; i++) /* vectors */
+ ((unsigned long*)0x30000000)[i] = ((unsigned long*)0x81000000)[i];
You will copy 0x44*4 bytes while we only need 0x11*4 = 0x44 (0x40 + 0x4 padding bytes in the middle)

The vectors you have copied are overwritten when you load rockbox into SDRAM, so no need to jump them.
Comment by Jack Halpin (FlynDice) - Friday, 24 April 2009, 05:25 GMT
"The vectors you have copied are overwritten when you load rockbox into SDRAM, so no need to jump them."
I understand the vectors get overwritten ( I'm not sure I understand why they need to be copied beforehand) but don't they get overwritten with the same vectors? If I jump to 0x0 from the bootloader I go into a continuous reset loop so I figured I was jumping into the reset exception. When I jump to 0x44(end of vectors from rockbox.map/start of crt0.S) rockbox seems to then boot normally. I'll try to back some things out of the patch if I can.
Comment by Rafaël Carré (funman) - Friday, 24 April 2009, 16:40 GMT
After enabling the mmu in the bootloader, we need to copy the vectors because the vectors are located at virtual address 0x0.

When mmu is disabled, since the bootloader is built with entry point 0x81000000, the vectors are at the right place (since 0x81000000 is an alias for 0x0)
When mmu is enabled, the vectors are not at 0x0 (because 0x0 points to uninitialized SDRAM) and we need to copy them.

Once we load rockbox in SDRAM, the new vectors (for loaded rockbox, not the bootloader) are in place !

This is why in the current version of rockbox (without this patch) we reserve room at the very beginning of IRAM to copy the vectors from SDRAM

About your problem : 0x0 is the reset vector, and branch directly to "newstart" (which should be at offset 0x44), so please triple check (with a disassembler on rockbox.elf for example) !
Comment by Rafaël Carré (funman) - Saturday, 25 April 2009, 14:31 GMT
Jack : I saw your message on IRC log:
"well I think the empty space we reserve in iram in current svn to copy the vectors to is getting copied in front of tne main firmware we load at 0x0 with the mmu patch."

You are mixing things:
1/ We reserve space in iram in the current (svn) rockbox because the vectors need to be at 0x0 and the IRAM is mapped at 0x0 : we copy the correct vectors from rockbox (0x30000000) to IRAM (0x0)
2/ The routine which copies the vectors in crt0.S is commented out in the mmu patch, because rockbox is mapped at 0x0 so no need to copy.

If the vectors were incorrect we couldn't boot at all (interrupt handler address would be incorrect and sd, dma wouldn't work at all)
Comment by Jack Halpin (FlynDice) - Saturday, 25 April 2009, 20:34 GMT
Well, this certainly wouldn't be the first time I've drawn an incorrect conclusion..... I was comparing our stuff with the gigabeat fx . I thought perhaps I had found a plausible explanation why jumping to 0x0 results in continuous reboot and jumping to 0x44 gets rockbox to boot up. I guess that would mean I'm jumping directly to newstart: and that works but wouldn't the continuous reset loop for 0x0 also imply that the reset vector was not present at 0x0 or is perhaps not pointing at newstart for some reason? I spent a lot of time yesterday studying up on linker scripts and doing comparisons but I still don't have a good understanding.
Comment by Rafaël Carré (funman) - Saturday, 25 April 2009, 22:20 GMT
You're not the first O:-) By looking each other at our code and keeping trying we'll find the solution sooner or later !

I didn't have this problem of continuous reset and I can't test the exact same code than you since my tree is rather old (~3 weeks)

Perhaps once you have jumped to newstart, display on the lcd the content of the memory from 0x0 to 0x44 and compare with first 0x44 bytes of rockbox.bin
Comment by Rafaël Carré (funman) - Thursday, 30 April 2009, 12:20 GMT
Here is the patch I use to see test_codec shows 36.56MHz needed for realtime mp3 playback (I'm not sure of the bitrate of my test file)

It enables i & d caches, and caching & buffering for IRAM only !
Major unstability, rockbox boots 2 times out of 3, and crashes really fast when playing a file.

Note : sansa_as3525.c, crt0.S & *.lds will probably be out of sync (I have a Clipv2 patch applied) and you'll have to merge them manually.
EDIT: kugel, "&&" is a C AND, for shell it's "-a" ;) (I also moved the compiler declaration after manufacturer has been set in tools/configure)

Also I disabled mmu in the bootloader since it's simpler: else we have to take care when writing rockbox.sansa to 0x30000000 (aliased to 0x0 virtual address) since the vectors will be overwritten.

EDIT2: flyndice : you can't use cpucache_invalidate() in the bootloader (see arm/system-target.h for CPUCACHE_INVALIDATE) , instead use the alias invalidate_idcache()
Comment by Thomas Martitz (kugel.) - Thursday, 30 April 2009, 15:46 GMT
Hm, apparently I've looked at a tutorial for csh (I wasn't aware that it's for csh only) which lists && for comparisons. Thanks for the info.
Comment by Jack Halpin (FlynDice) - Thursday, 30 April 2009, 19:47 GMT
OK, lots of learning here but not a whole lot of success. I do think I've identified a problem with bus speed settings though. I of course would need more study on this but I'm hoping someone else may be able to take advantage of this quicker. This was a helpful typo but a bugger to figure out why.... system-as3525 line 232 or so sets the bus setting to asyncronous. With this setting and cached DRAM I get a white or mirror image lcd once past the bootloader on startup. If on the other hand I change this to fast setting (bic instead of orr) I can boot into the main with mmu operating and cached DRAM. However, playback seems to work but there's no sound, internal sd works but microsd makes it lockup, and I get an empty battery and charging indication which Bertrik mentioned may be a sign that the ascodec communication was screwed up.

As far as a patch goes I've left the remapping aspect alone and tried to concentrate on simply getting an operable mmu with caching set up to work. I have tried to keep the mmu code out of the bootloader , the bus issue threw a small wrench in that but I think the bootloader works either way. I tried to stick to basically svn code and only change what I was working on. I have moved the cache coherency into dma-pl081.c. I've put a #define USE_MMU in as3525.h just below the TTB entries to turn the mmu mode on/off. Time to mow the lawn.....
Comment by Rafaël Carré (funman) - Saturday, 02 May 2009, 13:20 GMT
Good finding !

I have worked a bit on your suggestion:

About the fastbus setting: it means that the CPU frequency is not fclk, but pclk (so, 62MHz in the current configuration = 248/4), this is why it's a bit slowzer.
I tried to use synchronous but there is too much disadvantages : fclk must be higher than, and an integer multiple of pclk (so, not less than 124MHz = 62*2)

Then I tried again asynchronous (no constraint) and got rockbox booting with DRAM & IRAM fully cachable & bufferable !
The difference now is that I changed the PLLA, PCLK & CPU frequencies in clock-target.h : different settings show different statuses (rockbox boots, or I got a black screen), but PCM playback equally fails.

I can have i2c working if I define a clock freq at 1600000HZ (4 times more than the maximum!)
If I set DMA_SYNC to 0 (activate synchronisation logic of DMA) PCM works for 4 or 5 seconds before crashing the whole firmware, and I notice corruption in the menu translation (note test_disk fails)

Perhaps this is why the OF uses dynamic frequencies for the 2 PLLs, CPU, PCLK, and peripherals clocks (i2c, ide, usb, dbop ...)

I think now we have to look in clock-target.h to resolve our problems!
Comment by Thomas Martitz (kugel.) - Saturday, 02 May 2009, 14:10 GMT
I'm wondering whether we can commit parts of this patches, as they're very unhandy to work with. It's out of question that they're going in anyway once we found a solution.
Comment by Jack Halpin (FlynDice) - Sunday, 03 May 2009, 01:06 GMT
funman:

Ha! I spent the past 2 days reading the same stuff as you did I bet.... I found the same thing about the fastbus setting. They describe it slightly differently in the as3525 datasheet and the 920t datasheet but I beleive the bclk referred to in the 920 datasheet is connected to PCLCK for the 3525(actually hclk which I'm assuming is the same freq as PCLK but I can't find a reference.). I was thinking of playing with the frequencies in clock-target.h but then thought I may want to ask a few more questions before turning my player into a handwarmer instead of a music player ;).

- Is there anything I should be especially careful with?
- Is the CLK_DIV macro the preferred method for setting a frequency?
- Is #define CPUFREQ_DEFAULT 24800000 correct or missing 1 0...
- Is there a reason we only use PLLA. Would there be a battery cost to have PLLB up and locked to switch to a lower freq or for that matter we've got clk_main available but is 24Mhz useful?
- I noticed for the gig fx their method of switching from MAX to NORMAL is to simply switch from Asyncronous bus to Fastbus in system-megfx.c

I got called in to fly today and left my cable at home so not much in the way of experimenting got done but I did get to read up a bit. I'll try some of the things that seem safer tonight and see what happens.
Comment by Jack Halpin (FlynDice) - Sunday, 03 May 2009, 01:11 GMT
kugel:

The mmu_bus_xxxxx functions simply change the bus modes. I called them mmu_.... simply because they are set with the same cp15 control register that the rest of the mmu control uses. I think they would apply to all the other similar ARM targets and get rid of some assembler in some .c files. See my comment above about the gig fx using them to change between MAX and NORMAL feq.
Comment by Thomas Martitz (kugel.) - Sunday, 03 May 2009, 02:50 GMT
I'm just asking which parts of this patch(es) are easing up things AND are likely to be in actual SVN in the end, so that we can operate on smaller patches which are more easy to be kept in sync.
Comment by Thomas Martitz (kugel.) - Sunday, 03 May 2009, 04:12 GMT
So, let's rename it to cp15-arm.S. I see nothing wrong with it.
Comment by Jack Halpin (FlynDice) - Sunday, 03 May 2009, 06:13 GMT
While traipsing through the code I came across this snippet in /firmware/export/config.h around line 684.
Any opinion on whether this might be a better/worse solution to our cache coherency issues than the clean/dump/invalidate functions?

/* Attributes to place data in uncached DRAM */
/* These are useful beyond dual-core and ultimately beyond PP since they may
* be used for DMA buffers and such without cache maintenence calls. */
#define NOCACHEBSS_ATTR __attribute__((section(".ncbss"),nocommon))
#define NOCACHEDATA_ATTR __attribute__((section(".ncdata"),nocommon))
Comment by Rafaël Carré (funman) - Sunday, 03 May 2009, 08:20 GMT
@flyndice :
- the only careful thing to check is not to go over the maximal frequencies (I did it several time without breaking my player though it's not recommended)
- CLK_DIV is made to not go over this maximal frequency, but it's not mandatory (I personally find it more convenient)
- CPUFREQ_DEFAULT is correct (24.8MHz) : it's the lowest freq possible (to save power) : just look at frequencies of other model.
- I suppose that using PLLB costs some battery, so I wanted to only use one PLL. Note that the OF only uses PLLB for USB clock (iirc). And no, 24MHz is too low (at least for DBOP on color screens)
- Does the gigabeat fx define PCLK as the NORMAL frequency then ? I think we will use less power if we use a fclk less than 64MHz.

(to your 2nd message)
They could be useful if we always do a DMA transfer on a buffer which has these attributes, but we use a temporary buffer and not the one provided only if the provided buffer is unaligned.
I think we should benchmark the different solutions when we got rockbox operating well.
Or always do the DMA transfer on this aligned, uncached buffer to see if it brings a difference (if it doesn't, it means we are right in our cache maintenance calls).

@kugel:
Perhaps these functions should be marked as inline and put in system-arm.c (no need for a whole new file!)
Comment by Jack Halpin (FlynDice) - Sunday, 03 May 2009, 18:11 GMT
@funman:

The gig fx uses NORMAL/DEFAULT = 99Mhz and MAX = 297 Mhz. I also figured out hclk after reading their datasheet i think. For as3525 HCLK = PCLK, we have no divider. HCLK clocks the AHB bus and PCLK clocks the APB bus, for us they are equal but the gig fx can use a 1/2 divider for APB=AHB/2. I believe they run at FCLK @ 297Mhz, HCLK @ 99Mhz, and PCLK @ 49.5Mhz. When they switch to fastbus they essentially use HCLK as FCLK.
Comment by Thomas Martitz (kugel.) - Sunday, 03 May 2009, 19:01 GMT
May I point you to http://www.rockbox.org/irc/log-20090503#17:00 . F/X apparently doesn't use cpu_set_frequency() at all.
Comment by Jack Halpin (FlynDice) - Sunday, 03 May 2009, 20:10 GMT
Yes I saw that and went looking. I now see they have commented out : /* #define HAVE_ADJUSTABLE_CPU_FREQ */ in config-gigabeat.h... Sorry I'm trying to read it all as fast as I can..... ;) That is what they were trying to do though I think.
Comment by Rafaël Carré (funman) - Wednesday, 06 May 2009, 20:49 GMT
I found in the AS3525 datasheet that in asynchronous clocking, fclk must be higher than pclk.

I tried this on my fuze (highering CPUFREQ_DEFAULT& NORMAL) without changing pclk or plla, and I got rockbox booting with caching/buffering enabled for IRAM & SDRAM.
mpegplayer is very fast (realtime) and codecs are fast as well; but I notice lcd corruption, language file corruption, and test_disk fails (perhaps the disk problem causes the other problems..)
Comment by Jack Halpin (FlynDice) - Thursday, 07 May 2009, 02:05 GMT
Do you still have to bump I2C up to 1600KHz to make that part work? I've been playing with a setup sort of like the way I thought the gigabeat fx was setup with ok results PLLA =248MHz. Divider = 1 for FCLK = 248MHz. I make the input to CGU_PERI = FCLK and divide by 4 for PCLK = 62MHZ.
For MAX I run synchronous bus FCLK=922CLK=248MHz, PCLK=62MHz and for NORMAL/DEFAULT I go to fastbus and make the divider for FCLK=2 which gets me down to 922CLK = PCLK = 31MHz. Without the mmu the ui response is a little sluggish with this setup but everything works normally. test_codec shows 119MHz to decode an mp3. When I enable the mmu I get the ascodec problem right away with the empty battery and charging display. If I change I2C frq to 1600Khz I get normal battery display but radio still doesn't work. test_codec with mmu enabled shows 38.1MHz to decode an mp3. If I also enable round robin caching this comes down to 36 Mhz. Pictureflow fies.... I cannot play music and get sound though yet.
Comment by Rafaël Carré (funman) - Sunday, 10 May 2009, 13:55 GMT
Yes I still have i2c clock bumped to 1600kHz
However radio works fine for me.

Some progress : I now have perfect disk transfers !
I used a buffer for DMA in uncached SDRAM instead of clean_dcache() / dump_dcache() (patch attached)

But I still see deadlocks, especially with PCM but not only.
And also, this patch only works on my Fuze, not on my Clip (rockbox locks at the logo screen, but button thread still works : hold shutdown LCD, and button press show light)

I ran performance tests on my with CPU clock at 124MHz (pclk * 2):

test_fps is 37% faster (28% faster for YUV)
decoding is 266% faster (33.3MHz needed for MP3 128kbps stereo 44.1kHz)
test_disk is faster for create/open/dirscan/delete, but disk transfers are much slower, especially for aligned accesses since we have to memcpy to the uncached DMA buffer.

Note that I changed IDE clock frequency from 90MHz to 66MHz because the Sansa OF uses 66MHz (perhaps this is the cause of slower unaligned transfers ? I forgot to test!)

Also I find it weird that UI becomes much slower when we divide pclk by 2 : the DBOP frequency should not change that much since it's lower than pclk anyway... do you have an idea why?
Comment by Dustin Skoracki (sko) - Sunday, 10 May 2009, 14:41 GMT
On my e250v2 I always get disk_init failed. I have a new clean svn with only this patch applied.
Comment by Bertrik Sikken (bertrik) - Sunday, 10 May 2009, 16:21 GMT
Regarding the weird required i2c frequency: as far as I can tell from the OF, the i2c clock divider for the ascodec needs to be set using CPSR0 (least significant byte) and CPSR1 (most significant byte) and was recently fixed to work like that (previously only the LSB was set). This can no longer be the problem I think.
Maybe we're now too quick now, e.g. the i2c part is not indicating that it is busy yet, making the code conclude that the i2c transfer is already finished? I wonder if we're running into the i2c_busy check at the _start_ of the function now (a panic there should give a quick answer).
Comment by Jack Halpin (FlynDice) - Monday, 11 May 2009, 04:09 GMT
I think I've got a good solution to the cache coherency problem(but not clocking issues unfortunately). I noticed funman mirrored(I hope that's the right term) SDRAM _uncached_ to another location in his last patch and wondered if that just might work for all dma transfers. So I mirrored SDRAM and IRAM and put some code in dma-pl081.c to have dma transfers use the uncached mirrored DRAM & IRAM and it seems to have worked. DO NOT TRY TO APPLY THIS AS A PATCH it is for reference only. I've just included the diff of the 2 files involved in getting this scheme to work. I've been enabling the mmu and doing the mapping in crt0.S but it shouldn't matter if it gets done in system-as3525.c either. I still cannot play music but I can run test_codec, album art gets loaded and displayed, and I can browse files on the disk, and test_disk runs and finishes just fine. I think the clocking issues are the real problem here.
Comment by Jack Halpin (FlynDice) - Monday, 11 May 2009, 13:42 GMT
Whoops, found an error in that last mirror.diff. I neglected to change a couple of dst's to src's, but it still worked imagine that..... Look at this one instead.
Comment by Jack Halpin (FlynDice) - Tuesday, 12 May 2009, 06:06 GMT
@bertrick

"I wonder if we're running into the i2c_busy check at the _start_ of the function now (a panic there should give a quick answer)."

It seems we are indeed running into the i2c_busy check at the _start_ of the function. I inserted a panicf in both the read and write functions and get the panic tripped in the i2c_busy check at the beginning of the write function. Suggestions?
Comment by Rafaël Carré (funman) - Tuesday, 12 May 2009, 08:28 GMT
Regarding i2c, perhaps we need to insert some additional delays (I suppose in i2c_busy) with "nop" or busy loops.
I needed to do the same thing for my Clip to fix button reading : attached patch uses synchronous clocking (a bit faster than async : +10% speed for lcd ops and +1% for decoding) and boots on Clip.

FlynDice : I don't understand how your patch can work:
- if (dst >= 0x30000000 && dst < MEM*0x100000) can not be true
- DMA module needs physical addresses
- If you use uncached addresses you must synchronise the cache first, because there still may be data in the cache for the (cached) memory region : *_dcache() functions do that, and the memcpy() in my patch also.

EDIT:
IMPORTANT : I booted with the instruction cache enabled & data cache disabled, and I noticed that disk was functionning perfectly (although slowly) : so I think now we "only" have trouble with the data cache. However I don't remember if PCM worked perfectly or not :( I will test again.

Keep up everyone, we'll beat it soon ! :)
Comment by Jack Halpin (FlynDice) - Tuesday, 12 May 2009, 13:17 GMT

"I don't understand how your patch can work:"

Ha Ha... me neither now that you point that out, but it does and what can we learn from that? Well, it works as well for me as the version using the clean/invalidate coherency scheme. I included this code in 2 different branches that I built starting at clean svn and they both run about the same, most everything works except pcm & i2c. I don't think I'm mistakenly running a version without this code but I'll doublecheck. That is the SDRAM part that can never be true I wonder if that means something.... Hopefully this is another fortunate mistake.
Comment by Rafaël Carré (funman) - Tuesday, 12 May 2009, 13:38 GMT
I understand that rockbox doesn't do DMA transfers into/from IRAM (plugins/codecs are loaded in SDRAM and then the iram part is memcpy'ed), and that there was no changes for SDRAM transfers => if you remove your diff to dma-pl081.c you should see the same effect.

Now, perhaps you have something else in your full diff, that would explain why test_disk passes : could you post the full diff ?

Oh and one important thing I forgot : I often see disk corruptions on my Fuze & Clip when working on this subject.
For now it's only FAT FS corruption, but who knows if it could lead to destruction of the OF (first blocks of internal storage) ? So .. be careful and do not try this at home!
Comment by Jack Halpin (FlynDice) - Tuesday, 12 May 2009, 19:05 GMT
This is the full .diff that I pulled those 2 sections out of. For grins I tried just removing the dma-pl081.c code I had added in my current working copy and rockbox magically boots just fine - go figure. MMU enabled without any sniff of clean, dump,invalidate dcache but with IRAM & DRAM mapped to another location as CACHE_NONE. Maybe the code I stuck in there isn't doing anything so that's why it runs even with the mistakes. As you will notice the first time I posted it I had even forgotten to change some dst's to src's but it still ran..... Should have been a clue. I'm going to run a few experiments now but I've got to set rockboxing aside for a few days to tend to the day job....
Comment by Jack Halpin (FlynDice) - Tuesday, 12 May 2009, 21:50 GMT
I'm not going to post a patch for this but I found that if I placed a couple of 100 count while()delays in ascodec_read/write just before the i2c_busy check I get i2c working at 400khz setting in clock-target.h. I get 2 secs of sound when I try to play music now. This doesn't get radio working though. I think that will need its own tweak. I'm off for a couple of days now, good luck!
Comment by Rafaël Carré (funman) - Wednesday, 13 May 2009, 08:46 GMT
Well I see no functional difference with my last patch : the code in dma-pl081.c has no effect : if(x > A_LOT && x < A_LITTLE). You forgot to add 0x30000000 to the memory size, but even then that code would be wrong because DMA needs physical addresses. The check is correct for IRAM but there seems to be no direct transfers to/from IRAM.

For i2c perhaps the delay needs to go in i2c_busy() ? Also we should take care of not making it too slow..

For radio, I would increase fm_delay() , sorry I can't help because radio works fine on the Fuze

In my previous post I mentioned data cache : rockbox equally crashes (especially with PCM / mpegplayer) with the data cache disabled.

Also I frequently experiment (on my Clip but not on my Fuze) problems with playback (current song stop but status stays "play", time goes to 0:0, If I stop & resume playbacks continue, else rockbox will crash [backlight / button thread still works because I can see lcd & buttonlight go on/off] ).

So I make the (not based on real data) hypothesis that current SVN code has bugs, and that these bugs just happen faster because of increased performance.

Thanks much FlynDice, and see you soon !
Comment by Thomas Martitz (kugel.) - Wednesday, 13 May 2009, 08:47 GMT
Maybe my work about using #defines for the addresses all the way should be committed, so that this is more unlikely to happen?
Comment by Rafaël Carré (funman) - Wednesday, 13 May 2009, 16:46 GMT
sorry, can you repost the patch again ?
Comment by Rafaël Carré (funman) - Thursday, 14 May 2009, 09:55 GMT
I forgot to mention that I found something interesting in the OF:

pcm (i2sout) code uses uncached addresses in IRAM (alias 0x81xxx), and I think that it's the pcm buffer.
Comment by Rafaël Carré (funman) - Wednesday, 20 May 2009, 11:44 GMT
I tried to use an uncached buffer for PCM dma, but there is no much difference, rockbox crashes fastly anyway. Some threads are still active though (hold still stops backlight, and any button starts it again)
Comment by Rafaël Carré (funman) - Thursday, 21 May 2009, 13:03 GMT
note : when using *_dcache_range() in the sd transfer, we should use it inside the loop because the buffer can change (either the aligned buffer, either the user provided buffer if it's already aligned.

however it still shows corruption (in the wps background). I didn't test md5sum since i disabled write support to not break my filesystem
Comment by Rafaël Carré (funman) - Tuesday, 26 May 2009, 23:17 GMT
synced patch : test_disk fails with using an uncached buffer for ata although i see no wps corruption

since using clean_dcache/dump_dcache shows different results than using an uncached buffer, i suspected a problem in sd driver.
i tested if there was an infinite loop (error set in interrupt service routine INT_NAND() ) but it was not the case
Comment by Rafaël Carré (funman) - Thursday, 28 May 2009, 11:14 GMT
It would be interesting to compare results of using different methods for keeping cache coherent:

- keep the ranges used in DMA transfers coherent (*_dcache_range() )
- invalidate the whole data cache (invalidate_dcache() )
- copying data transferred by DMA to/from a buffer in uncached memory.
Comment by Rafaël Carré (funman) - Friday, 29 May 2009, 23:10 GMT
Here is my last patch, with some parts commented out.

Here are my comparisons of diffs in ata_sd_as3525.c, running on the Fuze:

Using clean_dcache_range/dump_dcache_range
minor wps corruption
mpegplayer crashs at "loading"

Using clean_dcache_range/dump_dcache_range, aligned_buffer 32 bytes aligned
no wps corruption
mpegplayer crashs at "loading"

Using invalidate_dcache
minor wps corruption
mpegplayer crashs at "loading"

Using invalidate_dcache, aligned_buffer 32 bytes aligned
no wps corruption
mpegplayer crashs at "loading"

Do not use any of these functions, aligned_buffer 32 bytes aligned
no partition detected

Use a reduced clock speed (/4, divisor bits = 1)
no partition detected

Use a reduced clock speeed (/2, divisor bits = 0)
sloooooooooooooow boot (I forced it to power off after 30 seconds, was stuck on rockbox logo)

Transfer data to a uncached buffer, 32 bytes aligned
no wps corruption
mpegplayer starts playing, then crash always at the same position for a given video
test_disk (with size set to 1MB) fails 1 time in 4 tests
md5sum : crashed at 308/353 files : everything.md5sum was empty
doom : fluid

Same setup than before, plus
iram not cached : same results
dram not cached : crash very early
iram & dram not cached : slooooooooow, mpegplayer works once, but then rockbox can't open files anymore
data cache disabled, ata_sd not modified : mpegplayer crashes at "loading"

***** data cache disabled :
* a bit faster than SVN (114MHz needed for realtime, against 126.17MHz for SVN)
* test_disk passes (with size 300MB)
* decoding hangs quickly on Fuze, seems to work on Clip
* Sometimes I see a white screen and rockbox stops here.

The patch attached uses this last configuration.
Comment by Jack Halpin (FlynDice) - Saturday, 30 May 2009, 14:17 GMT
I had an idea last night but am not sure I have the capability to implement it but I will toss it out here. Perhaps we could use the cached/uncached mirror concept in the opposite way. What I mean is can we use the physically addressed memory as uncached and use the mirrored copy as cached? That way dma transfers are using physical addresses in uncached memory and the processor can access the cached mirror with virtual addresses for everything else. Does this make any sense at all or am I just ranting again.....?
Comment by Rafaël Carré (funman) - Saturday, 30 May 2009, 14:31 GMT
I think there would be no differences, because we have to map rockbox at the cached address to take benefit from the data cache.
Comment by Jack Halpin (FlynDice) - Monday, 01 June 2009, 04:12 GMT
Here's the patch I've been using with limited success. I've had music playing perfectly for up to 10 minutes at a time with results very similar to the screenshots I posted onthe forums. I was getting both lcd and sound corruption but after incorporating funman's last patch both have completely disappeared. I was using delay loops in ascodec.c but Bertrik's patch has solved that problem. It seems I have finally gotten dump_dcache to work for cache coherence and moved the cache coherence functions into dma-pl081.c. I also added in some "protection" bits that get set as part of DMAC_CH_CONTROL(channel) = control. funman says these are all set to 0 in the OF and I can't see that they make any difference. I have also used the FCLK postdivider to lower the PCLK/DBOP frquency down to 32MHz. I cannot claim that this works well only that I can get it to work for a short period of time on my e280V2. Seems I cannot get radio to work with this patch no matter what I use for fm_delay(). Other than that, as long as you don't attempt to play music , Rockbox seems to run just fine, games and functions like copy from msd to internal mem work great. Running test_codec works normally. test_disk does not pass the write & verify but finishes the first test just fine.

My experience with playing music has been spotty. I can get it to work a couple of ways, one better than the other.
Put current svn on your player and play some music and set any settings you would like to have(lcd, initialize database, etc).
Now put the patched version on your player.
Now don't quit the first time it doesn't work....
If I now try to play a file from the files menu, album art loads and I get about 2 secs of sound before it quits. Instead try to just resume playback from your last install.
Next, don't quit the first time it doesn't work....
If you get it to play for more than 2 seconds by resuming, go to the System Debug page and look at either the buffering thread or View HW info. I always find the frequency maxed out at 192(boosted). It won't play very much longer...
Now the tricky part, while standing on 1 foot place 1 hand in a warm bowl of water and.... er wrong recipe....
Reboot your player. Go into the games section and find a game that has playback control(ie xobox, goban etc.). I may just be superstitious but It seemed more successful if I actually played the game and quit very quickly(I mostly used xobox but goban worked too)). Find playback control, go to pause/play and press, you should getmusic playing here and if you quit and go look at the buffering thread or viewHW page you will see you are indeed running at 192 boosted and 32 unboosted. Watch the buffering screen and after the first or second disk access you will now be locked at 192 until the music stops. I made it to just under 11 minutes 1 time but more likely you'll get 4- 6 minutes worth.

I know it's stretching to call this progress but it helps get a feel for what might be going wrong when you can watch something functioning correctly and see what happens at the moment it breaks.

And let me leave you with the thought, don't quit when it doesn't work the first time!
Comment by Rafaël Carré (funman) - Monday, 01 June 2009, 10:44 GMT
+ CGU_PROC |= (1<< 5); /* FCLK /2 */
+

This will have no effect since we are using fastbus, and fclk is not used (pclk is used as input for the CPU frequency)

Also *_dcache() functions are useless in SD transfers: since we use an uncached buffer, the "uncached_buffer" memory region has no data in the cache.
In fact this can cause trouble if the "aligned_buffer" memory region has data in cache, clean_dcache_range() can write wrong data in the buffer prior to a write transfer.

Also we can't rely on playback working if test_disk doesn't pass : the codec used could be corrupted and wrong code executed.
Comment by Jack Halpin (FlynDice) - Monday, 01 June 2009, 13:34 GMT
"+ CGU_PROC |= (1<< 5); /* FCLK /2 */
+
This will have no effect since we are using fastbus, and fclk is not used (pclk is used as input for the CPU frequency)"

Yes this _does_ have the desired effect of dividing FCLK by 2. The input to the DRAM and PCLK dividers is FCLK except for asynchronous bus. So yes, PCLK is used for fastbus but I have divided the source for PCLK by 2. This also means that all the things that take their clock input from pclk are running at half speed. I did tell the software I did this though so it showed the correct frequencies on the buffering thread page. It works as designed- I have watched the frequencies changing on the viewHW page while it was operating. I tried simply using the DRAM divider to get down to 32MHz but for some reason I could not get playback to work for more than 2 seconds no matter which way I tried.



Comment by Rafaël Carré (funman) - Monday, 01 June 2009, 15:04 GMT
Oh sorry, stupid me, this is the second time you tell me that :/
Comment by Jack Halpin (FlynDice) - Monday, 01 June 2009, 15:33 GMT
You are wise to doubt me though and I encourage it! Please cherry pick whatever might work better and ignore those areas where I may have some " misconceptions". I take none of this personally. I find the acrobatics needed to make this patch function are rather silly but it's the only way I've been able to make playback work for more than 2 seconds with the mmu & dcache enabled.
Comment by Rafaël Carré (funman) - Monday, 01 June 2009, 20:42 GMT
Updates my last patch:
Actually uncomment the use of an uncached buffer inSD transfers.
Map all memory regions to themselves, uncached, spotted by bertrik (big up to him!)
Set CCU_SCON to 1 (dma has the highest priority)

test_disk passes on Fuze & Clipv1.
Playback seems just fine on Fuze & Clipv1 (to be tested extensively however)
Comment by Dustin Skoracki (sko) - Monday, 01 June 2009, 22:33 GMT
I get only a blank white screen after the bootloader on my e250v2 (r21164 with the last patch applied).
Comment by Jack Halpin (FlynDice) - Monday, 01 June 2009, 23:56 GMT
Yep, blank white screen here too e280v2
Comment by Thomas Martitz (kugel.) - Tuesday, 02 June 2009, 00:44 GMT
The latest patch works just awesome here on my fuze.

*No white screen across several reboots.
*Playback works just fine
*test_codec gives 650% real time for mp3 (38.16MHz needed)
*test_disk write test passes
*test_fps is faster for unboosted (but slower for boosted!? need to recheck)

*Pictureflow runs flawlessly with playback in background and decent fps (~30).
Comment by Michael Chicoine (mc2739) - Tuesday, 02 June 2009, 01:00 GMT
tested on r21151

I also get the white screen after the bootloader on my e260v2, but if I wait until after the backlight fades out, the display comes on like normal. There are still some problems with intermittent display corruption, but it will clear up at the next screen update. Also, the FM Radio works with this patch.
Comment by Jack Halpin (FlynDice) - Tuesday, 02 June 2009, 02:53 GMT
r21165 I get similar results to Michael but no rockbox time right now. Lots of rockbox time tomorrow though so If you have any special tests in particular you think need to be done speak up.
Comment by MichaelGiacomelli (saratoga) - Tuesday, 02 June 2009, 03:45 GMT
I tested the funman's latest patch:

Fuze: played for 30 minutes without issue and good performance in test_codec
Clip: plays but audio deadlocks (clip itself remains responsive however)
Comment by Jack Halpin (FlynDice) - Tuesday, 02 June 2009, 05:04 GMT
Changing PCLK to 31MHZ seems to clear up the LCD issues and actually some intermittant MSD issues I was having on my e280v2. The real testing begins in the morning ...
Comment by Rafaël Carré (funman) - Tuesday, 02 June 2009, 07:42 GMT
@saratoga : My Clip deadlocks (completely) when going into the database menu or resuming playback, perhaps it's the same bug present in SVN we are viewing (audio deadlock)

@FlynDice : cool ! If we have decent performances for LCD perhaps we can use a slower PCLK (battery_bench running on my Fuze to see if better performance == less boosting == more battery life)
Comment by Rafaël Carré (funman) - Tuesday, 02 June 2009, 22:13 GMT
setting CCU_SCON to 1 makes the big difference : I believe it gives DMA transfers higher priority and let them complete successfully (see AS3525 datasheet for description of this register)

I now use cache coherency functions in ata (faster than using an uncached buffer), but I have to keep aligned_buffer aligned on cache line, else I see wps corruption (I find that weird..)

Also add HAVE_PCM_DMA_ADDRESS for keyclick to work properly

FlynDice I wait for your patch update for e200v2 ;) Have a look at what kugel said : http://www.rockbox.org/irc/log-20090602#19:46:11

TODO: test on e200v2, find what's wrong with the Clip, test on m200v4 if possible (to see if the Clip's problem is also present on m200v4), enable MMU in the bootloader so we can map DRAM to 0 and IRAM next to it, and then remove -mlong-calls

EDIT: patch attached :)
EDIT2: also remove SHAREDBSS_ATTR since we don't have a coprocessor (not related to this task though ..)
EDIT3: my Clip seems to behave fine now .. to test in the long run
EDIT4: mpegplayer now deadlocks on "Loading" screen ..
Comment by Thomas Martitz (kugel.) - Tuesday, 02 June 2009, 23:11 GMT
What's the need for remapping dram in the bootloader?
Comment by Rafaël Carré (funman) - Tuesday, 02 June 2009, 23:22 GMT
We must link rockbox.sansa to use 0x0 base address, and then the MMU must be active when entering rockbox.

So, the MMU should be activated in the bootloader just prior to executing the loaded rockbox (it doesn't need to be active _before_, while the bootloader is executing)
Comment by Thomas Martitz (kugel.) - Tuesday, 02 June 2009, 23:29 GMT
The mapping could also happen in crt0.S of the main binary, couldn't it?

EDIT: Also, I don't think we need to map it to 0x0. We can also let it there and move the IRAM behind it. I tried that with success.
Comment by Rafaël Carré (funman) - Tuesday, 02 June 2009, 23:34 GMT
No it can't, because we need to call some functions, and they wouldn't be linked at the correct address.

We could map the IRAM behind it, and also map the DRAM to 0x0 to have the vectors available : that should work, thanks for that (I didn't think that we can map the same region at different virtual addresses)
Comment by Michael Chicoine (mc2739) - Wednesday, 03 June 2009, 01:28 GMT
Tested latest patch on e260v2 with r21171

No display problems.
Music playback has intermittent audible glitches.
FM Radio locks up before entering fm screen (same as clean svn).
Mpegplayer locks up on "Loading" splash.
Comment by MichaelGiacomelli (saratoga) - Wednesday, 03 June 2009, 01:51 GMT
My clip played for about 40 minutes with the latest patch before deadlocking as before.
Comment by Dustin Skoracki (sko) - Wednesday, 03 June 2009, 09:56 GMT
I tried this patch on my e250v2 (r21175) together with  FS#10270  to prevent the white screen:
+ nearly no boosting (0.9%) on playing mp3 files (320 kBit/s), boosts only on memory access
+ test_codec said 670.29 % realtime, 36.99 MHz needed (test file was 128 kBit/s mp3)
- mpegplayer and radio not working as Michael wrote already
Comment by Rafaël Carré (funman) - Wednesday, 03 June 2009, 11:15 GMT
Remove use of *_dcache_range() functions in sd driver and memcpy to/from uncached buffer

This lets mpegplayer run fine, but now I see backlight changing while playing movies.. do you have the same effect on e200v2?
Comment by Dustin Skoracki (sko) - Wednesday, 03 June 2009, 11:35 GMT
Tried it with clean svn r21176 and only this patch applied: mpegplayer is working, no backlight changes... well it's working pretty perfect for me. But radio is still not working.
Comment by Thomas Martitz (kugel.) - Wednesday, 03 June 2009, 11:42 GMT
Yes, that's because the delay gets too low in button_read_dbop() .the hold button gets read wrongly and is switched on hence the backlight wants to turn off.
Comment by Rafaël Carré (funman) - Wednesday, 03 June 2009, 15:06 GMT
I still get the problem on my fuze with r21182
Comment by Thomas Martitz (kugel.) - Wednesday, 03 June 2009, 16:07 GMT
Because r21182 only changed e200v2? :) Change button-fuze.c:button_delay() to count down from 50 or so and the problem should get better.
Comment by Jack Halpin (FlynDice) - Wednesday, 03 June 2009, 17:48 GMT
Same results for me as sko reported. I found an interesting thing though while changing frequencies. With PCLK/DBOP at 31 I cannot get to the main menu with the lcd dbop delay in place, however if I comment it out I can boot fine with radio and microsd working
Comment by Rafaël Carré (funman) - Wednesday, 03 June 2009, 18:51 GMT
kugel : The backlight doesn't change with this change, that fixes the problem.

Did you get this delay from the OF or is it a "hand-made" ?
Comment by Thomas Martitz (kugel.) - Wednesday, 03 June 2009, 18:55 GMT
The OF uses 64-- AFAIK, I tried to minimize it a bit. I always hat the feeling this delay has impact on the responsiveness but that might just be me.
Comment by Jack Halpin (FlynDice) - Thursday, 04 June 2009, 15:58 GMT
Funman:

I think your discovery of "Set CCU_SCON to 1 (dma has the highest priority)" has made the use of uncached buffers for dma unneccessary. I noticed that I was getting an average of about 70 MHz in the buffering thread while playing mp3 for just about all frequency combinations I tried. I know I was seeing numbers more down in the 45-46 range when I was working on my version last weekend so I decided to just drop this line into the 192_32_32_dcache.patch and give it a try. Sure enough this one change alone makes this patch work also. I am not seeing any of the disk error messages getting tossed at me and test_disk passes the write & verify test. I have yet to be able to crash it yet. Mpegplayer runs fine also. The buffering thread shows an average of 45-46 MHz while playing mp3 vs 70 MHz. I believe the difference is that the ams-caching.diff patch uses memcopy to the uncached buffer while the 192_32_32_dcache.patch uses dma with cached memory and cache coherence functions. I could of course be mistaken and not understand this correctly. The patch still applies to current svn, would someone else try this and see what results they get. I'm not trying to lobby for one patch over another here, this is more along the lines of which method of disk access will give us better results, dma with cache coherence functions or memcopy to an uncached buffer. At least that's what I think the choices are . Feel free to correct my misunderstandings!

Comment by Rafaël Carré (funman) - Thursday, 04 June 2009, 16:17 GMT
FlynDice, 192_32_32_dcache.patch uses an uncached buffer in ata_sd-as3525.c :
+ unsigned char* uncached_buffer = UNCACHED_ADDR(aligned_buffer);

The only change between the 2 last patches is writing the data to/from uncached memory instead of using cache coherency functions in SD DMA transfers.
The consequences (that I can't explain) : mpegplayer doesn't crash, and music playback is perfect.

Also you should not take the cpu frequency indication in buffering thread as precise mesure, but only as an indicative one

The algorithm will see if the CPU is boosted or not at each tick, so if the CPU is boosted in one tick, unboosted just after the tick has happened, boosted again just before the next tick happens, the algorithm will think the CPU was boosted during the whole tick.
test_codec is more precise : it calculates how much time we need to decode a given time of audio.
Comment by Jack Halpin (FlynDice) - Thursday, 04 June 2009, 17:30 GMT
Well with test_codec I get 36.25 Mhz needed for realtime. If I add the LCD fifo patch it gets even better at 36.16 and the buffering thread averages about 41-42. I'm the first to admit I don't completely understand the tests I'm running here but we're not talking 2-3 MHz (on the buffering thread) we're looking at an almost 35% reduction. I thought I understood why but maybe not given your response!
Comment by Rafaël Carré (funman) - Thursday, 04 June 2009, 17:58 GMT
Well the real improvement here is 36.25 MHz -> 36.16MHz, so an improvement of 0.2%
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 01:38 GMT
- higher delay in button-fuze.c to fix backlight problem while playing mpeg.
- map the IRAM just behind DRAM to remove mlong-calls option from gcc and reduce binsize (approx. 10% smaller binaries)
- keep DRAM virtual address the same than physical, so MMU setup can happen in loaded rockbox, and the bootloader can continue to function without MMU.
- map the 1st MB of DRAM at 0x0 to avoid to copy the vectors
- tested bootloaders on Fuze & Clip

TODO:
- Fix Clip down button stopping to respond sometime.
- Fix e200v2 radio (radio is fine on my Clip & Fuze)
- Understand why we can't use cache coherency functions in SD transfers, else mpegplayer locks up (and clean ata-sd_as3525.c)

Did I miss anything ?
Comment by Thomas Martitz (kugel.) - Friday, 05 June 2009, 05:57 GMT
Works good here (after a bootloader update).

658% real time for 128kbit MP3. Nice :)

EDIT: Booting seems a tad bit slower, or is that me?
Comment by Dustin Skoracki (sko) - Friday, 05 June 2009, 11:04 GMT
Works good on my e250v2 too.

596.71 % realtime for 320 kBit/s mp3 (found one on my player which is short enough to be not to large for test_codec)
629.31 % for 128 kBit/s mp3

@kugel: hmm... I can't notice longer boot time, seems to be as before.
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 12:14 GMT
Yes booting is a bit longer on my Fuze (well maybe 0.5s longer), maybe because of the increased button delay. I didn't bissect to find what was slowing down.

Also bootloader update is optional, will work fine with previous bootloaders (mapping is made in crt0.s - calling memory_init - and DRAM is mapped at its physical address, with IRAM just behind it)

Also I found that in the solitaire game, if i scroll the wheel quite fast, the cursor will not move. If I do the same fast scrolling in the rocks->games menu, the cursor will move.
I never had seen that behaviour in solitaire : is that something wanted and we can ignore it; or is there a problem with the wheel driver?
Comment by Thomas Martitz (kugel.) - Friday, 05 June 2009, 12:18 GMT
I meant to say: It didn't boot until I updated the bootloader.
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 12:27 GMT
That's weird .. what was the effect with the older bootloader ?
Comment by Thomas Martitz (kugel.) - Friday, 05 June 2009, 12:28 GMT
It didn't boot. I can't recall the version though. I think I updated the bootloader the last time when I received the Fuze.
Comment by Dustin Skoracki (sko) - Friday, 05 June 2009, 12:30 GMT
I can't reproduce the weird wheel behavior in solitaire, works fine on my e250v2, cursor moves even if I turn the wheel as fast as I can.
Comment by Jack Halpin (FlynDice) - Friday, 05 June 2009, 13:33 GMT
I also had to update my bootloader but after that it works fine. I'm still having problems with microsd at PCLK =62 or 64 but that happens with svn also. I'm trying to tweak the delays but haven't gotten anything reliable yet. Is this a problem with the fuze? Solitaire scrolling works fine.
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 14:15 GMT
What was the symptom when you hadn't updated yet your bootloader: white screen? stuck on loaded rockbox logo display? stuck on bootloader logo display ?

@flyndice : microsd works fine here on fuze. what is your problem exactly? have you tried several different cards?
Comment by Thomas Martitz (kugel.) - Friday, 05 June 2009, 14:18 GMT
stuck on the rockbox (not bootloader) logo.

The solitaire problem can be fixed, I already located it.
Comment by Dustin Skoracki (sko) - Friday, 05 June 2009, 17:37 GMT
MicroSD works fine on my e250v2, but I don't have a microSDHC card. I'll try to get one and test it.
Comment by Jack Halpin (FlynDice) - Friday, 05 June 2009, 18:00 GMT
@funman:

My bootloader experience was the same as kugel's.

As far as microsd goes, I only have 1 microsd and no access to others(besides newegg .....). It's an 8GB transcend with a circled 6 on it but I don't know if that means class 6 or something else. With PCLK > 32 MHZ (ie 62,64) I cannot boot rockbox with the microsd installed. If I wait until rockbox boots and then install it I am fine until I select the files menu then the player locks. If I start playing music and then install the card It works ok _sometimes_ . It seems to be ok as long as it gets some time with PCLK at 31 or 32. In some of the clocking schemes I have tried PCLK shifts between say 31 & 62 during boosting and the card works just fine. But when PCLK is a constant 62 or 64 I have problems. I am thinking that the microsd "tuning" was done when we were running at 31MHz most of the time and that tweaking the delays may help for use with the higher PCLK.



I have been testing clocking schemes and think I have a found the source of my confusion yesterday although I'm not sure what to believe now as far as tests go. It's not the sd access but instead the clocking scheme that makes the difference I think. Of course I welcome all arguments against this so hit me with your best shot and enlighten me ;P!

As far as tests go now I'm not sure what to believe. I ran test_codec on a file using a 32MHz/64MHz fastbus only clocking scheme. Music played fine as expected but the results from test_codec were a bit puzzling. It told me that I needed 97.27 MHz for realtime and that decoding happened at 197.38% realtime. So, I tried playing music and watching the buffering thread: 30% boost rate at 41MHz, which for lack of a more scientific measure, just felt about right watching the alternating boosted/unboosted states. I then went back and figured out that 97.27 * 1.9738 = 191.99. Close enough to my 192MHz FCLK freq to not be a coincidence I think. I'm trying to read the test_codec code right now to understand how it comes up with it's numbers and the buffering thread code is next. Unless someone who has already done this can explain it to me sooner.
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 18:44 GMT
For the microsd, you should add some - custom made or copied from bootloader/common.c - printf() in the sd driver because i have no clue on what's going on. Look if there is not an unrecoverable error (the code would indefinitely retry).

test_codec works this way:
1/ fully load a file into the audiobuffer
2/ get the length (in seconds) of this file via its metadata
3/ decode this file
4/ Calculate how much time was needed to decode it (in ticks)
5/ speed = (audio duration / needed time) * 100% realtime

And the %dMHz needed for realtime is calculated this way: (CPUFREQ_MAX / speed)MHz needed for realtime
Comment by Jack Halpin (FlynDice) - Friday, 05 June 2009, 19:16 GMT
I will look at the sd thing later tonight.

Thanks for the test_codec explanation, of course I had just finished plowing through it all when the email came..... I tried recompiling with same results. Then make clean & recompile and the world makes sense again, well at least this does. 33.67 MHz needed for realtime, now that I can believe.
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 20:22 GMT
I identified the mpegplayer problem : mpegplayer/disk_buf.c will request a SD transfer into an uncached buffer.
The DMA code will use this buffer address as the physical address, and then the AS3525 deadlocks.

Instead of adding code to handle this, i removed the define PROC_NEEDS_CACHEALIGN. I believe only dual-core targets need to keep shared data out of cache lines, and we only have one core.

In ata_sd-as3525.c i now use {clean,dump}_dcache_range since we can avoid the memcpy() on aligned buffers; this makes transfers *much* faster for big data transfers, and a very few little faster for unaligned and small data transfers.


Now we should see what prevent us from committing this patch:

1/ radio not working on e200v2 ( FS#10267 )
2/ clip down button going mad (can be fixed after commit IMO)
3/ testing to see if we didn't forget some bug

For 1/ It seems that radio can't be made to work with this patch (reading  FS#10267  comments).
For 2/ I hope clip owners agree (kugel & bertrik)
For 3/ Let's go :P
Comment by Matt M (Chesteta) - Friday, 05 June 2009, 22:30 GMT
using FS:10048, FS:10272, FS:10267, r21193 on an e280v2
1: radio does not work (lockup on a blank screen (background and battery, vol and clock still showing): button presses turn on backlight but scroll wheel does not (not sure if since there is nothing to scroll through this is 'normal' or not)

2: wheel light does not turn off (it is configured to never be on) for about 18 seconds after booting into rockbox (i do have the database initialized so it could be scanning for changes?); when pressing select button the light flashes however the flash does not occur in the music player screen, seems to only be in menus (pressing other buttons does not cause the flash). When changing songs the light flashes for 1-2 sec, and intermittently during playback; I think it happens when reading from the disk?

3: when booting rockbox there are blue lines visible, maybe 1 pix vertical by a row approx 10 px blue, then ~4-5 px 'normal' and then another ~2-3 blue, in a row: the lines TEND to be in the lower 1/3 of the screen but when changing songs they will quickly appear higher up on the screen *note: the lines or 'bars' flash seemingly randomly across the screen staying in one place for ~1/3 of a sec (then 'jumping'), maybe 1-3 of them at a time (seems to be when 'doing stuff' more cpu intensive, ex changing songs or initially buffering a recently changed song)... they

test_disk WRITE&VERIFY
CPU clock: 62 Mhz: Passed
CPU clock: 248 Mhz: Passed *Noticed blue lines 'evenly' on entire screen (not bottom 1/3) while doing test... it appears when accessing disk boosted they are all over, and when processing/buffering they are just on bottom 1/3

MP3 (I am using 320k mp3's) seems to work great, have tried parametric eq which also works
Comment by Hilton Shumway (HIllshum) - Friday, 05 June 2009, 22:35 GMT
Yes, the wheel light is on disk access, we know about it but don't know why (and it is a useful debug "feature" anyway...)
Comment by Rafaël Carré (funman) - Friday, 05 June 2009, 22:49 GMT
chesteta : please make sure that the problems you see don't come from another patch (like fs#10272 !)

These other patches can be refined once this one (data cache / mmu) has been committed.
Comment by Matt M (Chesteta) - Friday, 05 June 2009, 22:52 GMT
ok, I will try and test the other patches seperately too
Comment by Jack Halpin (FlynDice) - Saturday, 06 June 2009, 00:47 GMT
Disregard my microsd comments from earlier. I was getting very inconsistent results while troubleshooting so I tried reformatting the card and it works just fine now at the present settings.
Comment by Michael Chicoine (mc2739) - Saturday, 06 June 2009, 23:16 GMT
tested with r21200 and only  FS#10048 

While playing flac, I am seeing it frequently skip to the next track during playback.

Watching the "buffering" debug screen, the skip happens when the usefl buffer empties and is refilled. Sometimes, the usefl buffer fills and immediately empties and refills a second time. At this point, it skips to the next track.
Comment by Rafaël Carré (funman) - Saturday, 06 June 2009, 23:30 GMT
Can you reproduce the same without the patch?

Also I wonder if the sudden empty and refill means that there was an error reading the file (corrupted file/fs)

Does it always happen on the same track/location or randomly ?
Comment by Michael Chicoine (mc2739) - Saturday, 06 June 2009, 23:35 GMT
Plays fine without this patch

Tested fs - no errors found

Does not fail at same track/location - replaying the track may result in track playing completely.

The failure happens at random. It may play 2 tracks properly or it may fail on 2 tracks in a row.
Comment by Jack Halpin (FlynDice) - Sunday, 07 June 2009, 05:23 GMT
I can confirm this on my e280v2 also. You can watch usefl: empty and see the wheellight flash on the disk access to refill. Every once in awhile alloc, real, and usefl will all suddenly drop to empty and then refill to full. When they get to full the current song quits and play skips to the next song as if for some reason the player dumped the rest of the playing song and advanced to the next. I could not notice a consistent pattern of failure that I could point to, different songs, different points in the songs, various levels of usefl. MP3 and ogg did not exhibit this behavior, only flac.
Comment by Rafaël Carré (funman) - Sunday, 07 June 2009, 08:17 GMT
I can reproduce track skips on my Fuze, I'll try playing some flacs without this patch to see if the bug also happens but less frequently.

Did you have the same skips with the penultimate patch?

I can't think of a possible reason for this :/

Comment by Rafaël Carré (funman) - Sunday, 07 June 2009, 15:10 GMT
I have some infos: here is some logf output from flac decoder:
FLAC: Frame 24, error -41
FLAC: Frame 852, error -209

Looking the flac codec and thinking a bit this can only mean one thing : corrupted flac.
If the flac is correct, and file system correct as well => corruption very likely happens during transfer from storage

I'll try the penultimate patch which used an uncached buffer for storage transfers, and also run tons of test_disk with both patches.
Comment by Rene Peinthor (rp) - Sunday, 07 June 2009, 17:42 GMT
I just installed(bootloader & rockbox) the latest patch on an e200v2 8gb.

And it seems there is a bug with disk access, because i can't access anything, if look the the File menu there are only weird file/directory names...
no loading of plugins and no playing of music because the codecs can't be loaded.
Comment by Jack Halpin (FlynDice) - Sunday, 07 June 2009, 17:56 GMT
@ Chesteta: use patch -p1 for funman patches. I just applied it to r21201 just fine.
Comment by Bertrik Sikken (bertrik) - Sunday, 07 June 2009, 19:41 GMT
funman, the patch from friday 05 jun 2:38 work normally on my clip, the patch from friday 05 jun 21:22 gives me data aborts at address 0x3XXXXXXX (even after make veryclean, make reconf, deleted .rockbox on the player)
Comment by Rafaël Carré (funman) - Sunday, 07 June 2009, 22:44 GMT
What "broke" this last patch is r21195 and its fix r21196, while I had tested it with r21194 : the exact reason is still unknown to me (tagcache setup speedup?).

Using newer revisions : rockbox is stuck on the logo screen.

Something else : I didn't look at the exact meaning of the MCI_DATA_TIMER register, I'll read the pl180 datasheet again to calculate a reasonable delay for timeout, perhaps we will detect earlier invalid transfers and retry.

Something if you test this patch : regularly run scandisk / dosfsck to verify the internal storage and microsd, because rockbox could have corrupted the filesystem.
Comment by Rafaël Carré (funman) - Monday, 08 June 2009, 15:16 GMT
* Always use a buffer aligned on cache line size, uncached, for SD transfers : this avoids modifying unrelated memory in SD transfers
* MCI_DATA_TIMER is set to the max, since I don't know how to calculate efficient timeout delays for transfers (in MCICLK cycles)
- the fixed timeout delay for commands is 64 cycles
*  FS#10296  applied to help understanding potential data aborts

I suppose r21195 changed the code, and modified alignement of some critical parts. Using my (booting) clip and seeing the data aborts, i noticed memory corruption, so very likely something wrong with caches.

r21215 + this patch boots fine on Clip & Fuze, test_disk passes, flac play fine without any problems (so far).

Since the blocking issues (e200v2 radio / clip button) are being worked on, I wish this patch gets committed ASAP so we can all use the same working base.

If you can still see data aborts / or data corruption (or related, like the flac problem mc2739 mentioned) please tell.
Comment by Rene Peinthor (rp) - Monday, 08 June 2009, 16:43 GMT
tried the new patch again with e200v2 8gb.

same problem, but it slightly improved, it seems file access works for the first few seconds, and as I access the file menu and switch to a subfolder, the same cryptic file/directory names appear.
Comment by Thomas Martitz (kugel.) - Monday, 08 June 2009, 16:48 GMT
Check your filesystem, I don't have this problem here.
Comment by Rene Peinthor (rp) - Monday, 08 June 2009, 16:58 GMT
I don't think its a filesystem problem, because with the svn version of rockbox, everything works.
But I might be wrong.
Comment by Rafaël Carré (funman) - Monday, 08 June 2009, 17:02 GMT
Please check the filesystem anyway (scandisk/fsck.vfat/dosfsck), previous versions of this patch could very well have corrupted it.
Comment by Rene Peinthor (rp) - Monday, 08 June 2009, 19:03 GMT
sorry, you are right!

reformatted the player now everything works, thank you!
Comment by Rafaël Carré (funman) - Monday, 08 June 2009, 21:19 GMT
Now i'm lost:
with r21226 my fuze refuses to boot (stuck at logo screen)

with r21226 and diff to system-arm.c reverted it boots
with r21226 and r21225 reverted it boots

Since the modified code in system-arm.c is never called (it's the data abort handler) I suppose that the problem comes from changed alignement.

The exact problem remaining unknown, the patch can not be committed since it would break at the next random commit.

Suggestions welcome, and please also tell if you can see the same behaviour on another model (my Clip boots and runs fine currently)
Comment by Rafaël Carré (funman) - Monday, 08 June 2009, 21:40 GMT
update: after a reformat my fuze boots, but i am still unsure if my previous comment showed a real problem or not.

Loading...