This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#10048 - AMSSansa MMU and Dcache patch
Attached to Project:
Rockbox
Opened by Jack Halpin (FlynDice) - Monday, 23 March 2009, 19:17 GMT+2
Last edited by Rafaël Carré (funman) - Tuesday, 09 June 2009, 01:07 GMT+2
Opened by Jack Halpin (FlynDice) - Monday, 23 March 2009, 19:17 GMT+2
Last edited by Rafaël Carré (funman) - Tuesday, 09 June 2009, 01:07 GMT+2
|
DetailsI believe this patch gets the mmu operating on my e280v2 however in doing so it makes the main program fail when it tries to mount disks and cannot. I get the message No partition found. Insert USB cable and fix it(main.c line 493). I am not a programmer so laugh at the code all you want and I wont be offended, I just hope it can be helpful. There's also some code for the HW debug page in debug-as325.c included. I have just mimicked the OF for the most part and looked at the other ARM targets for placement.
|
This task depends upon
Closed by Rafaël Carré (funman)
Tuesday, 09 June 2009, 01:07 GMT+2
Reason for closing: Accepted
Additional comments about closing: Even if further problems arise they'll be easier to identify and fix with a wider testing audience
Tuesday, 09 June 2009, 01:07 GMT+2
Reason for closing: Accepted
Additional comments about closing: Even if further problems arise they'll be easier to identify and fix with a wider testing audience
It seems you reinvent the wheel at some points. I definitely think it can all be done using existing code. It works on the gigabeats too, which have nearly the same cpu.
Getting it to work as more priority for now.
I think it's not running. Ogg performs worse than before, pictureflow is approx. 1/3 as fast (having 5fps now, >15 before).
The imx31 seems to clean the dcache rather than invalidating (the former one flushes it too, afaik, i.e. write content to ram).
On a side note: If/When we get this right, we should try to use it actually. Such as mapping the ram nearer to IRAM, so that we can save long-calls. And cache more ram too, 1MB seems a bit low (that doesn't even cover the whole Rockbox ram usage). I'm not sure if the audio buffer should be cached though.
That you're only caching 1MB out of 8MB might actually be a problem, at least I can imagine that.
Edit: Here's your patch without unrelated & whitespaces changes. Please avoid those (tell your editor to not fix trailing spaces, for example), as it makes it hard to see what the patch actually does.
/boot.link:8 nonconstant expression for length
entering the value causes no boot (screen, buttons remains dark)
If a DMA transfer is memory->peripheral, clean_dcache_range is what is needed beforehand. If DMA transfer is peripheral->memory, then use dump_dcache_range. It writes back the dangling ends if any but not what's in between (though peripheral->memory really needs special cache-alignment handling to be completely safe).
* bic r0, r0, #0x41 /* disable mmu & dcache */ => it should be #5 (4 | 1)
* The code to enable mmu should not be enabled in the boot loader (sdram not initialized yet)
* The mmu definitely works (I mirrored SDRAM on another address & tested)
Michael thanks for giving us your expertize ;)
The arm922t datasheet is a bit unclear, I think it requires prior knowledge to understand how caching works.
For now I do not declare the RAM regions as cached or buffered, because else the loaded rockbox won't run.
* immediate black screen on Clip
* immediate white screen on Fuze
I also enable the MMU just before jumping to loaded rockbox
Tested on Clip & Fuze (perhaps the patch doesn't apply without
FS#10092)test_disk still fails, Fuze backdrop is still corrupted, md5sum still succeeds!
(To run md5sum you need to replace the quit variable by false, else it will get corrupted and the plugin will exit early)
Problem happen on the Clip : when playing music : "error on DMA channel 0" (channel 0 is dedicated to ATA, not PCM?)
To enable caching:
- comment i/d cache disabling in bootloader/sansa_as3525.c:97
- uncomment dcache cleaning/dumping in ata_sd_as3525.c:714
- s/CACHE_NONE/CACHE_ALL/ in system-as3525.c:268
Now a question : can the DMA peripheral use virtual addresses?
Section 2.4 of PL081 doc only says linked lists must be in flat mapped memory area and we don't use linked lists (LLI)
My tests had mixed results, so I used physical addresses (also valid as virtual addresses since they are flat mapped)
Note that the problems we see (test_disk failing, crashes, bitmap corruption) also happen with icache & dcache disabled.
I started verifying DMA code and found some (not important so far) mistakes, so it might be useful to take a deeper look.
However, the gigabeat F/X remap and cache the whole memory too:
ldr r0, =0x30000000 ;physical address
ldr r1, =0x0 ;virtual address
mov r2, #32 ;memory size
mov r3, #12 ; CACHE_ALL
bl map_section
Edit: BTW, I don't think it's a smart idea to map the RAM to the physical address of the IRAM section, especially not if we plan on using the IRAM.
What is the buffer area you are talking about ? lcd framebuffer?
kugel : when the MMU is enabled, all addresses are virtual so this is not a problem : the gigabeat also remaps SDRAM to 0x0. This also brings the advantage to not have to copy the vectors.
By the way I found why playback failed on the Clip : the IRAM offset (in dma-pl081.c) must be 0x71000000, not 0x80000000.
Now playback works (proving that IRAM access works as well), but it stops every 2 seconds or so and the UI become very irresponsive.
You can test on Clip or Fuze (ogg playback works quite fine on my Fuze).
Note this patch is only for research uses.
I'm not sure why buffers areas shouldn't be cached : once we have written/read from it using DMA, this shouldn't be a problem no ? Do you remember where you read it's not a good idea to buffer/cache the audio/codec buffers?
Make sure to call cpucache_invalidate before jumping from the bootloader to the firmware if you have the MMU and caching enabled.
I've also changed the original iram addresses from 0x81000000 to 0x0 again (I don't know where 0x81000000 comes from, but 0x0 should be correct).
With this patch, rockbox is still noticeably and unbearable slow. I've not worked out what's causing the slowness.
Michael: perhaps this is why we have problems with the LCD displays. (needs testing)
I suppose the framebuffer you speak about is a framebuffer written to the device using DMA ?
Also we only use addresses with DMA peripheral (for SD & PCM transfers), that minimizes the area of code which needs attention.
Else when you enable the MMU, code at 0x0 will be replaced by SDRAM content.
Using the 0x81000000 (flat-mapped) IRAM alias permits to use the same location when mmu is disabled or enabled.
Perhaps the patch works on your Fuze if the loaded rockbox is too small to overwrite the part of main() executing after enable_mmu(), but it's just a luck !
Still utterly slow.
Rafaël, it would be nice if you could apply your local changes to this patch if you made some progress, as this patch gets rid of several hardcoding, which should make further development easier.
#elif CONFIG_CPU==AS3525
#define DRAMORIG DRAM_ORIG
#ifdef AMS_LOWMEM
I spent most of last night trying to merge this with what I had sort of working but to no avail. I've been taking a slightly different direction, removing the mmu code from the bootloader and no rearranging of memory. At the risk of making things confusing I'll post what I've got here to get the info out and see if it can be of any value.
please remove the diffs for test_disk & synchronous/asynchronous clocking from it (unrelated)
+ for(i=0; i< 0x44; i++) /* vectors */
+ ((unsigned long*)0x30000000)[i] = ((unsigned long*)0x81000000)[i];
You will copy 0x44*4 bytes while we only need 0x11*4 = 0x44 (0x40 + 0x4 padding bytes in the middle)
The vectors you have copied are overwritten when you load rockbox into SDRAM, so no need to jump them.
I understand the vectors get overwritten ( I'm not sure I understand why they need to be copied beforehand) but don't they get overwritten with the same vectors? If I jump to 0x0 from the bootloader I go into a continuous reset loop so I figured I was jumping into the reset exception. When I jump to 0x44(end of vectors from rockbox.map/start of crt0.S) rockbox seems to then boot normally. I'll try to back some things out of the patch if I can.
When mmu is disabled, since the bootloader is built with entry point 0x81000000, the vectors are at the right place (since 0x81000000 is an alias for 0x0)
When mmu is enabled, the vectors are not at 0x0 (because 0x0 points to uninitialized SDRAM) and we need to copy them.
Once we load rockbox in SDRAM, the new vectors (for loaded rockbox, not the bootloader) are in place !
This is why in the current version of rockbox (without this patch) we reserve room at the very beginning of IRAM to copy the vectors from SDRAM
About your problem : 0x0 is the reset vector, and branch directly to "newstart" (which should be at offset 0x44), so please triple check (with a disassembler on rockbox.elf for example) !
"well I think the empty space we reserve in iram in current svn to copy the vectors to is getting copied in front of tne main firmware we load at 0x0 with the mmu patch."
You are mixing things:
1/ We reserve space in iram in the current (svn) rockbox because the vectors need to be at 0x0 and the IRAM is mapped at 0x0 : we copy the correct vectors from rockbox (0x30000000) to IRAM (0x0)
2/ The routine which copies the vectors in crt0.S is commented out in the mmu patch, because rockbox is mapped at 0x0 so no need to copy.
If the vectors were incorrect we couldn't boot at all (interrupt handler address would be incorrect and sd, dma wouldn't work at all)
I didn't have this problem of continuous reset and I can't test the exact same code than you since my tree is rather old (~3 weeks)
Perhaps once you have jumped to newstart, display on the lcd the content of the memory from 0x0 to 0x44 and compare with first 0x44 bytes of rockbox.bin
It enables i & d caches, and caching & buffering for IRAM only !
Major unstability, rockbox boots 2 times out of 3, and crashes really fast when playing a file.
Note : sansa_as3525.c, crt0.S & *.lds will probably be out of sync (I have a Clipv2 patch applied) and you'll have to merge them manually.
EDIT: kugel, "&&" is a C AND, for shell it's "-a" ;) (I also moved the compiler declaration after manufacturer has been set in tools/configure)
Also I disabled mmu in the bootloader since it's simpler: else we have to take care when writing rockbox.sansa to 0x30000000 (aliased to 0x0 virtual address) since the vectors will be overwritten.
EDIT2: flyndice : you can't use cpucache_invalidate() in the bootloader (see arm/system-target.h for CPUCACHE_INVALIDATE) , instead use the alias invalidate_idcache()
As far as a patch goes I've left the remapping aspect alone and tried to concentrate on simply getting an operable mmu with caching set up to work. I have tried to keep the mmu code out of the bootloader , the bus issue threw a small wrench in that but I think the bootloader works either way. I tried to stick to basically svn code and only change what I was working on. I have moved the cache coherency into dma-pl081.c. I've put a #define USE_MMU in as3525.h just below the TTB entries to turn the mmu mode on/off. Time to mow the lawn.....
I have worked a bit on your suggestion:
About the fastbus setting: it means that the CPU frequency is not fclk, but pclk (so, 62MHz in the current configuration = 248/4), this is why it's a bit slowzer.
I tried to use synchronous but there is too much disadvantages : fclk must be higher than, and an integer multiple of pclk (so, not less than 124MHz = 62*2)
Then I tried again asynchronous (no constraint) and got rockbox booting with DRAM & IRAM fully cachable & bufferable !
The difference now is that I changed the PLLA, PCLK & CPU frequencies in clock-target.h : different settings show different statuses (rockbox boots, or I got a black screen), but PCM playback equally fails.
I can have i2c working if I define a clock freq at 1600000HZ (4 times more than the maximum!)
If I set DMA_SYNC to 0 (activate synchronisation logic of DMA) PCM works for 4 or 5 seconds before crashing the whole firmware, and I notice corruption in the menu translation (note test_disk fails)
Perhaps this is why the OF uses dynamic frequencies for the 2 PLLs, CPU, PCLK, and peripherals clocks (i2c, ide, usb, dbop ...)
I think now we have to look in clock-target.h to resolve our problems!
Ha! I spent the past 2 days reading the same stuff as you did I bet.... I found the same thing about the fastbus setting. They describe it slightly differently in the as3525 datasheet and the 920t datasheet but I beleive the bclk referred to in the 920 datasheet is connected to PCLCK for the 3525(actually hclk which I'm assuming is the same freq as PCLK but I can't find a reference.). I was thinking of playing with the frequencies in clock-target.h but then thought I may want to ask a few more questions before turning my player into a handwarmer instead of a music player ;).
- Is there anything I should be especially careful with?
- Is the CLK_DIV macro the preferred method for setting a frequency?
- Is #define CPUFREQ_DEFAULT 24800000 correct or missing 1 0...
- Is there a reason we only use PLLA. Would there be a battery cost to have PLLB up and locked to switch to a lower freq or for that matter we've got clk_main available but is 24Mhz useful?
- I noticed for the gig fx their method of switching from MAX to NORMAL is to simply switch from Asyncronous bus to Fastbus in system-megfx.c
I got called in to fly today and left my cable at home so not much in the way of experimenting got done but I did get to read up a bit. I'll try some of the things that seem safer tonight and see what happens.
The mmu_bus_xxxxx functions simply change the bus modes. I called them mmu_.... simply because they are set with the same cp15 control register that the rest of the mmu control uses. I think they would apply to all the other similar ARM targets and get rid of some assembler in some .c files. See my comment above about the gig fx using them to change between MAX and NORMAL feq.
Any opinion on whether this might be a better/worse solution to our cache coherency issues than the clean/dump/invalidate functions?
/* Attributes to place data in uncached DRAM */
/* These are useful beyond dual-core and ultimately beyond PP since they may
* be used for DMA buffers and such without cache maintenence calls. */
#define NOCACHEBSS_ATTR __attribute__((section(".ncbss"),nocommon))
#define NOCACHEDATA_ATTR __attribute__((section(".ncdata"),nocommon))
- the only careful thing to check is not to go over the maximal frequencies (I did it several time without breaking my player though it's not recommended)
- CLK_DIV is made to not go over this maximal frequency, but it's not mandatory (I personally find it more convenient)
- CPUFREQ_DEFAULT is correct (24.8MHz) : it's the lowest freq possible (to save power) : just look at frequencies of other model.
- I suppose that using PLLB costs some battery, so I wanted to only use one PLL. Note that the OF only uses PLLB for USB clock (iirc). And no, 24MHz is too low (at least for DBOP on color screens)
- Does the gigabeat fx define PCLK as the NORMAL frequency then ? I think we will use less power if we use a fclk less than 64MHz.
(to your 2nd message)
They could be useful if we always do a DMA transfer on a buffer which has these attributes, but we use a temporary buffer and not the one provided only if the provided buffer is unaligned.
I think we should benchmark the different solutions when we got rockbox operating well.
Or always do the DMA transfer on this aligned, uncached buffer to see if it brings a difference (if it doesn't, it means we are right in our cache maintenance calls).
@kugel:
Perhaps these functions should be marked as inline and put in system-arm.c (no need for a whole new file!)
The gig fx uses NORMAL/DEFAULT = 99Mhz and MAX = 297 Mhz. I also figured out hclk after reading their datasheet i think. For as3525 HCLK = PCLK, we have no divider. HCLK clocks the AHB bus and PCLK clocks the APB bus, for us they are equal but the gig fx can use a 1/2 divider for APB=AHB/2. I believe they run at FCLK @ 297Mhz, HCLK @ 99Mhz, and PCLK @ 49.5Mhz. When they switch to fastbus they essentially use HCLK as FCLK.
I tried this on my fuze (highering CPUFREQ_DEFAULT& NORMAL) without changing pclk or plla, and I got rockbox booting with caching/buffering enabled for IRAM & SDRAM.
mpegplayer is very fast (realtime) and codecs are fast as well; but I notice lcd corruption, language file corruption, and test_disk fails (perhaps the disk problem causes the other problems..)
For MAX I run synchronous bus FCLK=922CLK=248MHz, PCLK=62MHz and for NORMAL/DEFAULT I go to fastbus and make the divider for FCLK=2 which gets me down to 922CLK = PCLK = 31MHz. Without the mmu the ui response is a little sluggish with this setup but everything works normally. test_codec shows 119MHz to decode an mp3. When I enable the mmu I get the ascodec problem right away with the empty battery and charging display. If I change I2C frq to 1600Khz I get normal battery display but radio still doesn't work. test_codec with mmu enabled shows 38.1MHz to decode an mp3. If I also enable round robin caching this comes down to 36 Mhz. Pictureflow fies.... I cannot play music and get sound though yet.
However radio works fine for me.
Some progress : I now have perfect disk transfers !
I used a buffer for DMA in uncached SDRAM instead of clean_dcache() / dump_dcache() (patch attached)
But I still see deadlocks, especially with PCM but not only.
And also, this patch only works on my Fuze, not on my Clip (rockbox locks at the logo screen, but button thread still works : hold shutdown LCD, and button press show light)
I ran performance tests on my with CPU clock at 124MHz (pclk * 2):
test_fps is 37% faster (28% faster for YUV)
decoding is 266% faster (33.3MHz needed for MP3 128kbps stereo 44.1kHz)
test_disk is faster for create/open/dirscan/delete, but disk transfers are much slower, especially for aligned accesses since we have to memcpy to the uncached DMA buffer.
Note that I changed IDE clock frequency from 90MHz to 66MHz because the Sansa OF uses 66MHz (perhaps this is the cause of slower unaligned transfers ? I forgot to test!)
Also I find it weird that UI becomes much slower when we divide pclk by 2 : the DBOP frequency should not change that much since it's lower than pclk anyway... do you have an idea why?
Maybe we're now too quick now, e.g. the i2c part is not indicating that it is busy yet, making the code conclude that the i2c transfer is already finished? I wonder if we're running into the i2c_busy check at the _start_ of the function now (a panic there should give a quick answer).
"I wonder if we're running into the i2c_busy check at the _start_ of the function now (a panic there should give a quick answer)."
It seems we are indeed running into the i2c_busy check at the _start_ of the function. I inserted a panicf in both the read and write functions and get the panic tripped in the i2c_busy check at the beginning of the write function. Suggestions?
I needed to do the same thing for my Clip to fix button reading : attached patch uses synchronous clocking (a bit faster than async : +10% speed for lcd ops and +1% for decoding) and boots on Clip.
FlynDice : I don't understand how your patch can work:
- if (dst >= 0x30000000 && dst < MEM*0x100000) can not be true
- DMA module needs physical addresses
- If you use uncached addresses you must synchronise the cache first, because there still may be data in the cache for the (cached) memory region : *_dcache() functions do that, and the memcpy() in my patch also.
EDIT:
IMPORTANT : I booted with the instruction cache enabled & data cache disabled, and I noticed that disk was functionning perfectly (although slowly) : so I think now we "only" have trouble with the data cache. However I don't remember if PCM worked perfectly or not :( I will test again.
Keep up everyone, we'll beat it soon ! :)
"I don't understand how your patch can work:"
Ha Ha... me neither now that you point that out, but it does and what can we learn from that? Well, it works as well for me as the version using the clean/invalidate coherency scheme. I included this code in 2 different branches that I built starting at clean svn and they both run about the same, most everything works except pcm & i2c. I don't think I'm mistakenly running a version without this code but I'll doublecheck. That is the SDRAM part that can never be true I wonder if that means something.... Hopefully this is another fortunate mistake.
Now, perhaps you have something else in your full diff, that would explain why test_disk passes : could you post the full diff ?
Oh and one important thing I forgot : I often see disk corruptions on my Fuze & Clip when working on this subject.
For now it's only FAT FS corruption, but who knows if it could lead to destruction of the OF (first blocks of internal storage) ? So .. be careful and do not try this at home!
For i2c perhaps the delay needs to go in i2c_busy() ? Also we should take care of not making it too slow..
For radio, I would increase fm_delay() , sorry I can't help because radio works fine on the Fuze
In my previous post I mentioned data cache : rockbox equally crashes (especially with PCM / mpegplayer) with the data cache disabled.
Also I frequently experiment (on my Clip but not on my Fuze) problems with playback (current song stop but status stays "play", time goes to 0:0, If I stop & resume playbacks continue, else rockbox will crash [backlight / button thread still works because I can see lcd & buttonlight go on/off] ).
So I make the (not based on real data) hypothesis that current SVN code has bugs, and that these bugs just happen faster because of increased performance.
Thanks much FlynDice, and see you soon !
pcm (i2sout) code uses uncached addresses in IRAM (alias 0x81xxx), and I think that it's the pcm buffer.
however it still shows corruption (in the wps background). I didn't test md5sum since i disabled write support to not break my filesystem
since using clean_dcache/dump_dcache shows different results than using an uncached buffer, i suspected a problem in sd driver.
i tested if there was an infinite loop (error set in interrupt service routine INT_NAND() ) but it was not the case
- keep the ranges used in DMA transfers coherent (*_dcache_range() )
- invalidate the whole data cache (invalidate_dcache() )
- copying data transferred by DMA to/from a buffer in uncached memory.
Here are my comparisons of diffs in ata_sd_as3525.c, running on the Fuze:
Using clean_dcache_range/dump_dcache_range
minor wps corruption
mpegplayer crashs at "loading"
Using clean_dcache_range/dump_dcache_range, aligned_buffer 32 bytes aligned
no wps corruption
mpegplayer crashs at "loading"
Using invalidate_dcache
minor wps corruption
mpegplayer crashs at "loading"
Using invalidate_dcache, aligned_buffer 32 bytes aligned
no wps corruption
mpegplayer crashs at "loading"
Do not use any of these functions, aligned_buffer 32 bytes aligned
no partition detected
Use a reduced clock speed (/4, divisor bits = 1)
no partition detected
Use a reduced clock speeed (/2, divisor bits = 0)
sloooooooooooooow boot (I forced it to power off after 30 seconds, was stuck on rockbox logo)
Transfer data to a uncached buffer, 32 bytes aligned
no wps corruption
mpegplayer starts playing, then crash always at the same position for a given video
test_disk (with size set to 1MB) fails 1 time in 4 tests
md5sum : crashed at 308/353 files : everything.md5sum was empty
doom : fluid
Same setup than before, plus
iram not cached : same results
dram not cached : crash very early
iram & dram not cached : slooooooooow, mpegplayer works once, but then rockbox can't open files anymore
data cache disabled, ata_sd not modified : mpegplayer crashes at "loading"
***** data cache disabled :
* a bit faster than SVN (114MHz needed for realtime, against 126.17MHz for SVN)
* test_disk passes (with size 300MB)
* decoding hangs quickly on Fuze, seems to work on Clip
* Sometimes I see a white screen and rockbox stops here.
The patch attached uses this last configuration.
My experience with playing music has been spotty. I can get it to work a couple of ways, one better than the other.
Put current svn on your player and play some music and set any settings you would like to have(lcd, initialize database, etc).
Now put the patched version on your player.
Now don't quit the first time it doesn't work....
If I now try to play a file from the files menu, album art loads and I get about 2 secs of sound before it quits. Instead try to just resume playback from your last install.
Next, don't quit the first time it doesn't work....
If you get it to play for more than 2 seconds by resuming, go to the System Debug page and look at either the buffering thread or View HW info. I always find the frequency maxed out at 192(boosted). It won't play very much longer...
Now the tricky part, while standing on 1 foot place 1 hand in a warm bowl of water and.... er wrong recipe....
Reboot your player. Go into the games section and find a game that has playback control(ie xobox, goban etc.). I may just be superstitious but It seemed more successful if I actually played the game and quit very quickly(I mostly used xobox but goban worked too)). Find playback control, go to pause/play and press, you should getmusic playing here and if you quit and go look at the buffering thread or viewHW page you will see you are indeed running at 192 boosted and 32 unboosted. Watch the buffering screen and after the first or second disk access you will now be locked at 192 until the music stops. I made it to just under 11 minutes 1 time but more likely you'll get 4- 6 minutes worth.
I know it's stretching to call this progress but it helps get a feel for what might be going wrong when you can watch something functioning correctly and see what happens at the moment it breaks.
And let me leave you with the thought, don't quit when it doesn't work the first time!
+
This will have no effect since we are using fastbus, and fclk is not used (pclk is used as input for the CPU frequency)
Also *_dcache() functions are useless in SD transfers: since we use an uncached buffer, the "uncached_buffer" memory region has no data in the cache.
In fact this can cause trouble if the "aligned_buffer" memory region has data in cache, clean_dcache_range() can write wrong data in the buffer prior to a write transfer.
Also we can't rely on playback working if test_disk doesn't pass : the codec used could be corrupted and wrong code executed.
+
This will have no effect since we are using fastbus, and fclk is not used (pclk is used as input for the CPU frequency)"
Yes this _does_ have the desired effect of dividing FCLK by 2. The input to the DRAM and PCLK dividers is FCLK except for asynchronous bus. So yes, PCLK is used for fastbus but I have divided the source for PCLK by 2. This also means that all the things that take their clock input from pclk are running at half speed. I did tell the software I did this though so it showed the correct frequencies on the buffering thread page. It works as designed- I have watched the frequencies changing on the viewHW page while it was operating. I tried simply using the DRAM divider to get down to 32MHz but for some reason I could not get playback to work for more than 2 seconds no matter which way I tried.
Actually uncomment the use of an uncached buffer inSD transfers.
Map all memory regions to themselves, uncached, spotted by bertrik (big up to him!)
Set CCU_SCON to 1 (dma has the highest priority)
test_disk passes on Fuze & Clipv1.
Playback seems just fine on Fuze & Clipv1 (to be tested extensively however)
*No white screen across several reboots.
*Playback works just fine
*test_codec gives 650% real time for mp3 (38.16MHz needed)
*test_disk write test passes
*test_fps is faster for unboosted (but slower for boosted!? need to recheck)
*Pictureflow runs flawlessly with playback in background and decent fps (~30).
I also get the white screen after the bootloader on my e260v2, but if I wait until after the backlight fades out, the display comes on like normal. There are still some problems with intermittent display corruption, but it will clear up at the next screen update. Also, the FM Radio works with this patch.
Fuze: played for 30 minutes without issue and good performance in test_codec
Clip: plays but audio deadlocks (clip itself remains responsive however)
@FlynDice : cool ! If we have decent performances for LCD perhaps we can use a slower PCLK (battery_bench running on my Fuze to see if better performance == less boosting == more battery life)
I now use cache coherency functions in ata (faster than using an uncached buffer), but I have to keep aligned_buffer aligned on cache line, else I see wps corruption (I find that weird..)
Also add HAVE_PCM_DMA_ADDRESS for keyclick to work properly
FlynDice I wait for your patch update for e200v2 ;) Have a look at what kugel said : http://www.rockbox.org/irc/log-20090602#19:46:11
TODO: test on e200v2, find what's wrong with the Clip, test on m200v4 if possible (to see if the Clip's problem is also present on m200v4), enable MMU in the bootloader so we can map DRAM to 0 and IRAM next to it, and then remove -mlong-calls
EDIT: patch attached :)
EDIT2: also remove SHAREDBSS_ATTR since we don't have a coprocessor (not related to this task though ..)
EDIT3: my Clip seems to behave fine now .. to test in the long run
EDIT4: mpegplayer now deadlocks on "Loading" screen ..
So, the MMU should be activated in the bootloader just prior to executing the loaded rockbox (it doesn't need to be active _before_, while the bootloader is executing)
EDIT: Also, I don't think we need to map it to 0x0. We can also let it there and move the IRAM behind it. I tried that with success.
We could map the IRAM behind it, and also map the DRAM to 0x0 to have the vectors available : that should work, thanks for that (I didn't think that we can map the same region at different virtual addresses)
No display problems.
Music playback has intermittent audible glitches.
FM Radio locks up before entering fm screen (same as clean svn).
Mpegplayer locks up on "Loading" splash.
FS#10270to prevent the white screen:+ nearly no boosting (0.9%) on playing mp3 files (320 kBit/s), boosts only on memory access
+ test_codec said 670.29 % realtime, 36.99 MHz needed (test file was 128 kBit/s mp3)
- mpegplayer and radio not working as Michael wrote already
This lets mpegplayer run fine, but now I see backlight changing while playing movies.. do you have the same effect on e200v2?
Did you get this delay from the OF or is it a "hand-made" ?
I think your discovery of "Set CCU_SCON to 1 (dma has the highest priority)" has made the use of uncached buffers for dma unneccessary. I noticed that I was getting an average of about 70 MHz in the buffering thread while playing mp3 for just about all frequency combinations I tried. I know I was seeing numbers more down in the 45-46 range when I was working on my version last weekend so I decided to just drop this line into the 192_32_32_dcache.patch and give it a try. Sure enough this one change alone makes this patch work also. I am not seeing any of the disk error messages getting tossed at me and test_disk passes the write & verify test. I have yet to be able to crash it yet. Mpegplayer runs fine also. The buffering thread shows an average of 45-46 MHz while playing mp3 vs 70 MHz. I believe the difference is that the ams-caching.diff patch uses memcopy to the uncached buffer while the 192_32_32_dcache.patch uses dma with cached memory and cache coherence functions. I could of course be mistaken and not understand this correctly. The patch still applies to current svn, would someone else try this and see what results they get. I'm not trying to lobby for one patch over another here, this is more along the lines of which method of disk access will give us better results, dma with cache coherence functions or memcopy to an uncached buffer. At least that's what I think the choices are . Feel free to correct my misunderstandings!
+ unsigned char* uncached_buffer = UNCACHED_ADDR(aligned_buffer);
The only change between the 2 last patches is writing the data to/from uncached memory instead of using cache coherency functions in SD DMA transfers.
The consequences (that I can't explain) : mpegplayer doesn't crash, and music playback is perfect.
Also you should not take the cpu frequency indication in buffering thread as precise mesure, but only as an indicative one
The algorithm will see if the CPU is boosted or not at each tick, so if the CPU is boosted in one tick, unboosted just after the tick has happened, boosted again just before the next tick happens, the algorithm will think the CPU was boosted during the whole tick.
test_codec is more precise : it calculates how much time we need to decode a given time of audio.
- map the IRAM just behind DRAM to remove mlong-calls option from gcc and reduce binsize (approx. 10% smaller binaries)
- keep DRAM virtual address the same than physical, so MMU setup can happen in loaded rockbox, and the bootloader can continue to function without MMU.
- map the 1st MB of DRAM at 0x0 to avoid to copy the vectors
- tested bootloaders on Fuze & Clip
TODO:
- Fix Clip down button stopping to respond sometime.
- Fix e200v2 radio (radio is fine on my Clip & Fuze)
- Understand why we can't use cache coherency functions in SD transfers, else mpegplayer locks up (and clean ata-sd_as3525.c)
Did I miss anything ?
658% real time for 128kbit MP3. Nice :)
EDIT: Booting seems a tad bit slower, or is that me?
596.71 % realtime for 320 kBit/s mp3 (found one on my player which is short enough to be not to large for test_codec)
629.31 % for 128 kBit/s mp3
@kugel: hmm... I can't notice longer boot time, seems to be as before.
Also bootloader update is optional, will work fine with previous bootloaders (mapping is made in crt0.s - calling memory_init - and DRAM is mapped at its physical address, with IRAM just behind it)
Also I found that in the solitaire game, if i scroll the wheel quite fast, the cursor will not move. If I do the same fast scrolling in the rocks->games menu, the cursor will move.
I never had seen that behaviour in solitaire : is that something wanted and we can ignore it; or is there a problem with the wheel driver?
@flyndice : microsd works fine here on fuze. what is your problem exactly? have you tried several different cards?
The solitaire problem can be fixed, I already located it.
My bootloader experience was the same as kugel's.
As far as microsd goes, I only have 1 microsd and no access to others(besides newegg .....). It's an 8GB transcend with a circled 6 on it but I don't know if that means class 6 or something else. With PCLK > 32 MHZ (ie 62,64) I cannot boot rockbox with the microsd installed. If I wait until rockbox boots and then install it I am fine until I select the files menu then the player locks. If I start playing music and then install the card It works ok _sometimes_ . It seems to be ok as long as it gets some time with PCLK at 31 or 32. In some of the clocking schemes I have tried PCLK shifts between say 31 & 62 during boosting and the card works just fine. But when PCLK is a constant 62 or 64 I have problems. I am thinking that the microsd "tuning" was done when we were running at 31MHz most of the time and that tweaking the delays may help for use with the higher PCLK.
I have been testing clocking schemes and think I have a found the source of my confusion yesterday although I'm not sure what to believe now as far as tests go. It's not the sd access but instead the clocking scheme that makes the difference I think. Of course I welcome all arguments against this so hit me with your best shot and enlighten me ;P!
As far as tests go now I'm not sure what to believe. I ran test_codec on a file using a 32MHz/64MHz fastbus only clocking scheme. Music played fine as expected but the results from test_codec were a bit puzzling. It told me that I needed 97.27 MHz for realtime and that decoding happened at 197.38% realtime. So, I tried playing music and watching the buffering thread: 30% boost rate at 41MHz, which for lack of a more scientific measure, just felt about right watching the alternating boosted/unboosted states. I then went back and figured out that 97.27 * 1.9738 = 191.99. Close enough to my 192MHz FCLK freq to not be a coincidence I think. I'm trying to read the test_codec code right now to understand how it comes up with it's numbers and the buffering thread code is next. Unless someone who has already done this can explain it to me sooner.
test_codec works this way:
1/ fully load a file into the audiobuffer
2/ get the length (in seconds) of this file via its metadata
3/ decode this file
4/ Calculate how much time was needed to decode it (in ticks)
5/ speed = (audio duration / needed time) * 100% realtime
And the %dMHz needed for realtime is calculated this way: (CPUFREQ_MAX / speed)MHz needed for realtime
Thanks for the test_codec explanation, of course I had just finished plowing through it all when the email came..... I tried recompiling with same results. Then make clean & recompile and the world makes sense again, well at least this does. 33.67 MHz needed for realtime, now that I can believe.
The DMA code will use this buffer address as the physical address, and then the AS3525 deadlocks.
Instead of adding code to handle this, i removed the define PROC_NEEDS_CACHEALIGN. I believe only dual-core targets need to keep shared data out of cache lines, and we only have one core.
In ata_sd-as3525.c i now use {clean,dump}_dcache_range since we can avoid the memcpy() on aligned buffers; this makes transfers *much* faster for big data transfers, and a very few little faster for unaligned and small data transfers.
Now we should see what prevent us from committing this patch:
1/ radio not working on e200v2 (
FS#10267)2/ clip down button going mad (can be fixed after commit IMO)
3/ testing to see if we didn't forget some bug
For 1/ It seems that radio can't be made to work with this patch (reading
FS#10267comments).For 2/ I hope clip owners agree (kugel & bertrik)
For 3/ Let's go :P
1: radio does not work (lockup on a blank screen (background and battery, vol and clock still showing): button presses turn on backlight but scroll wheel does not (not sure if since there is nothing to scroll through this is 'normal' or not)
2: wheel light does not turn off (it is configured to never be on) for about 18 seconds after booting into rockbox (i do have the database initialized so it could be scanning for changes?); when pressing select button the light flashes however the flash does not occur in the music player screen, seems to only be in menus (pressing other buttons does not cause the flash). When changing songs the light flashes for 1-2 sec, and intermittently during playback; I think it happens when reading from the disk?
3: when booting rockbox there are blue lines visible, maybe 1 pix vertical by a row approx 10 px blue, then ~4-5 px 'normal' and then another ~2-3 blue, in a row: the lines TEND to be in the lower 1/3 of the screen but when changing songs they will quickly appear higher up on the screen *note: the lines or 'bars' flash seemingly randomly across the screen staying in one place for ~1/3 of a sec (then 'jumping'), maybe 1-3 of them at a time (seems to be when 'doing stuff' more cpu intensive, ex changing songs or initially buffering a recently changed song)... they
test_disk WRITE&VERIFY
CPU clock: 62 Mhz: Passed
CPU clock: 248 Mhz: Passed *Noticed blue lines 'evenly' on entire screen (not bottom 1/3) while doing test... it appears when accessing disk boosted they are all over, and when processing/buffering they are just on bottom 1/3
MP3 (I am using 320k mp3's) seems to work great, have tried parametric eq which also works
These other patches can be refined once this one (data cache / mmu) has been committed.
FS#10048While playing flac, I am seeing it frequently skip to the next track during playback.
Watching the "buffering" debug screen, the skip happens when the usefl buffer empties and is refilled. Sometimes, the usefl buffer fills and immediately empties and refills a second time. At this point, it skips to the next track.
Also I wonder if the sudden empty and refill means that there was an error reading the file (corrupted file/fs)
Does it always happen on the same track/location or randomly ?
Tested fs - no errors found
Does not fail at same track/location - replaying the track may result in track playing completely.
The failure happens at random. It may play 2 tracks properly or it may fail on 2 tracks in a row.
Did you have the same skips with the penultimate patch?
I can't think of a possible reason for this :/
FLAC: Frame 24, error -41
FLAC: Frame 852, error -209
Looking the flac codec and thinking a bit this can only mean one thing : corrupted flac.
If the flac is correct, and file system correct as well => corruption very likely happens during transfer from storage
I'll try the penultimate patch which used an uncached buffer for storage transfers, and also run tons of test_disk with both patches.
And it seems there is a bug with disk access, because i can't access anything, if look the the File menu there are only weird file/directory names...
no loading of plugins and no playing of music because the codecs can't be loaded.
Using newer revisions : rockbox is stuck on the logo screen.
Something else : I didn't look at the exact meaning of the MCI_DATA_TIMER register, I'll read the pl180 datasheet again to calculate a reasonable delay for timeout, perhaps we will detect earlier invalid transfers and retry.
Something if you test this patch : regularly run scandisk / dosfsck to verify the internal storage and microsd, because rockbox could have corrupted the filesystem.
* MCI_DATA_TIMER is set to the max, since I don't know how to calculate efficient timeout delays for transfers (in MCICLK cycles)
- the fixed timeout delay for commands is 64 cycles
*
FS#10296applied to help understanding potential data abortsI suppose r21195 changed the code, and modified alignement of some critical parts. Using my (booting) clip and seeing the data aborts, i noticed memory corruption, so very likely something wrong with caches.
r21215 + this patch boots fine on Clip & Fuze, test_disk passes, flac play fine without any problems (so far).
Since the blocking issues (e200v2 radio / clip button) are being worked on, I wish this patch gets committed ASAP so we can all use the same working base.
If you can still see data aborts / or data corruption (or related, like the flac problem mc2739 mentioned) please tell.
same problem, but it slightly improved, it seems file access works for the first few seconds, and as I access the file menu and switch to a subfolder, the same cryptic file/directory names appear.
But I might be wrong.
reformatted the player now everything works, thank you!
with r21226 my fuze refuses to boot (stuck at logo screen)
with r21226 and diff to system-arm.c reverted it boots
with r21226 and r21225 reverted it boots
Since the modified code in system-arm.c is never called (it's the data abort handler) I suppose that the problem comes from changed alignement.
The exact problem remaining unknown, the patch can not be committed since it would break at the next random commit.
Suggestions welcome, and please also tell if you can see the same behaviour on another model (my Clip boots and runs fine currently)