FS#6203 - unified way to open utf8 text files

Attached to Project: Rockbox
Opened by Dominik Riebeling (bluebrother) - Tuesday, 17 October 2006, 20:57 GMT
Last edited by Dominik Riebeling (bluebrother) - Saturday, 02 August 2008, 20:39 GMT
Task Type Patches
Category Operating System/Drivers
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


Since Rockbox supports utf8 there has been the issue of text files containing a BOM created by some editors. Rockbox needs to skip that BOM which is currently done independently in various places (and still missing in a couple of places). This patch addresses this by introducing a new open_utf8() function which behaves exactly like open() but places the file pointer to the first text byte after opening. That way opening a text file only needs to use that function and doesn't need to bother with a BOM anymore.

The attached patch changes this in a couple of places (see also commited  FS#5770 , pending  FS#6071 ; I guess I still missed some places where is should get changed). Using lseek() may still break but I'm not sure if that is actually used on text files anywhere. In that case a similar function for seeking could solve this easily.
This task depends upon

Closed by  Dominik Riebeling (bluebrother)
Saturday, 02 August 2008, 20:39 GMT
Reason for closing:  Accepted
Additional comments about closing:  committed adapted version as r18185.
Comment by Dominik Riebeling (bluebrother) - Sunday, 08 April 2007, 18:37 GMT
updated to current svn. Added BOM check to cuesheet.c too (untested, as I don't have any cuesheets around)
Comment by Jonathan Gordon (jdgordon) - Thursday, 17 April 2008, 13:35 GMT
any reason this hasnt been commited?
Comment by Dominik Riebeling (bluebrother) - Thursday, 17 April 2008, 19:59 GMT
I guess it simply wasn't committed because nobody picked it up -- and I somewhat forgot it. As the patch is quite out of date I resynced it (and replaced the BOM check with a somewhat nicer variant I found in playlist.c). It's barely tested right now (wps and playlists seem to work fine, haven't tried other affected functionality yet).

If there are no objections I'd be fine picking it up again and committing it after the missing testing has been done.
Comment by Bertrik Sikken (bertrik) - Friday, 01 August 2008, 17:40 GMT
The patch still applies, but the result does not compile anymore. learman added some code to add the BOM to m3u8 files in r17786 that breaks it (requires the BOM #define that was moved to apps/misc.c)
Maybe the utf8_open function could also _write_ a BOM in case it's called with specific flags set (e.g. O_CREAT or O_WRONLY)?
Comment by Dominik Riebeling (bluebrother) - Friday, 01 August 2008, 22:18 GMT
I still have that code around and properly synced. I planned committing it somewhere this weekend (given that I find the time, of course ;-) Making utf8_open to also write the BOM is a nice idea, I'll check it.