Rockbox

Tasklist

FS#9387 - display error for Asian text

Attached to Project: Rockbox
Opened by alex yuan (zephyr) - Tuesday, 09 September 2008, 07:36 GMT
Last edited by Teruaki Kawashima (teru) - Monday, 13 July 2009, 13:03 GMT
Task Type Bugs
Category Plugins
Status Closed
Assigned To No-one
Operating System Sansa e200
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Version 3.1
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Viewer decodes incorrectly for the text file with one-byte characters and two-byte characters mixed. For example, when the encoding set to GB2312, it always get two bytes to constitute one character. This is not always ture. If current character is just an ASCII character, just need to read one byte.

viewer.c -> get_ucs() (current)

if ((prefs.encoding == SJIS && *str > 0xA0 && *str < 0xE0) || prefs.encoding < SJIS)
return (unsigned char*)str+1;
else
return (unsigned char*)str+2;

I just build a private product on my sansa with following change to make decoding correct for GB2312. It works well:

if ((prefs.encoding == SJIS && *str > 0xA0 && *str < 0xE0) ||
(prefs.encoding < SJIS) ||
(prefs.encoding == GB2312 && *str <= 0x7F))
return (unsigned char*)str+1;
else
return (unsigned char*)str+2;

Could someone workout a complete solution including other encoding schema?
This task depends upon

Closed by  Teruaki Kawashima (teru)
Monday, 13 July 2009, 13:03 GMT
Reason for closing:  Accepted
Additional comments about closing:  Committed in r21743. thanks!
Comment by Jonas Häggqvist (rasher) - Thursday, 11 December 2008, 18:23 GMT
GB2312 codepoints are always 2 bytes according to http://en.wikipedia.org/wiki/GB2312

What you describe seems to be EUC-CN as far as I can tell.
Comment by alex yuan (zephyr) - Monday, 22 December 2008, 01:47 GMT
i think so. Two bytes are used to represent every character NOT found in ASCII.
Comment by Yoshihisa Uchida (Uchida) - Thursday, 22 January 2009, 10:50 GMT
Yuan, when ASCII was included in the text as you pointed it out, I confirmed the position of the next character is not correct for get_ucs().

Because your correction is not correct for KS X 1001, and Big-5, I create a patch file.
Please confirm it.
Comment by Yoshihisa Uchida (Uchida) - Friday, 23 January 2009, 11:19 GMT
My patch file correct.
Please confirm it.
Comment by Yoshihisa Uchida (Uchida) - Wednesday, 28 January 2009, 11:08 GMT
My patch file update.
Comment by Yoshihisa Uchida (Uchida) - Saturday, 21 February 2009, 04:02 GMT
Please use this patch when you apply patch files ( FS#8445 ,  FS#9546 ,  FS#9853 ,  FS#9855 ,  FS#9892 ,  FS#9893 ,  FS#9898 ,  FS#9902 ) for the Text viewer plugin.

Please apply the patch in order of  FS#9855 ,  FS#9892 ,  FS#9893 ,  FS#9898 ,  FS#9902 ,  FS#9853 ,  FS#9546 ,  FS#8445  and this task's patch.

If you do not apply these patch files, this patch need not be applied.
Comment by Yoshihisa Uchida (Uchida) - Wednesday, 17 June 2009, 09:18 GMT
sync r21316
Comment by Gman (Thecoolgman) - Wednesday, 17 June 2009, 10:20 GMT
where's the patch?
Comment by Yoshihisa Uchida (Uchida) - Wednesday, 17 June 2009, 13:15 GMT
Sorry, I missed to upload my patch.

Loading...