Rockbox

Tasklist

FS#9273 - Changing sapi_voice.pl To make japanese.voice sound better with specific SAPI

Attached to Project: Rockbox
Opened by Taktak (Whick) - Wednesday, 13 August 2008, 05:44 GMT
Task Type Patches
Category Language
Status Assigned
Assigned To Jonas Häggqvist (rasher)
Operating System All players
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

I would like to change voice strings to make them sound better with a specific SAPI voice.

I don't intend to change string in *.lang directly.
It should be done in voice.pl dinamically while building voice, shouldn't it?
This task depends upon

Comment by Taktak (Whick) - Wednesday, 13 August 2008, 06:05 GMT
Should we write in Hiragana (or katakana?) for japanese voice-strings in japanese.lang?
A knowledge is necessary to answer this qustion.
We can say that Hiragana and katakana are almost phonogramic in almost cases.
But They aren't phonogramic in case of prolonged sound like roman vowel with macron and this rule is not a perfect.
I won't explain the rule here because it's very difficult except for Japanese native speaker.
Beside, the accent of many words may be broken by using Hiraga(or Katakana).

What voice-string is misread by Sapi?
For now, I talk only about Sapi LH Kenji and LH Naoko bundled office XP or 2003 localized for Japan.
(It's just aside that Kenji is common male name like Jack and Naoko is common female name like Jane.)

They misread these voice-strings.

A.voice-strings in japanese.lang: B.normal spell in hiragana: C.how should be pronounced (writen in Hiragana):
D.how they pronounce:
E:What word should be given to sapi, for letting LH Kenji(or Naoko) to pronounce words correctly
(i.e. if voice.pl replace A to E before give voice-string to sapi_voice.vbs,
sapi_voice.vbs makes sound better.)

A: B: C: D: E
"大文字":"おおもじ":"おーもじ":"だいもんじ":"おー文字"
"背景色":"はいけいしょく": "はいけーしょく":"はいけーいろ":"背景蜀"
"文字色":"もじしょく":"もじしょく":"もじしょく":"文字蜀"
Comment by Jonas Häggqvist (rasher) - Wednesday, 13 August 2008, 18:50 GMT
If I understood right, the attached patch should fix those three examples. Hopefully you get the idea.
Comment by Taktak (Whick) - Thursday, 14 August 2008, 17:48 GMT
I appreciate your continued and considerate help, and have read this patch. I 'd like to ask you wheter if I may use "unknown" vendor in sapi_voice?
It seems kind of unsolicitously for me that distinguish by "unknown" vendor, to stretch the point a bit.
How about to distinguish by anything except vendor attibute.
For example, can voice.pl disinguish by existance of option which are "/voice:LH Kenji" (or "/voice:LH Naoko").

I 'm not familar with perl and policy of rockbox source, so that my opinion might be unconsidered.
Comment by Jonas Häggqvist (rasher) - Thursday, 14 August 2008, 19:05 GMT
I understand your concern, but according to Jens Arnold, this is the first SAPI voice he's seen that didn't specify a vendor, so it is probably not a problem. I don't know how the voice selection is done for SAPI - Jens Arnold is probably the best to comment on this question.

For now, I think it's okay to use the "(unknown)" vendor.
Comment by Jens Arnold (amiconn) - Thursday, 14 August 2008, 20:31 GMT
It should be possible to work out the correct vendor from the voice name if the engine doesn't specify a 'Vendor' attribute. However, this requires that the engine properly specifies the 'Name' attribute. This needs to be checked on a machine with those japanese L&H engines installed. I can't do this myself, as I don't have a japanese version of Office 2003, and Microsoft only offers the SAPI4 version of those engines for separate download.

Please run this script: http://www.rockbox.org/twiki/pub/Main/VoiceBuilding/ListVoices.vbs in a console window (cscript ListVoices.vbs) and post the output somewhere (the lines mentioning the problematic L&H voices are sufficient).
Comment by Taktak (Whick) - Friday, 15 August 2008, 01:51 GMT
I appreciate your attentiveness to my requirements and quick responses.
I runed this script and redirected output to this attached file.(i.e. c:\>cscript ListVoices.vbs>result_from_ListVoices_vbs.txt)

Comment by Taktak (Whick) - Friday, 15 August 2008, 02:24 GMT
I altered ListVoices.vbs to diplay vendor attribute if sapi have it for other sapi engine users' chcking ther sapi. and attached its output.
Comment by Taktak (Whick) - Friday, 15 August 2008, 04:52 GMT
I altered ListVoices.vbs to diplay vendor attribute if sapi have it for other sapi engine users' chcking ther sapi. and attached its output.
Comment by Taktak (Whick) - Friday, 15 August 2008, 06:22 GMT
I altered ListVoices.vbs to diplay vendor attribute if sapi have it for other sapi engine users' chcking ther sapi. and attached its output.
Comment by Taktak (Whick) - Friday, 15 August 2008, 07:34 GMT
sorry for my duplicate post.
Comment by Jens Arnold (amiconn) - Friday, 15 August 2008, 17:24 GMT
I've just committed a fix for sapi_voice.vbs, so that it now reports the vendor for those japanese L&H engines as "L&H" (same as the other L&H engines actually report).
Comment by Jonas Häggqvist (rasher) - Saturday, 16 August 2008, 00:10 GMT
After this, the vendor in my patch should obviously be changed from "(unknown)" to "L&H".

Does this work as intended?
Comment by Kenjiro Arai (MoonWolf) - Sunday, 17 August 2008, 09:48 GMT
I upload the result of ListVoices.vbs in my environment.
Vendor of "ドキュメントトーカ" are "Create System Development Co and Ltd."
Comment by Taktak (Whick) - Monday, 18 August 2008, 03:23 GMT
> Does this work as intended?
Jonas, Thank you very much! I think it might work well but can't test it at the moment. I'll confirm the work of current sapi_voice.vbs in two days.


Thank you very much, MoonWolf. Ok, The Vendor attribute of product "ドキュメントトーカ" is "Create System Development Co, Ltd." isn't it.
Then, Could I ask whether your characters (of ドキュメントトーカ) pronouce "voice-strings of current japanese.lang" correctly ? (id est. Don't the sapi of the vendor misread anywords?)
BTW, I've convert YOUR result.txt to other one encoded UTF-8 and upload it just in case.(Because it was encoded in shif-jis because of my insufficient explanation for you.)

In Japanese( for Moon Wolf)
協力ありがとうございます。ドキュメントトーカのVendor Attributeは、"Create System Development Co, Ltd."ということですね。
それでドキュメントトーカのキャラクターは、(最新の)Japanese.langのvoice文字列を正しく発音しているか教えていただけますか? (別の言い方をすれば、そちらのSapiが読み間違えを起こしませんか?)
 なお念のため、MoonWolfがアップしてくれたresult.txtをUTF-8に変換しアップし直しました。(私の説明不足のせいで、Shift-Jisの漢字が含まれていたので)
Comment by Kenjiro Arai (MoonWolf) - Monday, 18 August 2008, 17:22 GMT
I upload the result of ListVoices.vbs in my environment.
Vendor of "ドキュメントトーカ" are "Create System Development Co and Ltd."
Comment by Kenjiro Arai (MoonWolf) - Monday, 18 August 2008, 17:23 GMT
Sorry duplicate post
Comment by Taktak (Whick) - Thursday, 21 August 2008, 01:05 GMT
I checked three japnese sapi-engines by this alterd ListVoices.vbs, and updated sapi_voice.vbs, voice.pl and japanese.lang.

1. added "use utf8;" code to voice.pl to replace some multibyte voice-strings
2. updated japanese.lang like english.lang ("r18308: Ensure every phrase has a "user:" line - currently they are all empty.")
3. added "distinguishing audio format code" to sapi_voice.vbs after I distinguished each audio format by this alterd ListVoices.vbs (ListVoices.vbs's output is result.txt)
Comment by Taktak (Whick) - Thursday, 21 August 2008, 08:18 GMT
Sorry, that result.txt was shif-jis encoded. this one is utf-8 encoded.
Comment by Taktak (Whick) - Friday, 22 August 2008, 07:33 GMT
fixe few mistakes and resynced.
Comment by harry tu (bookshare) - Tuesday, 23 September 2008, 02:32 GMT
Is there anything stopping this from being committed?

Loading...