Machine yomigana/furigana annotator?

Rasqual Twilight
2007-02-17, 14:09

I'm looking for a solution that can annotate (http://en.wikipedia.org/wiki/Furigana) Japanese text, preferably using a template system (so that it can be adapted to outputting XHTML, plain text or engine-specific text).

My best hint so far would be looking at the rikaichan Firefox extension (http://www.polarcloud.com/rikaichan/) (GPLv2, based on Jim Breen's EDICT) and port it to another language for use on the command-line (e.g. Java) - 他の心当たりあるだろうか?

jBrowse (http://www.jbrowse.com/products/jbrowse/) (IE activeX, free-beer shareware) is interesting but not open-source, is also based on an EDICT, and makes errors still (some people claim it is inferior to rikaichan, I can't really tell).

Thank you for any suggestion

2007-02-18, 17:15
The problem is that in order to recognize well enough kanji compounds to be able to tokenize them, the program would essentially have to derive semantics from a sentence. By the time you can do that correctly, you might as well go with the full machine translation...

2007-02-19, 11:56
Check out ChaSen (http://chasen.naist.jp/hiki/ChaSen/).