Kanji Sieve v0.2


It has taken me a little longer than I thought to get to version 0.2 of Kanji Sieve. Mainly due to getting it to look better cross platform and avoiding problems for a user that wouldn’t be an issue for me as the developer.
However, as someone actually downloaded, looked at and commented on my initial little solution I looked at Kanji Sieve again. A little encouragement will always prompt me to continue projects.This time I’ve taken a bit more care over the look for the Windows file. On the suggestion of Tom Hodgers I used the Meiryo font and allowed the user to change the font size of the sample text.

I added non-Jyouyou kanji and katakana words to the sieve. This may be a bit indiscriminate. What I’m doing with katakana is searching for runs of katakana and hoping these are words. They may not be. For non-Jyouyou I try to eliminate all roman characters, kana, Jyouyou kanji, and punctuation. What’s left over in a Japanese text should be non-Jyouyou kanji. Again strange punctuation and foreign characters may appear here. I do have some plans to try to refine this panel though.

Trying it out I was surprised at the amount of non-jyouyou a friend of mine used in her mixi diary. I would have thought a larger amount of kana and jyouyou kanji in a personal diary. I wonder if it is due to using a word processor, it’s easier to generate those kanji and presumably she can expect ordinary friends to read them easily. If she was writing by hand it might be different.

Lastly I incorporated a little hack I put together to replace kanji with keywords. I did this to demonstrate how little meaning you get from just keywords. Especially when the most popular keywords in English that appear as the first entry in Kanjidict are a bit dreadful at times. These panels may or may not survive into the next version. If my notebook ever sees the light of day I’d generate the keywords from the users input which may at least be more useful and perhaps generate an xml file with the keyword furigana as pop-ups.

Further plans
I’d at least like to solve exporting. At the moment I have an issue with the flow of records of unknown length in printouts. It may just have to be an xml export.
I may make it into a multi-record solution.
I also found something very similar at the reading tutor web site at Tokyo International University. Which has the added benefit of producing custom glossaries for articles. If I could understand how they can parse for individual words I’d implement this myself.

Download from my new permanent Kanji Sieve page.

––update 11Apr10––
I’ve corrected the oversight I made in not filtering for half width kana or full width roman characters. non-Jyouyou and katakana should work a bit better now.

10. April 2010 by ロバート
Categories: 02 reading • 読む事 | Tags: , , , | 8 comments

Comments (8)