Kanji Sieve – Analysing Kanji Usage


This is a little FileMaker solution I’ve written.
It takes a piece of pasted Japanese text and analyses the kanji contained in it.

I wrote it as a quick and probably imprecise way of looking at kanji usage in texts. Probably because of the 1998 study of kanji usage in the Asahi Shinbun (Shinbun denshi media no kanji, Senseido, 1998) usually a figure is quoted of 1000 most frequent kanji account for 95% of usage. I have also seen this as 1000 characters allow you to read 95% of articles (a subtle difference) but I think this is a bit of an overstatement, (the thread below suggests 1900 kanji in order to read 95% of compounds). While doing a bit of research on this I came across several other frequency studies and an interesting thread where Jim Breen notes

…a discussion at a language teaching conference in Japan I attended in 1999, where there was general consensus that
the average Japanese adult could read 700-800 kanji…

Although I find this a bit hard to imagine, write by hand maybe…What interests me is the percentage of kyouiku kanji that are used in texts and which of the remainder of the jyouyou kanji are used most frequently.

My hypothesis is that the kyouiku kanji are a better medium term goal for JSL learners than the complete jyouyou set. The diminishing returns in terms of effort on the 939 kanji beyond the kyouiku kanji might suggest approaching these on a need-to-know basis. The old canard (by Heisigists I suspect) is that leaving out 10% of the alphabet isn’t a good idea. I don’t know. Firstly a more accurate analogy would be around vocabulary and it’s not so much that you completely ignore them but that it is possible to work around the unknown characters. And there’s a world of difference in effort between learning 3 characters and learning 939 characters. But I digress.
The Asahi Shinbun also probably isn’t the most read source by JSL learners either. It might be good to have some statistics on Amazon reviews, mixi blogs, or manga.


Kanji Sieve filters for the six primary school grades and for the remaining jyouyou kanji. It then counts the occurance of each character. This might allow you to see the most frequently occurring characters in the texts you are interested in.
Characters outside the jyouyou set are not considered.
For readability or difficulty other considerations would need to be addressed such as the vocabulary used, the length of compounds and the grammar.

If I continue to play with this I would like to add an export option, maybe allow you to collect a series of articles and see the aggregate statistics.
I would also like to incorporate it into my Kanji Notebook, to allow you to lookup kanji or add them to a study list or set of flash cards.
I would also like to see if I can extract vocabulary in the same way, but I suspect word boundaries would be an issue there although Rikachan manages it though….

most recent version only
Kanji Sieve Page

26. February 2010 by ロバート
Categories: 02 reading • 読む事 | Tags: , , | 4 comments

Comments (4)