Kanji Sieve

ksgeisha.jpg

Kanji Sieve is a “mash-up” application or a database wrapped around custom web browsers. It is designed to help you learn to read Japanese. It provides custom dictionaries and lists for texts you input. The key piece of information is a Japanese text. Kanji Sieve will break this down into its component kanji and give you their frequency and level. The kanji can be looked up and annotated in Kanji Notebook. After breakdown the text is submitted to chuta.jp to be broken down into words. A wordlist is generated and the user can look up the words in a variety of online dictionaries by clicking on the words in the list. Finally a tab separated text is made that can be used in flashcard applications.
Images can be added to kanji, and audio can be added to a record.

About
I originally wrote this as a quick and probably imprecise way of looking at kanji usage in texts. Probably because of the 1998 study of kanji usage in the Asahi Shinbun (Shinbun denshi media no kanji, Senseido, 1998) a figure of 1000 most frequent kanji accounting for 95% of usage is quoted. I have also seen this as 1000 characters allow you to read 95% of articles (a subtle difference) but I think this is a bit of an overstatement.

What interests me is the percentage of kyouiku kanji that are used in texts and which of the remainder of the jyouyou kanji are used most frequently. My hypothesis is that the kyouiku kanji are a better medium term goal for JSL learners than the complete jyouyou set. The diminishing returns of the 939 kanji beyond the kyouiku kanji might suggest approaching these on a need-to-know basis.

Since v0.1 I have expanded Kanji Sieve to work with multiple documents and to generate wordlists for Flashcard decks. It has gone from a quick hack to look at kanji in a single document to something that may be of genuine use in learning to read Japanese.

zarujiten.png

This solution is written in FileMaker. It filters for the six primary school grades, for the remaining jyouyou kanji, katakana groupings (which are usually words by themselves) and finally all other characters hopefully just returning non-jyouyou kanji. (However as I do this by removing all known characters it may leave foreign characters and unanticipated punctuation.) It then counts the occurrence of each character and highlights the groups of characters in the text. This might allow you to see the most frequently occurring characters in the texts you are interested in. For readability or difficulty other considerations would need to be addressed such as the vocabulary used, the length of compounds and the grammar, although I’ve attempted to give each a difficulty score based on kanji usage.

As I continue to develop this I would like to add more features. I hope to expand kanji Notebook and add a sentences database as well.
…And I’ve another few ideas I want to play with too. Watch this space.

Remember this is still in beta. Bug reports, comments and suggestions are always welcome.

How-to movie from v0.3
Shortly there will be new screencasts for version 0.5
Kanji Sieve help page
There are now also help pages in progress.

Posts
Kanji Sieve 0.1
Kanji Sieve 0.2
FileMaker Kanji Project – progress 2
FileMaker Kanji Project – progress 3
Kanji Sieve 0.3
Kanji Sieve for Windows
Kanji Sieve 0.4 progress
Kanji Sieve v0.4 released
Kanji Sieve v0.5 released

Downloads

v0.5 usr file (November 2010)
Kanji Sieve Data (12 MB zipped)
It needs either the Mac or Windows runtime from v0.3 or above (or a full version of FileMaker 10 or above) Replace the .usr file in the runtime solution with this updated file.
WARNING
Change the name of your old .usr file to .usr.old and import your data into the new file at the prompt. Always backup your data!

v0.5 uses v4 of the 360Works Scriptmaster plugin. It should work with the earlier version but you may want to update your Scriptmaster plugin.

v0.5 Mac (November 2010)
Kanji Sieve mac runtime (43 MB zipped)
It needs Mac OS X v10.4.11 and above. It is a runtime version of FileMaker that only works with the bound file.

v0.5 Win (November 2010)
Kanji Sieve windows runtime (37 MB zipped)
It needs Windows XP and above. It is a runtime version of FileMaker that only works with the bound file.

Made on Mac OS X v10.5.8 using FileMaker 10.

Version History
Kanji Sieve Help

Caveats

  • Needs regex plugin and Scriptmaster plugin (included in runtime)
  • Needs Java Virtual Machine 1.4.2 or higher
  • NeedsQuickTime to playback audio
  • Needs Flash for image editor from pixlr.com
  • Non-jyouyou might also display and count non-kanji characters
  • Display on Windows requires Meiryo font. Also due to quirks in Windows and Internet Explorer I’m not 100% happy with how it looks on Windows. (It may depend on what version you have)
  • It needs Internet access to work fully. There is some offline use.
  • It’s still beta

Known Bugs
12 Nov fix due in 0.5.1
In preferences, known kanji pane, filter non-jyouyou pane and user list pane, the input will not accept line returns or paragraph returns in the input text. Otherwise you will lose your list and will see a ? in the input area.
ie
時事時 is ok



時 isn’t

Further plans
I hope to solve the remaining issues with a windows version.
I’d like to introduce sentence lists in a similar manner to word lists.
Language support
Better feedback through dialogs.
Help files.

—-attribution—–
Heading image from Okinawa Soba’s Flickr Gallery.
original image
used in accordance with CC Licence
—-attribution—–

Leave a Reply

Required fields are marked *