Amatubu_Wiki | POPFile | RecentChanges | Preferences

POPFile MeCab インストーラのメモ





● Kakasi 漢字→かな(ローマ字)変換プログラム(推奨)

POPFile 0.22.5 までのバージョンで使用されていたプログラムです。
分かち書きの精度は MeCab に比べると低い(ひらがなやカタカナで構成されている単語の情報を持っていない)ですが、辞書サイズが小さくてすみます(2MB 程度)。

○ MeCab Yet Another Part-of-Speech and Morphological Analyzer

Kakasi よりもより正確な分かち書きを行うことができますが、辞書サイズが大きくなります(40MB 程度)。

○ 内蔵パーサ 文字種による分割



Installer image (in English)

Please choose the Japanese wakachi-gaki (splitting words) parser program:

(Japanese texts have no spaces between words unlike English texts. So, to analyze e-mails by using bayesian filter we have to split (wakachi-gaki) the e-mail body texts into words.)

x Kakasi - KAnji KAna Simple Inverter (Recommended)

The program used by POPFile 0.22.5 or before.
The wakachi-gaki accuracy is poorer than MeCab (because Kakasi does not have the information about the words which is constructed by Hira-gana or Kata-kana), but Kakasi uses smaller dictionaries (about 2MB).
The POPFile installer contains Kakasi and its dictionaries.

o MeCab - Yet Another Part-of-Speech and Morphological Analyzer

The wakachi-gaki accuracy is better than Kakasi, but MeCab uses larger dictionaries (about 40MB).
The POPFile installer does not contain MeCab. It will be downloaded from the Internet.

o The internal parser - splitting by the kinds of characters

Instead of using external programs, the parser splits texts by the kinds of characters (ex. Kanji, Hira-gana or Kata-kana).
The wakachi-gaki accuracy is poor than programs which use dictionaries, but it does not use dictionaries so it is faster.


Amatubu_Wiki | POPFile | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited September 9, 2007 1:21 by Amatubu (diff)

Copyright (c) 1996-2006 naoki iimura e-mail