unidic-mecab is a dictionary for Mecab (Japanese morphological analysis
implementation), based on corpus of Contemporary Written Japanese (upstream
publish it as unidic-cwj).
.
* All entries are based on the definition of "SUW (short-unit word)" that is
specified by NINJAL (The National Institute for Japanese Language and
Linguistics), which provides word segmentation in uniform size suited for
linguistic research.
* It has three-layered structure with
- lemma
- form
- spelling
And it can provide a clear distinction of two types of word variant:
spelling variant and form variant.
* It is useful for research of Speech processing since it can be added
accent and shift in sound information.
.
This package is huge. You need more than 10GB of free space to download and
install.
Installed Size: 5.2 GB
Architectures: all