2004 ApplyingConditionalRandomFieldsToJapMorphAn

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Conditional Random Field Model, Japanese Language, Morphological Analysis Task, CRFpp System, CRF Length Bias, Hierarchical Tag Set.

Notes

Cited By

Quotes

Abstract

2 Japanese Morphological Analysis

2.1 Word Boundary Ambiguity

  • Word boundary ambiguity cannot be ignored when dealing with non-segmented languages. A simple approach would be to let a character be a token (i.e., character-based Begin/Inside tagging) so that boundary ambiguity never occur (Peng et al., 2004). However, B/I tagging is not a standard method in 20-year history of corpus-based Japanese morphological analysis. This is because B/I tagging cannot directly reflect lexicons which contain prior knowledge about word segmentation. We cannot ignore a lexicon since over 90% accuracy can be achieved even using the longest prefix matching with the lexicon. Moreover, B/I tagging produces a number of redundant candidates which makes the decoding speed slower.
  • Traditionally in Japanese morphological analysis, we assume that a which lists a pair of a word and its corresponding part-of-speech, is available.

2.2.1 Hierarchical Tagset


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 ApplyingConditionalRandomFieldsToJapMorphAnTaku Kudo
Kaoru Yamamoto
Yuji Matsumoto
Applying Conditional Random Fields to Japanese Morphological Analysishttp://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Kudo.pdf