2011 AStackedSubWordModelforJointChi

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Chinese Word Segmentation.

Notes

Cited By

Quotes

Abstract

The large combined search space of joint word segmentation and Part-of-Speech (POS) tagging makes efficient decoding very hard. As a result, effective high order features representing rich contexts are inconvenient to use. In this work, we propose a novel stacked subword model for this task, concerning both efficiency and effectiveness. Our solution is a two step process. First, one word-based segmenter, one character-based segmenter and one local character classifier are trained to produce coarse segmentation and POS information. Second, the outputs of the three predictors are merged into sub-word sequences, which are further bracketed and labeled with POS tags by a fine-grained sub-word tagger. The coarse-to-fine search scheme is efficient, while in the sub-word tagging step rich contextual features can be approximately derived. Evaluation on the Penn Chinese Tree-bank shows that our model yields improvements over the best system reported in the literature.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 AStackedSubWordModelforJointChiWeiwei SunA Stacked Sub-word Model for Joint Chinese Word Segmentation and Part-of-speech Tagging2011