2011 AStackedSubWordModelforJointChi

Subject Headings: Chinese Word Segmentation.

Notes

The large combined search space of joint word segmentation and Part-of-Speech (POS) tagging makes efficient decoding very hard. As a result, effective high order features representing rich contexts are inconvenient to use. In this work, we propose a novel stacked subword model for this task, concerning both efficiency and effectiveness. Our solution is a two step process. First, one word-based segmenter, one character-based segmenter and one local character classifier are trained to produce coarse segmentation and POS information. Second, the outputs of the three predictors are merged into sub-word sequences, which are further bracketed and labeled with POS tags by a fine-grained sub-word tagger. The coarse-to-fine search scheme is efficient, while in the sub-word tagging step rich contextual features can be approximately derived. Evaluation on the Penn Chinese Tree-bank shows that our model yields improvements over the best system reported in the literature.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2011 AStackedSubWordModelforJointChi	Weiwei Sun			A Stacked Sub-word Model for Joint Chinese Word Segmentation and Part-of-speech Tagging						2011