1999 TieredTaggingandCombinedLanguag

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Supervised Tagging Algorithm.

Notes

Cited By

Quotes

Abstract

We address the problem of morpho-syntactic disambiguation of arbitrary texts in a highly inflectional natural language. We use a large tagset (615 tags), EAGLES and MULTEXT compliant [5]. The large tagset is internally mapped onto a reduced one (82 tags), serving statistical disambiguation, and a text disambiguated in terms of this tagset is subsequently subject to a recovery process of all the information left out from the large tagset. This two step process is called tiered tagging. To further improve the tagging accuracy we use a combined language models classifier, a procedure that interpolates the results of tagging the same text with several register-specific language models.

1. Introduction

One issue recurrent in the tagging literature refers to the tagset dimension vs. tagging accuracy dichotomy. In general, it is believed that the larger the tagset, the poorer the accuracy of the tagging process, although some experiments [4] show that this does not always hold provided enough training data is available and the tagset cardinality varies within reasonable limits (say 100-200 tags). However, when the target tagset gets larger (t00-100 tags or even more), the problem becomes the current tagging technology. We describe tiered tagging, a two-step process, as a possible solution for reconciling the tagging accuracy with the large number o tags in the target tagset (as many highly inflectional languages require). The two leves of tiered tagging have tow different …

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 TieredTaggingandCombinedLanguagDan TufisTiered Tagging and Combined Language Models Classifiers