Morph Segmentation Task

From GM-RKB
Jump to navigation Jump to search

A Morph Segmentation Task is a lexical task that requires the segmenting of a linguistic expression into its component morphs.



References

2004

  • (Karrij, 2004) ⇒ Wessel Kraaij. (2004). “Variations on Language Modeling for Information Retrieval." PhD Thesis, University of Twente, June 2004.
    • QUOTE: Compound analysis (also called decompounding or compound splitting) is an additional normalization technique for Germanic languages, since these have a productive compounding capacity. This means that new words can be formed by concatenating existing words. Decomposition of these compound words into their constituting morphological base forms is important for IR, since these compounds can usually be paraphrased by a noun-phrase construction, e.g., “vliegangst” and “angst om te vliegen” (fear of flying). Normalization of compounds will enable a match between both forms of the same composite concept and partial matches with related words after compound splitting, e.g., ’luchtvervuiling’ will match with ’vervuiling’ Several algorithms have been proposed for compound splitting.. They either use a lexicon (e.g. Vosse, 1994) or a corpus (e.g. Hollink et al., 2003) as a resource for the identification of candidate base forms which can form compounds. We will discuss the results of several comparative studies concerning stemming algorithms in the rest of this section.

2003

  • (Hollink et al., 2003) ⇒ Hollink, V., Kamps, J., Monz, C., & de Rijke, M. (2003). Monolingual document retrieval for european languages. Information Retrieval.

1994

  • (Vosse, 1994) ⇒ Vosse, T. G. (1994). “The Word Connection." PhD thesis, Rijksuniversiteit Leiden, Neslia Paniculata Uitgeverij, Enschede.