Morphological Analysis Task

A Morphological Analysis Task is a Natural Language Processing Task that converts Surface Form to a Grammatical Form.

AKA: Morpheme Detection Task.
Context:
- Task Input: Surface Form of a Linguistic Expression or Word Sequence.
- Task Output: Grammatical Representation of the surface form (e.g. Morphemes, Grammatical Feature, and other Morphological Tags).
- It can be solved by a Morphological Analysis System (that implements Morphological Analysis Algorithms).
- It can perform a Linguistic Analysis of the internal structures of Words and how they can be modified.
- It can range from being a Morpheme-based Morphological Analysis Task, to being a Lexeme-based Morphological Analysis Task, to being a Word-based Morphological Analysis Task.
- It can range from being a Computer-Assisted Morphological Analysis Task, to being a Supervised Morphological Analysis Task to being a Unsupervised Morphological Analysis Task.
- It can range from being a Morphological Parsing Task, to being Morphological Rule Analysis Task, to being a Non-concatenative Morphological Analysis Task.
- It can combine the following machine learning tasks:
  - Word Sense Disambiguation (WSD) Task,
  - Text Tagging Task such as POS Tagging Task,
  - Suffix Stripping Task.
Example(s):
Counter-Example(s):
See: Text Syntactic Analysis, Morphological Tag, Morphological Inflection, Morphological Derivation, Part-of-Speech Tagging System, Word Sense Disambiguation, Minimum Description Length, Zipfian Sparsity, Gibbs Sampling, Adaptor Grammar, Non-concatenative Morphology, Allomorphy, Morphophonology, Recurrent Neural Network Language Model, Bio-Morphological Analysis, General Morphological Analysis, Mathematical Morphology.

References

2018

(Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/morphological_analysis Retrieved:2018-5-17.
- Morphological analysis is the analysis of morphology in various fields:
  - Morphological analysis (problem-solving) or general morphological analysis, a method for exploring all possible solutions to a multi-dimensional, non-quantified problem
  - Analysis of morphology (linguistics), the internal structure of words
  - Analysis of morphology (biology), the form and structure of organisms and their specific features
  - Mathematical morphology, a theory and technique for analysis and processing of images and geometrical structures
  - Morphological dictionary, in computational linguistics, a linguistic resource that contains correspondences between surface form and lexical forms of words

2018b

(Malaviya et al., 2018) ⇒ Chaitanya Malaviya, Matthew R. Gormley, and Graham Neubig. (2018). “Neural Factor Graph Models for Cross-lingual Morphological Tagging .” In: arXiv:1805.04570 Journal.
- QUOTE: Morphological analysis involves predicting the syntactic traits of a word (e.g. {POS:Noun, Case:Acc, Gender:Fem}). Previous work in morphological tagging improves performance for low-resource languages (LRLs) through cross-lingual training with a high-resource language (HRL) from the same family, but is limited by the strict, often false, assumption that tag sets exactly overlap between the HRL and LRL. …
  … Morphological analysis (Hajič and Hladká (1998), Oflazer and Kuruöz (1994), inter alia) is the task of predicting fine-grained annotations about the syntactic properties of tokens in a language such as part-of-speech, case, or tense. For instance, in Figure 1, the given Portuguese sentence is labeled with the respective morphological tags such as Gender and its label value Masculine.

Figure 1: Morphological tags for a UD sentence in Portuguese and a translation in Spanish

2017

(Goldsmith et al., 2017) ⇒ John A. Goldsmith, Jackson L. Lee, and Aris Xanthos. (2017). “Computational Learning of Morphology.” In: Annual Review of Linguistics Journal, 3. doi:10.1146/annurev-linguistics-011516-034017
- QUOTE: A natural way to evaluate morphological analysis is to treat each position between letters (phonemes) as a site of a possible morpheme break; if we have a gold standard created by a human with an indication of the true segmentation, then we can evaluate which of the predicted breaks are true and which false, and we can do the same for position for which breaks were not predicted. An alternative approach is to evaluate the quality of a morphological learner’s output on the basis of how much that analysis improves the results of a larger system in which it is included.
  (...)
  Most of the more successful work is based fundamentally on the metaphorical understanding that grammar learning consists of a search through grammar space, typically one small step at a time. That is, we can imagine the specification of a grammar as locating it as a point in a space of very high dimensionality, and the task of finding the correct grammar is conceived of as one of traveling through that space. Methods differ as to where in grammar space the search should start: some assume that we start in a random location, while other methods allow one to start at a grammar that is reasonably close to the final solution. In this section we will briefly describe three approaches that have been used in this literature, Minimum Description Length (MDL) analysis, Gibbs sampling, and adaptor grammars.
  All of these approaches have been developed in the context of probabilistic models, and involve different aspects of a search algorithm through the space of possible grammars (here, morphologies) to find one or more grammars that score high on a test based on probability. Probability assigned to training data is used as a way to quantify the notion of “goodness of fit”, in the sense that the higher the probability is that a grammar assigns to a set of data, the better the goodness of fit. The three approaches are not, strictly speaking, alternatives; one could adopt any subset of the three in implementing a system.

2008a

(Gasser, 2008) ⇒ Michael Gasser (2008)."Morphological Analysis and Generation in Computer-Assisted Teaching of Indigenous Languages". School of Informatics. Indiana University
- QUOTE: Morphological analysis:
  - Converts a surface form to a lexical/grammatical form;
  - A surface form is analyzed into its constituent morphemes:
    - kinawilo → k-in-aw-il-o
  - A surface form is analyzed into a representation of its grammatical features:
    - kinawilo →
      [root=‘il’,
      abs=[prs=1,num=sing],
      erg=[prs=2,num=sing,-form],
      tam=incmpl]

2008b

(Saranya, 2008) ⇒ S. K. Saranya. (2008). “Morphological Analyzer for Malayalam Verbs.” In: M. Tech Thesis, Amrita School of Engineering, Coimbatore.
- QUOTE: Morphological Analysis: Individual words are analyzed into their components and nonword tokens such as punctuation are separated from the words(...)
  Suppose we have an English interface to an operating system and the following sentence is typed: I want to print Bill’s .init file. Morphological analysis must do the following things:
  - Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s”.
  - Recognize the sequence “.init” as a file extension that is functioning as an adjective in the sentence.

This process will usually assign syntactic categories to all the words in the sentence. Consider the word “prints”. This word is either a plural noun or a third person singular verb (he prints )(...)

Morphological analyzer and morphological generator are two essential and basic tools for building any language processing application. Morphological Analysis is the process of providing grammatical information of a word given its suffix. Morphological analyzer is a computer program which takes a word as input and produces its grammatical structure as output. A morphological analyzer will return its root/stem word along with its grammatical information depending upon its word category. For nouns it will provide gender, number, and case information and for verbs, it will be tense, aspects, and modularity(...)

Various NLP research groups have developed different methods and algorithm for morphological analysis. Some of the algorithms are language dependent and some of them are language independent. A brief survey of various methods involved in Morphological Analysis includes the following:

2004

(Diab et al., 2004) ⇒ Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. (2004). “Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks.”. In: Proceedings of HLT-NAACL 2004: Short Papers. ISBN:1-932432-24-8
- QUOTE: Morphological analysis may be characterized as the process of segmenting a surface word form into its component derivational and inflectional morphemes."

2001

(Goldsmith, 2001) ⇒ John Goldsmith. (2001). “Unsupervised Learning of the Morphology of a Natural Language". In: Computational Linguistics Journal, 27(2). doi:10.1162/089120101750300490
- QUOTE: The central task of morphological analysis is the segmentation of words into the components that form the word by the operation of concatenation. While that view is not free of controversy, it remains the traditional conception of morphology, and the one that we shall employ here ^[1]. Issues of interface with phonology, traditionally known as morphophonology, and with syntax are not directly addressed. While some of the discussion is relevant to the unrestricted set of languages, some of the assumptions made in the implementation restrict the useful application of the algorithms to languages in which the average number of affixes per word is less than what is found in such languages as Finnish, Hungarian, and Swahili, and we restrict our testing in the present report to more widely studied European languages. Our general goal, however, is the treatment of unrestricted natural languages.

↑ Sylvain Neuvel has recently produced an interesting computational implementation of a theory of morphology that does not have a place for morphemes, as described at http://www.neuvel.net. It is well established that nonconcatenative morphology is found in some scattered language families, notably Semitic and Penutian. African tone languages require simultaneous morphological analyses of the tonal and the segmental material.

1998

(Hajič & Hladká, 1998) ⇒ Jan Hajič, and Barbora Hladká. (1998). “Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset.” In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1. doi:10.3115/980845.980927
- QUOTE: Given the nature of inflectional languages, which can generate many (sometimes thousands of) forms for a given lemma (or “dictionary entry"), it is necessary to employ morphological analysis before the tagging proper. In Czech, there are as many as 5 different lemmas (not counting underlying derivations nor word senses) and up to 108 different tags for an input word form.

1994

(Oflazer & Kuruoz, 1994) ⇒ Kemal Oflazer, and Ilker Kuruoz. (1994). “Tagging and Morphological Disambiguation of Turkish Text." In: Proceedings of the 4th Conference on Applied Natural Language (ANLC 1994). doi:10.3115/974358.974391
- QUOTE: Morphological analysis does not have access to syntactic context, so when the morphological structure of a lexical form has several distinct analyses, it is not possible to disambiguate such cases except maybe by using root usage frequencies. For disambiguation one may have to use information provided by sentential position and the local morphosyntactic context.

[1]