Tag Dictionary
A Tag Dictionary is a Dictionary that contains all possible POS tags that have been predefined and assigned to words in the test data.
- AKA: Tagger Dictionary, POS Tagger Dictionary.
- Context:
- It can be used by a Part-of-Speech Tagger.
- It can from being a Statistical Parser Tag Dictionary to being a Context-based Parser Tag Dictionary.
- Example(s):
- Counter-Example(s):
- See: POS Tagging Task, NLP Tagging Task, Hidden Markov Model, Unsupervised POS Tagging.
References
2016
- (LanguageTool Wiki, 2016) ⇒ http://wiki.languagetool.org/developing-a-tagger-dictionary Last edited: 25 Apr 2016. Retrieved: 2020-03-22.
- QUOTE: A tagger, or POS (part-of-speech) tagger is used to tag, or annotate, words with their respective part-of-speech information (see Wikipedia for more background information). Actually, POS tags usually convey more information, such as morphological information (plural / singular etc.).
Most taggers in LanguageTool are dictionary-based because statistical or context-oriented taggers are trained to ignore occasional grammar errors. For this reason, their output will be correct even if the input was in fact incorrect. While this is a desired behavior for most natural language processing applications, in grammar checking it is simply wrong.
We haven't however tried to train any statistical taggers on incorrect input to correct their output. This remains to be tested by someone who has enough time. However, we did test lexicon-based taggers/lemmatisers. For most languages, we use finite-state automata encoding for them. This means that the plain text files are prepared with a tool in the morfologik-stemming library. The resulting binary files are then used at runtime from Java code by morfologik-stemming library which is bundled with LanguageTool.
- QUOTE: A tagger, or POS (part-of-speech) tagger is used to tag, or annotate, words with their respective part-of-speech information (see Wikipedia for more background information). Actually, POS tags usually convey more information, such as morphological information (plural / singular etc.).
2015
- (Moore, 2015) ⇒ Robert Moore (2015) "An Improved Tag Dictionary for Faster Part-of-Speech Tagging". In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015). DOI:10.18653/v1/D15-1151
- QUOTE: In this paper, we present a new method of constructing tag dictionaries for part-of-speech (POS) tagging. A tag dictionary is simply a list of words[1] along with a set of possible tags for each word listed, plus one additional set of possible tags for all words not listed. Tag dictionaries are commonly used to speed up POS-tag inference by restricting the tags considered for a particular word to those specified by the dictionary.
- ↑ According to the conventions of the field, POS tags are assigned to all tokens in a tokenized text, including punctuation marks and other non-word tokens. In this paper, all of these will be covered by the term word.
2014
- (Moore, 2014) ⇒ Robert C. Moore (2014). "Fast High-Accuracy Part-of-Speech Tagging by Independent Classifiers". In: Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING 2014).
- QUOTE: We construct our tag dictionary based on a “bigram” model of the probability $p(t|w)$ of a tag $t$ given a word $w$, estimated from the annotated training data. The probabilities for tags that have never been seen with a given word, as well as all the tag probabilities for unknown words, are estimated by interpolation with a “unigram” distribution over the tags.
2001
- (Sarkar,2001) ⇒ Anoop Sarkar. (2001). “Applying Co-training Methods to Statistical Parsing.” In: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies (NAACL 2001) doi:10.3115/1073336.1073359
- QUOTE: Early work in combining labeled and unlabeled data for NLP tasks was done in the area of unsupervised part of speech (POS) tagging. (Cutting et al., 1992) reported very high results (96% on the Brown corpus) for unsupervised POS tagging using Hidden Markov Models (HMMs) by exploiting hand-built tag dictionaries and equivalence classes. Tag dictionaries are predefined assignments of all possible POS tags to words in the test data. This impressive result triggered several follow-up studies in which the effect of hand tuning the tag dictionary was quantified as a combination of labeled and unlabeled data. The experiments in (Merialdo, 1994; Elworthy, 1994) showed that only in very specific cases HMMs were effective in combining labeled and unlabeled data. However, (Brill, 1997) showed that aggressively using tag dictionaries extracted from labeled data could be used to bootstrap an unsupervised POS tagger with high accuracy (approx 95% onWSJ data). We exploit this approach of using tag dictionaries in our method as well (see Section 3.2 for more details). It is important to point out that, before attacking the problem of parsing using similar machine learning techniques, we face a representational problem which makes it difficult to define the notion of tag dictionary for a statistical parser.
1996
- (Ratnaparkhi, 1996) ⇒ Adwait Ratnaparkhi (1996). "A Maximum Entropy Model for Part-Of-Speech Tagging". In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 1996).
- QUOTE: This paper presents a statistical model which trains from a corpus annotated with Part-Of-Speech tags and assigns them to previously unseen text with state-of-the-art accuracy–96.6%). The model can be classified as a Maximum Entropy model and simultaneously uses many contextual "features" to predict the POS tag.
1995
- (Brill, 1995) ⇒ Eric Brill. (1995). “Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging.” In: Proceedings of the Third Workshop on Very Large Corpora (VLC@ACL 1995).
- QUOTE: In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic tagger. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate tagger using only a small amount of manually tagged text.
1994a
- (Merialdo, 1994) ⇒ Bernard Merialdo (1994). "Tagging English Text with a Probabilistic Model". In: Computational Linguistics 20(2).
- QUOTE: In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined: using text that has been tagged by hand and computing relative frequency counts, using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle. Experminents show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand-tagged text is available.
1994b
- (Elworthy, 1994) David Elworthy (1994). “Does Baum-Welch Re-estimation Help Taggers?". In: Proceedings of the Fourth Conference on Applied Natural Language Processing (ANLP 1994). DOI: 10.3115/974358.974371
- QUOTE: Part-of-speech tagging is the process of assigning grammatical categories to individual words in a corpus. One widely used approach makes use of a statistical technique called a Hidden Markov Model (HMM). The model is defined by two collections of parameters: the transition probabilities, which express the probability that a tag follows the preceding one (or two for a second order model); and the lexical probabilities, giving the probability that a word has a given tag without regard to words on either side of it. To tag a text, the tags with non-zero probability are hypothesised for each word, and the most probable sequence of tags given the sequence of words is determined from the probabilities. Two algorithms are commonly used, known as the Forward-Backward (FB) and Viterbi algorithms. FB assigns a probability to every tag on every word, while Viterbi prunes tags which cannot be chosen because their probability is lower than the ones of competing hypotheses, with a corresponding gain in computational efficiency.
1992
- (Cutting et al., 1992) ⇒ Douglas R. Cutting, Julian Kupiec, Jan O. Pedersen, and Penelope Sibun (1992). “A Practical Part-of-Speech Tagger.” In: Proceedings of the Third Conference on Applied Natural Language Processing (ANLP 1992). DOI:10.3115/974499.974523
- QUOTE: We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.