2006 BroadCoverageSenseDisambigAndIEwithSST

(Ciaramita & Altun, 2006) ⇒ Massimiliano Ciaramita, Yasemin Altun. (2006). “Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2006).

Subject Headings: SuperSenseTagger, Semcor Corpus, Hidden Markov Model.

Notes

Cited By

~37 http://scholar.google.com/scholar?cites=6151778341530463394

Quotes

Abstract

In this paper we approach word sense disambiguation and information extraction as a unified tagging problem. The task consists of annotating text with the tagset defined by the 41 Wordnet supersense classes for nouns and verbs. Since the tagset is directly related to Wordnet synsets, the tagger returns partial word sense disambiguation. Furthermore, since the noun tags include the standard named entity detection classes – person, location, organization, time, etc. – the tagger, as a by-product, returns extended named entity information. We cast the problem of supersense tagging as a sequential labeling task and investigate it empirically with a discriminatively-trained Hidden Markov Model. Experimental evaluation on the main sense-annotated datasets available, i.e., Semcor and Senseval, shows considerable improvements over the best known “first-sense” baseline.

1. Introduction

…
This paper presents a novel approach to broad-coverage information extraction and word sense disambiguation. Our goal is to simplify the disambiguation task, for both nouns and verbs, to a level at which it can be approached as any other tagging problem, and can be solved with state of the art methods. As a by-product, this task includes and extends NER. We define a tagset based on Wordnet’s lexicographers classes, or supersenses (Ciaramita and Johnson, 2003), cf. Table 1. The size of the supersense tagset allows us to adopt a structured learning approach, which takes local dependencies between labels into account. To this extent, we cast the supersense tagging problem as a sequence labeling task and train a discriminative Hidden Markov Model (HMM), based on that of Collins (2002), on the manually annotated Semcor corpus (Miller et al., 1993). In two experiments we evaluate the accuracy of the tagger on the Semcor corpus itself, and on the English “all words” Senseval 3 shared task data (Snyder and Palmer, 2004). The model outperforms remarkably the best known baseline, the first sense heuristic – to the best of our knowledge, for the first time on the most realistic “all words” evaluation setting.
Table 1. Nouns and verbs supersense labels, and short description (from the Wordnet documentation).

2 Supersense tagset

Supersense tagging is inspired by similar considerations, but in a domain-independent setting; e.g., verb supersenses can label semantic interactions between nominal concepts. The following sentence (Example 1), extracted from the data – further described in Section 5.1 – shows the information captured by the supersense tagset:
- (1) Clara Harris_n.person, one of the guestsn.person in the box_n.artifact, stood up_v.motion and demandedv.communication water_n.substance.
As Example 1 shows there is more information that can be extracted from a sentence than just the names; e.g. the fact that “Clara Harris” and the following “guests” are both tagged as “person” might suggest some sort of co-referentiality, while the coordination of verbs of motion and communication, as in “stood up and demanded”, might be useful for language modeling purposes. In such a setting, structured learning methods, e.g., sequential, can help tagging by taking the senses of the neighboring words into account.

4 Sequence Tagging

We take a sequence labeling approach to learning a model for supersense tagging. Our goal is to learn a function from input vectors, the observations from labeled data, to response variables, the supersense labels. POS tagging, shallow parsing, NP-chunking and NER are all examples of sequence labeling tasks in which performance can be significantly improved by optimizing the choice of labeling over whole sequences of words, rather than individual words.

6 Conclusions

In this paper we presented a novel approach to broad-coverage word sense disambiguation and information extraction. We defined a tagset based on Wordnet supersenses, a much simpler and general semantic model than Wordnet which, however, preserves significant polysemy information and includes standard named entity recognition categories. We showed that in this framework it is possible to perform accurate broad-coverage tagging with state of the art sequence learning methods. The tagger considerably outperformed the most competitive baseline on both Semcor and Senseval data. To the best of our knowledge the results on Senseval data provide the first convincing evidence of the possibility of improving by considerable amounts over the first sense baseline.
We believe both the tagset and the structured learning approach contribute to these results. The simplified representation obviously helps by reducing the number of possible senses for each word (cf. Table 3). Interestingly, the relative improvement in performance is not as large as the relative reduction in polysemy. This indicates that sense granularity is only one of the problems in WSD. More needs to be understood concerning sources of information, and processes, that affect word sense selection in context. As far as the tagger is concerned, we applied the simplest feature representation, more sophisticated features can be used, e.g., based on kernels, which might contribute significantly by allowing complex feature combinations. These results also suggest new directions of research within this model. In particular, the labels occurring in each sequence tend to coincide with predicates (verbs) and arguments (nouns and named entities). A sequential dependency model might not be the most accurate at capturing the grammatical dependencies between these elements. Other conditional models, e.g., designed on head to head, or similar, dependencies could prove more appropriate.
Another interesting issue is the granularity of the tagset. Supersenses seem more practical then synsets for investigating the impact of broad-coverage semantic tagging, but they define a very simplistic ontological model. A natural evolution of this kind of approach might be one which starts by defining a semantic model at an intermediate level of abstraction (cf. (Ciaramita et al., 2005)).

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2006 BroadCoverageSenseDisambigAndIEwithSST	Massimiliano Ciaramita Yasemin Altun			Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger		Proceedings of the Conference on Empirical Methods in Natural Language Processing	http://acl.ldc.upenn.edu/W/W06/W06-1670.pdf			2006