Lexical Chaining Task
A Lexical Chaining Task is Lexical Analysis Task that requires the identification of the Lexical Chains in a Text.
- Context:
- It can be solved by Lexical Chaining System that implements a Lexical Chaining Algorithm.
- It can range from being an Unsupervised Lexical Chaining Task to being a Supervised Lexical Chaining Task.
- Example(s):
- …
- Counter-Example(s):
- See: Lexical Chain, Natural Language Processing, Lexicon, Morphological Analysis, Clustering Algorithm, Lexical Cohesion, English Language Writing, Anaphora Resolution, Lexical Cohesion.
References
2015
- (Chakraverty et al., 2015) ⇒ S. Chakraverty, B. Juneja, U. Pandey, and A. Arora (2015, March). "Dual lexical chaining for context based text classification". In Computer Engineering and Applications (ICACEA), 2015 International Conference on Advances in (pp. 432-439). IEEE. DOI: 10.1109/ICACEA.2015.7164744
- ABSTRACT: Text Classification enhances the accessibility and systematic organization of the vast reserves of data populating the world-wide-web. Despite great strides in the field, the domain of context driven text classification provides fresh opportunities to develop more efficient context oriented techniques with refined metrics. In this paper, we propose a novel approach to categorize text documents using a dual lexical chaining technique. The algorithm first prepares a cohesive category-keyword matrix by feeding category names into the WordNet and Wikipedia ontology, extracting lexically and semantically related keywords from them and then adding to the keywords by employing a keyword enrichment process. Next, the WordNet is referred again to find the degree of lexical cohesiveness between the tokens of a document. Terms that are strongly related are woven together into two separate lexical chains; one for their noun senses and another for their verb senses, that represent the feature set for the document. This segregation enables a better expression of word cohesiveness as concept terms and action terms are treated distinctively. We propose a new metric to calculate the strength of a lexical chain. It includes a statistical part given by Term Frequency-Inverse Document Frequency-Relative Category Frequency (TF-IDF-RCF) which itself is an improvement upon the conventional TF-IDF measure. The chain's contextual strength is determined by the degree of its lexical matching with the category-keyword matrix as well as by the relative positions of its constituent terms. Results indicate the efficacy of our approach. We obtained an average accuracy of 90% on 6 categories derived from the 20 News Group and the Reuters corpora. Lexical chaining has been applied successfully to text summarization. Our results indicate a positive direction towards its usefulness for text classification.
2014
- (Somasundaran, Burstein, & Chodorow, 2014) ⇒ Swapna Somasundaran, Jill Burstein, and Martin Chodorow (2014). "Lexical chaining for measuring discourse coherence quality in test-taker essays". In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 950-961).
- QUOTE: In this paper we explore how lexical chains can be employed to measure coherence in essays. Specifically, our goal is to investigate how attributes of lexical chains can encode discourse coherence quality, such as adherence to the essay topic, elaboration, usage of varied vocabulary, and sound organization of thoughts and ideas. To do this, we build lexical chains and extract linguistically-motivated features from them. The number of chains and their properties, such as length, density and link strength, can potentially reveal discourse qualities related to focus and elaboration. In addition, features that capture the interactions between chains and explicit discourse cues, such as transition words, can show if the cohesive elements in text have been organized in a coherent fashion.
The main contributions of this paper are as follows: We use lexical chaining features to train a discourse coherence classifier on annotated essays from six different essay-writing tasks which differ in essay genre and/or test-taker population. We then perform experiments to measure the effect of the features when they are used alone and when they are combined with state-of-the-art features to classify the coherence quality of essays. Our results indicate that lexical chaining features yield better results than discourse features previously explored for this task and that the best performing feature combinations contain lexical chaining features. We also show that lexical chaining features can improve system performance across multiple genres of writing and populations. Our efforts result in the creation of a higher performing state-of-the-art feature set for measuring coherence in test-taker writing.
- QUOTE: In this paper we explore how lexical chains can be employed to measure coherence in essays. Specifically, our goal is to investigate how attributes of lexical chains can encode discourse coherence quality, such as adherence to the essay topic, elaboration, usage of varied vocabulary, and sound organization of thoughts and ideas. To do this, we build lexical chains and extract linguistically-motivated features from them. The number of chains and their properties, such as length, density and link strength, can potentially reveal discourse qualities related to focus and elaboration. In addition, features that capture the interactions between chains and explicit discourse cues, such as transition words, can show if the cohesive elements in text have been organized in a coherent fashion.
2011a
- (Carthy, 2011) ⇒ http://www.csi.ucd.ie/staff/jcarthy/home/Lex.html
- QUOTE: Any given document, and in particular a news story, will have typically have a central theme or focus. Computing the lexical chains in a document is one technique that can be used to identify the central theme of a document. This in turn leads to the identification of the key section(s) of the document which can then be used for summarisation purposes. By developing the theory of lexical chaining we postulate that it will be possible to build more sophisticated summarization techniques than the simple keyword-based ones that dominate in current commercial systems(...)
The notion of lexical chaining derives from work in the area of textual cohesion in linguistics (Halliday and Hasan 1976). The linguistics term text is used to refer to any passage, spoken or written, that forms a unified whole. This unity or cohesion may be due, for example, to an anaphoric reference which provides cohesion between sentences. Cohesion is brought about by the referring item and the item it refers to. For example, in the sentences "John ate the apple. He thought it was delicious." the word it in the second sentence refers back to apple in the first sentence, and the word he refers back to John. There are a number of forms of cohesion such as reference, substitution, ellipsis, conjunction and lexical cohesion which is of primary interest in this research. Where the cohesive elements occur over a number of sentences a cohesive chain is formed. For example: John had mud pie for dessert. Mud pie is made of chocolate. John really enjoyed it. The word it in the third sentence refers back to dessert in the first sentence. In this example it can also be seen that repetition (mud pie in the first and second sentence) also contributes to the cohesion of the text(...)
A lexical chain is a sequence of related words in the text, spanning short (adjacent words or sentences) or long distances (entire text). A chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text. A lexical chain can provide a context for the resolution of an ambiguous term and enable identification of the concept that the term represents. WordNet is one lexical resource that may be used in the identification of lexical chains.
- QUOTE: Any given document, and in particular a news story, will have typically have a central theme or focus. Computing the lexical chains in a document is one technique that can be used to identify the central theme of a document. This in turn leads to the identification of the key section(s) of the document which can then be used for summarisation purposes. By developing the theory of lexical chaining we postulate that it will be possible to build more sophisticated summarization techniques than the simple keyword-based ones that dominate in current commercial systems(...)
2011b
- (Ghose, 2011) ⇒ Abhishek Ghose (2011). "Supervised Lexical Chaining". Master of Science Thesis. Department of Computer Science and Engineering, Indian Institute of Technology, Madras.
- QUOTE: Lexical chaining is a method of grouping semantically related words in a document. Groups thus obtained are known as lexical chains. Lexical chains provide a rich representation of text, and have been used in various tasks like discourse analysis, summarization, corrections of malapropisms, amongst many others, with reasonable success. However, despite the general applicability of the method, its use is limited by the fact that chaining algorithms often group weakly related or unrelated words together. The large amount of time required for mining chains from a document also make it unsuitable for certain tasks. In our research, we look at applying supervised learning methods to address these drawbacks.
We propose two supervised algorithms as viable alternatives to classical algorithms. These algorithms rely on certain probabilistic properties of usage of words in text. We empirically establish the relevance of these properties to rapid construction of high quality lexical chains through experiments we have performed. Using lexical chains formed over sense tagged documents as training data, along with a knowledge of these properties, our algorithms are shown to be capable of reliably constructing chains on a test set of documents.
Although, we believe that exploring supervised learning for chaining is a worthy investigation in its own right, we provide certain encouraging results in defence of the approach. We compare our algorithms to a classical chaining algorithm and report a 44% improvement in quality and 55 times improvement in speed. Our experiments were performed on the SemCor 3.0 dataset.
- QUOTE: Lexical chaining is a method of grouping semantically related words in a document. Groups thus obtained are known as lexical chains. Lexical chains provide a rich representation of text, and have been used in various tasks like discourse analysis, summarization, corrections of malapropisms, amongst many others, with reasonable success. However, despite the general applicability of the method, its use is limited by the fact that chaining algorithms often group weakly related or unrelated words together. The large amount of time required for mining chains from a document also make it unsuitable for certain tasks. In our research, we look at applying supervised learning methods to address these drawbacks.
2003
- (Galley and McKeown, 2003) ⇒ Michel Galley, and Kathleen R. McKeown. (2003). “Improving Word Sense Disambiguation in Lexical Chaining.” In: Proceedings of IJCAI (2003). Poster paper.
- QUOTE: Lexical chaining is the process of connecting semantically related words, creating a set of chains that represent different threads of cohesion through the text. This intermediate representation of text has been used in many natural language processing applications, including automatic summarization (...)
1991
- (Morris & Hirst, 1991) ⇒ Jane Morris, and Graeme Hirst (1991). "Lexical cohesion computed by thesaural relations as an indicator of the structure of text". Computational linguistics, 17(1), 21-48.
- ABSTTRACT: In text, lexical cohesion is the result of chains of related words that contribute to the continuity of lexical meaning. These lexical chains are a direct result of units of text being "about the same thing," and finding text structure involves finding units of text that are about the same thing. Hence, computing the chains is useful, since they will have a correspondence to the structure of the text. Determining the structure of text is an essential step in determining the deep meaning of the text. In this paper, a thesaurus is used as the major knowledge base for computing lexical chains. Correspondences between lexical chains and structural elements are shown to exist. Since the lexical chains are computable, and exist in non-domain-specific text, they provide a valuable indicator of text structure. The lexical chains also provide a semantic context for interpreting words, concepts, and sentences.
1976
- (Halliday & Hasan, 1976) ⇒ Michael AK Halliday, and Ruqaiya Hasan. (1976). “Cohesion in Spoken and Written English." In: Longman's, London.