Keyphrase Extraction Task
A Keyphrase Extraction Task is an IE from text task that requires the extraction of keyphrases from a text corpus.
- Context:
- Input: a Text Item Set.
- optional: an output (Keyphrase List) size.
- optional: a Controlled Vocabulary.
- output: a Keyphrase List.
- It can be solved by a Keyphrase Extraction System (that implements a keyphrase extraction algorithm).
- It can range from being a Manual Keyphrase Extraction Task to being an Automatic Keyphrase Extraction Task.
- …
- Input: a Text Item Set.
- Example(s):
- Counter-Example(s):
- See: Subject Heading, Taxonomy.
References
2021
- (Wikipedia - Keyphrase Extraction) ⇒ https://en.wikipedia.org/wiki/Automatic_summarization#Keyphrase_extraction
- QUOTE: The task is the following. You are given a piece of text, such as a journal article, and you must produce a list of keywords or key[phrase]s that capture the primary topics discussed in the text.[1] In the case of research articles, many authors provide manually assigned keywords, but most text lacks pre-existing keyphrases. For example, news articles rarely have keyphrases attached, but it would be useful to be able to automatically do so for a number of applications discussed below.
Consider the example text from a news article: “The Army Corps of Engineers, rushing to meet President Bush's promise to protect New Orleans by the start of the 2006 hurricane season, installed defective flood-control pumps last year despite warnings from its own expert that the equipment would fail during a storm, according to documents obtained by The Associated Press".
A keyphrase extractor might select "Army Corps of Engineers", "President Bush", "New Orleans", and "defective flood-control pumps" as keyphrases. These are pulled directly from the text. In contrast, an abstractive keyphrase system would somehow internalize the content and generate keyphrases that do not appear in the text, but more closely resemble what a human might produce, such as "political negligence" or "inadequate protection from floods". Abstraction requires a deep understanding of the text, which makes it difficult for a computer system.
Keyphrases have many applications. They can enable document browsing by providing a short summary, improve information retrieval (if documents have keyphrases assigned, a user could search by keyphrase to produce more reliable hits than a full-text search), and be employed in generating index entries for a large text corpus. Depending on the different literature and the definition of key terms, words or phrases, keyword extraction is a highly related theme.
- QUOTE: The task is the following. You are given a piece of text, such as a journal article, and you must produce a list of keywords or key[phrase]s that capture the primary topics discussed in the text.[1] In the case of research articles, many authors provide manually assigned keywords, but most text lacks pre-existing keyphrases. For example, news articles rarely have keyphrases attached, but it would be useful to be able to automatically do so for a number of applications discussed below.
2014
- (Hasanaidul & Ng, 2014) ⇒ Kazi S. Hasanaidul, and Vincent Ng. (2014). “Automatic Keyphrase Extraction: A Survey of the State of the Art.” In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262-1273.
2010
- (Kim et al., 2010) ⇒ Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. (2010). “Semeval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles.” In: Proceedings of the 5th International Workshop on Semantic Evaluation.
- (Li, Zhou et al., 2010) ⇒ Zhenhui Li, Ding Zhou, Yun-Fang Juan, and Jiawei Han. (2010). “Keyword Extraction for Social Snippets.” In: Proceedings of the 19th International Conference on World Wide Web, (WWW-2010).
2007
2000
- (Turney, 2000) ⇒ Peter D. Turney. (2000). “Learning Algorithms for Keyphrase Extraction.” In: Journal of Information Retrieval, 2(4). doi:10.1023/A:1009976227802
- … Many journals ask their authors to provide a list of keywords for their articles. We call these keyphrases, rather than keywords, because they are often phrases of two or more words, rather than single words. We define a keyphrase list as a short list of phrases (typically five to fifteen noun phrases) that capture the main topics discussed in a given document. This paper is concerned with the automatic extraction of keyphrases from text.
1999
- (Frank et al., 1999) ⇒ Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. (1999). “Domain-Specific Keyphrase Extraction.” In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 1999)
- ↑ Alrehamy, Hassan H; Walker, Coral (2018). "SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation". Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing. 650. pp. 222–235. doi:10.1007/978-3-319-66939-7_19. ISBN 978-3-319-66938-0.