2015 MiningQualityPhrasesfromMassive
- (Liu, Shang et al., 2015) ⇒ Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han. (2015). “Mining Quality Phrases from Massive Text Corpora.” In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ISBN:978-1-4503-2758-9 doi:10.1145/2723372.2751523
Subject Headings: SegPhrase Segmenter, AutoPhrase, Text Segmentation, Phrasal Chunking.
Notes
Cited By
- http://scholar.google.com/scholar?q=%222015%22+Mining+Quality+Phrases+from+Massive+Text+Corpora
- http://dl.acm.org/citation.cfm?id=2723372.2751523&preflayout=flat#citedby
2018
- (Shang et al., 2018) ⇒ Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. (2018). “Automated Phrase Mining from Massive Text Corpora.” In: IEEE Transactions on Knowledge and Data Engineering Journal, PP(99). doi:10.1109/TKDE.2018.2812203
Quotes
Abstract
Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2015 MiningQualityPhrasesfromMassive | Chi Wang Jialu Liu Xiang Ren Jingbo Shang Jiawei Han | Mining Quality Phrases from Massive Text Corpora | 10.1145/2723372.2751523 | 2015 |