2014 SemanticbasedMultilingualDocume
- (Romeo et al., 2014) ⇒ Salvatore Romeo, Andrea Tagarelli, and Dino Ienco. (2014). “Semantic-based Multilingual Document Clustering via Tensor Modeling.” In: EMNLP, Conference on Empirical Methods in Natural Language Processing.
Subject Headings: Multilingual Document Clustering.
Notes
Cited By
Quotes
Abstract
A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cope with this issue we propose a new document clustering approach for multilingual corpora that (i) exploits a large-scale multilingual knowledge base, (ii) takes advantage of the multi-topic nature of the text documents, and (iii) employs a tensor-based model to deal with high dimensionality and sparseness. Results have shown the significance of our approach and its better performance w.r.t. classic document clustering approaches, in both a balanced and an unbalanced corpus evaluation.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2014 SemanticbasedMultilingualDocume | Salvatore Romeo Andrea Tagarelli Dino Ienco | Semantic-based Multilingual Document Clustering via Tensor Modeling |