2020 PatentDocumentClusteringwithDee
- (Kim, Yoon et al., 2020) ⇒ Jaeyoung Kim, Janghyeok Yoon, Eunjeong Park, and Sungchul Choi. (2020). “Patent Document Clustering with Deep Embeddings.” In: Scientometrics, 123. doi:10.1007/s11192-020-03396-7
Subject Headings: Text Embedding Clustering, Patent Clustering.
Notes
- It proposes a method for automatically clustering patent documents using deep learning techniques.
- It uses a neural embedding approach called Doc2Vec to convert the text of patent abstracts into embedding vectors.
- It then applies a modified deep embedded clustering (DEC) algorithm to cluster the patent embeddings.
- It compares performance to traditional clustering methods like k-means on tf-idf and bag-of-words features.
- It finds the proposed Doc2Vec + DEC method achieves higher accuracy than the baselines.
- It visualizes the embeddings using t-SNE to show the DEC optimization process increases within-cluster coupling.
- It discusses the improved performance is due to strengthening similarity and optimizing cluster boundaries.
- It highlights the efficiency gains of using negative sampling and KL divergence in DEC over methods like t-SNE.
- It concludes the deep learning approach shows promise for patent analysis tasks like clustering.
- It suggests future work on incorporating patent metadata, improving speed for full documents, and data visualization applications.
Cited By
Quotes
Abstract
The analysis of scientific and technical documents is crucial in the process of establishing science and technology strategies. One popular method for such analysis is for field experts to manually classify each scientific or technical document into one of several predefined technical categories. However, not only is manual classification error-prone and expensive, but it also requires extended efforts to handle frequent data updates. In contrast, machine learning and text mining techniques enable cheaper and faster operations, and can alleviate the burden on human resources. In this paper, we propose a method for extracting embedded feature vectors by applying a neural embedding approach for text features in patent documents and automatically clustering the embedding features by utilizing a deep embedding clustering method.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2020 PatentDocumentClusteringwithDee | Jaeyoung Kim Janghyeok Yoon Eunjeong Park Sungchul Choi | Patent Document Clustering with Deep Embeddings | 10.1007/s11192-020-03396-7 | 2020 |