2015 RubikKnowledgeGuidedTensorFacto
- (Wang et al., 2015) ⇒ Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Bradley A. Malin, and Jimeng Sun. (2015). “Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.” In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783395
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222015%22+Rubik%3A+Knowledge+Guided+Tensor+Factorization+and+Completion+for+Health+Data+Analytics
- http://dl.acm.org/citation.cfm?id=2783258.2783395&preflayout=flat#citedby
Quotes
Author Keywords
- Computational phenotyping; constraint optimization; data mining; healthcare analytics; tensor analysis
Abstract
Computational phenotyping is the process of converting heterogeneous electronic health records (EHRs) into meaningful clinical concepts. Unsupervised phenotyping methods have the potential to leverage a vast amount of labeled EHR data for phenotype discovery. However, existing unsupervised phenotyping methods do not incorporate current medical knowledge and cannot directly handle missing, or noisy data.
We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. Rubik incorporates 1) guidance constraints to align with existing medical knowledge, and 2) pairwise constraints for obtaining distinct, non-overlapping phenotypes. Rubik also has built-in tensor completion that can significantly alleviate the impact of noisy and missing data. We utilize the Alternating Direction Method of Multipliers (ADMM) framework to tensor factorization and completion, which can be easily scaled through parallel computing.
We evaluate Rubik on two EHR datasets, one of which contains 647, 118 records for 7, 744 patients from an outpatient clinic, the other of which is a public dataset containing 1, 018, 614 CMS claims records for 472, 645 patients. Our results show that Rubik can discover more meaningful and distinct phenotypes than the baselines. In particular, by using knowledge guidance constraints, Rubik can also discover sub-phenotypes for several major diseases. Rubik also runs around seven times faster than current state-of-the-art tensor methods. Finally, Rubik is scalable to large datasets containing millions of EHR records.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2015 RubikKnowledgeGuidedTensorFacto | Joydeep Ghosh Jimeng Sun Yichen Wang Robert Chen Joshua C. Denny Abel Kho You Chen Bradley A. Malin | Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics | 10.1145/2783258.2783395 | 2015 |