2015 RubikKnowledgeGuidedTensorFacto

(Wang et al., 2015) ⇒ Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Bradley A. Malin, and Jimeng Sun. (2015). “Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.” In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783395

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Computational phenotyping; constraint optimization; data mining; healthcare analytics; tensor analysis

Abstract

Computational phenotyping is the process of converting heterogeneous electronic health records (EHRs) into meaningful clinical concepts. Unsupervised phenotyping methods have the potential to leverage a vast amount of labeled EHR data for phenotype discovery. However, existing unsupervised phenotyping methods do not incorporate current medical knowledge and cannot directly handle missing, or noisy data.

We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. Rubik incorporates 1) guidance constraints to align with existing medical knowledge, and 2) pairwise constraints for obtaining distinct, non-overlapping phenotypes. Rubik also has built-in tensor completion that can significantly alleviate the impact of noisy and missing data. We utilize the Alternating Direction Method of Multipliers (ADMM) framework to tensor factorization and completion, which can be easily scaled through parallel computing.

We evaluate Rubik on two EHR datasets, one of which contains 647, 118 records for 7, 744 patients from an outpatient clinic, the other of which is a public dataset containing 1, 018, 614 CMS claims records for 472, 645 patients. Our results show that Rubik can discover more meaningful and distinct phenotypes than the baselines. In particular, by using knowledge guidance constraints, Rubik can also discover sub-phenotypes for several major diseases. Rubik also runs around seven times faster than current state-of-the-art tensor methods. Finally, Rubik is scalable to large datasets containing millions of EHR records.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2015 RubikKnowledgeGuidedTensorFacto	Joydeep Ghosh Jimeng Sun Yichen Wang Robert Chen Joshua C. Denny Abel Kho You Chen Bradley A. Malin			Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics				10.1145/2783258.2783395		2015

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2015_RubikKnowledgeGuidedTensorFacto&oldid=901591"