2024 AWholeSlideFoundationModelforDi
- (Xu, Usuyama et al., 2024) ⇒ Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, Yanbo Xu, Mu Wei, Wenhui Wang, Shuming Ma, Furu Wei, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Jaylen Rosemon, Tucker Bower, Soohee Lee, Roshanthi Weerasinghe, Bill J. Wright, Ari Robicsek, Brian Piening, Carlo Bifulco, Sheng Wang, and Hoifung Poon. (2024). “A Whole-slide Foundation Model for Digital Pathology from Real-world Data.” In: Nature. doi:10.1038/s41586-024-07441-w
Subject Headings: Image Data Encoding, Image Data Encoder, Image Transformer Model, LongNet.
Notes
- The paper presents Prov-GigaPath, a whole-slide foundation model for digital pathology, pretrained on 1.3 billion image tiles from Providence Health Network.
- The paper describes the GigaPath vision transformer architecture, which adapts the LongNet method to handle gigapixel pathology slides, enabling both local and global pattern recognition.
- The paper benchmarks Prov-GigaPath's performance, showing state-of-the-art results on 25 out of 26 digital pathology tasks, with significant improvements over existing models.
- The paper highlights the extensive pretraining effort using a large and diverse dataset, significantly larger than TCGA (The Cancer Genome Atlas), ensuring comprehensive and robust model training.
- The paper demonstrates Prov-GigaPath's excellence in predicting gene mutations, achieving notable improvements in AUROC and AUPRC scores across various benchmarks.
- The paper shows Prov-GigaPath's superior performance in predicting cancer subtypes for nine major cancer types, illustrating the model's ability to extract meaningful features from pathology images.
- The paper explores vision-language pretraining by incorporating pathology reports, enabling Prov-GigaPath to perform zero-shot subtyping and mutation prediction.
- The paper makes Prov-GigaPath fully open-weight, providing source code and pretrained model weights to facilitate further research and application in clinical settings.
- The paper outlines future directions, including studying scaling laws, optimizing the pretraining process, and enhancing vision-language multimodal learning to improve diagnostics and clinical decision support.
Cited By
Quotes
Abstract
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1, 2, 3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256âÃâ256 pathology image tiles in 171, 189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30, 000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on visionâlanguage pretraining for pathology7, 8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.
References
;