Google PaLM 1 Language Model: Difference between revisions
Jump to navigation
Jump to search
m (Text replacement - ".↵----" to ". ----") |
m (Text replacement - ". ↵" to ". ") |
||
Line 16: | Line 16: | ||
=== 2022 === | === 2022 === | ||
* ([[2022_LargeLanguageModelsEncodeClinic|Singhal et al., 2022]]) ⇒ [[Karan Singhal]], [[Shekoofeh Azizi]], [[Tao Tu]], [[S Sara Mahdavi]], [[Jason Wei]], [[Hyung Won Chung]], [[Nathan Scales]], [[Ajay Tanwani]], [[Heather Cole-Lewis]], and [[Stephen Pfohl]], [[Perry Payne]], [[Martin Seneviratne]], [[Paul Gamble]], [[Chris Kelly]], [[Nathaneal Scharli]], [[Aakanksha Chowdhery]], [[Philip Mansfield]], [[Blaise Aguera y Arcas]], [[Dale Webster]], [[Greg S. Corrado]], [[Yossi Matias]], [[Katherine Chou]], [[Juraj Gottweis]], [[Nenad Tomasev]], [[Yun Liu]], and [[Alvin Rajkomar]]. ([[2022]]). “[https://arxiv.org/pdf/2212.13138.pdf Large Language Models Encode Clinical Knowledge].” In: arXiv preprint arXiv:2212.13138. | * ([[2022_LargeLanguageModelsEncodeClinic|Singhal et al., 2022]]) ⇒ [[Karan Singhal]], [[Shekoofeh Azizi]], [[Tao Tu]], [[S Sara Mahdavi]], [[Jason Wei]], [[Hyung Won Chung]], [[Nathan Scales]], [[Ajay Tanwani]], [[Heather Cole-Lewis]], and [[Stephen Pfohl]], [[Perry Payne]], [[Martin Seneviratne]], [[Paul Gamble]], [[Chris Kelly]], [[Nathaneal Scharli]], [[Aakanksha Chowdhery]], [[Philip Mansfield]], [[Blaise Aguera y Arcas]], [[Dale Webster]], [[Greg S. Corrado]], [[Yossi Matias]], [[Katherine Chou]], [[Juraj Gottweis]], [[Nenad Tomasev]], [[Yun Liu]], and [[Alvin Rajkomar]]. ([[2022]]). “[https://arxiv.org/pdf/2212.13138.pdf Large Language Models Encode Clinical Knowledge].” In: arXiv preprint arXiv:2212.13138. | ||
** QUOTE: .. In addition, [[we]] evaluate [[PaLM]] (a [[540-billion parameter LLM]]) and its [[instruction-tuned variant]], [[Flan-PaLM]], on [[MultiMedQA]]. </s> Using a combination of [[prompting strategi]]es, [[Flan-PaLM]] achieves [[state-of-the-art]] [[accuracy]] on every [[MultiMedQA multiple-choice dataset (MedQA]], [[MedMCQA]], [[PubMedQA]], [[MMLU clinical topic]]s), including 67.6% [[accuracy on MedQA (US Medical License Exam question]]s), [[surpassing prior state-of-the-art]] by over 17%. </s> However, [[human evaluation]] reveals key gaps in [[Flan-PaLM response]]s. </s> To [[resolve this we]] introduce [[instruction prompt tuning]], a [[parameter-efficient approach]] for aligning [[LLM]]s to new [[domain]]s using a few [[exemplar]]s. </s> The resulting [[model]], [[Med-PaLM]], [[performs encouragingly]], but remains inferior to [[clinician]]s. </s> | ** QUOTE: .. In addition, [[we]] evaluate [[PaLM]] (a [[540-billion parameter LLM]]) and its [[instruction-tuned variant]], [[Flan-PaLM]], on [[MultiMedQA]]. </s> Using a combination of [[prompting strategi]]es, [[Flan-PaLM]] achieves [[state-of-the-art]] [[accuracy]] on every [[MultiMedQA multiple-choice dataset (MedQA]], [[MedMCQA]], [[PubMedQA]], [[MMLU clinical topic]]s), including 67.6% [[accuracy on MedQA (US Medical License Exam question]]s), [[surpassing prior state-of-the-art]] by over 17%. </s> However, [[human evaluation]] reveals key gaps in [[Flan-PaLM response]]s. </s> To [[resolve this we]] introduce [[instruction prompt tuning]], a [[parameter-efficient approach]] for aligning [[LLM]]s to new [[domain]]s using a few [[exemplar]]s. </s> The resulting [[model]], [[Med-PaLM]], [[performs encouragingly]], but remains inferior to [[clinician]]s. </s> | ||
Latest revision as of 01:46, 28 January 2024
A Google PaLM 1 Language Model is an foundation LLM produced by Google Research.
References
2022
- (Singhal et al., 2022) ⇒ Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, and Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Nathaneal Scharli, Aakanksha Chowdhery, Philip Mansfield, Blaise Aguera y Arcas, Dale Webster, Greg S. Corrado, Yossi Matias, Katherine Chou, Juraj Gottweis, Nenad Tomasev, Yun Liu, and Alvin Rajkomar. (2022). “Large Language Models Encode Clinical Knowledge.” In: arXiv preprint arXiv:2212.13138.
- QUOTE: .. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians.
2022
- (Chowdhery et al., 2022) ⇒ Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham et al. (2022). “Palm: Scaling Language Modeling with Pathways.” arXiv preprint arXiv:2204.02311