2023 TheDawnofLMMsPreliminaryExplora
- (Yang, Li et al., 2023) ⇒ Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. (2023). “The Dawn of LMMs: Preliminary Explorations with GPT-4V (ision).” In: arXiv preprint arXiv:2309.17421. doi:10.48550/arXiv.2309.17421
Subject Headings: Large Multimodel Model, Visual Understanding.
Notes
Cited By
Quotes
Abstract
Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V (ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2023 TheDawnofLMMsPreliminaryExplora | Kevin Lin Zhengyuan Yang Linjie Li Jianfeng Wang Chung-Ching Lin Zicheng Liu Lijuan Wang | The Dawn of LMMs: Preliminary Explorations with GPT-4V (ision) | 10.48550/arXiv.2309.17421 | 2023 |