2023 TheDawnofLMMsPreliminaryExplora

From GM-RKB

Jump to navigation Jump to search

(Yang, Li et al., 2023) ⇒ Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. (2023). “The Dawn of LMMs: Preliminary Explorations with GPT-4V (ision).” In: arXiv preprint arXiv:2309.17421. doi:10.48550/arXiv.2309.17421

Subject Headings: Large Multimodel Model, Visual Understanding.

Notes

Cited By

http://scholar.google.com/scholar?q=%222023%22+The+Dawn+of+LMMs%3A+Preliminary+Explorations+with+GPT-4V+%28ision%29

Quotes

Abstract

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V (ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2023 TheDawnofLMMsPreliminaryExplora	Kevin Lin Zhengyuan Yang Linjie Li Jianfeng Wang Chung-Ching Lin Zicheng Liu Lijuan Wang			The Dawn of LMMs: Preliminary Explorations with GPT-4V (ision)				10.48550/arXiv.2309.17421		2023

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2023_TheDawnofLMMsPreliminaryExplora&oldid=852514"

Facts

... more about "2023 TheDawnofLMMsPreliminaryExplora"

Zhengyuan Yang +, Linjie Li +, Kevin Lin +, Jianfeng Wang +, Chung-Ching Lin +, Zicheng Liu + and Lijuan Wang +

10.48550/arXiv.2309.17421 +

The Dawn of LMMs: Preliminary Explorations with GPT-4V (ision) +

2023 +