2023 AComprehensiveSurveyonVectorDat
- (Han et al., 2023) ⇒ Yikun Han, Chunjiang Liu, and Pengfei Wang. (2023). “A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge.” doi:10.48550/arXiv.2310.11703
Subject Headings: Vector Database, Vector DBMS, Approximate Nearest Neighbor Search (ANNS).
Notes
- It comprehensively explains the storage and management of high-dimensional data in vector databases, which is crucial for applications in fields like natural language processing and computer vision.
- It provides a detailed overview of Approximate Nearest Neighbor Search (ANNS) algorithmss, essential for efficient similarity searches in large datasets and complex datasets.
- It insightfully discusses various storage and search techniques in vector databases, including sharding, partitioning, caching, replication, and search methods like brute force, tree-based, hash-based, and quantization-based techniques.
- It addresses the key challenges faced in vector databases, such as index construction, heterogeneous data support, distributed parallel processing, and integration with machine learning frameworks.
- It highlights the potential and benefits of integrating vector databases with advanced Large Language Models (LLMs) like GPT-4, enhancing capabilities in data science and artificial intelligence.
- It describes an ideal workflow for combining vector databases with LLMs, emphasizing applications in semantic search and real-time knowledge retrieval.
- It provides insights into retrieval-based Large Language Models (LLMs) that utilize external datastores, discussing their advantages like memorization of long-tail knowledge, ease of updating, and improved interpretability.
Cited By
Quotes
Abstract
A vector database is used to store high-dimensional data that cannot be characterized by traditional DBMS. Although there are not many articles describing existing or introducing new vector database architectures, the approximate nearest neighbor search problem behind vector databases has been studied for a long time, and considerable related algorithmic articles can be found in the literature. This article attempts to comprehensively review relevant algorithms to provide a general understanding of this booming research area. The basis of our framework categorises these studies by the approach of solving ANNS problem, respectively hash-based, tree-based, graph-based and quantization-based approaches. Then we present an overview of existing challenges for vector databases. Lastly, we sketch how vector databases can be combined with large language models and provide new possibilities.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2023 AComprehensiveSurveyonVectorDat | Yikun Han Chunjiang Liu Pengfei Wang | A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge | 10.48550/arXiv.2310.11703 | 2023 |