Vector Database Framework

From GM-RKB
Jump to navigation Jump to search

A Vector Database Framework is a database framework designed for creating and managing vector databases and instances.



References

2024

  • GPT-4
Name Open Source Key Features
Elasticsearch Yes Clustering, High Availability, Automatic Node Recovery, Horizontal Scalability, Cross-Cluster Replication
Vespa Yes Fast Data Writes, Configurable Data Redundancy, Structured Filters, Text Search Operators, Vector Search Operators
Vald Yes Automatic Backups, Distributed Vector Indexes, Index Replication, Multi-Language Support
ScaNN Yes Search Space Trimming, Quantization for Maximum Inner Product Search, Euclidean Distance Support
Pgvector Yes Nearest Neighbor Search, L2 Distance, Inner Product, Cosine Distance, PostgreSQL Client Compatibility
Chroma Yes Queries, Filtering, Density Estimates, LangChain Support, Scalable API
Pinecone No Fully Managed Service, Scalability, Real-time Data Ingestion, Low-Latency Search, LangChain Integration
Weaviate Yes Fast Search, Flexibility, Modules Integration with OpenAI, Cohere
Faiss Yes Similarity Search, Clustering of Dense Vectors, Various Indexing and Search Algorithms, Large-Scale Dataset Optimization
Annoy Yes Memory Efficiency, Tree-Based Search, Euclidean/Cosine Distance Metrics
Milvus Yes Scalable Storage and Search, Metric Indexing, Multiple Programming Languages Support
Hnswlib Yes Memory Efficiency, Small-World Graph Search, Euclidean/Cosine Distance Metrics
FaunaDB Not Specified Cloud-Native, Serverless, k-d Tree Algorithm, ACID Transactions
Amazon Neptune Not Specified Fully Managed Graph Database, Gremlin and SPARQL Support, Scalable Infrastructure

2023

  • (Pan, Wang et al., 2023) ⇒ James Jie Pan, Jianguo Wang, and Guoliang Li. (2023). “Survey of Vector Database Management Systems.” doi:10.48550/arXiv.2310.14021
    • NOTES:
      • It thoroughly evaluates over 20 commercial Vector Database Management Systems (VDBMSs) that have emerged in recent years, focusing on the obstacles in managing vector data.
      • It details the process of query processing in VDBMSs, discussing aspects like similarity scores, query types, and interfaces, along with the complexities of basic search query operators.
      • It outlines various storage and indexing strategies used in VDBMSs, including partitioning techniques (like randomization and learned partitioning) and different types of indexes such as tree-based, table-based, and graph-based.
      • It delves into the optimization and execution aspects of VDBMSs, explaining plan enumeration, selection, hybrid operators for predicated queries, and the utilization of hardware acceleration and distributed search techniques.
      • It classifies current VDBMSs into categories such as native, extended, and search engines/libraries, analyzing their design and runtime characteristics to highlight each type's strengths.
      • It acknowledges the importance of benchmarks in evaluating VDBMSs, but it doesn't provide an in-depth analysis of specific benchmarks, suggesting an area for future exploration.
      • It analyzes EuclidesDB VDBMS (2018), Vearch VDBMS (2018), Pinecone VDBMS (2019), Vald (2020), Chroma (2022), Weaviate (2019), Milvus (2021), NucliaDB (2021), Qdrant (2021), Manu (2022), Marqo (2022), Vespa (2020), Cosmos DB (2023), MongoDB DBMS (2023), Neo4j DBMS (2023), Redis (2023), AnalyticDB-V (2020), PASE+PG (2020), pgvector+PG (2021), SingleStoreDB (2022), ClickHouse (2023), MyScale (2023).