OpenAI Text Embedding Model

From GM-RKB

Jump to navigation Jump to search

An OpenAI Text Embedding Model is a text-item embedding model that is an OpenAI model.

Example(s):
- text-embedding-ada-002.
- ...
See: davinci-001 Model.

References

2023

https://platform.openai.com/docs/guides/embeddings/limitations-risks
- NOTES:
  - It introduces a new unified text embedding model called text-embedding-ada-002 that outperforms previous OpenAI text embedding models.
  - It simplifies the OpenAI embeddings API by merging multiple embedding models into one.
  - It provides a longer context length, smaller embedding size, and is 99.8% cheaper than previous embedding models.
  - It shows how companies are using embeddings for search, recommendations, etc.
  - It notes limitations like potential social bias and lack of recent event knowledge.
  - It answers frequently asked questions about counting tokens, using vector databases, and legalities of sharing embeddings.
  - It demonstrates how to use text-embedding-ada-002 to cluster a dataset into meaningful groups, using k-means on the embedding vectors to identify clusters related to topics like dog food and positive/negative reviews.
  - It provides examples of using embeddings for applications like text search to find similar documents, code search to find related code snippets, and recommendations to match users to relevant content.
  - It discusses best practices when working with embeddings, like using cosine similarity for comparing vectors, being aware of potential biases, and counting tokens properly to stay within maximum context lengths.

2022

https://openai.com/blog/new-and-improved-embedding-model
- NOTES:
  - It announces a new embedding model called text-embedding-ada-002 that replaces 5 previous models. The new model outperforms previous models on text search, code search, and sentence similarity tasks.
  - The new model has a longer context length (8192 vs 2048 tokens), smaller embedding size (1536 vs 12288 dimensions), and is 99.8% cheaper than the previous davinci-001 model.
  - It simplifies the /embeddings API by merging multiple models into one that handles text search, sentence similarity, and code search well.
  - Examples are given of companies like Kalendar AI and Notion using embeddings to improve search and recommendations.
  - Limitations are that the new model doesn't outperform on text classification as much as the old text-similarity-davinci-001 model.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=OpenAI_Text_Embedding_Model&oldid=903713"

Concept