OpenAI Text Embedding Model
Jump to navigation
Jump to search
An OpenAI Text Embedding Model is a text-item embedding model that is an OpenAI model.
- Example(s):
- See: davinci-001 Model.
References
2023
- https://platform.openai.com/docs/guides/embeddings/limitations-risks
- NOTES:
- It introduces a new unified text embedding model called text-embedding-ada-002 that outperforms previous OpenAI text embedding models.
- It simplifies the OpenAI embeddings API by merging multiple embedding models into one.
- It provides a longer context length, smaller embedding size, and is 99.8% cheaper than previous embedding models.
- It shows how companies are using embeddings for search, recommendations, etc.
- It notes limitations like potential social bias and lack of recent event knowledge.
- It answers frequently asked questions about counting tokens, using vector databases, and legalities of sharing embeddings.
- It demonstrates how to use text-embedding-ada-002 to cluster a dataset into meaningful groups, using k-means on the embedding vectors to identify clusters related to topics like dog food and positive/negative reviews.
- It provides examples of using embeddings for applications like text search to find similar documents, code search to find related code snippets, and recommendations to match users to relevant content.
- It discusses best practices when working with embeddings, like using cosine similarity for comparing vectors, being aware of potential biases, and counting tokens properly to stay within maximum context lengths.
- NOTES:
2022
- https://openai.com/blog/new-and-improved-embedding-model
- NOTES:
- It announces a new embedding model called text-embedding-ada-002 that replaces 5 previous models. The new model outperforms previous models on text search, code search, and sentence similarity tasks.
- The new model has a longer context length (8192 vs 2048 tokens), smaller embedding size (1536 vs 12288 dimensions), and is 99.8% cheaper than the previous davinci-001 model.
- It simplifies the /embeddings API by merging multiple models into one that handles text search, sentence similarity, and code search well.
- Examples are given of companies like Kalendar AI and Notion using embeddings to improve search and recommendations.
- Limitations are that the new model doesn't outperform on text classification as much as the old text-similarity-davinci-001 model.
- NOTES: