SentenceTransformers Package
(Redirected from sentence-transformers)
Jump to navigation
Jump to search
A SentenceTransformers Package is a Python library sentence embedding system (to encode sentence items into sentence vector embeddings).
- Context:
- It can apply Sentence BERT.
- It can apply Mean Pooling to the output embeddings of all tokens in a sentence (to create a fixed-sized vector representation).
- It can (typically) support Sentence-Transformer Pre-Trained Models, such as:
all-MiniLM-L6-v2
(384-dims 256-max-tokens)[1] andall-mpnet-base-v2
(769-dims 384-max-tokens) [2]. - ...
- Example(s):
- Counter-Example(s):
- Word Embeddings Models like Word2Vec or GloVe that generate embeddings per word without capturing the context of the sentence.
- BERT Embeddings Models without mean pooling, resulting in variable-sized embeddings that are less practical for certain comparison tasks.
- OpenAI Embeddings API.
- See: Cosine Similarity, Fine-Tuning Technique, Language Model Pre-training, PyTorch Framework, Semantic Similarity Search, Siamese Network Architecture, State-of-the-Art Model, Transfer Learning.
References
2024
- https://www.sbert.net/
- QUOTE SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.
The framework is based on PyTorch and Transformers and offers a large collection of pre-trained models tuned for various tasks. Further, it is easy to fine-tune your own models.
- NOTES:
- It is a Python framework for creating NLP embeddings, based on PyTorch and Transformers.
- It enables embeddings for sentences, texts, and images in over 100 languages, useful for tasks like semantic textual similarity and semantic search.
- It offers a wide collection of pre-trained models and facilitates easy fine-tuning on custom datasets.
- It is installed via pip, with recommendations for Python 3.8+ and PyTorch 1.11.0+.
- It uses the model.encode() method to generate embeddings from a list of sentences, demonstrating ease of use.
- It achieves state-of-the-art performance on various NLP tasks, with extensive evaluations and optimizations for speed.
- QUOTE SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
2023
- https://towardsdatascience.com/sbert-deb3d4aef8a4
- NOTE:
- It introduces a Siamese network architecture to process sentences independently through the same BERT model, enabling efficient sentence embedding generation.
- It applies a pooling layer after passing sentences through BERT to transform high-dimensional vectors into a single 768-dimensional vector, optimizing for dimensionality reduction.
- It proposes three optimization objectives: classification, regression, and triplet, to cater to different NLP tasks and improve model performance across various applications.
- It achieves massive improvements in processing speed by reducing the quadratic complexity of BERT inference to linear, significantly speeding up tasks like similarity search among large sentence collections.
- It maintains high accuracy in embedding generation, demonstrating the effectiveness of its approach in capturing sentence semantics compared to traditional BERT embeddings.
- It is fine-tuned on SNLI and MultiNLI datasets with a softmax classifier for classification tasks, indicating its robustness in understanding natural language inference.
- It offers the SentenceTransformers library, providing a convenient interface for utilizing SBERT models for sentence embedding tasks, facilitating ease of use and accessibility for developers.
- NOTE: