SentenceTransformers Package

From GM-RKB

(Redirected from sentence-transformers package)

Jump to navigation Jump to search

A SentenceTransformers Package is a Python library sentence embedding system (to encode sentence items into sentence vector embeddings).

Context:
- It can apply Sentence BERT.
- It can apply Mean Pooling to the output embeddings of all tokens in a sentence (to create a fixed-sized vector representation).
- It can (typically) support Sentence-Transformer Pre-Trained Models, such as: all-MiniLM-L6-v2 (384-dims 256-max-tokens)[1] and all-mpnet-base-v2 (769-dims 384-max-tokens) [2].
- ...
Example(s):
- SentenceTransformers, v2.3.1 [3].
- ...
Counter-Example(s):
- Word Embeddings Models like Word2Vec or GloVe that generate embeddings per word without capturing the context of the sentence.
- BERT Embeddings Models without mean pooling, resulting in variable-sized embeddings that are less practical for certain comparison tasks.
- OpenAI Embeddings API.
See: Cosine Similarity, Fine-Tuning Technique, Language Model Pre-training, PyTorch Framework, Semantic Similarity Search, Siamese Network Architecture, State-of-the-Art Model, Transfer Learning.

References

2024

https://www.sbert.net/
- QUOTE SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
  You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.
  The framework is based on PyTorch and Transformers and offers a large collection of pre-trained models tuned for various tasks. Further, it is easy to fine-tune your own models.
- NOTES:
  - It is a Python framework for creating NLP embeddings, based on PyTorch and Transformers.
  - It enables embeddings for sentences, texts, and images in over 100 languages, useful for tasks like semantic textual similarity and semantic search.
  - It offers a wide collection of pre-trained models and facilitates easy fine-tuning on custom datasets.
  - It is installed via pip, with recommendations for Python 3.8+ and PyTorch 1.11.0+.
  - It uses the model.encode() method to generate embeddings from a list of sentences, demonstrating ease of use.
  - It achieves state-of-the-art performance on various NLP tasks, with extensive evaluations and optimizations for speed.

2023

https://towardsdatascience.com/sbert-deb3d4aef8a4
- NOTE:
  - It introduces a Siamese network architecture to process sentences independently through the same BERT model, enabling efficient sentence embedding generation.
  - It applies a pooling layer after passing sentences through BERT to transform high-dimensional vectors into a single 768-dimensional vector, optimizing for dimensionality reduction.
  - It proposes three optimization objectives: classification, regression, and triplet, to cater to different NLP tasks and improve model performance across various applications.
  - It achieves massive improvements in processing speed by reducing the quadratic complexity of BERT inference to linear, significantly speeding up tasks like similarity search among large sentence collections.
  - It maintains high accuracy in embedding generation, demonstrating the effectiveness of its approach in capturing sentence semantics compared to traditional BERT embeddings.
  - It is fine-tuned on SNLI and MultiNLI datasets with a softmax classifier for classification tasks, indicating its robustness in understanding natural language inference.
  - It offers the SentenceTransformers library, providing a convenient interface for utilizing SBERT models for sentence embedding tasks, facilitating ease of use and accessibility for developers.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=SentenceTransformers_Package&oldid=914530"