Embedding Space
Jump to navigation
Jump to search
An Embedding Space is a low-dimensional vector space that is designed to capture high-dimensional vector relationships through vector relationships.
- Context:
- It can (typically) be created by a Distributional Text-Item Embedding Modeling System (that implements a distributional text-item embedding modeling algorithm).
- It can (typically) be a Lookup Table composed of Embedding Vectors.
- It can (typically) be associated with some Large Dataset.
- It can (typically) represent a Similarity Space.
- It can serve as a mathematical model to map high-dimensional data into a simpler, low-dimensional space, while preserving crucial relationships among the original data.
- …
- Example(s):
- A Generation Method-Specific Embedding Space, such as:
- a Neural Embedding Space.
- An LLM Embedding Space: Uses LLM models to create embeddings.
- a Latet Factor Embedding Space.
- ...
- A Data Type-Specific Embedding Space, such as:
- A Facial Recognition Embedding Space: Captures facial features as vectors.
- A Text-Item Embedding Space, such as:
- A Document Embedding Space: Represents textual documents as vectors for similarity analysis.
- A Word Embedding Space: Translates words into vectors that capture semantic meaning.
- A Protein Structure Embedding Space: Maps protein structures to vectors to study their functional similarities.
- A Social Network Embedding Space: Embeds users or interactions in a social network into vectors.
- A Time-Series Embedding Space: Represents time-series data as vectors to facilitate tasks like clustering or anomaly detection.
- A Graph Embedding Space: Represents vertices and edges in a graph as vectors.
- …
- A Generation Method-Specific Embedding Space, such as:
- Counter-Example(s):
- A Euclidean Space: Does not inherently capture high-dimensional relationships.
- See: Kernel Matrix, Embedding Function.
References
2020
- https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture
- QUOTE: An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models. …
2017b
- (Karimi, 2017) ⇒ Amir-Hossein Karimi (2017). "A Summary Of The Kernel Matrix, And How To Learn It Effectively Using Semidefinite Programming". arXiv preprint arXiv:1709.06557.
- QUOTE: ... The information specifying the inner products between each pair of points in the embedding space is contained in the so-called kernel matrix, which is symmetric (due to the commutative property of distance between two points) and positive semidefinite (positive definite if all points are linearly independent). This matrix essential describes the geometry of the embedding space. The importance of this lies in the fact that since kernel-based learning algorithms extract all information needed from inner products of training data points in [math]\displaystyle{ \mathcal{F} }[/math], there is no need to learn a kernel function [math]\displaystyle{ \phi }[/math] over the entire sample space to specify the embedding of a finite training dataset. Instead, the finite-dimensional kernel matrix (also known as a Gram matrix) that contains the inner products of training points in [math]\displaystyle{ \mathcal{F} }[/math] is sufficient.
2015
- (Rothe & Schütze, 2015) ⇒ Sascha Rothe, and Hinrich Schütze. (2015). “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes.” In: arXiv preprint arXiv:1507.01127.
- QUOTE: ... We are looking for a model that extends standard embeddings for words to embeddings for the other two data types in WordNet: synsets and lexemes. We want all three data types – words, lexemes, synsets – to live in the same embedding space. …
2015
- (Chang et al., 2015) ⇒ Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. (2015). “Heterogeneous Network Embedding via Deep Architectures.” In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783296
- QUOTE: ... In particular, we demonstrate that the rich content and linkage information in a heterogeneous network can be captured by such an approach, so that similarities among cross-modal data can be measured directly in a common embedding space. …