DIStributional SEmantics Composition Tookit (DISSECT) System
Jump to navigation
Jump to search
A DIStributional SEmantics Composition Tookit (DISSECT) System is a Word Embedding System that can build and compose distributional semantic representations from co-occurrence matrices.
- Context:
- Source code available at https://github.com/composes-toolkit/dissect
- Example(s):
- …
- Counter-Example(s):
- See: Distributional Semantics, Semantic Space.
References
2014a
- (Cimec, 2014) ⇒ http://clic.cimec.unitn.it/composes/toolkit/
- QUOTE: DISSECT (DIStributional Semantics Composition Toolkit) is part of the COMPOSES (COMPositional Operations in SEmantic Space) project. It can be used to build semantic spaces from co-occurrence matrices, perform compositional operations on these semantic spaces and rely on them to measure semantic similarity between words or phrases.
To get acquainted with DISSECT, read about its main features and general philosophy in the introduction (includes licensing information), then download and set up the toolkit, and familiarize yourself with it through the hands-on tutorial.
For detailed technical information, you can use the Quick search functionality on the right (searching the codebase) and the module index links (in the navigation bars at the top and bottom right of each page).
- QUOTE: DISSECT (DIStributional Semantics Composition Toolkit) is part of the COMPOSES (COMPositional Operations in SEmantic Space) project. It can be used to build semantic spaces from co-occurrence matrices, perform compositional operations on these semantic spaces and rely on them to measure semantic similarity between words or phrases.
2014b
- (Cimec, 2014) ⇒ http://clic.cimec.unitn.it/composes/toolkit/introduction.html
- QUOTE: You can use DISSECT to build and explore automated models of word, phrase and sentence meaning based on the principles of distributional semantics. The toolkit focuses in particular on compositional meaning, that is, it provides functions to derive the meaning of phrases and sentences from the meanings of their parts (e.g., derive a meaning representation for black vomit from the representations of black and vomit). However, we hope that DISSECT will also be useful to researchers and practitioners who need models of word meaning (without composition), as it supports various methods to construct semantic spaces, assessing similarity and even evaluating against benchmarks that are independent of the composition infrastructure.
2014c
- (Baroni et al., 2014) ⇒ Marco Baroni, Georgiana Dinu, and Germán Kruszewski. (2014). “Don't Count, Predict! a Systematic Comparison of Context-counting Vs. Context-predicting Semantic Vectors.” In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)
- QUOTE: We prepared the count models using the DISSECT toolkit. We extracted count vectors from symmetric context windows of two and five words to either side of target. We considered two weighting schemes: positive Pointwise Mutual Information and Local Mutual Information (akin to the widely used Log-Likelihood Ratio scheme) [17]. We used both full and compressed vectors. The latter were obtained by applying the Singular Value Decomposition [20] or Non-negative Matrix Factorization [29], Lin (2007) algorithm, with reduced sizes ranging from 200 to 500 in steps of 100. In total, 36 count models were evaluated.
2013a
- (Lazaridou et al., 2013) ⇒ Angeliki Lazaridou, Marco Marelli, Roberto Zamparelli, and Marco Baroni. (2013). “Compositionally Derived Representations of Morphologically Complex Words in Distributional Semantics.” In: ACL (1).
2013b
- (Dinu et al., 2013) ⇒ Georgiana Dinu, Nghia The Pham, and Marco Baroni. (2013). “DISSECT - DIStributional SEmantics Composition Toolkit.” In: Proceedings of the Conference System Demonstrations - 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013).