Brown Word-Hierarchy Cluster
A Brown Word-Hierarchy Cluster is a Semantic Word Cluster created by a Brown et al Clustering Algorithm.
- Example(s):
- …
- Counter-Example(s):
- See: Word Cluster, Brown Clustering Algorithm, Clark Word-Cluster.
References
2011
- (Haffari et al., 2011) ⇒ Gholamreza Haffari, Marzieh Razavi, and Anoop Sarkar. (2011). “An Ensemble Model That Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing.” In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: We combine multiple word representations based on semantic clusters extracted from the (Brown et al., 1992) algorithm and syntactic clusters obtained from the Berkeley parser (Petrov et al., 2006)
2010
- (Turian et al., 2010) ⇒ Joseph Turian, Lev Ratinov, and Yoshua Bengio. (2010). “Word Representations: A Simple and General Method for Semi-supervised Learning.” In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking … The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992). So it is a class-based bigram language model. It runs in time [math]\displaystyle{ O(V·K^2) }[/math], where [math]\displaystyle{ V }[/math] is the size of the vocabulary and [math]\displaystyle{ K }[/math] is the number of clusters. The hierarchical nature of the clustering means that we can choose the word class at several levels in the hierarchy, which can compensate for poor clusters of a small number of words. One downside of Brown clustering is that it is based solely on bigram statistics, and does not consider word usage in a wider context.
Brown clusters have been used successfully in a variety of NLP applications: NER (Miller et al., 2004; Liang, 2005; Ratinov & Roth, 2009), PCFG parsing (Candito & Crabbé, 2009), dependency parsing (Koo et al., 2008; Suzuki et al., 2009), and semantic dependency parsing (Zhao et al., 2009).
Martin et al. (1998) presents algorithms for inducing hierarchical clusterings based upon word bigram and trigram statistics. Ushioda (1996) presents an extension to the Brown clustering algorithm, and learn hierarchical clusterings of words as well as phrases, which they apply to POS tagging.
- QUOTE: We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking … The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992). So it is a class-based bigram language model. It runs in time [math]\displaystyle{ O(V·K^2) }[/math], where [math]\displaystyle{ V }[/math] is the size of the vocabulary and [math]\displaystyle{ K }[/math] is the number of clusters. The hierarchical nature of the clustering means that we can choose the word class at several levels in the hierarchy, which can compensate for poor clusters of a small number of words. One downside of Brown clustering is that it is based solely on bigram statistics, and does not consider word usage in a wider context.
2008
- (Koo et al., 2008) ⇒ Terry Koo, Xavier Carreras, and Michael Collins. (2008). “Simple Semi-supervised Dependency Parsing.” In: Proceedings of ACL 2008 (ACL 2008).
- QUOTE: Figure 2: An example of a Brown word-cluster hierarchy. Each node in the tree is labeled with a bit-string indicating the path from the root node to that node, where 0 indicates a left branch and 1 indicates a right branch. …
… Brown clustering algorithm (Brown et al., 1992). We chose to work with the Brown algorithm due to its simplicity and prior success in other NLP applications (Miller et al., 2004; Liang, 2005).
- QUOTE: Figure 2: An example of a Brown word-cluster hierarchy. Each node in the tree is labeled with a bit-string indicating the path from the root node to that node, where 0 indicates a left branch and 1 indicates a right branch. …
1992
- (Brown et al, 1992) ⇒ Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. (1992). “Class-based n-gram Models of Natural Language.” In: Computational Linguistics, 18(4).