Brown et al Clustering Algorithm
Jump to navigation
Jump to search
A Brown et al Clustering Algorithm is a word clustering algorithm proposed in (Brown et al, 1992).
References
2011
- (Haffari et al., 2011) ⇒ Gholamreza Haffari, Marzieh Razavi, and Anoop Sarkar. (2011). “An Ensemble Model That Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing.” In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: We combine multiple word representations based on semantic clusters extracted from the (Brown et al., 1992) algorithm and syntactic clusters obtained from the Berkeley parser (Petrov et al., 2006)
2010
- (Turian et al., 2010) ⇒ Joseph Turian, Lev Ratinov, and Yoshua Bengio. (2010). “Word Representations: A Simple and General Method for Semi-supervised Learning.” In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.
- QUOTE: The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992). So it is a class-based bigram language model. It runs in time [math]\displaystyle{ O(V·K^2) }[/math], where [math]\displaystyle{ V }[/math] is the size of the vocabulary and [math]\displaystyle{ K }[/math] is the number of clusters. The hierarchical nature of the clustering means that we can choose the word class at several levels in the hierarchy, which can compensate for poor clusters of a small number of words. One downside of Brown clustering is that it is based solely on bigram statistics, and does not consider word usage in a wider context.
Brown clusters have been used successfully in a variety of NLP applications: NER (Miller et al., 2004; Liang, 2005; Ratinov & Roth, 2009), PCFG parsing (Candito & Crabbé, 2009), dependency parsing (Koo et al., 2008; Suzuki et al., 2009), and semantic dependency parsing (Zhao et al., 2009).
Martin et al. (1998) presents algorithms for inducing hierarchical clusterings based upon word bigram and trigram statistics. Ushioda (1996) presents an extension to the Brown clustering algorithm, and learn hierarchical clusterings of words as well as phrases, which they apply to POS tagging.
- QUOTE: The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992). So it is a class-based bigram language model. It runs in time [math]\displaystyle{ O(V·K^2) }[/math], where [math]\displaystyle{ V }[/math] is the size of the vocabulary and [math]\displaystyle{ K }[/math] is the number of clusters. The hierarchical nature of the clustering means that we can choose the word class at several levels in the hierarchy, which can compensate for poor clusters of a small number of words. One downside of Brown clustering is that it is based solely on bigram statistics, and does not consider word usage in a wider context.
1992
- (Brown et al, 1992) ⇒ Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. (1992). “Class-based n-gram Models of Natural Language.” In: Computational Linguistics, 18(4).