Collective Classification Algorithm
Jump to navigation
Jump to search
A Collective Classification Algorithm is a supervised classification algorithm that can be applied by a collective classification system (to solve a collective classification task).
- AKA: Link-based Object Classification.
- Context:
- It can range from being a Local Collective Classification Algorithm to being a Global Collective Classification Algorithm.
- …
- Counter-Example(s):
- See: Relational Learning Algorithm, Link Mining Algorithm, Link-based Edge Prediction Algorithm.
References
2009
- (Bilgic & Getoor, 2009) ⇒ Mustafa Bilgic, and Lise Getoor. (2009). “Reflect and Correct: A misclassification prediction approach to active inference.” In: ACM Transactions on Knowledge Discovery from Data (TKDD), 3(4). doi:10.1145/1631162.1631168
- QUOTE: There are many collective classification models proposed to date that make different modeling assumptions about these dependencies. They can be grouped into two broad categories. In the first category, local collective classification models, the collective models consist of a collection of local vector-based classifiers, such as logistic regression. … The second category of collective classification models are global collective classification models.
- Link-based Classification http://www.cs.umd.edu/projects/linqs/projects/lbc/index.html
- QUOTE: Traditional machine learning classification algorithms aim to label entities on the basis of their attribute values. Many real-world datasets, however, contain interlinked entities and exhibit correlations among labels of the interlinked entities. Link-based classification aims to improve classification accuracy by exploiting such correlations in the link structure besides utilizing the attribute values of each entity.
2005
- (Getoor & Diehl, 2005) ⇒ Lise Getoor, and Christopher P. Diehl. (2005). “Link Mining: A survey.” In: SIGKDD Explorations, 7(2). doi:10.1145/1117454.1117456
- QUOTE: LBC has received considerable attention recently. Chakrabarti et al. [18] consider the problem of classifying related news items in the Reuters dataset. They were among the first to notice that exploiting class labels of related objects aids classification, whereas exploiting features of related objects can actually harm classification accuracy. Oh et al. [87] report similar results on a collection of encyclopedia articles: simply incorporating words from neighboring documents was not helpful, while making use of the predicted class of neighboring documents was helpful. Lafferty et al. [71] introduce conditional random elds (CRF), which extend traditional maximum entropy models for LBC in the restricted case where the data graphs are chains. Taskar et al. [107] extend La erty et al.'s approach [71] to the case where the data graphs are arbitrary graphs. Neville and Jensen [80] propose simple LBC algorithm s to classify corporate datasets with rich schemas that produce graphs with heterogeneous objects, each with its own distinct set of features. Lu and Getoor [76] extend simple machine learning classifiers to perform LBC by introducing new features that measure the distribution of class labels in the Markov blanket of the object to be classified. In addition to the machine learning community, the computer vision and natural language communities have also studied the LBC problem . Rosenfeld et al. [99] proposed relaxation labeling, an inference algorithm later used by Chakrabarti et al. [18] to perform link-based classification. Hummel and Zucker [53] present one of many approaches for exploring relaxation labeling theoretically. Lafferty et al. [71] proposed CRFs for use in part-of-speech tagging, a task in natural language processing.