Correlation Clustering Task: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
m (Text replacement - "\<P\>([\s]{1,7})([^\s])" to "<P> $2")
m (Text replacement - "ments]]" to "ment]]s")
 
Line 28: Line 28:
=== 2004 ===
=== 2004 ===
* ([[Bansal et al., 2004]]) ⇒ [[Nikhil Bansal]], [[Avrim Blum]], and [[Shuchi Chawla]]. ([[2004]]). “Correlation Clustering." Machine Learning 56, no. 1-3  
* ([[Bansal et al., 2004]]) ⇒ [[Nikhil Bansal]], [[Avrim Blum]], and [[Shuchi Chawla]]. ([[2004]]). “Correlation Clustering." Machine Learning 56, no. 1-3  
** ABSTRACT: [[We]] consider the following [[clustering problem]]: we have a [[complete graph]] on <math>n</math> [[vertices (items)]], where each edge (u, v) is [[labeled]] either + or − depending on whether u and v have been deemed to be similar or different. </s> The goal is to [[produce]] a [[partition of the vertices (a clustering)]] that agrees as much as possible with the [[edge label]]s. </s> That is, we want a [[clustering]] that [[maximize]]s the number of + edges within clusters, plus the number of − edges between clusters (equivalently, [[minimize]]s the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). </s> [[This formulation]] is motivated from a [[document clustering problem]] in which one has a [[pairwise similarity function]] f learned from past data, and the goal is to partition the current [[set of documents]] in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem. </s>        <P>        An interesting feature of this [[clustering formulation]] is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or [[min-max clustering]]. </s> Instead, in our formulation, the optimal number of [[cluster]]s could be any value between 1 and n, depending on the [[edge label]]s. </s> We look at [[approximation algorithm]]s for both minimizing disagreements and for [[maximizing agreement]]s. </s> For [[minimizing disagreement]]s, we give a [[constant factor approximation]]. </s> For [[maximizing agreement]]s we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg ([[1996]]). </s> [[We]] also show how to extend some of these results to [[graphs with edge label]]s in [−1, + 1], and give some results for the case of [[random noise]]. </s>
** ABSTRACT: [[We]] consider the following [[clustering problem]]: we have a [[complete graph]] on <math>n</math> [[vertices (items)]], where each edge (u, v) is [[labeled]] either + or − depending on whether u and v have been deemed to be similar or different. </s> The goal is to [[produce]] a [[partition of the vertices (a clustering)]] that agrees as much as possible with the [[edge label]]s. </s> That is, we want a [[clustering]] that [[maximize]]s the number of + edges within clusters, plus the number of − edges between clusters (equivalently, [[minimize]]s the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). </s> [[This formulation]] is motivated from a [[document clustering problem]] in which one has a [[pairwise similarity function]] f learned from past data, and the goal is to partition the current [[set of document]]s in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem. </s>        <P>        An interesting feature of this [[clustering formulation]] is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or [[min-max clustering]]. </s> Instead, in our formulation, the optimal number of [[cluster]]s could be any value between 1 and n, depending on the [[edge label]]s. </s> We look at [[approximation algorithm]]s for both minimizing disagreements and for [[maximizing agreement]]s. </s> For [[minimizing disagreement]]s, we give a [[constant factor approximation]]. </s> For [[maximizing agreement]]s we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg ([[1996]]). </s> [[We]] also show how to extend some of these results to [[graphs with edge label]]s in [−1, + 1], and give some results for the case of [[random noise]]. </s>


----
----

Latest revision as of 04:36, 24 June 2024

A Correlation Clustering Task is a clustering task that requires use of cluster correlation measure.



References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/correlation_clustering Retrieved:2015-11-8.
    • Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance. [1]
  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/correlation_clustering#Correlation_clustering Retrieved:2015-11-8.
    • Correlation clustering also relates to a different task, where correlations among attributes of feature vectors in a high-dimensional space are assumed to exist guiding the clustering process. These correlations may be different in different clusters, thus a global decorrelation cannot reduce this to traditional (uncorrelated) clustering.

      Correlations among subsets of attributes result in different spatial shapes of clusters. Hence, the similarity between cluster objects is defined by taking into account the local correlation patterns. With this notion, the term has been introduced in simultaneously with the notion discussed above. Different methods for correlation clustering of this type are discussed in, the relationship to different types of clustering is discussed in, see also Clustering high-dimensional data.

      Correlation clustering (according to this definition) can be shown to be closely related to biclustering. As in biclustering, the goal is to identify groups of objects that share a correlation in some of their attributes; where the correlation is usually typical for the individual clusters.

2011

  • (Wirth, 2011) ⇒ Anthony Wirth. (2011). “Correlation Clustering.” In: (Sammut & Webb, 2011) p.227
    • QUOTE: In its rawest form, correlation clustering is graph optimization problem. Consider a clustering C to be a mapping from the elements to be clustered, V , to the set {1, …, | V | }, so that u and v are in the same cluster if and only if C[u] = C[v]. Given a collection of items in which each pair (u, v) has two weights wuv+ and wuv− , we must find a clustering C that minimizes :[math]\displaystyle{ ∑C[u]=C[v]w−uv+∑C[u]≠C[v]w+uv, \ (1) }[/math] or, equivalently, maximizes :[math]\displaystyle{ ∑C[u]=C[v]w+uv+∑C[u]≠C[v]w−uv. \ (2) }[/math] Note that although wuv+ and wuv− may be thought of as positive and negative evidence towards coassociation, the actual weights are nonnegative.

2006

2004