Partitioning Around Medoids (PAM) Algorithm
Jump to navigation
Jump to search
A Partitioning Around Medoids (PAM) Algorithm is a distance-based k-Medoids clustering algorithm that aims to partition a dataset into clusters by minimizing the sum of the dissimilarities between objects labeled to be in a cluster and a point designated as the center of that cluster, known as a medoid.
- Context:
- It can be seen as a robust alternative to k-Means clustering, which minimizes the sum of squared distances to the cluster centers, making PAM more resistant to noise and outliers.
- It can (typically) use a greedy algorithm for the initial selection of medoids and subsequent iterative improvement of the clustering by minimizing the total dissimilarity.
- It can (often) involve two key phases: the "BUILD" phase for initial medoid selection and the "SWAP" phase for iteratively improving the cluster quality.
- It can be computationally expensive for large datasets due to its initial runtime complexity of \(O(k(n-k)^2)\), although optimized versions such as FastPAM and FasterPAM significantly reduce this complexity.
- Example(s):
- ...
- Counter-Example(s):
- See: Greedy Algorithm, k-Medoids, Clustering Algorithm, Outlier.
References
2024
- (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/k-medoids#Partitioning_Around_Medoids Retrieved:2024-3-22.
- PAM[1] uses a greedy search which may not find the optimum solution, but it is faster than exhaustive search. It works as follows:
- (BUILD) Initialize: greedily select of the data points as the medoids to minimize the cost
- Associate each data point to the closest medoid.
- (SWAP) While the cost of the configuration decreases:
- For each medoid , and for each non-medoid data point :
- Consider the swap of and , and compute the cost change
- If the cost change is the current best, remember this m and o combination
- Perform the best swap of [math]\displaystyle{ m_{\text{best}} }[/math] and [math]\displaystyle{ o_{\text{best}} }[/math] , if it decreases the cost function. Otherwise, the algorithm terminates.
- For each medoid , and for each non-medoid data point :
- The runtime complexity of the original PAM algorithm per iteration of (3) is [math]\displaystyle{ O(k (n-k)^2) }[/math] , by only computing the change in cost. A naive implementation recomputing the entire cost function every time will be in [math]\displaystyle{ O(n^2k^2) }[/math] . This runtime can be further reduced to [math]\displaystyle{ O(n^2) }[/math] , by splitting the cost change into three parts such that computations can be shared or avoided (FastPAM). The runtime can further be reduced by eagerly performing swaps (FasterPAM), at which point a random initialization becomes a viable alternative to BUILD.
- PAM[1] uses a greedy search which may not find the optimum solution, but it is faster than exhaustive search. It works as follows:
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs named:2