K-Medoids Clustering – Algorithm
The K-medoids clustering algorithm can be summarized as follows −
- Initialize k medoids − Select k random data points from the dataset as the initial medoids.
- Assign data points to medoids − Assign each data point to the nearest medoid.
- Update medoids − For each cluster, select the data point that minimizes the sum of distances to all the other data points in the cluster, and set it as the new medoid.
- Repeat steps 2 and 3 until convergence or a maximum number of iterations is reached.
Implementation in Python
To implement K-medoids clustering in Python, we can use the scikit-learn library. The scikit-learn library provides the KMedoids class, which can be used to perform K-medoids clustering on a dataset.
First, we need to import the required libraries −
from sklearn_extra.cluster import KMedoids from sklearn.datasets import make_blobs import matplotlib.pyplot as plt
Next, we generate a sample dataset using the make_blobs() function from scikit-learn −
X, y = make_blobs(n_samples=500, centers=3, random_state=42)
Here, we generate a dataset with 500 data points and 3 clusters.
Next, we initialize the KMedoids class and fit the data −
kmedoids = KMedoids(n_clusters=3, random_state=42) kmedoids.fit(X)
Here, we set the number of clusters to 3 and use the random_state parameter to ensure reproducibility.
Finally, we can visualize the clustering results using a scatter plot −
plt.figure(figsize=(7.5,3.5)) plt.scatter(X[:,0], X[:,1], c=kmedoids.labels_, cmap='viridis') plt.scatter(kmedoids.cluster_centers_[:,0], kmedoids.cluster_centers_[:,1], marker='x', color='red') plt.show()
Example
Here is the complete implementation in Python −
from sklearn_extra.cluster import KMedoids from sklearn.datasets import make_blobs import matplotlib.pyplot as plt # Generate sample data X, y = make_blobs(n_samples=500, centers=3, random_state=42)# Cluster the data using KMedoids kmedoids = KMedoids(n_clusters=3, random_state=42) kmedoids.fit(X)# Plot the results plt.figure(figsize=(7.5,3.5)) plt.scatter(X[:,0], X[:,1], c=kmedoids.labels_, cmap='viridis') plt.scatter(kmedoids.cluster_centers_[:,0], kmedoids.cluster_centers_[:,1], marker='x', color='red') plt.show()
Output
Here, we plot the data points as a scatter plot and color them based on their cluster labels. We also plot the medoids as red crosses.

K-Medoids Clustering – Advantages
Here are the advantages of using K-medoids clustering −
- Robust to outliers and noise − K-medoids clustering is more robust to outliers and noise than K-means clustering because it uses a representative data point, called a medoid, to represent the center of the cluster.
- Can handle non-Euclidean distance metrics − K-medoids clustering can be used with any distance metric, including non-Euclidean distance metrics, such as Manhattan distance and cosine similarity.
- Computationally efficient − K-medoids clustering has a computational complexity of O(k*n^2), which is lower than the computational complexity of K-means clustering.
K-Medoids Clustering – Disadvantages
The disadvantages of using K-medoids clustering are as follows −
- Sensitive to the choice of k − The performance of K-medoids clustering can be sensitive to the choice of k, the number of clusters.
- Not suitable for high-dimensional data − K-medoids clustering may not perform well on high-dimensional data because the medoid selection process becomes computationally expensive.
Leave a Reply