Blog

Machine Learning – High Correlation Filter
High Correlation Filter is a feature selection technique used in machine learning to identify and remove highly correlated features from the dataset. This technique is used to improve the performance of the model by reducing the number of features used for training the model and to avoid the problem of multicollinearity, which occurs when two or more predictor variables are highly correlated with each other.

The High Correlation Filter works by computing the correlation between each pair of features in the dataset and removing one of the two features that are highly correlated with each other. This is done by setting a threshold for the correlation coefficient between the features, and removing one of the features if the absolute value of the correlation coefficient is greater than the threshold.

The steps involved in implementing High Correlation Filter are as follows −
- Compute the correlation matrix for the dataset.
- Set a threshold for the correlation coefficient between the features.
- Find the pairs of features that have a correlation coefficient greater than the threshold.
- Remove one of the two features from each pair of highly correlated features.
- Use the remaining features for training the machine learning model.
The advantage of using High Correlation Filter is that it reduces the number of features used for training the model, which in turn reduces the complexity of the model and makes it easier to interpret. Moreover, it helps to avoid the problem of multicollinearity, which can lead to unstable and unreliable estimates of the model parameters.

However, there are some limitations to High Correlation Filter. For example, it may not always select the best set of features for the model, especially if there are non-linear relationships between the features and the target variable. Also, if two features are highly correlated, removing one of them may result in the loss of some important information that was present in the removed feature.

Example

Here is an example to implement High Correlation Filter in Python −
```
# Importing the necessary librariesimport pandas as pd
import numpy as np

# Load the diabetes dataset
diabetes = pd.read_csv(r'C:\Users\Leekha\Desktop\diabetes.csv')# Define the predictor variables (X) and the target variable (y)
X = diabetes.iloc[:,:-1].values
y = diabetes.iloc[:,-1].values

# Compute the correlation matrix
corr_matrix = np.corrcoef(X, rowvar=False)# Set the threshold for high correlation
threshold =0.8# Find the indices of the highly correlated features
high_corr_indices = np.where(np.abs(corr_matrix)> threshold)# Create a set of feature pairs to be removed
features_to_remove =set()# Iterate over the indices of the highly correlated features and# add them to the set of features to be removedfor i, j inzip(*high_corr_indices):if i != j and(j, i)notin features_to_remove:
      features_to_remove.add((i, j))# Convert the set of feature pairs to a list
features_to_remove =list(features_to_remove)# Remove one of the two features from each pair of highly correlated features
X_filtered = np.delete(X,[j for i, j in features_to_remove], axis=1)# Print the shape of the filtered datasetprint('Shape of the filtered dataset:', X_filtered.shape)
```
Output

When you execute this code, it will produce the following output −
```
Shape of the filtered dataset: (768, 8)
```
Advantages of High Correlation Filter

Following are the advantages of using High Correlation Filter −
- Reduces multicollinearity − The High Correlation Filter can reduce multicollinearity, which occurs when two or more features are highly correlated with each other. Multicollinearity can negatively impact the performance of machine learning models.
- Improves model performance − By removing highly correlated features, the High Correlation Filter can improve the performance of machine learning models.
- Simplifies the model − With fewer features, the model can be easier to interpret and understand.
- Saves computational resources − With fewer features, the computational resources required to train machine learning models are reduced.
Disadvantages of High Correlation Filter

Following are the disadvantages of using High Correlation Filter −
- Information loss − The High Correlation Filter can lead to information loss because it removes features that may contain important information.
- Affects non-linear relationships − The High Correlation Filter assumes that the relationships between the features are linear. It may not work well for datasets where the relationships between the features are non-linear.
- Impact on the dependent variable − Removing highly correlated features can sometimes have a negative impact on the dependent variable, particularly if the features are strongly correlated with the dependent variable.
- Selection bias − The High Correlation Filter may introduce selection bias if it removes features that are important for predicting the dependent variable.
October 4, 2025

Machine Learning – Forward Feature Construction

Forward Feature Construction is a feature selection method in machine learning where we start with an empty set of features and iteratively add the best performing feature at each step until the desired number of features is reached.

The goal of feature selection is to identify the most important features that are relevant for predicting the target variable, while ignoring the less important features that add noise to the model and may lead to overfitting.

The steps involved in Forward Feature Construction are as follows −

Initialize an empty set of features.
Set the maximum number of features to be selected.
Iterate until the desired number of features is reached −
- For each remaining feature that is not already in the set of selected features, fit a model with the selected features and the current feature, and evaluate its performance using a validation set.
- Select the feature that leads to the best performance and add it to the set of selected features.
Return the set of selected features as the optimal set for the model.

The key advantage of Forward Feature Construction is that it is computationally efficient and can be used for high-dimensional datasets. However, it may not always lead to the optimal set of features, especially if there are highly correlated features or non-linear relationships between the features and the target variable.

Example

Here is an example to implement Forward Feature Construction in Python −

# Importing the necessary librariesimport pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the diabetes dataset
diabetes = pd.read_csv(r'C:\Users\Leekha\Desktop\diabetes.csv')# Define the predictor variables (X) and the target variable (y)
X = diabetes.iloc[:,:-1].values
y = diabetes.iloc[:,-1].values

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.2, random_state =0)# Create an empty set of features
selected_features =set()# Set the maximum number of features to be selected
max_features =8# Iterate until the desired number of features is reachedwhilelen(selected_features)< max_features:# Set the best feature and the best score to be 0
   best_feature =None
   best_score =0# Iterate over all the remaining featuresfor i inrange(X_train.shape[1]):# Skip the feature if it's already selectedif i in selected_features:continue# Select the current feature and fit a linear regression model
      X_train_selected = X_train[:,list(selected_features)+[i]]
      regressor = LinearRegression()
      regressor.fit(X_train_selected, y_train)# Compute the score on the testing set
      X_test_selected = X_test[:,list(selected_features)+[i]]
      score = regressor.score(X_test_selected, y_test)# Update the best feature and score if the current feature performs betterif score > best_score:
         best_feature = i
         best_score = score

   # Add the best feature to the set of selected features
   selected_features.add(best_feature)# Print the selected features and the scoreprint('Selected Features:',list(selected_features))print('Score:', best_score)

Output

On execution, it will produce the following output −

Selected Features: [1]
Score: 0.23530716168783583
Selected Features: [0, 1]
Score: 0.2923143573608237
Selected Features: [0, 1, 5]
Score: 0.3164103491569179
Selected Features: [0, 1, 5, 6]
Score: 0.3287368302427327
Selected Features: [0, 1, 2, 5, 6]
Score: 0.334586804842275
Selected Features: [0, 1, 2, 3, 5, 6]
Score: 0.3356264736550455
Selected Features: [0, 1, 2, 3, 4, 5, 6]
Score: 0.3313166516703744
Selected Features: [0, 1, 2, 3, 4, 5, 6, 7]
Score: 0.32230203252064216

October 4, 2025

Machine Learning – Backward Elimination

Backward Elimination is a feature selection technique used in machine learning to select the most significant features for a predictive model. In this technique, we start by considering all the features initially, and then we iteratively remove the least significant features until we get the best subset of features that gives the best performance.

Implementation in Python

To implement Backward Elimination in Python, you can follow these steps −

Import the necessary libraries: pandas, numpy, and statsmodels.api.

import pandas as pd
import numpy as np
import statsmodels.api as sm

Load your dataset into a Pandas DataFrame. We will be using Pima-Indians-Diabetes dataset

diabetes = pd.read_csv(r'C:\Users\Leekha\Desktop\diabetes.csv')

Define the predictor variables (X) and the target variable (y).

X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values

Add a column of ones to the predictor variables to represent the intercept.

X = np.append(arr = np.ones((len(X),1)).astype(int), values = X, axis =1)

Use the Ordinary Least Squares (OLS) method from the statsmodels library to fit the multiple linear regression model with all the predictor variables.

X_opt = X[:,[0,1,2,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()

Check the p-values of each predictor variable and remove the one with the highest p-value (i.e., the least significant).

regressor_OLS.summary()

Repeat steps 5 and 6 until all the remaining predictor variables have a p-value below the significance level (e.g., 0.05).

X_opt = X[:,[0,1,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

X_opt = X[:,[0,3,4,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

X_opt = X[:,[0,3,5]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

X_opt = X[:,[0,3]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

The final subset of predictor variables with p-values below the significance level is the optimal set of features for the model.

Example

Here is the complete implementation of Backward Elimination in Python −

# Importing the necessary librariesimport pandas as pd
import numpy as np
import statsmodels.api as sm

# Load the diabetes dataset
diabetes = pd.read_csv(r'C:\Users\Leekha\Desktop\diabetes.csv')# Define the predictor variables (X) and the target variable (y)
X = diabetes.iloc[:,:-1].values
y = diabetes.iloc[:,-1].values

# Add a column of ones to the predictor variables to represent the intercept
X = np.append(arr = np.ones((len(X),1)).astype(int), values = X, axis =1)# Fit the multiple linear regression model with all the predictor variables
X_opt = X[:,[0,1,2,3,4,5,6,7,8]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()# Check the p-values of each predictor variable and remove the one# with the highest p-value (i.e., the least significant)
regressor_OLS.summary()# Repeat the above step until all the remaining predictor variables# have a p-value below the significance level (e.g., 0.05)
X_opt = X[:,[0,1,2,3,5,6,7,8]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

X_opt = X[:,[0,1,3,5,6,7,8]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

X_opt = X[:,[0,1,3,5,7,8]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

X_opt = X[:,[0,1,3,5,7]]
regressor_OLS = sm.OLS(endog = y, exog = X_opt).fit()
regressor_OLS.summary()

Output

When you execute this program, it will produce the following output −

October 4, 2025

Machine Learning – Feature Extraction
Feature extraction is often used in image processing, speech recognition, natural language processing, and other applications where the raw data is high-dimensional and difficult to work with.

Example

Here is an example of how to perform feature extraction using Principal Component Analysis (PCA) on the Iris Dataset using Python −
```
# Import necessary libraries and datasetfrom sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the dataset
iris = load_iris()# Perform feature extraction using PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(iris.data)# Visualize the transformed data
plt.figure(figsize=(7.5,3.5))
plt.scatter(X_pca[:,0], X_pca[:,1], c=iris.target)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()
```
In this code, we first import the necessary libraries, including sklearn for performing feature extraction using PCA and matplotlib for visualizing the transformed data.

Next, we load the Iris Dataset using load_iris(). We then perform feature extraction using PCA with PCA() and set the number of components to 2 (n_components=2). This reduces the dimensionality of the input data from 4 features to 2 principal components.

We then transform the input data using fit_transform() and store the transformed data in X_pca. Finally, we visualize the transformed data using plt.scatter() and color the data points based on their target value. We label the axes as PC1 and PC2, which are the first and second principal components, respectively, and show the plot using plt.show().

Output

When you execute the given program, it will produce the following plot as the output −

Advantages of Feature Extraction

Following are the advantages of using Feature Extraction −
- Reduced Dimensionality − Feature extraction reduces the dimensionality of the input data by transforming it into a new set of features. This makes the data easier to visualize, process and analyze.
- Improved Performance − Feature extraction can improve the performance of machine learning algorithms by creating a set of more meaningful features that capture the essential information from the input data.
- Feature Selection − Feature extraction can be used to perform feature selection by selecting a subset of the most relevant features that are most informative for the machine learning model.
- Noise Reduction − Feature extraction can also help reduce noise in the data by filtering out irrelevant features or combining related features.
Disadvantages of Feature Extraction

Following are the disadvantages of using Feature Extraction −
- Loss of Information − Feature extraction can result in a loss of information as it involves reducing the dimensionality of the input data. The transformed data may not contain all the information from the original data, and some information may be lost in the process.
- Overfitting − Feature extraction can also lead to overfitting if the transformed features are too complex or if the number of features selected is too high.
- Complexity − Feature extraction can be computationally expensive and time-consuming, especially when dealing with large datasets or complex feature extraction techniques such as deep learning.
- Domain Expertise − Feature extraction requires domain expertise to select and transform the features effectively. It requires knowledge of the data and the problem at hand to choose the right features that are most informative for the machine learning model.
October 4, 2025
Machine Learning – Feature Selection
Feature selection is an important step in machine learning that involves selecting a subset of the available features to improve the performance of the model. The following are some commonly used feature selection techniques −

Filter Methods

This method involves evaluating the relevance of each feature by calculating a statistical measure (e.g., correlation, mutual information, chi-square, etc.) and ranking the features based on their scores. Features that have low scores are then removed from the model.

To implement filter methods in Python, you can use the SelectKBest or SelectPercentile functions from the sklearn.feature_selection module. Below is a small code snippet to implement Feature selection.
```
from sklearn.feature_selection import SelectPercentile, chi2
selector = SelectPercentile(chi2, percentile=10)
X_new = selector.fit_transform(X, y)
```
Wrapper Methods

This method involves evaluating the model’s performance by adding or removing features and selecting the subset of features that yields the best performance. This approach is computationally expensive, but it is more accurate than filter methods.

To implement wrapper methods in Python, you can use the RFE (Recursive Feature Elimination) function from the sklearn.feature_selection module. Below is a small code snippet to implement Wrapper method.
```
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

estimator = LogisticRegression()
selector = RFE(estimator, n_features_to_select=5)
selector = selector.fit(X, y)
X_new = selector.transform(X)
```
Embedded Methods

This method involves incorporating feature selection into the model building process itself. This can be done using techniques such as Lasso regression, Ridge regression, or Decision Trees. These methods assign weights to each feature and features with low weights are removed from the model.

To implement embedded methods in Python, you can use the Lasso or Ridge regression functions from the sklearn.linear_model module. Below is a small code snippet for implementing embedded methods −
```
from sklearn.linear_model import Lasso

lasso = Lasso(alpha=0.1)
lasso.fit(X, y)
coef = pd.Series(lasso.coef_, index = X.columns)
important_features = coef[coef !=0]
```
Principal Component Analysis (PCA)

This is a type of unsupervised learning method that involves transforming the original features into a set of uncorrelated principal components that explain the maximum variance in the data. The number of principal components can be selected based on a threshold value, which can reduce the dimensionality of the dataset.

To implement PCA in Python, you can use the PCA function from the sklearn.decomposition module. For example, to reduce the number of features you can use PCA as given the following code −
```
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
X_new = pca.fit_transform(X)
```
Recursive Feature Elimination (RFE)

This method involves recursively eliminating the least significant features until a subset of the most important features is identified. It uses a model-based approach and can be computationally expensive, but it can yield good results in high-dimensional datasets.

To implement RFE in Python, you can use the RFECV (Recursive Feature Elimination with Cross Validation) function from the sklearn.feature_selection module. For example, below is a small code snippet with the help of which we can implement to use Recursive Feature Elimination −
```
from sklearn.feature_selection import RFECV
from sklearn.tree import DecisionTreeClassifier
estimator = DecisionTreeClassifier()
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
X_new = selector.transform(X)
```
These feature selection techniques can be used alone or in combination to improve the performance of machine learning models. It is important to choose the appropriate technique based on the size of the dataset, the nature of the features, and the type of model being used.

Example

In the below example, we will implement three feature selection methods − univariate feature selection using the chi-square test, recursive feature elimination with cross-validation (RFECV), and principal component analysis (PCA).

We will use the Breast Cancer Wisconsin (Diagnostic) Dataset, which is included in scikit-learn. This dataset contains 569 samples with 30 features, and the task is to classify whether a tumor is malignant or benign based on these features.

Here is the Python code to implement these feature selection methods on the Breast Cancer Wisconsin (Diagnostic) Dataset −
```
# Import necessary libraries and datasetimport pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Load the dataset
diabetes = pd.read_csv(r'C:\Users\Leekha\Desktop\diabetes.csv')# Split the dataset into features and target variable
X = diabetes.drop('Outcome', axis=1)
y = diabetes['Outcome']# Apply univariate feature selection using the chi-square test
selector = SelectKBest(chi2, k=4)
X_new = selector.fit_transform(X, y)# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.3, random_state=42)# Fit a logistic regression model on the selected features
clf = LogisticRegression()
clf.fit(X_train, y_train)# Evaluate the model on the test set
accuracy = clf.score(X_test, y_test)print("Accuracy using univariate feature selection: {:.2f}".format(accuracy))# Recursive feature elimination with cross-validation (RFECV)
estimator = LogisticRegression()
selector = RFECV(estimator, step=1, cv=5)
selector.fit(X, y)
X_new = selector.transform(X)
scores = cross_val_score(LogisticRegression(), X_new, y, cv=5)print("Accuracy using RFECV feature selection: %0.2f (+/- %0.2f)"%(scores.mean(), scores.std()*2))# PCA implementation
pca = PCA(n_components=5)
X_new = pca.fit_transform(X)
scores = cross_val_score(LogisticRegression(), X_new, y, cv=5)print("Accuracy using PCA feature selection: %0.2f (+/- %0.2f)"%(scores.mean(), scores.std()*2))
```
Output

When you execute this code, it will produce the following output on the terminal −
```
Accuracy using univariate feature selection: 0.74
Accuracy using RFECV feature selection: 0.77 (+/- 0.03)
Accuracy using PCA feature selection: 0.75 (+/- 0.07)
```
October 4, 2025
Machine Learning – Dimensionality Reduction
Dimensionality reduction in machine learning is the process of reducing the number of features or variables in a dataset while retaining as much of the original information as possible. In other words, it is a way of simplifying the data by reducing its complexity.

The need for dimensionality reduction arises when a dataset has a large number of features or variables. Having too many features can lead to overfitting and increase the complexity of the model. It can also make it difficult to visualize the data and can slow down the training process.

There are two main approaches to dimensionality reduction −

Feature Selection

This involves selecting a subset of the original features based on certain criteria, such as their importance or relevance to the target variable.

The following are some commonly used feature selection techniques −
- Filter Methods
- Wrapper Methods
- Embedded Methods
Feature Extraction

Feature extraction is a process of transforming raw data into a set of meaningful features that can be used for machine learning models. It involves reducing the dimensionality of the input data by selecting, combining or transforming features to create a new set of features that are more useful for the machine learning model.

Dimensionality reduction can improve the accuracy and speed of machine learning models, reduce overfitting, and simplify data visualization.
October 4, 2025
Agglomerative Clustering in Machine Learning
Agglomerative Clustering in Machine Learning

Agglomerative clustering is a hierarchical clustering algorithm that starts with each data point as its own cluster and iteratively merges the closest clusters until a stopping criterion is reached. It is a bottom-up approach that produces a dendrogram, which is a tree-like diagram that shows the hierarchical relationship between the clusters. The algorithm can be implemented using the scikit-learn library in Python.

Agglomerative Clustering Algorithm

Agglomerative Clustering is a hierarchical algorithm that creates a nested hierarchy of clusters by merging clusters in a bottom-up approach. This algorithm includes the following steps −
- Treat each data point as a single cluster
- Compute the proximity matrix using a distance metric
- Merge clusters based on a linkage criterion
- Update the distance matrix
- Repeat steps 3 and 4 until a single cluster remains
Why use Agglomerative Clustering?

The Agglomerative clustering allows easy interpretation of relationships between data points. Unlike k-means clustering, we do not need to specify the number of clusters. It is very efficient and can identify small clusters.

Implementation of Agglomerative Clustering in Python

We will use the iris dataset for demonstration. The first step is to import the necessary libraries and load the dataset.
```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage

iris = load_iris()
X = iris.data
y = iris.target
```
The next step is to create a linkage matrix that contains the distances between each pair of clusters. We can use the linkage function from the scipy.cluster.hierarchy module to create the linkage matrix.
```
Z = linkage(X,'ward')
```
The ‘ward’ method is used to calculate the distances between the clusters. It minimizes the variance of the distances between the clusters being merged.

We can visualize the dendrogram using the dendrogram function from the same module.
```
plt.figure(figsize=(7.5,3.5))
plt.title("Iris Dendrogram")
dendrogram(Z)
plt.show()
```
The resulting dendrogram (see the following plot) shows the hierarchical relationship between the clusters. We can see that the algorithm has merged the closest clusters first, and the distance between the clusters increases as we move up the tree.

The final step is to apply the clustering algorithm and extract the cluster labels. We can use the AgglomerativeClustering class from the sklearn.cluster module to apply the algorithm.
```
model = AgglomerativeClustering(n_clusters=3)
model.fit(X)
labels = model.labels_
```
The n_clusters parameter specifies the number of clusters to be extracted from the data. In this case, we have specified n_clusters=3 because we know that the iris dataset has three classes.

We can visualize the resulting clusters using a scatter plot.
```
plt.figure(figsize=(7.5,3.5))
plt.scatter(X[:,0], X[:,1], c=labels)
plt.xlabel("Sepal length")
plt.ylabel("Sepal width")
plt.title("Agglomerative Clustering Results")
plt.show()
```
The resulting plot shows the three clusters identified by the algorithm. We can see that the algorithm has successfully separated the data points into their respective classes.

Example

Here is the complete implementation of Agglomerative Clustering in Python −
```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
Z = linkage(X,'ward')# Plot the dendogram
plt.figure(figsize=(7.5,3.5))
plt.title("Iris Dendrogram")
dendrogram(Z)
plt.show()# create an instance of the AgglomerativeClustering class
model = AgglomerativeClustering(n_clusters=3)# fit the model to the dataset
model.fit(X)
labels = model.labels_

# Plot the results
plt.figure(figsize=(7.5,3.5))
plt.scatter(X[:,0], X[:,1], c=labels)
plt.xlabel("Sepal length")
plt.ylabel("Sepal width")
plt.title("Agglomerative Clustering Results")
plt.show()
```
Advantages of Agglomerative Clustering

Following are the advantages of using Agglomerative Clustering −
- Produces a dendrogram that shows the hierarchical relationship between the clusters.
- Can handle different types of distance metrics and linkage methods.
- Allows for a flexible number of clusters to be extracted from the data.
- Can handle large datasets with efficient implementations.
Disadvantages of Agglomerative Clustering

Following are some of the disadvantages of using Agglomerative Clustering −
- Can be computationally expensive for large datasets.
- Can produce imbalanced clusters if the distance metric or linkage method is not appropriate for the data.
- The final result may be sensitive to the choice of distance metric and linkage method used.
- The dendrogram may be difficult to interpret for large datasets with many clusters.
Applications of Agglomerative Clustering

You can find application of Agglomerative Clustering in many areas of unsupervised machine learning tasks. The following are some important areas of its applications in machine learning −
- Image Segmentation
- Document Clustering
- Customer Behaviour Analysis (Customer Segmentation)
- Market Segmentation
- Social Network Analysis
October 4, 2025
Machine Learning – Distribution-Based Clustering
Distribution-based clustering algorithms, also known as probabilistic clustering algorithms, are a class of machine learning algorithms that assume that the data points are generated from a mixture of probability distributions. These algorithms aim to identify the underlying probability distributions that generate the data, and use this information to cluster the data into groups with similar properties.

One common distribution-based clustering algorithm is the Gaussian Mixture Model (GMM). GMM assumes that the data points are generated from a mixture of Gaussian distributions, and aims to estimate the parameters of these distributions, including the means and covariances of each distribution. Let’s see below what is GMM in ML and how we can implement in Python programming language.

Gaussian Mixture Model

Gaussian Mixture Models (GMM) is a popular clustering algorithm used in machine learning that assumes that the data is generated from a mixture of Gaussian distributions. In other words, GMM tries to fit a set of Gaussian distributions to the data, where each Gaussian distribution represents a cluster in the data.

GMM has several advantages over other clustering algorithms, such as the ability to handle overlapping clusters, model the covariance structure of the data, and provide probabilistic cluster assignments for each data point. This makes GMM a popular choice in many applications, such as image segmentation, pattern recognition, and anomaly detection.

Implementation in Python

In Python, the Scikit-learn library provides the GaussianMixture class for implementing the GMM algorithm. The class takes several parameters, including the number of components (i.e., the number of clusters to identify), the covariance type, and the initialization method.

Here is an example of how to implement GMM using the Scikit-learn library in Python −

Example
```
from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# generate a dataset
X, _ = make_blobs(n_samples=200, centers=4, random_state=0)# create an instance of the GaussianMixture class
gmm = GaussianMixture(n_components=4)# fit the model to the dataset
gmm.fit(X)# predict the cluster labels for the data points
labels = gmm.predict(X)# print the cluster labelsprint("Cluster labels:", labels)
plt.figure(figsize=(7.5,3.5))
plt.scatter(X[:,0], X[:,1], c=labels, cmap='viridis')
plt.show()
```
In this example, we first generate a synthetic dataset using the make_blobs() function from Scikit-learn. We then create an instance of the GaussianMixture class with 4 components and fit the model to the dataset using the fit() method. Finally, we predict the cluster labels for the data points using the predict() method and print the resulting labels.

Output

When you execute this program, it will produce the following plot as the output −

In addition, you will get the following output on the terminal −
```
Cluster labels: [2 0 1 3 2 1 0 1 1 1 1 2 0 0 2 1 3 3 3 1 3 1 2 0 2 2 3 2 2 1 3 1 0 2 0 1 0
   1 1 3 3 3 3 1 2 0 1 3 3 1 3 0 0 3 2 3 0 2 3 2 3 1 2 1 3 1 2 3 0 0 2 2 1 1
   0 3 0 0 2 2 3 1 2 2 0 1 1 2 0 0 3 3 3 1 1 2 0 3 2 1 3 2 2 3 3 0 1 2 2 1 3
   0 0 2 2 1 2 0 3 1 3 0 1 2 1 0 1 0 2 1 0 2 1 3 3 0 3 3 2 3 2 0 2 2 2 2 1 2
   0 3 3 3 1 0 2 1 3 0 3 2 3 2 2 0 0 3 1 2 2 0 1 1 0 3 3 3 1 3 0 0 1 2 1 2 1
   0 0 3 1 3 2 2 1 3 0 0 0 1 3 1]
```
The covariance type parameter in GMM controls the type of covariance matrix to use for the Gaussian distributions. The available options include “full” (full covariance matrix), “tied” (tied covariance matrix for all clusters), “diag” (diagonal covariance matrix), and “spherical” (a single variance parameter for all dimensions). The initialization method parameter controls the method used to initialize the parameters of the Gaussian distributions.

Advantages of Gaussian Mixture Models

Following are the advantages of using Gaussian Mixture Models −
- Gaussian Mixture Models (GMM) can model arbitrary distributions of data, making it a flexible clustering algorithm.
- It can handle datasets with missing or incomplete data.
- It provides a probabilistic framework for clustering, which can provide more information about the uncertainty of the clustering results.
- It can be used for density estimation and generation of new data points that follow the same distribution as the original data.
- It can be used for semi-supervised learning, where some data points have known labels and are used to train the model.
Disadvantages of Gaussian Mixture Models

Following are some of the disadvantages of using Gaussian Mixture Models −
- GMM can be sensitive to the choice of initial parameters, such as the number of clusters and the initial values for the means and covariances of the clusters.
- It can be computationally expensive for high-dimensional datasets, as it involves computing the inverse of the covariance matrix, which can be expensive for large matrices.
- It assumes that the data is generated from a mixture of Gaussian distributions, which may not be true for all datasets.
- It may be prone to overfitting, especially when the number of parameters is large or the dataset is small.
- It can be difficult to interpret the resulting clusters, especially when the covariance matrices are complex.
October 4, 2025
Machine Learning – Affinity Propagation
Affinity Propagation is a clustering algorithm that identifies “exemplars” in a dataset and assigns each data point to one of these exemplars. It is a type of clustering algorithm that does not require a pre-specified number of clusters, making it a useful tool for exploratory data analysis. Affinity Propagation was introduced by Frey and Dueck in 2007 and has since been widely used in many fields such as biology, computer vision, and social network analysis.

The idea behind Affinity Propagation is to iteratively update two matrices: the responsibility matrix and the availability matrix. The responsibility matrix contains information about how well-suited each data point is to serve as an exemplar for another data point, while the availability matrix contains information about how much each data point wants to select another data point as an exemplar. The algorithm alternates between updating these two matrices until convergence is achieved. The final exemplars are chosen based on the maximum values in the responsibility matrix.

Implementation in Python

In Python, the Scikit-learn library provides the AffinityPropagation class for implementing the Affinity Propagation algorithm. The class takes several parameters, including the preference parameter, which controls how many exemplars are chosen, and the damping factor, which controls the convergence speed of the algorithm.

Here is an example of how to implement Affinity Propagation using the Scikit-learn library in Python −

Example
```
from sklearn.cluster import AffinityPropagation
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# generate a dataset
X, _ = make_blobs(n_samples=100, centers=4, random_state=0)# create an instance of the AffinityPropagation class
af = AffinityPropagation(preference=-50)# fit the model to the dataset
af.fit(X)# print the cluster labels and the exemplarsprint("Cluster labels:", af.labels_)print("Exemplars:", af.cluster_centers_indices_)#Plot the result
plt.figure(figsize=(7.5,3.5))
plt.scatter(X[:,0], X[:,1], c=af.labels_, cmap='viridis')
plt.scatter(af.cluster_centers_[:,0], af.cluster_centers_[:,1], marker='x', color='red')
plt.show()
```
In this example, we first generate a synthetic dataset using the make_blobs() function from Scikit-learn. We then create an instance of the AffinityPropagation class with a preference value of -50 and fit the model to the dataset using the fit() method. Finally, we print the cluster labels and the exemplars identified by the algorithm.

Output

When you execute this code, it will produce the following plot as the output −

In addition, it will print the following output on the terminal −
```
Cluster labels: [3 0 3 3 3 3 1 0 0 0 0 0 0 0 0 2 3 3 1 2 2 0 1 2 3 1 3 3 2 2 2 0 2 2 1 3 0 2 0 1 3 1 0 1 1 0 2 1 3 1 3 2 1 1 1 0 0 2 2 0 0 2 2 3 2 0 1 1 2 3 0 2 3 0 3 3 3 1 2 2 2 0 1 1 2 1 2 2 3 3 3 1 1 1 1 0 0 1 0 1]
Exemplars: [9 41 51 74]
```
The preference parameter in Affinity Propagation controls the number of exemplars that are chosen. A higher preference value leads to more exemplars, while a lower preference value leads to fewer exemplars. The damping factor controls the convergence speed of the algorithm, with larger damping factors leading to slower convergence.

Overall, Affinity Propagation is a powerful clustering algorithm that can identify the number of clusters automatically and does not require a pre-specified number of clusters. However, it can be computationally expensive and may not work well with very large datasets.

Advantages of Affinity Propagation

Following are the advantages of using Affinity Propagation −
- Affinity Propagation can identify the number of clusters automatically without specifying the number of clusters in advance.
- It can handle clusters of arbitrary shapes and sizes.
- It can handle datasets with noisy or incomplete data.
- It is relatively insensitive to the choice of initial parameters.
- It has been shown to outperform other clustering algorithms on certain types of datasets.
Disadvantages of Affinity Propagation

Following are some of the disadvantages of using Affinity Propagation −
- It can be computationally expensive for large datasets or datasets with many features.
- It may converge to suboptimal solutions, especially when the data has a high degree of variability or noise.
- It can be sensitive to the choice of the damping factor, which controls the rate of convergence.
- It may produce many small clusters or clusters with only one or a few members, which may not be meaningful.
- It can be difficult to interpret the resulting clusters, as the algorithm does not provide explicit information about the meaning or characteristics of the clusters.
October 4, 2025
Machine Learning – BIRCH Clustering
BIRCH (Balanced Iterative Reducing and Clustering hierarchies) is a hierarchical clustering algorithm that is designed to handle large datasets efficiently. The algorithm builds a treelike structure of clusters by recursively partitioning the data into subclusters until a stopping criterion is met.

BIRCH uses two main data structures to represent the clusters: Clustering Feature (CF) and Sub-Cluster Feature (SCF). CF is used to summarize the statistical properties of a set of data points, while SCF is used to represent the structure of subclusters.

BIRCH clustering has three main steps −
- Initialization − BIRCH constructs an empty tree structure and sets the maximum number of CFs that can be stored in a node.
- Clustering − BIRCH reads the data points one by one and adds them to the tree structure. If a CF is already present in a node, BIRCH updates the CF with the new data point. If there is no CF in the node, BIRCH creates a new CF for the data point. BIRCH then checks if the number of CFs in the node exceeds the maximum threshold. If the threshold is exceeded, BIRCH creates a new subcluster by recursively partitioning the CFs in the node.
- Refinement − BIRCH refines the tree structure by merging the subclusters that are similar based on a distance metric.
Implementation of BIRCH Clustering in Python

To implement BIRCH clustering in Python, we can use the scikit-learn library. The scikitlearn library provides a BIRCH class that implements the BIRCH algorithm.

Here is an example of how to use the BIRCH class to cluster a dataset −

Example
```
from sklearn.datasets import make_blobs
from sklearn.cluster import Birch
import matplotlib.pyplot as plt

# Generate sample data
X, y = make_blobs(n_samples=1000, centers=10, cluster_std=0.50,
random_state=0)# Cluster the data using BIRCH
birch = Birch(threshold=1.5, n_clusters=4)
birch.fit(X)
labels = birch.predict(X)# Plot the results
plt.figure(figsize=(7.5,3.5))
plt.scatter(X[:,0], X[:,1], c=labels, cmap='winter')
plt.show()
```
In this example, we first generate a sample dataset using the make_blobs function from scikit-learn. We then cluster the dataset using the BIRCH algorithm. For the BIRCH algorithm, we instantiate a Birch object with the threshold parameter set to 1.5 and the n_clusters parameter set to 4. We then fit the Birch object to the dataset using the fit method and predict the cluster labels using the predict method. Finally, we plot the results using a scatter plot.

Output

When you execute the given program, it will produce the following plot as the output −

Advantages of BIRCH Clustering

BIRCH clustering has several advantages over other clustering algorithms, including −
- Scalability − BIRCH is designed to handle large datasets efficiently by using a treelike structure to represent the clusters.
- Memory efficiency − BIRCH uses CF and SCF data structures to summarize the statistical properties of the data points, which reduces the memory required to store the clusters.
- Fast clustering − BIRCH can cluster the data points quickly because it uses an incremental clustering approach.
Disadvantages of BIRCH Clustering

BIRCH clustering also has some disadvantages, including −
- Sensitivity to parameter settings − The performance of BIRCH clustering can be sensitive to the choice of parameters, such as the maximum number of CFs that can be stored in a node and the threshold value used to create subclusters.
- Limited ability to handle non-spherical clusters − BIRCH assumes that the clusters are spherical, which means it may not perform well on datasets with nonspherical clusters.
- Limited flexibility in the choice of distance metric − BIRCH uses the Euclidean distance metric by default, which may not be appropriate for all datasets.
October 4, 2025

Blog

Example

Output

Advantages of High Correlation Filter

Disadvantages of High Correlation Filter

Example

Output

Implementation in Python

Example

Output

Example

Output

Advantages of Feature Extraction

Disadvantages of Feature Extraction

Filter Methods

Wrapper Methods

Embedded Methods

Principal Component Analysis (PCA)

Recursive Feature Elimination (RFE)

Example

Output

Feature Selection

Feature Extraction

Agglomerative Clustering in Machine Learning

Agglomerative Clustering Algorithm

Why use Agglomerative Clustering?

Implementation of Agglomerative Clustering in Python

Example

Advantages of Agglomerative Clustering

Disadvantages of Agglomerative Clustering

Applications of Agglomerative Clustering

Gaussian Mixture Model

Implementation in Python

Example

Output

Advantages of Gaussian Mixture Models

Disadvantages of Gaussian Mixture Models

Implementation in Python

Example

Output

Advantages of Affinity Propagation

Disadvantages of Affinity Propagation

Implementation of BIRCH Clustering in Python

Example

Output

Advantages of BIRCH Clustering

Disadvantages of BIRCH Clustering