Todays Artificial Intelligence (AI) has far surpassed the hype of blockchain and quantum computing. The developers now take advantage of this in creating new Machine Learning models and to re-train the existing models for better performance and results. This tutorial will give an introduction to machine learning and its implementation in Artificial Intelligence.
Blog
-
Machine Learning (ML) Interview Questions and Answers
If you are preparing for an machine learning (ML) interview, this guide provides the top 50+ machine learning interview questions and answers along with the detailed explanation covering from basics to advanced ML concepts.
These ML interview questions and answers are helpful for both freshers as well as experienced professionals. We have divided these questions into the following categories:
- Basic ML Concepts Interview Questions
- Intermediate ML Interview Questions
- Advanced ML Interview Questions
- Problem-Solving & Application-Oriented ML Interview Questions
Basic Machine Learning Interview Questions and Answers
1. Define Machine Learning?
Machine learning (ML) is a branch of AI that uses data to find patterns, make predictions or decisions without explicit program and advanced algorithms to enable machines to learn and response like a human. Machine learning is a branch of AI that enables systems to learn
2. What is supervised learning?
In supervised learning, a model is trained on labelled dataset for training. It is well known classification model. Some of the key supervised learning algorithms are Linear Regression, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines (SVM) and k-Nearest Neighbors (KNN).
3. What is unsupervised learning?
A machine learning model which is trained on unlabelled dataset for training is known as unsupervised learning. In unsupervised learning, algorithm identifies patterns, structures, or relationships within the data without pre-defined categories or labels. Common techniques include clustering, dimensionality reduction, and anomaly detection.
4. What is overfitting?
Overfitting occurs when a model learns noise from training data, resulting in poor generalization to unseen data. Hence, when a model performs well on training data but not well on test data or new data; this occurrence is known as Overfitting. Regularization, cross-validation, and pruning are some possible solutions to avoid Overfitting.
5. What is underfitting?
Underfitting happens when a model is too simple to capture data patterns and unable to find the relationship between the input and output variables in a dataset resulting in poor performance on both training and test sets.
6. How do you prevent overfitting?
Use techniques like cross-validation, regularization, early stopping, and adding more training data are most prominent methods to prevent overfitting.
7. Explain different methods to overcome overfitting in AI model?
Some of the most commonly used techniques to prevent overfitting are techniques are cross-validation, regularization, early stopping. A brief description of these is as −
- Cross-validation − Cross-validation helps to prevent overfitting by dividing the data into multiple subgroups, training the model on each subset, and verifying it on the remaining data to ensure that it generalizes well to new data.
- Regularization − Regularization slightly reduces in training accuracy for a gain in generalizability. It uses different strategies to reduce overfitting in machine learning models.
- Early stopping − Early stopping prevents overfitting by halting training once the model’s performance on a validation set starts to degrade, ensuring it doesn’t learn noise from the training data.
8. What is bias-variance tradeoff?
Its the balance between model complexity and accuracy, where high bias leads to underfitting and high variance leads to overfitting.
9. What is regularization?
Regularization slightly reduces in training accuracy for a gain in generalizability. It uses different strategies to reduce overfitting in machine learning models. Regularization adds a penalty to the loss function to reduce model complexity, helping prevent overfitting (e.g., L1, L2 regularization).
10. What is the difference between L1 and L2 regularization?
L1 regularization, also known as Lasso regularization, adds the absolute values penalty of the model’s coefficients to the loss function. It promotes sparsity. L2 regularization, also known as Ridge regularization, adds the squared penalty of the model’s coefficients to the loss function. It reduces large weights smoothly.
11. What is the curse of dimensionality in Machine Learning?
The curse of dimensionality states that as the number of dimensions or features in a dataset rises, the data space expands exponentially. This expansion causes data to become sparse, making effective analysis harder.
12. Why is feature scaling important in machine learning?
Feature scaling is an important pre-processing step in machine learning that entails converting numerical features to a common scale. It contributes significantly to accurate and efficient model training and performance. Scaling strategies seek to normalize the range, distribution, and size of features, decreasing any biases and inconsistencies caused by variances in their values. Overall, Feature scaling standardizes data, improving convergence in gradient-based models and distance-based algorithms.
13. What is Normalization?
Normalization, a key component of Feature Scaling, is a data preparation technique used to standardize the values of features in a dataset and bring them to a similar scale. This method improves data analysis and modeling accuracy by reducing the impact of different sizes on machine learning models. It can be measured using following formula −
X′=X−XminXmax−Xmin
14. What is Standardization?
Standardization is feature scaling method in which values are centred around the mean and have a unit standard deviation. This signifies that the attribute’s mean becomes zero, resulting in a distribution with a unit standard deviation. It can be measured using following formula −
X′=X−μσ
Here, μ is a mean value of feature values and σ is the standard deviation of the feature values.
15. Whats the difference between normalization and standardization?
Normalization adjusts data to a specified range, often [0, 1], by modifying each feature’s minimum and maximum values. It is beneficial when features have different sizes and distance-based techniques are used, whereas standardization converts data to have a mean of zero and a standard deviation of one. It preserves the form of the original distribution and is typically employed when features have multiple dimensions or the data follows a Gaussian (normal) distribution.
16. What is feature selection?
Feature selection is a process of selecting the most relevant features from a dataset to improve model performance, reduce overfitting, and reduce computing cost. It allows models to focus on relevant input variables, improving accuracy and efficiency in machine learning tasks. Feature selection identifies the most important features, reducing model complexity and potentially improving performance.
17. What is PCA?
Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms data into components capturing maximum variance. PCA is not only reduces dimensions but also capture the majority of the data’s variance. It is frequently used to simplify complex datasets, reduce noise, and enhance computational efficiency in machine learning applications.
18. What is cross-validation?
Cross-validation is a strategy for evaluating the performance of machine learning model that involves splitting the dataset into various subsets, training the model on some of them, and testing it on others. This improves the model’s generalizability and lowers overfitting by allowing for more reliable evaluation across multiple data splits.
19. What is imputation?
Imputation in machine learning is the process of replacing missing or incomplete values in a dataset with replaced values such as the mean, median, mode, or projections based on other attributes. This helps to maintain dataset integrity, allowing models to learn on entire data without being biased by missing elements.
20. How do you handle imbalanced data?
To deal with imbalanced data in machine learning, you can use techniques like resampling, synthetic data generation (SMOTE), or cost-sensitive learning to handle imbalanced datasets. Performance metrics is also well suited for imbalance, such as F1-score, precision-recall, or AUC-ROC.
21. What is data augmentation?
Data augmentation is a machine learning technique that adds variation to training data by introducing modifications like rotations, flips, or noise to existing samples. This improves model generalization, particularly in image and natural language processing applications, by allowing the model to learn robust features from a variety of data.
22. Define multicollinearity.
In a regression model, when two or more independent variables have a strong correlation with one another, making it difficult to evaluate each independent variable’s effect on the dependent variable is known as multicollinearity.
23. What is one-hot encoding?
One-hot encoding is a method of describing categorical data as numerical vectors in which each distinct category is represented by a binary number like 0 and 1; where 1 indicates presence and 0 indicates absence. It is a common approach to deal with categorical data in machine learning.
24. Why data cleaning is crucial for Machine Learning Models?
Data cleaning is a process of correcting or deleting inaccurate, corrupted, poorly formatted, duplicate, or incomplete data from a dataset. If the data is inaccurate, the outcomes and algorithms are untrustworthy, even if they appear in a proper form. Data cleaning is crucial because it provides consistency in a data set and allows you to get trustworthy findings from analysis you perform on it.
25. What is the difference between data cleaning and data transformation?
Data cleaning is a process of finding and fixing or deleting flaws, inconsistencies, and inaccuracies in raw data to ensure its accuracy and completeness. Data transformation, on the other hand, is changing data from one format or structure to another, usually in order to prepare it for analysis or make it compatible with multiple systems.
Intermediate Machine Learning Interview Questions and Answers
26. What is linear regression?
Linear regression is a statistical method used to find the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.
27. What is logistic regression?
Logistic regression is a classification algorithm that predicts probabilities using a logistic function. It estimates the probability of an event occurring, such as success or failure of an event, based on a given data of independent variables.
28. What is the difference between classification and regression?
Classification is a process of predicting discrete labels or classes like to detecting an email whether it is spam or not and producing categorical results. Regression, on the other hand, predicts continuous values like to predict house or stock prices with numerical outputs. Classification predicts discrete labels, while regression predicts continuous values. Overall, classification is about assigning labels, while regression is about predicting values.
29. Define decision trees.
A decision tree is a non-parametric supervised learning technique used for classification and regression. It divides data into branches based on feature values and makes predictions or classifications. It has a hierarchical tree structure that includes a root node, branches, internal nodes, and leaf nodes. Each node represents a decision point, splitting data depending on the best feature, and each branch leads to more splits until it reaches a leaf node, which produces prediction or result.
30. What is a random forest?
Random forest is a machine learning algorithm that builds multiple decision trees during training and combines their outputs to improve accuracy and reduce overfitting. Each tree in the forest is trained on a random subset of data, with random features chosen at each split, allowing the ensemble to capture diverse patterns. The final prediction is made by averaging (for regression) or voting (for classification) across all trees.
31. What is gradient boosting?
Gradient boosting is an ensemble machine learning technique that combines the predictions from multiple weak learners, typically decision trees, to form a robust predictive model. It creates models in a sequential manner, with each new model attempting to correct errors by minimizing the gradient of the loss function.
32. What is k-means clustering?
K-means clustering is an unsupervised machine learning approach that divides data into k different groups or clusters based on feature similarity. It iteratively assigns data points to clusters by reducing the distance between each point and the cluster center, and then updates the centers until the clusters are stable.
33. What is K-Nearest Neighbors (KNN)?
K-Nearest Neighbors (KNN) is a supervised machine learning technique used for classification and regression. It classifies data points based on the majority label of the “k” nearest data points in the feature space, then makes predictions by comparing new occurrences to previously known ones. The choice of “k” and distance metric affects its accuracy.
34. What is Naive Bayes?
Naive Bayes is a probabilistic machine learning technique based on Bayes’ theorem. It implies that features are independent of one another and is widely used for classification tasks such as spam detection and sentiment analysis due to its efficiency and performance on large datasets.
35. What is SVM (Support Vector Machine)?
Support Vector Machine (SVM) is a supervised machine learning technique used for classification and regression. It works by determining the best hyperplane that separates data points from distinct classes with maximum margin. SVMs are extremely effective in high-dimensional spaces and clear separation exists between classes.
Advance Level Machine Learning Interview Questions and Answers
36. What is a neural network?
A neural network is a deep learning model which mimic like a human brain and nervous system. It mainly consist nodes, or artificial neurons and three layers – an input layer, one or more hidden layers, and one output layer.
37. Define deep neural network?
A deep neural network (DNN) is an artificial neural network that includes multiple layers of interconnected nodes (neurons), each of which learns to extract progressively complicated features from the input data. It is an important architecture in deep learning since it enables models to automatically learn patterns and make predictions from large datasets.
38. What is an activation function?
An activation function determines that which neurons are triggered when information flows over the network’s layers. It is an essential component of neural networks, allowing them to learn complex patterns in data. Some of the most popular and commonly used activation functions in neural networks are ReLU, Leaky ReLU, Sigmoid, Tanh, and Softmax.
39. Define backpropagation.
Backpropagation is a deep learning technique that optimizes neural networks. The gradient of the loss function with respect to each weight is calculated using the chain rule, and the weights are then adjusted in the direction that minimizes the loss. This procedure is repeated iteratively throughout training to increase the model’s accuracy.
40. What is a convolutional neural network (CNN)?
A Convolutional Neural Network (CNN) is a deep learning model that is effectively work for image related datasets. It is made up with layers that automatically recognize features using convolutional filters, followed by pooling layers to reduce dimensionality and fully connected layers for classification or regression.
41. What is a recurrent neural network (RNN)?
RNNs process sequential data by retaining information from previous steps, useful in time-series and NLP. A Recurrent Neural Network (RNN) is a type of neural network that processes sequential data by keeping track of previous inputs using internal states. It is especially beneficial in applications that need data ordering, such as time series prediction, natural language processing, and speech recognition.
42. What is overfitting in neural networks?
When a model performs well on training data but not well on test data or new data; this occurrence is known as Overfitting. Regularization, cross-validation, and pruning are some possible solutions to avoid Overfitting.
43. What is dropout?
Dropout is a deep learning regularization method in which randomly selected neurons are dropped out with a specific probability during training. This helps to prevent overfitting by forcing the network to acquire redundant representations, resulting in better generalization to new data.
44. What is batch normalization?
Batch normalization is a deep learning approach for normalizing the input of each layer in a neural network by modifying and scaling activations. It improves training speed, stability, and performance by minimizing internal covariate shift, resulting in more constant gradient flows during training.
45. What is a GAN (Generative Adversarial Network)?
A Generative Adversarial Network (GAN) is a deep learning model made up of two neural networks, a generator and a discriminator. The generator generates fake data, while the discriminator tries to tell the difference between actual and fake data. The two networks compete and improve each other until the generator produces accurate data.
Problem-Solving & Application Oriented Machine Learning Interview Questions and Answers
46. What is model deployment?
Model deployment in machine learning is a process of integrating a trained model into a real scenario to make real-time predictions or choices based on new data. This includes getting the model ready for usage, assuring scalability, and monitoring its performance over time.
47. What is hyperparameter tuning?
In machine learning, hyperparameter tuning is the process of determining the ideal combination of hyperparameters (settings or configurations) for a model in order to optimize performance. It entails experimenting with different values for hyperparameters such as learning rate, batch size, and regularization strength, often using techniques such as grid search or random search.
48. What is grid search?
Grid search is a hyperparameter optimization strategy in machine learning that trains and evaluates a model on a predefined set of hyperparameter combinations. It searches systematically through all possible combinations of supplied hyperparameters to determine the optimal configuration based on performance metrics.
49. What is random search?
Random search is a hyperparameter optimization strategy that selects random combinations of hyperparameters from a predetermined search space. It is frequently used in machine learning to determine the optimal model configuration, particularly when the search space is huge and grid search is computationally expensive.
50. What are ensemble methods?
Ensemble methods combine multiple models to improve accuracy and robustness (e.g., bagging, boosting).
-
Machine Learning Cheatsheet
This machine learning cheatsheet serves as a quick reference guide for key concepts and commonly used algorithms in machine learning. It includes essential topics such as supervised learning, unsupervised learning, and reinforcement learning, as well as commonly used algorithms like linear regression and decision trees. This machine learning (ML) cheatsheet is valuable for anyone interested in machine learning.
Table of Contents
- Supervised Machine Learning
- Supervised Machine Learning Algorithms
- Unsupervised Machine Learning
- Unsupervised Machine Learning Algorithms
- Reinforcement Learning
- Reinforcement Learning Algorithms
Supervised Machine Learning
Supervised machine learning is a type of machine learning that trains the algorithms using labeled datasets to predict outcomes.
The main objective of supervised learning is to make algorithms learn an association between input data samples and corresponding outputs after performing multiple training data instances.
Supervised Machine Learning Algorithms
Supervised learning algorithms are categorized into two types of tasks – classification and regression. Below, we have listed commonly used supervised machine learning algorithms, their applications, advantages and disadvantages.
Algorithm Description Applications Advantages Disadvantages Linear Regression Predicts a continuous numerical value based on a linear relationship between input and output variables. Predicting house prices, stock prices, sales figures. Simple to implement, interpretable, efficient. Sensitive to outliers, assumes linearity. Logistic Regression Predicts a categorical value (e.g., binary classification) using a logistic function. Classifying email as spam or not spam, predicting customer churn. Interpretable, efficient, can handle categorical features. Prone to overfitting, limited to linear relationships. Ridge Regression Regularized linear regression that adds a penalty term to the loss function to prevent overfitting. Regression tasks, feature selection. Can handle multicollinearity, improves model generalization. Requires tuning the regularization parameter. Lasso Regression Regularized linear regression that adds a penalty term to the loss function to encourage sparsity (feature selection). Regression tasks, feature selection. Can handle multicollinearity, performs feature selection. May introduce bias in feature selection. K-Nearest Neighbors (KNN) Classifies or predicts the value of a new data point based on the majority class or average value of its k nearest neighbors in the training dataset. Classification, regression, recommendation systems. Simple to implement, no training phase required, can handle non-linear relationships. Can be computationally expensive for large datasets, sensitive to the choice of distance metric and the value of k. Support Vector Machines (SVMs) Finds the optimal hyperplane to separate data points into different classes. Image classification, text classification, anomaly detection. Effective for high-dimensional data, handles non-linear relationships with kernels. Can be computationally expensive for large datasets, sensitive to outliers. Decision Tree Creates a tree-like model to make decisions based on a series of rules. Classification, regression, predictive modeling. Easy to understand and interpret, can handle both numerical and categorical features. Prone to overfitting, can be sensitive to small changes in data. Random Forests An ensemble of decision trees, combining multiple models to improve accuracy and reduce overfitting. Classification, regression, predictive modeling. More accurate than individual decision trees, robust to noise and outliers. Can be computationally expensive for large datasets. Naive Bayes A probabilistic classifier based on Bayes’ theorem, assuming independence of features. Text classification, spam filtering, sentiment analysis. Simple to implement, efficient, can handle categorical and numerical features. Assumes independence of features, which may not always hold true. Gradient Boosting Regression An ensemble method that iteratively trains weak models to improve accuracy. Regression, classification, predictive modeling. Highly accurate, can handle complex relationships. Can be computationally expensive, requires careful tuning of hyperparameters. XGBoost A scalable and efficient gradient boosting framework. Regression, classification, ranking. Highly accurate, efficient, can handle large datasets. Can be complex to configure. LightGBM Regressor A gradient boosting framework that uses histograms and gradient boosting for efficient training. Regression, classification, ranking. Faster than XGBoost, efficient for large datasets. May have slightly lower accuracy than XGBoost in some cases. Neural Networks (Deep Learning) Complex models with multiple layers, capable of learning complex patterns and relationships. Image classification, natural language processing, speech recognition. Highly accurate, can handle complex tasks. Can be computationally expensive, requires careful tuning of hyperparameters. Unsupervised Machine Learning
Unsupervised machine learning is a type of machine learning that learns patterns and structures within the data without human supervision. Unsupervised learning uses machine learning algorithms to analyze the data and discover underlying patterns within unlabeled data sets.
Unsupervised Machine Learning Algorithms
Unsupervised learning algorithms are categorised into three categories − clustering, association, and dimensionality reduction. Below, we have listed commonly used unsupervised machine learning algorithms, their applications, advantages and disadvantages.
Algorithm Description Applications Advantages Disadvantages K-Means Clustering Partitions data into K clusters based on similarity. Customer segmentation, image segmentation, anomaly detection. Simple to implement, efficient, can handle large datasets. Requires specifying the number of clusters, sensitive to initialization. Hierarchical Clustering Creates a hierarchy of clusters, either agglomerative (bottom-up) or divisive (top-down). Customer segmentation, image segmentation, outlier detection. Can reveal hierarchical structures, doesn’t require specifying the number of clusters. Can be computationally expensive for large datasets, sensitive to distance metrics. Principal Component Analysis (PCA) Reduces the dimensionality of data while preserving the most important features. Data visualization, feature engineering, noise reduction. Efficient, can reveal underlying patterns in data. May lose some information in the dimensionality reduction process. Singular Value Decomposition (SVD) Decomposes a matrix into its singular values and vectors. Data analysis, recommendation systems, image compression. Can be used for dimensionality reduction and feature extraction. Can be computationally expensive for large matrices. Independent Component Analysis (ICA) Identifies independent sources of signals from mixed observations. Blind source separation, signal processing. Can separate mixed signals, useful in applications like speech recognition. Can be sensitive to initialization and assumptions about the independence of sources. Gaussian Mixture Model (GMM) Models data as a mixture of Gaussian distributions, assuming each cluster is generated from a Gaussian distribution. Clustering, density estimation, anomaly detection. Can handle complex data distributions, flexible. Can be computationally expensive, sensitive to initialization. Apriori Algorithm A frequent itemset mining algorithm used to discover associations between items in a dataset. Market basket analysis, recommendation systems. Efficient for finding frequent itemsets, can be used for association rule mining. May not be suitable for large datasets with many items. t-SNE Non-linear dimensionality reduction technique that preserves local structure. Data visualization, clustering, anomaly detection. Effective for visualizing high-dimensional data in low-dimensional space. Can be computationally expensive, sensitive to parameters. UMAP Another non-linear dimensionality reduction technique that preserves global structure and local relationships. Data visualization, clustering, anomaly detection. Often faster and more scalable than t-SNE, preserves global structure well. May require careful parameter tuning. Reinforcement Learning
Reinforcement learning is a type of machine learning where an agent (generally a software entity) is trained to interpret the environment by performing actions and monitoring the results. For every good action, the agent gets positive feedback and for every bad action the agent gets negative feedback. It’s inspired by how animals learn from their experiences, making decisions based on the consequences of their actions.
Reinforcement Learning Algorithms
In this section, we have listed some well known reinforcement learning algorithms, their applications, advantages and disadvantages.
Algorithm Description Applications Advantages Disadvantages Q-Learning Off-policy learning algorithm that learns the optimal action-value function. Game playing, robotics, control systems. Simple to implement, can handle complex environments. Can be computationally expensive for large state spaces. SARSA On-policy learning algorithm that updates the action-value function based on the current policy. Game playing, robotics, control systems. Can handle continuous action spaces, suitable for online learning. Can be sensitive to exploration-exploitation trade-off. Deep Q-Networks (DQN) Combines deep learning with Q-learning, using a neural network to approximate the action-value function. Atari game playing, robotics, self-driving cars. Can handle complex environments with large state and action spaces. Requires careful tuning of hyperparameters, can be computationally expensive. Policy Gradients Directly optimizes the policy function to maximize rewards. Robotics, game playing, natural language processing. Can handle continuous action spaces, can be more sample efficient than value-based methods. Can be sensitive to noise and instability. Actor-Critic Combines policy-based and value-based methods, using both a policy function and a value function. Robotics, game playing, natural language processing. Can be more stable and efficient than pure policy-based or value-based methods. Requires careful balancing of exploration and exploitation. Asynchronous Advantage Actor-Critic (A3C) A parallel version of actor-critic that can handle complex environments with large state spaces. Robotics, game playing, natural language processing. Can be more efficient than traditional actor-critic methods, suitable for distributed training. Can be complex to implement. - Supervised Machine Learning
-
Machine Learning – Quick Guide
Machine Learning – Introduction
Todays Artificial Intelligence (AI) has far surpassed the hype of blockchain and quantum computing. This is due to the fact that huge computing resources are easily available to the common man. The developers now take advantage of this in creating new Machine Learning models and to re-train the existing models for better performance and results. The easy availability of High Performance Computing (HPC) has resulted in a sudden increased demand for IT professionals having Machine Learning skills.
In this tutorial, you will learn in detail about −
What is the crux of machine learning?
- What are the different types in machine learning?
- What are the different algorithms available for developing machine learning models?
- What tools are available for developing these models?
- What are the programming language choices?
- What platforms support development and deployment of Machine Learning applications?
- What IDEs (Integrated Development Environment) are available?
- How to quickly upgrade your skills in this important area?
Machine Learning – What Todays AI Can Do?
When you tag a face in a Facebook photo, it is AI that is running behind the scenes and identifying faces in a picture. Face tagging is now omnipresent in several applications that display pictures with human faces. Why just human faces? There are several applications that detect objects such as cats, dogs, bottles, cars, etc. We have autonomous cars running on our roads that detect objects in real time to steer the car. When you travel, you use Google Directions to learn the real-time traffic situations and follow the best path suggested by Google at that point of time. This is yet another implementation of object detection technique in real time.
Let us consider the example of Google Translate application that we typically use while visiting foreign countries. Googles online translator app on your mobile helps you communicate with the local people speaking a language that is foreign to you.
There are several applications of AI that we use practically today. In fact, each one of us use AI in many parts of our lives, even without our knowledge. Todays AI can perform extremely complex jobs with a great accuracy and speed. Let us discuss an example of complex task to understand what capabilities are expected in an AI application that you would be developing today for your clients.
Example
We all use Google Directions during our trip anywhere in the city for a daily commute or even for inter-city travels. Google Directions application suggests the fastest path to our destination at that time instance. When we follow this path, we have observed that Google is almost 100% right in its suggestions and we save our valuable time on the trip.
You can imagine the complexity involved in developing this kind of application considering that there are multiple paths to your destination and the application has to judge the traffic situation in every possible path to give you a travel time estimate for each such path. Besides, consider the fact that Google Directions covers the entire globe. Undoubtedly, lots of AI and Machine Learning techniques are in-use under the hoods of such applications.
Considering the continuous demand for the development of such applications, you will now appreciate why there is a sudden demand for IT professionals with AI skills.
In our next chapter, we will learn what it takes to develop AI programs.
Machine Learning – Traditional AI
The journey of AI began in the 1950’s when the computing power was a fraction of what it is today. AI started out with the predictions made by the machine in a fashion a statistician does predictions using his calculator. Thus, the initial entire AI development was based mainly on statistical techniques.
In this chapter, let us discuss in detail what these statistical techniques are.
Statistical Techniques
The development of todays AI applications started with using the age-old traditional statistical techniques. You must have used straight-line interpolation in schools to predict a future value. There are several other such statistical techniques which are successfully applied in developing so-called AI programs. We say so-called because the AI programs that we have today are much more complex and use techniques far beyond the statistical techniques used by the early AI programs.
Some of the examples of statistical techniques that are used for developing AI applications in those days and are still in practice are listed here −
- Regression
- Classification
- Clustering
- Probability Theories
- Decision Trees
Here we have listed only some primary techniques that are enough to get you started on AI without scaring you of the vastness that AI demands. If you are developing AI applications based on limited data, you would be using these statistical techniques.
However, today the data is abundant. To analyze the kind of huge data that we possess statistical techniques are of not much help as they have some limitations of their own. More advanced methods such as deep learning are hence developed to solve many complex problems.
As we move ahead in this tutorial, we will understand what Machine Learning is and how it is used for developing such complex AI applications.
Machine Learning – What is Machine Learning?
Consider the following figure that shows a plot of house prices versus its size in sq. ft.
After plotting various data points on the XY plot, we draw a best-fit line to do our predictions for any other house given its size. You will feed the known data to the machine and ask it to find the best fit line. Once the best fit line is found by the machine, you will test its suitability by feeding in a known house size, i.e. the Y-value in the above curve. The machine will now return the estimated X-value, i.e. the expected price of the house. The diagram can be extrapolated to find out the price of a house which is 3000 sq. ft. or even larger. This is called regression in statistics. Particularly, this kind of regression is called linear regression as the relationship between X & Y data points is linear.
In many cases, the relationship between the X & Y data points may not be a straight line, and it may be a curve with a complex equation. Your task would be now to find out the best fitting curve which can be extrapolated to predict the future values. One such application plot is shown in the figure below.
Source:
You will use the statistical optimization techniques to find out the equation for the best fit curve here. And this is what exactly Machine Learning is about. You use known optimization techniques to find the best solution to your problem.
Next, let us look at the different categories of Machine Learning.
Machine Learning – Categories
Machine Learning is broadly categorized under the following headings −
Machine learning evolved from left to right as shown in the above diagram.
- Initially, researchers started out with Supervised Learning. This is the case of housing price prediction discussed earlier.
- This was followed by unsupervised learning, where the machine is made to learn on its own without any supervision.
- Scientists discovered further that it may be a good idea to reward the machine when it does the job the expected way and there came the Reinforcement Learning.
- Very soon, the data that is available these days has become so humongous that the conventional techniques developed so far failed to analyze the big data and provide us the predictions.
- Thus, came the deep learning where the human brain is simulated in the Artificial Neural Networks (ANN) created in our binary computers.
- The machine now learns on its own using the high computing power and huge memory resources that are available today.
- It is now observed that Deep Learning has solved many of the previously unsolvable problems.
- The technique is now further advanced by giving incentives to Deep Learning networks as awards and there finally comes Deep Reinforcement Learning.
Let us now study each of these categories in more detail.
Supervised Learning
Supervised learning is analogous to training a child to walk. You will hold the childs hand, show him how to take his foot forward, walk yourself for a demonstration and so on, until the child learns to walk on his own.
Regression
Similarly, in the case of supervised learning, you give concrete known examples to the computer. You say that for given feature value x1 the output is y1, for x2 it is y2, for x3 it is y3, and so on. Based on this data, you let the computer figure out an empirical relationship between x and y.
Once the machine is trained in this way with a sufficient number of data points, now you would ask the machine to predict Y for a given X. Assuming that you know the real value of Y for this given X, you will be able to deduce whether the machines prediction is correct.
Thus, you will test whether the machine has learned by using the known test data. Once you are satisfied that the machine is able to do the predictions with a desired level of accuracy (say 80 to 90%) you can stop further training the machine.
Now, you can safely use the machine to do the predictions on unknown data points, or ask the machine to predict Y for a given X for which you do not know the real value of Y. This training comes under the regression that we talked about earlier.
Classification
You may also use machine learning techniques for classification problems. In classification problems, you classify objects of similar nature into a single group. For example, in a set of 100 students say, you may like to group them into three groups based on their heights – short, medium and long. Measuring the height of each student, you will place them in a proper group.
Now, when a new student comes in, you will put him in an appropriate group by measuring his height. By following the principles in regression training, you will train the machine to classify a student based on his feature the height. When the machine learns how the groups are formed, it will be able to classify any unknown new student correctly. Once again, you would use the test data to verify that the machine has learned your technique of classification before putting the developed model in production.
Supervised Learning is where the AI really began its journey. This technique was applied successfully in several cases. You have used this model while doing the hand-written recognition on your machine. Several algorithms have been developed for supervised learning. You will learn about them in the following chapters.
Unsupervised Learning
In unsupervised learning, we do not specify a target variable to the machine, rather we ask machine What can you tell me about X?. More specifically, we may ask questions such as given a huge data set X, What are the five best groups we can make out of X? or What features occur together most frequently in X?. To arrive at the answers to such questions, you can understand that the number of data points that the machine would require to deduce a strategy would be very large. In case of supervised learning, the machine can be trained with even about few thousands of data points. However, in case of unsupervised learning, the number of data points that is reasonably accepted for learning starts in a few millions. These days, the data is generally abundantly available. The data ideally requires curating. However, the amount of data that is continuously flowing in a social area network, in most cases data curation is an impossible task.
The following figure shows the boundary between the yellow and red dots as determined by unsupervised machine learning. You can see it clearly that the machine would be able to determine the class of each of the black dots with a fairly good accuracy.
Source:
https://chrisjmccormick.files.wordpress.com/2013/08/approx_decision_boun dary.png
The unsupervised learning has shown a great success in many modern AI applications, such as face detection, object detection, and so on.
Reinforcement Learning
Consider training a pet dog, we train our pet to bring a ball to us. We throw the ball at a certain distance and ask the dog to fetch it back to us. Every time the dog does this right, we reward the dog. Slowly, the dog learns that doing the job rightly gives him a reward and then the dog starts doing the job right way every time in future. Exactly, this concept is applied in Reinforcement type of learning. The technique was initially developed for machines to play games. The machine is given an algorithm to analyze all possible moves at each stage of the game. The machine may select one of the moves at random. If the move is right, the machine is rewarded, otherwise it may be penalized. Slowly, the machine will start differentiating between right and wrong moves and after several iterations would learn to solve the game puzzle with a better accuracy. The accuracy of winning the game would improve as the machine plays more and more games.
The entire process may be depicted in the following diagram −
This technique of machine learning differs from the supervised learning in that you need not supply the labelled input/output pairs. The focus is on finding the balance between exploring the new solutions versus exploiting the learned solutions.
Deep Learning
The deep learning is a model based on Artificial Neural Networks (ANN), more specifically Convolutional Neural Networks (CNN)s. There are several architectures used in deep learning such as deep neural networks, deep belief networks, recurrent neural networks, and convolutional neural networks.
These networks have been successfully applied in solving the problems of computer vision, speech recognition, natural language processing, bioinformatics, drug design, medical image analysis, and games. There are several other fields in which deep learning is proactively applied. The deep learning requires huge processing power and humongous data, which is generally easily available these days.
We will talk about deep learning more in detail in the coming chapters.
Deep Reinforcement Learning
The Deep Reinforcement Learning (DRL) combines the techniques of both deep and reinforcement learning. The reinforcement learning algorithms like Q-learning are now combined with deep learning to create a powerful DRL model. The technique has been with a great success in the fields of robotics, video games, finance and healthcare. Many previously unsolvable problems are now solved by creating DRL models. There is lots of research going on in this area and this is very actively pursued by the industries.
So far, you have got a brief introduction to various machine learning models, now let us explore slightly deeper into various algorithms that are available under these models.
Machine Learning – Supervised
Supervised learning is one of the important models of learning involved in training machines. This chapter talks in detail about the same.
Algorithms for Supervised Learning
There are several algorithms available for supervised learning. Some of the widely used algorithms of supervised learning are as shown below −
- k-Nearest Neighbours
- Decision Trees
- Naive Bayes
- Logistic Regression
- Support Vector Machines
As we move ahead in this chapter, let us discuss in detail about each of the algorithms.
k-Nearest Neighbours
The k-Nearest Neighbours, which is simply called kNN is a statistical technique that can be used for solving for classification and regression problems. Let us discuss the case of classifying an unknown object using kNN. Consider the distribution of objects as shown in the image given below −
Source:
The diagram shows three types of objects, marked in red, blue and green colors. When you run the kNN classifier on the above dataset, the boundaries for each type of object will be marked as shown below −
Source:
Now, consider a new unknown object that you want to classify as red, green or blue. This is depicted in the figure below.
As you see it visually, the unknown data point belongs to a class of blue objects. Mathematically, this can be concluded by measuring the distance of this unknown point with every other point in the data set. When you do so, you will know that most of its neighbours are of blue color. The average distance to red and green objects would be definitely more than the average distance to blue objects. Thus, this unknown object can be classified as belonging to blue class.
The kNN algorithm can also be used for regression problems. The kNN algorithm is available as ready-to-use in most of the ML libraries.
Decision Trees
A simple decision tree in a flowchart format is shown below −
You would write a code to classify your input data based on this flowchart. The flowchart is self-explanatory and trivial. In this scenario, you are trying to classify an incoming email to decide when to read it.
In reality, the decision trees can be large and complex. There are several algorithms available to create and traverse these trees. As a Machine Learning enthusiast, you need to understand and master these techniques of creating and traversing decision trees.
Naive Bayes
Naive Bayes is used for creating classifiers. Suppose you want to sort out (classify) fruits of different kinds from a fruit basket. You may use features such as color, size and shape of a fruit, For example, any fruit that is red in color, is round in shape and is about 10 cm in diameter may be considered as Apple. So to train the model, you would use these features and test the probability that a given feature matches the desired constraints. The probabilities of different features are then combined to arrive at a probability that a given fruit is an Apple. Naive Bayes generally requires a small number of training data for classification.
Logistic Regression
Look at the following diagram. It shows the distribution of data points in XY plane.
From the diagram, we can visually inspect the separation of red dots from green dots. You may draw a boundary line to separate out these dots. Now, to classify a new data point, you will just need to determine on which side of the line the point lies.
Support Vector Machines
Look at the following distribution of data. Here the three classes of data cannot be linearly separated. The boundary curves are non-linear. In such a case, finding the equation of the curve becomes a complex job.
Source: http://uc-r.github.io/svm
The Support Vector Machines (SVM) comes handy in determining the separation boundaries in such situations.
Machine Learning – Scikit-learn Algorithm
Fortunately, most of the time you do not have to code the algorithms mentioned in the previous lesson. There are many standard libraries which provide the ready-to-use implementation of these algorithms. One such toolkit that is popularly used is scikit-learn. The figure below illustrates the kind of algorithms which are available for your use in this library.
Source: https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
The use of these algorithms is trivial and since these are well and field tested, you can safely use them in your AI applications. Most of these libraries are free to use even for commercial purposes.
Machine Learning – Unsupervised
So far what you have seen is making the machine learn to find out the solution to our target. In regression, we train the machine to predict a future value. In classification, we train the machine to classify an unknown object in one of the categories defined by us. In short, we have been training machines so that it can predict Y for our data X. Given a huge data set and not estimating the categories, it would be difficult for us to train the machine using supervised learning. What if the machine can look up and analyze the big data running into several Gigabytes and Terabytes and tell us that this data contains so many distinct categories?
As an example, consider the voters data. By considering some inputs from each voter (these are called features in AI terminology), let the machine predict that there are so many voters who would vote for X political party and so many would vote for Y, and so on. Thus, in general, we are asking the machine given a huge set of data points X, What can you tell me about X?. Or it may be a question like What are the five best groups we can make out of X?. Or it could be even like What three features occur together most frequently in X?.
This is exactly the Unsupervised Learning is all about.
Algorithms for Unsupervised Learning
Let us now discuss one of the widely used algorithms for classification in unsupervised machine learning.
k-means clustering
The 2000 and 2004 Presidential elections in the United States were close very close. The largest percentage of the popular vote that any candidate received was 50.7% and the lowest was 47.9%. If a percentage of the voters were to have switched sides, the outcome of the election would have been different. There are small groups of voters who, when properly appealed to, will switch sides. These groups may not be huge, but with such close races, they may be big enough to change the outcome of the election. How do you find these groups of people? How do you appeal to them with a limited budget? The answer is clustering.
Let us understand how it is done.
- First, you collect information on people either with or without their consent: any sort of information that might give some clue about what is important to them and what will influence how they vote.
- Then you put this information into some sort of clustering algorithm.
- Next, for each cluster (it would be smart to choose the largest one first) you craft a message that will appeal to these voters.
- Finally, you deliver the campaign and measure to see if its working.
Clustering is a type of unsupervised learning that automatically forms clusters of similar things. It is like automatic classification. You can cluster almost anything, and the more similar the items are in the cluster, the better the clusters are. In this chapter, we are going to study one type of clustering algorithm called k-means. It is called k-means because it finds k unique clusters, and the center of each cluster is the mean of the values in that cluster.
Cluster Identification
Cluster identification tells an algorithm, Heres some data. Now group similar things together and tell me about those groups. The key difference from classification is that in classification you know what you are looking for. While that is not the case in clustering.
Clustering is sometimes called unsupervised classification because it produces the same result as classification does but without having predefined classes.
Now, we are comfortable with both supervised and unsupervised learning. To understand the rest of the machine learning categories, we must first understand Artificial Neural Networks (ANN), which we will learn in the next chapter.
Machine Learning – Artificial Neural Networks
The idea of artificial neural networks was derived from the neural networks in the human brain. The human brain is really complex. Carefully studying the brain, the scientists and engineers came up with an architecture that could fit in our digital world of binary computers. One such typical architecture is shown in the diagram below −
There is an input layer which has many sensors to collect data from the outside world. On the right hand side, we have an output layer that gives us the result predicted by the network. In between these two, several layers are hidden. Each additional layer adds further complexity in training the network, but would provide better results in most of the situations. There are several types of architectures designed which we will discuss now.
ANN Architectures
The diagram below shows several ANN architectures developed over a period of time and are in practice today.
Source:
Each architecture is developed for a specific type of application. Thus, when you use a neural network for your machine learning application, you will have to use either one of the existing architecture or design your own. The type of application that you finally decide upon depends on your application needs. There is no single guideline that tells you to use a specific network architecture.
Machine Learning – Deep Learning
Deep Learning uses ANN. First we will look at a few deep learning applications that will give you an idea of its power.
Applications
Deep Learning has shown a lot of success in several areas of machine learning applications.
Self-driving Cars â The autonomous self-driving cars use deep learning techniques. They generally adapt to the ever changing traffic situations and get better and better at driving over a period of time.
Speech Recognition â Another interesting application of Deep Learning is speech recognition. All of us use several mobile apps today that are capable of recognizing our speech. Apples Siri, Amazons Alexa, Microsofts Cortena and Googles Assistant all these use deep learning techniques.
Mobile Apps â We use several web-based and mobile apps for organizing our photos. Face detection, face ID, face tagging, identifying objects in an image all these use deep learning.
Untapped Opportunities of Deep Learning
After looking at the great success deep learning applications have achieved in many domains, people started exploring other domains where machine learning was not so far applied. There are several domains in which deep learning techniques are successfully applied and there are many other domains which can be exploited. Some of these are discussed here.
- Agriculture is one such industry where people can apply deep learning techniques to improve the crop yield.
- Consumer finance is another area where machine learning can greatly help in providing early detection on frauds and analyzing customers ability to pay.
- Deep learning techniques are also applied to the field of medicine to create new drugs and provide a personalized prescription to a patient.
The possibilities are endless and one has to keep watching as the new ideas and developments pop up frequently.
What is Required for Achieving More Using Deep Learning
To use deep learning, supercomputing power is a mandatory requirement. You need both memory as well as the CPU to develop deep learning models. Fortunately, today we have an easy availability of HPC High Performance Computing. Due to this, the development of the deep learning applications that we mentioned above became a reality today and in the future too we can see the applications in those untapped areas that we discussed earlier.
Now, we will look at some of the limitations of deep learning that we must consider before using it in our machine learning application.
Deep Learning Disadvantages
Some of the important points that you need to consider before using deep learning are listed below −
- Black Box approach
- Duration of Development
- Amount of Data
- Computationally Expensive
We will now study each one of these limitations in detail.
Black Box approach
An ANN is like a blackbox. You give it a certain input and it will provide you a specific output. The following diagram shows you one such application where you feed an animal image to a neural network and it tells you that the image is of a dog.
Why this is called a black-box approach is that you do not know why the network came up with a certain result. You do not know how the network concluded that it is a dog? Now consider a banking application where the bank wants to decide the creditworthiness of a client. The network will definitely provide you an answer to this question. However, will you be able to justify it to a client? Banks need to explain it to their customers why the loan is not sanctioned?
Duration of Development
The process of training a neural network is depicted in the diagram below −
You first define the problem that you want to solve, create a specification for it, decide on the input features, design a network, deploy it and test the output. If the output is not as expected, take this as a feedback to restructure your network. This is an iterative process and may require several iterations until the time network is fully trained to produce desired outputs.
Amount of Data
The deep learning networks usually require a huge amount of data for training, while the traditional machine learning algorithms can be used with a great success even with just a few thousands of data points. Fortunately, the data abundance is growing at 40% per year and CPU processing power is growing at 20% per year as seen in the diagram given below −
Computationally Expensive
Training a neural network requires several times more computational power than the one required in running traditional algorithms. Successful training of deep Neural Networks may require several weeks of training time.
In contrast to this, traditional machine learning algorithms take only a few minutes/hours to train. Also, the amount of computational power needed for training deep neural network heavily depends on the size of your data and how deep and complex the network is?
After having an overview of what Machine Learning is, its capabilities, limitations, and applications, let us now dive into learning Machine Learning.
Machine Learning – Skills
Machine Learning has a very large width and requires skills across several domains. The skills that you need to acquire for becoming an expert in Machine Learning are listed below −
- Statistics
- Probability Theories
- Calculus
- Optimization techniques
- Visualization
Necessity of Various Skills of Machine Learning
To give you a brief idea of what skills you need to acquire, let us discuss some examples −
Mathematical Notation
Most of the machine learning algorithms are heavily based on mathematics. The level of mathematics that you need to know is probably just a beginner level. What is important is that you should be able to read the notation that mathematicians use in their equations. For example – if you are able to read the notation and comprehend what it means, you are ready for learning machine learning. If not, you may need to brush up your mathematics knowledge.
fAN(net−θ)=⎧⎩⎨⎪⎪γnet−θ−γifnet−θ≥ϵif−ϵ<net−θ<ϵifnet−θ≤−ϵ
maxα[∑i=1mα−12∑i,j=1mlabel(i)⋅label(j)⋅ai⋅aj⟨x(i),x(j)⟩]
fAN(net−θ)=(eλ(net−θ)−e−λ(net−θ)eλ(net−θ)+e−λ(net−θ))
Probability Theory
Here is an example to test your current knowledge of probability theory: Classifying with conditional probabilities.
p(ci|x,y)=p(x,y|ci)p(ci)p(x,y)
With these definitions, we can define the Bayesian classification rule −
- If P(c1|x, y) > P(c2|x, y) , the class is c1 .
- If P(c1|x, y) < P(c2|x, y) , the class is c2 .
Optimization Problem
Here is an optimization function
maxα[∑i=1mα−12∑i,j=1mlabel(i)⋅label(j)⋅ai⋅aj⟨x(i),x(j)⟩]
Subject to the following constraints −
α≥0,and∑i−1mαi⋅label(i)=0
If you can read and understand the above, you are all set.
Visualization
In many cases, you will need to understand the various types of visualization plots to understand your data distribution and interpret the results of the algorithms output.
Besides the above theoretical aspects of machine learning, you need good programming skills to code those algorithms.
So what does it take to implement ML? Let us look into this in the next chapter.
Machine Learning – Implementing
To develop ML applications, you will have to decide on the platform, the IDE and the language for development. There are several choices available. Most of these would meet your requirements easily as all of them provide the implementation of AI algorithms discussed so far.
If you are developing the ML algorithm on your own, the following aspects need to be understood carefully −
The language of your choice â this essentially is your proficiency in one of the languages supported in ML development.
The IDE that you use â This would depend on your familiarity with the existing IDEs and your comfort level.
Development platform â There are several platforms available for development and deployment. Most of these are free-to-use. In some cases, you may have to incur a license fee beyond a certain amount of usage. Here is a brief list of choice of languages, IDEs and platforms for your ready reference.
Language Choice
Here is a list of languages that support ML development −
- Python
- R
- Matlab
- Octave
- Julia
- C++
- C
This list is not essentially comprehensive; however, it covers many popular languages used in machine learning development. Depending upon your comfort level, select a language for the development, develop your models and test.
IDEs
Here is a list of IDEs which support ML development −
- R Studio
- Pycharm
- iPython/Jupyter Notebook
- Julia
- Spyder
- Anaconda
- Rodeo
- Google Colab
The above list is not essentially comprehensive. Each one has its own merits and demerits. The reader is encouraged to try out these different IDEs before narrowing down to a single one.
Platforms
Here is a list of platforms on which ML applications can be deployed −
- IBM
- Microsoft Azure
- Google Cloud
- Amazon
- Mlflow
Once again this list is not exhaustive. The reader is encouraged to sign-up for the abovementioned services and try them out themselves.
Machine Learning – Conclusion
This tutorial has introduced you to Machine Learning. Now, you know that Machine Learning is a technique of training machines to perform the activities a human brain can do, albeit bit faster and better than an average human-being. Today we have seen that the machines can beat human champions in games such as Chess, AlphaGO, which are considered very complex. You have seen that machines can be trained to perform human activities in several areas and can aid humans in living better lives.
Machine Learning can be a Supervised or Unsupervised. If you have lesser amount of data and clearly labelled data for training, opt for Supervised Learning. Unsupervised Learning would generally give better performance and results for large data sets. If you have a huge data set easily available, go for deep learning techniques. You also have learned Reinforcement Learning and Deep Reinforcement Learning. You now know what Neural Networks are, their applications and limitations.
Finally, when it comes to the development of machine learning models of your own, you looked at the choices of various development languages, IDEs and Platforms. Next thing that you need to do is start learning and practicing each machine learning technique. The subject is vast, it means that there is width, but if you consider the depth, each topic can be learned in a few hours. Each topic is independent of each other. You need to take into consideration one topic at a time, learn it, practice it and implement the algorithm/s in it using a language choice of yours. This is the best way to start studying Machine Learning. Practicing one topic at a time, very soon you would acquire the width that is eventually required of a Machine Learning expert.
Good Luck!
-
Machine Learning – Types of Data
Data in machine learning are broadly categorized into two types − numerical (quantitative) and categorical (qualitative) data. The numerical data can be measured, counted or given a numerical value, for example, age, height, income, etc. The categorical data is non-numeric data that can be arranged in categories with or without meaningful order, for example, gender, blood group, etc.
Further, the numerical data can be categorized into discrete and continuous data. The categorical data can also be categorized into two types − nominal and ordinal. Let’s understand these types of data in machine learning in detail.
What is Data in Machine Learning?
Data in machine learning is a set of observations or measurement that are used to train, validate and test a machine learning model. Data is very crucial in machine learning because it is the foundation of creating accurate machine learning model.
What are Types of Data?
The data used in machine learning can be broadly categorized into two types −
Numerical (Quantitative) Data
The numerical (quantitative) data is data that can be measured, counted or given a numerical value. The examples of numerical data are age, height, income, number of students in class, number of books in a shelf, shoe size, etc.
The numerical data can be categorized into the folloiwng two types −
- Discrete Data
- Continuous Data
1. Discrete Data
The discrete data is numerical data that is countable, finite, and can only take certain values, usually whole numbers. Examples of discrete data are number of students in class, number of books in a shelf, shoe size, number of ducks in a pond, etc.
2. Continuous Data
The continuous data is numerical data that can take any value within a specified range including fractions and decimals. Examples of continuous data are age, height, weight, income, time, temperature, etc.
What is true zero?
True zero represents the absence of the quantity being measured. For example, height, weight, age, temperature in Kelvin are examples of data with true zero. As the height with 0 CM represents the absolute absence of height, 0K temperature represents no heat. But temperature in Celsius (or Fahrenheit) is an example of data with false zero.
We can categorize the numerical data into the following two types on basis of true zero −
- interval data − quantitative data with equal intervals between data points. Examples are temperature (Fahrenheit), temperature (Celsius), pH, SAT score (200-800), credit score (300-850), etc.
- ratio data − same as interval data but with true zero. Examples are weight in KG, number of students, income, speed, etc.
Categorical (Qualitative) Data
The categorical (qualitative) data can be categorized with or without a meaningful order. For example, gender, blood group, hair color, nationality, the school grades, level of education, range of income, ratings, etc.
The categorical data can be divided into the folloiwng two types −
- Nominal Data
- Ordinal Data
1. Nominal Data
The nominal data is categorical data that can not be arranged in an order or rank. The examples of nominal data are gender, blood group, hair color, nationality, etc.
2. Ordinal Data
The ordinal data is categorical data can be ordered or ranked with a specific attribute. The examples of ordinal data are the school grades, level of education, range of income, ratings, etc.
The Four Levels of Data Measurement
We can categorized data into four level − nominal, ordinal, interval, and ratio. These levels of measurement are divided on basis of the following four features −
- Categories − data can be categorized but not in an order.
- Rank Order − data can be categorized with some meaningful order.
- Equal Difference − The difference between subsequent data remains same.
- True Zero − it represents the absence of quantity being measured.
The following table highlights how the four level of measurement are associated with the above discussed four features.
Nominal Ordinal Interval Ratio Categories Yes Yes Yes Yes Rank Order Yes Yes Yes Equal Difference Yes Yes True Zero Yes The nominal data is categorical data with no meaningful order whereas ordinal data is a categorical data with meaningful order. The concept of true zero plays role to differentiate interval and ratio data. Ratio data is same as interval data but it includes true zero.
-
Monetizing Machine Learning
Monetizing machine learning refers to transforming machine learning projects into profitable web applications. Monetizing an ML project involves many steps including problem understanding, ML model development, web application development, model integration to web application, serverless cloud deployment of the final web app and finally monetizing the application.
The idea behind monetizing machine learning project is simple. What we will do? We will build a simple fast SaaS application for project and monetize it.
Creating a Software as a Service (SaaS) is a good choice for its many benefits such as reduced costs, scalability, ease of management, etc.
To monetize, we can consider subscription based pricing, premium features, API access, advertising, custom service, etc.
Let’s understand how to transform a machine learning project into a web application and monetize it.
Understanding the Problems
Take a real-world problem and do research on whether we can solve the problem using machine learning. If yes, find out if it is feasible to implement the solution using all your resources.
Who will benefit from the ML solution − the final end users? Who is the end user of the final machine learning application? Understanding the users is very important when you are analyzing a real-world problem.
The problem falls under what type of task in the machine learning context. What types of models can be used to solve the problem? Whether the problem can be solved using regression, classification, or clustering models. A proper understanding of the problem will help you to find the answers of these questions.
What would be the business model? Whether web application of mobile application, API sale or combination of two or more?
What type of data we have? Structured or unstructured. Analyze the data properly before going to solve the problem. It will help to decide what type of machine learning approach you should follow.
What computational resources you have? How to develop ML models? − on premise or cloud-based.
Understand the real world problem properly that you want to solve.
Defining the Solution
What will be the final solution of the problem?
Define the solution − how you will present the solution to the end user whether you will develop a web application, mobile app, API or a combination.
What is the business model?
Define your business model. What type of product for machine leaning model you want to create? One of the best solution is to create a software as a service (SaaS). You can consider for PaaS, AIaaS, Mobile Applications, API Service, and Selling ML APIs, etc.
Building a web application using serverless technology is a good choice to showcase your machine leaning application or solution. It is also easy to monetize your solution later on.
When you decide how you bring the solution to world, the next step is defining the core features of your machine learning solution. User interaction with the application, navigation, login, security, data privacy, etc., should be defined before diving into building the machine learning model.
Developing Machine Learning Model
The next step is to start developing your machine learning model. But before actually starting, you need to understand the machine learning models in detail. Without having a good knowledge of ML models you can’t be able to decide which model to select for your problem.
Understand Machine Learning Models
It is very important to understand different types of machine learning models and how to choose the right one for your project. Understanding the ML models will help select an appropriate model for your machine learning application.
Understanding that the underlined solution will fall under a particular machine learning task will help you decide on the proper model. Suppose your solution falls under the classification, then you have many choices of machine learning model. You can apply Naïve base, logistic regression, k-nearest neighbor, decision trees, and many more. So having a proper understanding of models is required before going to make your hands dirty with data and model training.
Types of ML Models
You should have a good understanding of the following types of machine learning models −
- Supervised − regression, classification,
- Unsupervised − clustering, dimensionality reduction
- Reinforcement − game theory, multi agent systems
- Neural Networks − recognition (image, speech), NLP
Select the right model
The most important step in building a machine learning model is to select the right one that solves your business problem. While selecting the right ML model, you should consider different factors such as −
- Data characteristics − consider the nature of data (structured, unstructured, time series data) to select a suitable model.
- Problem type − determine whether your problem is regression, classification or other task.
- Model complexity − determine the optimal model complexity to avoid the overfitting or under fitting.
- Computational resources − consider the computational resources to choose a complex or simple model.
- Desired outcome − consider it to perform the model evaluation.
Train Machine Learning Model
After selecting the right model for your machine learning problem, the next is to start building the actual machine learning model. There are different ways to build an ML model. The easiest way is to use a pre-trained model and custom train on your own datasets.
Pre-trained models − Pre-trained models are machine learning models that are trained with huge datasets. If your data is similar to the datasets on which the pre-trained models are trained, you can select them for your solution. In such cases, you need only to build a web or mobile application and deploy it on the cloud for worldwide users.
Fine-Tuning Pre-Trained Model − You can consider fine-tuning a pre-trained model on your custom datasets. You can fine-tune any publicly available model using machine learning libraries/ frameworks such as TensorFlow/ Keras, PyTorch, etc. You can also consider some online platforms such as AWS Sagemaker, Vertex AI, IBM Watson Studio, Azure Machine Learning, etc. for fine-tuning purposes.
Build from Scratch − You can consider building a machine learning model from scratch if you have all the required resources. It may take more time compared to the above two ways but may cost a little less.
Amazon SageMaker is a cloud-based machine-learning platform to create, train, evaluate, and deploy etc. machine-learning models on the cloud.
Evaluate Model
You have trained your ML model on your custom dataset. Now you have to evaluate the model on some new data to check whether the model is performing as per our desired outcomes or not.
For evaluating your machine learning model, you can calculate the metrics such as accuracy, precision, recall, f1 score, confusion matrix, etc. Based on these metrics, you can decide on a further course of action − finalizing the current model or going back with training again.
You can consider ensemble methods, combining multiple models (bagging and boosting) to improve model performance and reduce overfitting.
Deploy Demo Model online
Before building a full-fledged web application and deploying it on a cloud server, it is advised to deploy your machine learning model online. There are many free hosting providers where you can deploy your machine learning model and get feedback from the real time users. You can consider the following providers for this purpose −
- Hugging Face Space
- Streamlit Cloud
- Heroku
Creating Machine Learning Web Applications
As of now, you have developed your ML model and deployed the demo model online. Your model is working perfectly. Now you are ready to build a full-fledged machine learning web or mobile application.
You can consider the following technology stack to build web applications −
- Python frameworks – Flask, Django, FastAPI, etc.
- Web development (frontend) concepts − HTML, CSS, JavaScript
- Integrating machine learning models − how to integrate using APIs or libraries − Rest API
Deploying on the Serverless Cloud
Deploying your ML application on a serverless cloud will open doors to monetize your application. It will reach a worldwide audience. Choosing a cloud platform is a good idea to host your app. Going serverless can benefit you with reduced costs, scalability, ease of management, etc.
The following is a list of some well-known serverless cloud service providers best for your machine learning web applications −
- Google Cloud Platform − Google Cloud Functions
- Amazon Web Services − AWS Lambda, AWS Fargate, AWS Amplify Hosting
- Microsoft Azure − Microsoft Azure Functions
- Heroku
- Python Anywhere
- Cloudflare Workers
- Vercel Functions
You can use services like EC2 for computing power and S3 for storage.
Monetizing Your Machine Learning Applications
Now, your machine learning application is live on the cloud. You can promote, and market to your users. You can give them special offers to use your application.
Your machine learning application can reach to any corner of the world. When you get enough user, you can think about monetizing your application. There are different strategies to monetize ML web application including subscription model, pay-per-use pricing, advertising, premium features, etc.
- Subscription Model − Subscription-based pricing tiers (e.g., basic, premium, enterprise).
- Freemium Model − Offer a free version with limited features, and charge for advanced features.
- API Access − Charge businesses to access your AI tools via an API.
- Custom Solutions − Offer bespoke content generation services for larger clients.
- Advertising − you can also consider putting advertisement on your application but keep it in mind that advertisements will distort your application’s premium look.
Marketing and Sales
Marketing and sales are important to grow any business. Continuous marketing is required for a better sale of the product.
You can sell your Machine Learning application APIs on different online API marketplaces.
You can consider the following API Marketplaces −
- RapidAPI
- APILayer
- AWS Marketplace
- Infosys API Marketplace
- IBM API Connect
Monetizing machine learning has now become easy but more competitive. Monetizing the ML application needs a detailed market analysis before starting the building application. Each step of the machine learning software development needs deep research. Building a minimum viable product (MVP) and testing it before building a full-fledged web application is advisable.
-
Machine Learning – Data Leakage
Data leakage is a common problem in machine learning that occurs when information from outside the training dataset is used to create or evaluate a model. This can lead to overfitting, where the model is too closely tailored to the training data and performs poorly on new data.
There are two main types of data leakage: Target Leakage and Train-test Contamination
Target Leakage
Target leakage occurs when features that are not available during prediction are used to create the model. For example, if we are predicting whether a customer will churn, and we include the customer’s cancellation date as a feature, then the model will have access to information that would not be available in practice. This can lead to unrealistically high accuracy during training and poor performance on new data.
Train-test Contamination
Train-test contamination occurs when information from the test set is inadvertently used in the training process. For example, if we normalize the data based on the mean and standard deviation of the entire dataset instead of just the training set, then the model will have access to information that would not be available in practice. This can lead to overly optimistic estimates of model performance.
How to Prevent Data Leakage?
To prevent data leakage, it is important to carefully preprocess the data and ensure that no information from the test set is used in the training process. Some strategies for preventing data leakage include −
- Splitting the data into separate training and test sets before doing any preprocessing or feature engineering.
- Only using features that would be available at the time of prediction.
- Using cross-validation to evaluate model performance instead of a single train-test split.
- Ensuring that all preprocessing steps (such as normalization or scaling) are applied to the training set only and then using the same transformations on the test set.
- Being aware of any potential sources of leakage, such as date or time-based features, and handling them appropriately.
Implementation in Python
Here is an example in which we will be using Sklearn breast cancer dataset and ensure that no information from the test set is leaked into the model during training −
Example
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC # Load the breast cancer dataset data = load_breast_cancer()# Separate features and labels X, y = data.data, data.target # Split the data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Define the pipeline pipeline = Pipeline([('scaler', StandardScaler()),('svm', SVC())])# Fit the pipeline on the train set pipeline.fit(X_train, y_train)# Make predictions on the test set y_pred = pipeline.predict(X_test)# Evaluate the model performance accuracy = accuracy_score(y_test, y_pred)print("Accuracy:", accuracy)
Output
When you execute this code, it will produce the following output −
Accuracy: 0.9824561403508771
-
Machine Learning – MLOps
MLOps (Machine Learning Operations) is a set of practices and tools that combine software engineering, data science, and operations to enable the automated deployment, monitoring, and management of machine learning models in production environments.
MLOps addresses the challenges of managing and scaling machine learning models in production, which include version control, reproducibility, model deployment, monitoring, and maintenance. It aims to streamline the entire machine learning lifecycle, from data preparation and model training to deployment and maintenance.
MLOps Best Practices
MLOps involves a number of key practices and tools, including −
- Version control − This involves tracking changes to code, data, and models using tools like Git to ensure reproducibility and maintain a history of all changes.
- Continuous integration and delivery (CI/CD) − This involves automating the process of building, testing, and deploying machine learning models using tools like Jenkins, Travis CI, or CircleCI.
- Containerization − This involves packaging machine learning models and dependencies into containers using tools like Docker or Kubernetes, which enables easy deployment and scaling of models in production environments.
- Model serving − This involves setting up a server to host machine learning models and serving predictions on incoming data.
- Monitoring and logging − This involves tracking the performance of machine learning models in production environments using tools like Prometheus or Grafana, and logging errors and alerts to enable proactive maintenance.
- Automated testing − This involves automating the testing of machine learning models to ensure they are accurate and robust.
Python Libraries for MLOps
Python has a number of libraries and tools that can be used for MLOps, including −
- Scikit-learn − A popular machine learning library that provides tools for data preprocessing, model selection, and evaluation.
- TensorFlow − A widely used open-source platform for building and deploying machine learning models.
- Keras − A high-level neural networks API that can run on top of TensorFlow.
- PyTorch − A deep learning framework that provides tools for building and deploying neural networks.
- MLflow − An open-source platform for managing the machine learning lifecycle that provides tools for tracking experiments, packaging code and models, and deploying models in production.
- Kubeflow − A machine learning toolkit for Kubernetes that provides tools for managing and scaling machine learning workflows.
-
Machine Learning – Entropy
Entropy is a concept that originates from thermodynamics and was later applied in various fields, including information theory, statistics, and machine learning. In machine learning, entropy is used as a measure of the impurity or randomness of a set of data. Specifically, entropy is used in decision tree algorithms to decide how to split the data to create a more homogeneous subset. In this article, we will discuss entropy in machine learning, its properties, and its implementation in Python.
Entropy is defined as a measure of disorder or randomness in a system. In the context of decision trees, entropy is used as a measure of the impurity of a node. A node is considered pure if all the examples in it belong to the same class. In contrast, a node is impure if it contains examples from multiple classes.
To calculate entropy, we need to first define the probability of each class in the data set. Let p(i) be the probability of an example belonging to class i. If we have k classes, then the total entropy of the system, denoted by H(S), is calculated as follows −
H(S)=−sum(p(i)∗log2(p(i)))
where the sum is taken over all k classes. This equation is called the Shannon entropy.
For example, suppose we have a dataset with 100 examples, of which 60 belong to class A and 40 belong to class B. Then the probability of class A is 0.6 and the probability of class B is 0.4. The entropy of the dataset is then −
H(S)=−(0.6×log2(0.6)+0.4×log2(0.4))=0.971
If all the examples in the dataset belong to the same class, then the entropy is 0, indicating a pure node. On the other hand, if the examples are evenly distributed across all classes, then the entropy is high, indicating an impure node.
In decision tree algorithms, entropy is used to determine the best split at each node. The goal is to create a split that results in the most homogeneous subsets. This is done by calculating the entropy of each possible split and selecting the split that results in the lowest total entropy.
For example, suppose we have a dataset with two features, X1 and X2, and the goal is to predict the class label, Y. We start by calculating the entropy of the entire dataset, H(S). Next, we calculate the entropy of each possible split based on each feature. For example, we could split the data based on the value of X1 or the value of X2. The entropy of each split is calculated as follows −
H(X1)=p1×H(S1)+p2×H(S2)H(X2)=p3×H(S3)+p4×H(S4)
where p1, p2, p3, and p4 are the probabilities of each subset; and H(S1), H(S2), H(S3), and H(S4) are the entropies of each subset.
We then select the split that results in the lowest total entropy, which is given by −
Hsplit=H(X1)ifH(X1)≤H(X2);elseH(X2)
This split is then used to create the child nodes of the decision tree, and the process is repeated recursively until all nodes are pure or a stopping criterion is met.
Example
Let’s take an example to understand how it can be implemented in Python. Here we will use the “iris” dataset −
from sklearn.datasets import load_iris import numpy as np # Load iris dataset iris = load_iris()# Extract features and target X = iris.data y = iris.target # Define a function to calculate entropydefentropy(y): n =len(y) _, counts = np.unique(y, return_counts=True) probs = counts / n return-np.sum(probs * np.log2(probs))# Calculate the entropy of the target variable target_entropy = entropy(y)print(f"Target entropy: {target_entropy:.3f}")
The above code loads the iris dataset, extracts the features and target, and defines a function to calculate entropy. The entropy() function takes a vector of target values and returns the entropy of the set.
The function first calculates the number of examples in the set and the count of each class. It then calculates the proportion of each class and uses these to calculate the entropy of the set using the entropy formula. Finally, the code calculates the entropy of the target variable in the iris dataset and prints it to the console.
Output
When you execute this code, it will produce the following output −
Target entropy: 1.585
-
Machine Learning – P-value
In machine learning, we use P-value to test the null hypothesis that there is no significant relationship between two variables. For example, if we have a dataset of house prices and we want to determine whether there is a significant relationship between the size of the house and its price, we can use P-value to test this hypothesis.
To understand the concept of P-value in machine learning, we need to first understand the concept of null hypothesis and alternative hypothesis. The null hypothesis is the hypothesis that there is no significant relationship between the two variables, while the alternative hypothesis is the opposite of the null hypothesis, which states that there is a significant relationship between the two variables.
Once we have defined our null hypothesis and alternative hypothesis, we can use P-value to test the significance of our hypothesis. The P-value is the probability of obtaining the observed result or a more extreme result, assuming that the null hypothesis is true.
If the P-value is less than the significance level (usually set at 0.05), then we reject the null hypothesis and accept the alternative hypothesis. This means that there is a significant relationship between the two variables. On the other hand, if the P-value is greater than the significance level, then we fail to reject the null hypothesis and conclude that there is no significant relationship between the two variables.
Implementation of P-value in Python
Python provides several libraries for statistical analysis and hypothesis testing. One of the most popular libraries for statistical analysis is the scipy library. The scipy library provides a function called ttest_ind() that can be used to calculate the P-value for two independent samples.
To demonstrate the implementation of p-value in Machine Learning, we will use the breast cancer dataset provided by scikit-learn. The goal of this dataset is to predict whether a breast tumor is malignant or benign based on various features such as the tumor’s radius, texture, perimeter, area, smoothness, compactness, concavity, and symmetry.
First, we will load the dataset and split it into training and testing sets −
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split data = load_breast_cancer() X = data.data y = data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Next, we will use the SelectKBest class from scikit-learn to select the top k features based on their p-values. Here, we will select the top 5 features −
from sklearn.feature_selection import SelectKBest, f_classif k =5 selector = SelectKBest(score_func=f_classif, k=k) X_train_new = selector.fit_transform(X_train, y_train) X_test_new = selector.transform(X_test)
The SelectKBest class takes a score function as input to calculate the p-values for each feature. We use the f_classif function, which is the ANOVA F-value between each feature and the target variable. The k parameter specifies the number of top features to select.
After fitting the selector on the training data, we transform the data to keep only the top k features using the fit_transform() method. We also transform the testing data to keep only the selected features using the transform() method.
We can now train a model on the selected features and evaluate its performance −
from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score model = LogisticRegression() model.fit(X_train_new, y_train) y_pred = model.predict(X_test_new) accuracy = accuracy_score(y_test, y_pred)print(f"Accuracy: {accuracy:.2f}")
In this example, we trained a logistic regression model on the top 5 selected features and evaluated its performance using accuracy. However, the p-value can also be used for hypothesis testing to determine whether a feature is statistically significant or not.
For example, to test the hypothesis that the mean radius feature is significant, we can use the ttest_ind() function from the scipy.stats module −
from scipy.stats import ttest_ind malignant = X[y ==0,0] benign = X[y ==1,0] t, p_value = ttest_ind(malignant, benign)print(f"P-value: {p_value:.2f}")
The ttest_ind() function takes two arrays as input and returns the t-statistic and the two-tailed p-value.
Output
We will get the following output from the above implementation −
Accuracy: 0.97 P-value: 0.00
In this example, we calculated the p-value for the mean radius feature between the malignant and benign classes.