Support Vector Machine: Mathematical intuition

Dr. Arun Kumar Pandey (Ph.D.)
5 min readMar 13, 2024

--

  • Support Vector Machine (SVM) is undoubtedly one of the most popular ML algorithms used by machine learning practitioners. It is a supervised machine learning algorithm that is robust to outliers and generalizes well in many cases. However, the intuitive idea behind SVM can be a bit tricky to understand for a beginner. The name in itself is quite intimidating, Support, Vector, and Machine.
  • It is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning.
  • In this algorithm, we try to find a hyperplane that best separates the two classes. It is to be noted that, it may seem that the SVM and Logistic regression are similar. Both the algorithms try to find the best hyperplane, but the main difference is logistic regression is a probabilistic approach whereas support vector machine is based on statistical approaches.
  • Now the question is which hyperplane does it select? There can be an infinite number of hyperplanes passing through a point and classifying the two classes perfectly. So, which one is the best? Well, SVM does this by finding the maximum margin between the hyperplanes that means maximum distances between the two classes.

Support-vector-machine explanation: 2D

Imagine a scatter plot where the green points represent one class and the orange ones represent another. To separate these two classes, an SVM seeks to find the best-fitting line or plane (hyperplane) that maximizes the margin between the two classes. Based on:

We can find the best line by computing the maximum margin from equidistant support vectors.

Image Credit: Arun Kumar Pandey

Let’s understand it in detail. We want to classify the new data point as either green or orange. To classify these points, we can have many decision boundaries, but the question is which is the best and how do we find it? The best hyperplane is that plane that has the maximum distance from both the classes, and this is the main aim of SVM. This is done by finding different hyperplanes that classify the labels in the best way, then it will choose the one which is farthest from the data points or the one which has a maximum margin.

Image Credit: Arun Kumar Pandey

We aim to determine if a point lies on the right (positive) or left (negative) side of the separating line (plane). This is done by computing the dot product of the vector onto the perpendicular line to the boundary separating the two sets of data points (green and orange). Here ‘c’ is the distance of the light blue separing line from the origin. Now we have three conditions that may arise:

Here the dot product of the two vectors (where w is a unit vector perpendicular to the decision boundary), gives the distance of vector X from the decision boundary and there may be infinite points on the boundary to measure the distance from. We simply take the perpendicular and use it as a reference and then take projections of all the other data points on this perpendicular vector and then compare the distance. In SVM we also have a concept of margin.

Therefore, we have

If the value of X⋅ w+b>0 then we can say it is a positive point otherwise it is a negative point. Now we need (w,b) such that the margin has a maximum distance. Let’s say this distance is ‘d’. To calculate d, we need the equation of the two lines L1, and L2. We want our plane to have an equal distance from both the classes which means L should pass through the center of L1 and L2 that’s why we take magnitude equal. For mathematical convenience, we consider the equations of the two lines X⋅ w+b = 1 and X⋅ w+b = — 1. Another reason, why choose 1, is if we multiply the equation of hyperplane with a factor greater than 1 then the parallel lines will shrink and if we multiply with a factor less than 1, they expand. We can now say that these lines will move as we do changes in (w,b) and this is how this gets optimized. But what is the optimization function? Let’s calculate it. We know that the aim of SVM is to maximize this margin which means distance (d). But there are few constraints for this distance (d) Let’s look at what these constraints are.

Classification problem with higher dimension data

The data set shown below in Image-(a) has no clear linear separation between the two classes. In machine learning parlance, we would say that these are not linearly separable. How can we get the support vector machine to work on such data?

Image Credit: Arun Kumar Pandey

Since we can’t separate it into two classes using a line, we need to transform it into a higher dimension by employing a kernel function to the data set. A higher dimension enables us to clearly separate the two groups with a plane. Here, we can draw some planes between the green dots and the orange dots — with the end goal of maximizing the margin. If we let R=the number of dimensions, the kernel function will convert a two-dimensional space (R2) to a three-dimensional space (R3). Once the data is separated into three dimensions, we can apply SVM and separate the two groups using a two-dimensional plane. For a higher dimension, we would have to use a higher dimensional curve.

For Various Kernels, please check: https://arunp77.github.io/support-vector.html

Reference

  1. https://arunp77.github.io/support-vector.html
  2. https://arunp77.github.io/machine-learning.html

--

--