Support Vector Machine (SVM) using Log Barrier Method

Introduction: Support Vector Machine

SVM is a popular classification algorithm utilized in applications of artificial intelligence to separate dissimilar data. Given a two class training dataset, SVM generates an optimal separating hyperplane (by maximizing the margin) to separate two class of data using a constrained quadratic convex optimization formulation.

Preliminaries

In this project, we formulate SVM classifier as an optimization problem. We will cover some preliminaries on optimization before describing details of SVM.

Convex Optimization

Convex optimization deals with the problem of minimizing (or maximizing) a convex function over a convex set.

\[ \min_{x\in D}f(x) \]

Here \(f(x)\) and \(D\) are convex.

The optimization problem for SVM results in a convex program which can be solved efficiently. Moreover, we will consider special class of cost functions for SVM to obtain a convex quadratic program.

Quadratic Optimization

Quadratic Optimization, or Quadratic Programming (QP), is a type of optimization problem where the objective function is quadratic, and the constraints are linear.

\[\min_{Ax\leq b}x^TPx\]

Here \(P\succeq 0\) for a convex quadratic program.

Indicator Function

Indicator functions are used to make inequality constraint implicit in the objective function.

\[ I_{-}(u)=\begin{cases} 0; u\leq 0\\ \infty; u>0 \end{cases} \]

Indicator function are used to convert a constrained optimization problem to an unconstrained optimization problem. The latter is easier to solve.

Approximate Indicator Function (Log Barrier Function)

For numerical computations, we need to approximate the indicator function. One approach is using the log barrier function

\[ \begin{array}{ll} \hat{I}_{-}(u)=-(1/t)log(-u)\\ \end{array} \]

Approximate penalty function for increasing value of t

Optimization: SVM

Given a set of linearly separable data points and labels \((x_i, y_i)\), finding the maximum margin separating hyperplane is equivalent to the following constrained optimization problem

\[ \begin{array}{l} \displaystyle\min_{w,b}\lVert w \rVert^2 \\ s.t.\ y_i(\langle w,x_i\rangle+b)\geq 1,\ \forall i \end{array} \]

Here \(\langle w,x\rangle+b=0\) is the equation of hyperplane (decision boundary).

Optimal hyperplane and margin hyperplane obtained using SVM optimization problem

Optimization: SVM using Log Barrier Function

To make the constraint in the earlier optimization problem implicit in the objective function, we can make use of the log barrier function defined in the preliminaries.

\[ \begin{equation} \displaystyle\min_{w,b}\left[\lVert w \rVert^2-(1/t)\sum_ilog(-1+y_i(\langle w,x_i\rangle+b))\right] \end{equation} \]

Here \(t\in \mathbb{R}_+\)

Optimization Problem to Find Feasible Start Point

Presence of logarithm in objective function of SVM using log barrier function restricts start of optimization from any random initial point. Therefore, we need an additional mechanism to find a feasible starting point

\[ \begin{array}{l} \displaystyle\min_{w, b, s} s \\ s.t.\; (1-y_i(\langle w,x_i\rangle+b))\leq s;\; \forall i \end{array} \]

We can put a lower bound on \(s\) to prevent unnecessary computation.

Central Path Method

In the SVM using log barrier method, if we use very large value of t to run our optimization, we get poor results due to large value of Hessian of objective function at the boundary of solution space. Hence, optimization is run multiple times with increasing value of t . Solution from previous iteration is used as starting point for current iteration. This technique is known as central path method.

Central Path

Let \(x^*(t),\ t>0\) be the solution of

\[ \begin{array}{l} \min t(f_0(x))+\phi(x)\\ s.t.\ Ax=b \end{array} \]

Then the central path associated with it is defined as the set of points \(x^*(t),\ t>0\) which we call the set central points.

Optimal hyperplane for linearly separable data obtained from SVM optimization problem after incorporating the concept of log-barrier method and central path (t is the parameter of penalty function. Larger the value of t, better the result obtained using central path.) — Optimal hyperplane for linearly separable data obtained from SVM optimization problem after incorporating the concept of log-barrier method and central path ( t is the parameter of penalty function. Larger the value of t, better the result obtained using central path.)

Soft Margin SVM

When a dataset is not linearly separable due to insufficient features, noise and spurious data, we can not find a linearly separable hyperplane. In such cases, we have to use a soft margin SVM classifier.

Optimization: Soft Margin SVM

\[ \begin{array}{l} \displaystyle\min_{\substack{w,b,\\ \xi_i\in\mathbb{R}_+}}\left[\lVert w \rVert^2+C\sum_i^{n}\xi_i\right]\\ s.t.\;\left[1-y_i(\langle w,x_i\rangle+b)\right]\leq \xi_i;\;\forall i \end{array} \]

Here \(C\) is a regularization/penalty parameter.

svm_linearly_inseparable_exact_method — Soft margin SVM using exact method

Optimization: Soft Margin SVM using Log Barrier Method

\[ \displaystyle\min_{\substack{w,b,\\ \xi_i\in\mathbb{R}_+}}\left[\lVert w \rVert^2+C\sum_i^{n}\xi_i-\frac{1}{t}\sum_i\left(log(-1+y_i(\langle w,x_i\rangle+b)+\xi_i)+log(\xi_i)\right)\right] \]

Here \(C\) is a reqularization/penalty parameter.

Optimization problem for finding the feasible starting point remains the same. Only the minimum value of \(s\) turns out to be negative for linearly inseparable dataset. This can be handled by appropriately chosing the initial values of \(\xi_i\forall i\).

Optimal hyperplane for linearly inseparable data obtained from SVM optimization problem after incorporating the concept of log-barrier method and central path (t is the parameter of penalty function. Larger the value of t, better the result obtained using central path.) — Optimal hyperplane for linearly inseparable data obtained from SVM optimization problem after incorporating the concept of log-barrier method and central path ( t is the parameter of penalty function. Larger the value of t, better the result obtained using central path.)

Documentation

For more details about the project, one can go through the presentation.

Author

Anurag Gupta is an M.S. graduate in Electrical and Computer Engineering from Cornell University. He also holds an M.Tech degree in Systems and Control Engineering and a B.Tech degree in Electrical Engineering from the Indian Institute of Technology, Bombay.

Comment

Past Comments

No comments yet. Be the first!