1. 最小二乘支持向量机的推导

与支持向量机（Support Vector Machines, SVM）不同，Suykens 等提出的最小二乘支持向量机（Least Squared Support Vector Machines, LSSVM）在 SVM 之上使用做出了 3 项改进：

使用误差项 $e_{i}$ 代替了松弛变量 $\xi_{i}$
使用等式约束代替了不等式约束
使用二次损失函数 $(1-y_{i} f\left(\mathbf{x}_{i}\right) )^{2}$ 代替了 hinge 损失 $\left(1-y_{i} f\left(\mathbf{x}_{i}\right)\right)_{+}$

得到 LSSVM 的目标函数为--式 (1) :

\begin{array}{cc}{\min _{\boldsymbol{w}, b, \boldsymbol{e}}} & {\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}+\frac{C}{2} \sum_{i=1}^{n} e_{i}^{2}} \\ { s.t. } & {y_{i}=\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b+e_{i}, i=1, \cdots, n}\end{array}

其中 $\phi(\boldsymbol{x_{i}})$ 为投影到希尔伯特空间的样本； $\boldsymbol{w}$ 和 $b$ 为分类器参数，分别是分类超平面斜率与偏移量； $\boldsymbol{e_{i}}$ 为误差项； $C$ 为正则项参数，控制对误差项的惩罚程度。

与 SVM 的解法相似，可使用拉格朗日乘子法来求解两个分类器参数. 引入拉格朗日乘子 $\alpha_{i}$ 得到其拉格朗日函数为 -- 式 (2)：

L(\boldsymbol{w}, b, \boldsymbol{e} ,\boldsymbol{\alpha})=\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}+\frac{C}{2} \sum_{i=1}^{n} e_{i}^{2}-\sum_{i=1}^{n} \alpha_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b+e_{i}-y_{i}\right)

将上式分别对 $\boldsymbol{w}$ 、 $b$ 、 $e_{i}$ 求偏导并令其为0，可得：

\begin{array}{l}{\frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} \phi\left(\boldsymbol{x_{i}}\right)} \\ {\frac{\partial L}{\partial b}=0 \Rightarrow \sum_{i=1}^{n} \alpha_{i}=0} \\ {\frac{\partial L}{\partial e_{i}}=0 \Rightarrow \alpha_{i}=\lambda e_{i}, i=1,2, \ldots, n}\end{array}

与 SVM 要求解二次规划问题(QP)不同，LSSVM 可通过求解线性规划问题得到模型参数，且此凸规划满足 Slater 约束规范，最优解满足(Karush-Kuhn-Tucker，KKT)，在 KKT条件中消去变量 $\boldsymbol{w}$ 和 $e_{i}$ ，可得如下等式：

\left[\begin{array}{cc}{0} & {\boldsymbol{y'}} \\ {\boldsymbol{y}} & {\boldsymbol{K}+\frac{1}{C} \boldsymbol{I}}\end{array}\right]\left[\begin{array}{l}{b}\\ \boldsymbol{{\alpha}}\end{array}\right]=\left[\begin{array}{l}{0} \\ {\boldsymbol{1}}\end{array}\right]

其中 $\boldsymbol{K}$ 为核矩阵与样本标签矩阵的点乘积， $\boldsymbol{I}$ 为单位矩阵， $\boldsymbol{1}$ 为全1的向量，通过求解此线性等式可求得拉格朗日乘子 $\boldsymbol{{\alpha}}$ 以及偏移量 $b$ ，与SVM相同，其对未知样本 $\boldsymbol{x}$ 的决策函数为：

\operatorname{sgn}(f(\boldsymbol{x}))=\operatorname{sgn}\left(\sum_{i=1}^{n} \alpha_{i} k\left(\boldsymbol{x}_{i}, \boldsymbol{x}\right)+b\right)

2. Pre-computed Kernel 的 matlab 实现

由于只需要求解线性规划问题，所以LSSVM的实现非常简单。所谓 Pre-computed Kernel 就是自定义的核函数，这与我们常用的线性核、RBF核等经典核函数不同，我们可以自己定义一个核函数生成核矩阵。

LSSVM 的 Pre-computed Kernel 实现非常简单，即使用自定义的核函数输入训练和预测函数中，所以将生成核矩阵的函数抽离出来：

createKernelMatrix.m

function outputKernel = createKernelMatrix(data1, data2, param)
% input: 
% 		data1: train data matrix
% 		data2: train or test matrix
% 		param: a strcture contains all the paramters of kernel
%			   .type: kernel function type
%			   .gap : gap with of RBF kernel 
% output:
%		outputKernel: kernel matrix computed by data1 and data2
%

    % kernel parameters extraction
    type = param.kernelType;
    gap = param.gap;
    
    % kernel creation
    switch type
        case 'lin'
            outputKernel = data1 * data2';

        case 'rbf'
            gap = 2 * gap * gap;
            
            n1sq = sum(data1.^2,2);
            n1 = size(data1,1);
            n2sq = sum(data2.^2,2);
            n2 = size(data2,1);
            
            D = (ones(n2,1)*n1sq')' + ones(n1,1)*n2sq' -2*data1*data2';
            
            outputKernel = exp(-D / gap);
    end
end

这里只定义了两个经典核函数，想使用其他核函数自己编写就可以。

训练函数实现：

lssvmtrain.m

function model = lssvmtrain(kernel, label, param)
% input: 
% 		kernel: input kernel matrix generated by train data, eg: createKernelMatrix(trainData, trainData, param)
%   	label : train label
%   	param : a strcture contains all the paramters of model 
% output: 
%		model: train model contains all informations

    % model parameters extraction
    cost = param.cost;

    % TrainKernelMatrix creation
    nb_data = length(label);

    TrainKernelMatrix = (label*label') .* kernel;
    UnitMatrix = eye(nb_data,nb_data);
    TrainKernelMatrix = TrainKernelMatrix + UnitMatrix ./ cost;
    
    % constrct formula
    rightPart = [0; ones(nb_data, 1)];
    leftPart = [0, label'; label, TrainKernelMatrix];
    
    % solve liner program question
    results =  linsolve(leftPart, rightPart); 

    bias = results(1);  % b
    coef = results(2 : end);  % alpha
    yalphas = label .* coef;  % y .* alphas 

    % save to model
    model.coef = coef;
    model.bias = bias;
    model.oriKernel = kernel;
    model.trainLabel = label;
    model.yalphas = yalphas;
end

预测函数实现：

lssvmpredict.m

function [predictLabel, accuracy, decisionValue] = lssvmpredict(model, kernel, testLabel)
% input:
% 		model: trained model
% 		kernel: kernel matrix generated by trainData and testData, eg: createKernelMatrix(trainData, testData, param)
% 		testLabel: label of test data
% output:
%		predictLabel: vector
%		accuracy: number 
%		decisionValue: decision values 

nb_test = length(testLabel);

testKernel = kernel;
yalphasMat = model.yalphas * ones(1, nb_test);

decisionValue = sum(testKernel .* yalphasMat) + model.bias;
predictLabel = sign(decisionValue);

accuracy = sum(predictLabel' == testLabel) / nb_test;

end

3. 经典 `ionosphere` 数据集中与 SVM 的精度对比

下面在经典 ionosphere 数据集中使用刚刚编写的 LSSVM 与 SVM 的精度进行对比。

test.m

clear;clc;

% ======================================================== %
% ========       prepare trian and test data       ======= %
% ======================================================== %

data = load('ionosphere.mat');
label = data.y;
data = data.x;

nbData = length(label);
trainRatio = 0.7;
nbTrain = fix(nbData * trainRatio);

% ===== select front trainRatio as train data ===== %
% trainIndice = 1: nbTrain;
% testIndice  = nbTrain + 1 : nbData;

% ===== select train data randomly ===== %
randIndice = randperm(nbData);
trainIndice = randIndice(1: nbTrain);
testIndice  = randIndice(nbTrain + 1 : nbData);

trainData = data(trainIndice, :);
trainLabel = label(trainIndice);
testData = data(testIndice, :);
testLabel = label(testIndice);

% ======================================================== %
% ========            compare progress             ======= %
% ======================================================== %

% == LibSVM
model = libsvmtrain(trainLabel, trainData, '-t 2 -c 1 -g 1');
[libsvm_label, libsvm_acc, libsvm_decv] = libsvmpredict(testLabel, testData, model);


% == LSSVM
kernelParam.kernelType = 'rbf';
kernelParam.gap  = 1;
modelParam.cost = 1;

trainK = createKernelMatrix(trainData, trainData, kernelParam);
testK  = createKernelMatrix(trainData, testData, kernelParam);

lssvm_model = lssvmtrain(trainK, trainLabel, modelParam);
[lssvm_label, lssvm_acc, lsscm_decv] = lssvmpredict(lssvm_model, testK, testLabel);


fprintf('\n\n\nTest accuracy of SVM is:   %f,\nTest accuracy of LSSVM is: %f \n', libsvm_acc(1)/100, lssvm_acc);

随机选取训练测试数据并尝试几组核函数和参数得到的结果：

C:1, RBF, gap:1

Test accuracy of SVM is:   0.924528,
Test accuracy of LSSVM is: 0.896226 

Test accuracy of SVM is:   0.943396,
Test accuracy of LSSVM is: 0.952830 

Test accuracy of SVM is:   0.971698,
Test accuracy of LSSVM is: 0.971698 

Test accuracy of SVM is:   0.933962,
Test accuracy of LSSVM is: 0.952830

C:1, RBF, gap:100

Test accuracy of SVM is:   0.688679,
Test accuracy of LSSVM is: 0.688679 

Test accuracy of SVM is:   0.613208,
Test accuracy of LSSVM is: 0.603774

C:1, linear

Test accuracy of SVM is:   0.877358,
Test accuracy of LSSVM is: 0.839623 

Test accuracy of SVM is:   0.886792,
Test accuracy of LSSVM is: 0.877358 


Test accuracy of SVM is:   0.877358,
Test accuracy of LSSVM is: 0.905660

C:100, linear

Test accuracy of SVM is:   0.915094,
Test accuracy of LSSVM is: 0.905660 

Test accuracy of SVM is:   0.858491,
Test accuracy of LSSVM is: 0.820755 

Test accuracy of SVM is:   0.905660,
Test accuracy of LSSVM is: 0.849057

参考文献

SUYKENS, Johan A K . Least squares support vector machines[J]. International Journal of Circuit Theory & Applications, 2002, 27(6):605-615.

最小二乘支持向量机及其 Pre-computed Kernel 的 matlab 实现

1. 最小二乘支持向量机的推导

2. Pre-computed Kernel 的 matlab 实现

3. 经典 ionosphere 数据集中与 SVM 的精度对比

参考文献

3. 经典 `ionosphere` 数据集中与 SVM 的精度对比