1. 最小二乘支持向量机的推导
与支持向量机(Support Vector Machines, SVM)不同,Suykens 等提出的最小二乘支持向量机(Least Squared Support Vector Machines, LSSVM)在 SVM 之上使用做出了 3 项改进:
- 使用误差项代替了松弛变量
- 使用等式约束代替了不等式约束
- 使用二次损失函数 代替了 hinge 损失
得到 LSSVM 的目标函数为--式 (1) :
其中为投影到希尔伯特空间的样本;和 为分类器参数,分别是分类超平面斜率与偏移量;为误差项; 为正则项参数,控制对误差项的惩罚程度。
与 SVM 的解法相似,可使用拉格朗日乘子法来求解两个分类器参数. 引入拉格朗日乘子 得到其拉格朗日函数为 -- 式 (2):
将上式分别对、、求偏导并令其为0,可得:
与 SVM 要求解二次规划问题(QP)不同,LSSVM 可通过求解线性规划问题得到模型参数,且此凸规划满足 Slater 约束规范,最优解满足(Karush-Kuhn-Tucker,KKT),在 KKT条件中消去变量和,可得如下等式:
其中为核矩阵与样本标签矩阵的点乘积,为单位矩阵,为全1的向量,通过求解此线性等式可求得拉格朗日乘子以及偏移量,与SVM相同,其对未知样本的决策函数为:
2. Pre-computed Kernel 的 matlab 实现
由于只需要求解线性规划问题, 所以LSSVM的实现非常简单。所谓 Pre-computed Kernel 就是自定义的核函数,这与我们常用的线性核、RBF核等经典核函数不同,我们可以自己定义一个核函数生成核矩阵。
LSSVM 的 Pre-computed Kernel 实现非常简单,即使用自定义的核函数输入训练和预测函数中,所以将生成核矩阵的函数抽离出来:
createKernelMatrix.m
function outputKernel = createKernelMatrix(data1, data2, param)
% input:
% data1: train data matrix
% data2: train or test matrix
% param: a strcture contains all the paramters of kernel
% .type: kernel function type
% .gap : gap with of RBF kernel
% output:
% outputKernel: kernel matrix computed by data1 and data2
%
% kernel parameters extraction
type = param.kernelType;
gap = param.gap;
% kernel creation
switch type
case 'lin'
outputKernel = data1 * data2';
case 'rbf'
gap = 2 * gap * gap;
n1sq = sum(data1.^2,2);
n1 = size(data1,1);
n2sq = sum(data2.^2,2);
n2 = size(data2,1);
D = (ones(n2,1)*n1sq')' + ones(n1,1)*n2sq' -2*data1*data2';
outputKernel = exp(-D / gap);
end
end
这里只定义了两个经典核函数,想使用其他核函数自己编写就可以。
训练函数实现:
lssvmtrain.m
function model = lssvmtrain(kernel, label, param)
% input:
% kernel: input kernel matrix generated by train data, eg: createKernelMatrix(trainData, trainData, param)
% label : train label
% param : a strcture contains all the paramters of model
% output:
% model: train model contains all informations
% model parameters extraction
cost = param.cost;
% TrainKernelMatrix creation
nb_data = length(label);
TrainKernelMatrix = (label*label') .* kernel;
UnitMatrix = eye(nb_data,nb_data);
TrainKernelMatrix = TrainKernelMatrix + UnitMatrix ./ cost;
% constrct formula
rightPart = [0; ones(nb_data, 1)];
leftPart = [0, label'; label, TrainKernelMatrix];
% solve liner program question
results = linsolve(leftPart, rightPart);
bias = results(1); % b
coef = results(2 : end); % alpha
yalphas = label .* coef; % y .* alphas
% save to model
model.coef = coef;
model.bias = bias;
model.oriKernel = kernel;
model.trainLabel = label;
model.yalphas = yalphas;
end
预测函数实现:
lssvmpredict.m
function [predictLabel, accuracy, decisionValue] = lssvmpredict(model, kernel, testLabel)
% input:
% model: trained model
% kernel: kernel matrix generated by trainData and testData, eg: createKernelMatrix(trainData, testData, param)
% testLabel: label of test data
% output:
% predictLabel: vector
% accuracy: number
% decisionValue: decision values
nb_test = length(testLabel);
testKernel = kernel;
yalphasMat = model.yalphas * ones(1, nb_test);
decisionValue = sum(testKernel .* yalphasMat) + model.bias;
predictLabel = sign(decisionValue);
accuracy = sum(predictLabel' == testLabel) / nb_test;
end
3. 经典 ionosphere
数据集中与 SVM 的精度对比
下面在经典 ionosphere
数据集中使用刚刚编写的 LSSVM 与 SVM 的精度进行对比。
test.m
clear;clc;
% ======================================================== %
% ======== prepare trian and test data ======= %
% ======================================================== %
data = load('ionosphere.mat');
label = data.y;
data = data.x;
nbData = length(label);
trainRatio = 0.7;
nbTrain = fix(nbData * trainRatio);
% ===== select front trainRatio as train data ===== %
% trainIndice = 1: nbTrain;
% testIndice = nbTrain + 1 : nbData;
% ===== select train data randomly ===== %
randIndice = randperm(nbData);
trainIndice = randIndice(1: nbTrain);
testIndice = randIndice(nbTrain + 1 : nbData);
trainData = data(trainIndice, :);
trainLabel = label(trainIndice);
testData = data(testIndice, :);
testLabel = label(testIndice);
% ======================================================== %
% ======== compare progress ======= %
% ======================================================== %
% == LibSVM
model = libsvmtrain(trainLabel, trainData, '-t 2 -c 1 -g 1');
[libsvm_label, libsvm_acc, libsvm_decv] = libsvmpredict(testLabel, testData, model);
% == LSSVM
kernelParam.kernelType = 'rbf';
kernelParam.gap = 1;
modelParam.cost = 1;
trainK = createKernelMatrix(trainData, trainData, kernelParam);
testK = createKernelMatrix(trainData, testData, kernelParam);
lssvm_model = lssvmtrain(trainK, trainLabel, modelParam);
[lssvm_label, lssvm_acc, lsscm_decv] = lssvmpredict(lssvm_model, testK, testLabel);
fprintf('\n\n\nTest accuracy of SVM is: %f,\nTest accuracy of LSSVM is: %f \n', libsvm_acc(1)/100, lssvm_acc);
随机选取训练测试数据并尝试几组核函数和参数得到的结果:
C:1, RBF, gap:1
Test accuracy of SVM is: 0.924528,
Test accuracy of LSSVM is: 0.896226
Test accuracy of SVM is: 0.943396,
Test accuracy of LSSVM is: 0.952830
Test accuracy of SVM is: 0.971698,
Test accuracy of LSSVM is: 0.971698
Test accuracy of SVM is: 0.933962,
Test accuracy of LSSVM is: 0.952830
C:1, RBF, gap:100
Test accuracy of SVM is: 0.688679,
Test accuracy of LSSVM is: 0.688679
Test accuracy of SVM is: 0.613208,
Test accuracy of LSSVM is: 0.603774
C:1, linear
Test accuracy of SVM is: 0.877358,
Test accuracy of LSSVM is: 0.839623
Test accuracy of SVM is: 0.886792,
Test accuracy of LSSVM is: 0.877358
Test accuracy of SVM is: 0.877358,
Test accuracy of LSSVM is: 0.905660
C:100, linear
Test accuracy of SVM is: 0.915094,
Test accuracy of LSSVM is: 0.905660
Test accuracy of SVM is: 0.858491,
Test accuracy of LSSVM is: 0.820755
Test accuracy of SVM is: 0.905660,
Test accuracy of LSSVM is: 0.849057