例7.1:已知一个如图所示的训练数据集,其正例点是
x
1
=
(
3
,
3
)
T
,
x
2
=
(
4
,
3
)
T
x_1=(3,3)^T, x_2=(4,3)^T
x1=(3,3)T,x2=(4,3)T,负例点是
x
3
=
(
1
,
1
)
T
x_3=(1,1)^T
x3=(1,1)T,试求最大间隔分离超平面。
解:按照算法7.1,根据训练数据集构造最优化问题:
min
w
,
b
1
2
(
w
1
2
+
w
2
2
)
s
.
t
.
3
w
1
+
3
w
2
+
b
≥
1
4
w
1
+
3
w
2
+
b
≥
1
−
w
1
−
w
2
−
b
≥
1
\min_{w,b} \frac{1}{2}(w_1^2 + w_2^2) \\ \qquad \qquad s.t. \qquad3w_1+3w_2+b\geq1 \\ \qquad \qquad\qquad \qquad4w_1+3w_2+b\geq1 \\ \qquad \qquad\qquad \qquad-w_1-w_2-b\geq1
w,bmin21(w12+w22)s.t.3w1+3w2+b≥14w1+3w2+b≥1−w1−w2−b≥1
使用拉格朗日乘数法,构造拉格朗日函数为:
L
(
w
1
,
w
2
,
b
,
α
1
,
α
2
,
α
3
)
=
1
2
(
w
1
2
+
w
2
2
)
+
α
1
(
1
−
3
w
1
−
3
w
2
−
b
)
+
α
2
(
1
−
4
w
1
−
3
w
2
−
b
)
+
α
3
(
1
+
w
1
+
w
2
+
b
)
L(w_1,w_2,b,\alpha_1,\alpha_2,\alpha_3) = \frac{1}{2}(w_1^2 + w_2^2)+\alpha_1(1-3w_1-3w_2-b)+\alpha_2(1-4w_1-3w_2-b)+\alpha_3(1+w_1+w_2+b)
L(w1,w2,b,α1,α2,α3)=21(w12+w22)+α1(1−3w1−3w2−b)+α2(1−4w1−3w2−b)+α3(1+w1+w2+b)
则拉格朗日原始问题为:
min
w
,
b
max
α
1
,
α
2
,
α
3
L
(
w
1
,
w
2
,
b
,
α
1
,
α
2
,
α
3
)
\min_{w,b}\max_{\alpha_1,\alpha_2,\alpha_3}L(w_1,w_2,b,\alpha_1,\alpha_2,\alpha_3)
w,bminα1,α2,α3maxL(w1,w2,b,α1,α2,α3)
转换为对偶问题为:
max
α
1
,
α
2
,
α
3
min
w
,
b
L
(
w
1
,
w
2
,
b
,
α
1
,
α
2
,
α
3
)
\max_{\alpha_1,\alpha_2,\alpha_3}\min_{w,b}L(w_1,w_2,b,\alpha_1,\alpha_2,\alpha_3)
α1,α2,α3maxw,bminL(w1,w2,b,α1,α2,α3)
接下来求解对偶问题:
首先求解
min
w
,
b
L
(
w
1
,
w
2
,
b
,
α
1
,
α
2
,
α
3
)
\min_{w,b}L(w_1,w_2,b,\alpha_1,\alpha_2,\alpha_3)
w,bminL(w1,w2,b,α1,α2,α3)
求极值的方法是求导令其等于0:
∂
L
∂
w
1
=
w
1
−
3
α
1
−
4
α
2
+
α
3
\frac{\partial L}{\partial w_1} =w_1-3\alpha_1-4\alpha_2+\alpha_3
∂w1∂L=w1−3α1−4α2+α3
∂
L
∂
w
2
=
w
2
−
3
α
1
−
3
α
2
+
α
3
\frac{\partial L}{\partial w_2} =w_2-3\alpha_1-3\alpha_2+\alpha_3
∂w2∂L=w2−3α1−3α2+α3
∂
L
∂
b
=
−
α
1
−
α
2
+
α
3
              
\frac{\partial L}{\partial b} =-\alpha_1-\alpha_2+\alpha_3\;\;\;\;\;\;\;
∂b∂L=−α1−α2+α3
令其等于0,得到方程组(1):
w
1
−
3
α
1
−
4
α
2
+
α
3
=
0
①
w
2
−
3
α
1
−
3
α
2
+
α
3
=
0
②
α
1
+
α
2
+
−
α
3
=
0
③
w_1-3\alpha_1-4\alpha_2+\alpha_3=0 \quad ①\\ w_2-3\alpha_1-3\alpha_2+\alpha_3 =0 \quad ② \\ \alpha_1+\alpha_2+-\alpha_3=0 \quad ③
w1−3α1−4α2+α3=0①w2−3α1−3α2+α3=0②α1+α2+−α3=0③
解方程得到:
w
2
=
2
α
3
w
1
=
−
α
1
+
3
α
3
w
1
=
α
2
+
2
α
3
w_2=2\alpha_3 \\ w_1 =-\alpha_1+3\alpha_3 \\ w_1 =\alpha_2+2\alpha_3
w2=2α3w1=−α1+3α3w1=α2+2α3
ps.上述结果不一定都有用,先写下来。
将
w
1
=
α
2
+
2
α
3
,
w
2
=
2
α
3
w_1=\alpha_2+2\alpha_3,w_2=2\alpha_3
w1=α2+2α3,w2=2α3代入拉格朗日函数得:
L
(
w
1
,
w
2
,
b
,
α
1
,
α
2
,
α
3
)
=
1
2
(
w
1
2
+
w
2
2
)
+
α
1
(
1
−
3
w
1
−
3
w
2
−
b
)
L(w_1,w_2,b,\alpha_1,\alpha_2,\alpha_3) = \frac{1}{2}(w_1^2 + w_2^2)+\alpha_1(1-3w_1-3w_2-b)
L(w1,w2,b,α1,α2,α3)=21(w12+w22)+α1(1−3w1−3w2−b)
+
α
2
(
1
−
4
w
1
−
3
w
2
−
b
)
+
α
3
(
1
+
w
1
+
w
2
+
b
)
+\alpha_2(1-4w_1-3w_2-b)+\alpha_3(1+w_1+w_2+b)
+α2(1−4w1−3w2−b)+α3(1+w1+w2+b)
=
1
2
[
(
α
2
+
2
α
3
)
2
+
(
2
α
3
)
2
]
+
α
1
[
1
−
3
∗
(
α
2
+
2
α
3
)
−
3
∗
2
α
3
−
b
]
+
= \frac{1}{2}[(\alpha_2+2\alpha_3)^2 + (2\alpha_3)^2] + \alpha_1[1-3*(\alpha_2+2\alpha_3) - 3*2\alpha_3-b]+
=21[(α2+2α3)2+(2α3)2]+α1[1−3∗(α2+2α3)−3∗2α3−b]+
α
2
[
1
−
4
∗
(
α
2
+
2
α
3
)
−
3
∗
2
α
3
−
b
]
+
α
3
[
1
+
α
2
+
2
α
3
+
2
α
3
+
b
]
\alpha_2[1-4*(\alpha_2+2\alpha_3)-3*2\alpha_3-b]+\alpha_3[1+\alpha_2+2\alpha_3+2\alpha_3+b]
α2[1−4∗(α2+2α3)−3∗2α3−b]+α3[1+α2+2α3+2α3+b]
=
1
2
[
α
2
2
+
4
α
2
α
3
+
4
α
3
2
+
4
α
3
2
]
+
α
1
[
1
−
3
α
2
−
6
α
3
−
6
α
3
−
b
]
+
= \frac{1}{2}[\alpha_2^2+4\alpha_2\alpha_3+4\alpha_3^2 + 4\alpha_3^2]+\alpha_1[1-3\alpha_2-6\alpha_3-6\alpha_3-b]+
=21[α22+4α2α3+4α32+4α32]+α1[1−3α2−6α3−6α3−b]+
α
2
[
1
−
4
α
2
−
8
α
3
−
6
α
3
−
b
]
+
α
3
[
1
+
α
2
+
2
α
3
+
2
α
3
+
b
]
\alpha_2[1-4\alpha_2-8\alpha_3-6\alpha_3-b]+\alpha_3[1+\alpha_2+2\alpha_3+2\alpha_3+b]
α2[1−4α2−8α3−6α3−b]+α3[1+α2+2α3+2α3+b]
=
1
2
[
α
2
2
+
4
α
2
α
3
+
8
α
3
2
]
+
α
1
[
1
−
3
α
2
−
12
α
3
−
b
]
+
=\frac{1}{2}[\alpha_2^2+4\alpha_2\alpha_3+8\alpha_3^2]+\alpha_1[1-3\alpha_2-12\alpha_3-b]+
=21[α22+4α2α3+8α32]+α1[1−3α2−12α3−b]+
α
2
[
1
−
4
α
2
−
14
α
3
−
b
]
+
α
3
[
1
+
α
2
+
4
α
3
+
b
]
\alpha_2[1-4\alpha_2-14\alpha_3-b] + \alpha_3[1+\alpha_2+4\alpha_3+b]
α2[1−4α2−14α3−b]+α3[1+α2+4α3+b]
为了看的更清楚一点,再誊抄一遍:
L
(
w
1
,
w
2
,
b
,
α
1
,
α
2
,
α
3
)
=
1
2
[
α
2
2
+
4
α
2
α
3
+
8
α
3
2
]
+
α
1
[
1
−
3
α
2
−
12
α
3
−
b
]
+
L(w_1,w_2,b,\alpha_1,\alpha_2,\alpha_3)=\frac{1}{2}[\alpha_2^2+4\alpha_2\alpha_3+8\alpha_3^2]+\alpha_1[1-3\alpha_2-12\alpha_3-b]+
L(w1,w2,b,α1,α2,α3)=21[α22+4α2α3+8α32]+α1[1−3α2−12α3−b]+
α
2
[
1
−
4
α
2
−
14
α
3
−
b
]
+
α
3
[
1
+
α
2
+
4
α
3
+
b
]
\alpha_2[1-4\alpha_2-14\alpha_3-b] + \alpha_3[1+\alpha_2+4\alpha_3+b]
α2[1−4α2−14α3−b]+α3[1+α2+4α3+b]
令该式子用
Q
(
b
,
α
1
,
α
2
,
α
3
)
Q(b,\alpha_1,\alpha_2,\alpha_3)
Q(b,α1,α2,α3)表示,
因为方程组(1)中的式子:
α
1
+
α
2
+
−
α
3
=
0
\alpha_1+\alpha_2+-\alpha_3=0
α1+α2+−α3=0,可以将b消掉,
则
Q
(
α
1
,
α
2
,
α
3
)
=
1
2
[
α
2
2
+
4
α
2
α
3
+
8
α
3
2
]
+
α
1
[
1
−
3
α
2
−
12
α
3
]
+
Q(\alpha_1,\alpha_2,\alpha_3)=\frac{1}{2}[\alpha_2^2+4\alpha_2\alpha_3+8\alpha_3^2]+\alpha_1[1-3\alpha_2-12\alpha_3]+
Q(α1,α2,α3)=21[α22+4α2α3+8α32]+α1[1−3α2−12α3]+
α
2
[
1
−
4
α
2
−
14
α
3
]
+
α
3
[
1
+
α
2
+
4
α
3
]
\alpha_2[1-4\alpha_2-14\alpha_3] + \alpha_3[1+\alpha_2+4\alpha_3]
α2[1−4α2−14α3]+α3[1+α2+4α3]
则接下来求
max
α
1
,
α
2
,
α
3
Q
(
α
1
,
α
2
,
α
3
)
\max_{\alpha_1,\alpha_2,\alpha_3}Q(\alpha_1,\alpha_2,\alpha_3)
α1,α2,α3maxQ(α1,α2,α3)
求极值依然是求导令导数等于0:
∂
Q
∂
α
1
=
1
−
3
α
2
−
12
α
3
=
0
\frac{\partial Q}{\partial \alpha_1} = 1-3\alpha_2-12\alpha_3 =0
∂α1∂Q=1−3α2−12α3=0
∂
Q
∂
α
2
=
α
2
+
2
α
3
−
3
α
1
+
1
−
4
α
2
−
14
α
3
−
4
α
2
+
α
3
=
1
−
3
α
1
−
7
α
2
−
11
α
3
=
0
\frac{\partial Q}{\partial \alpha_2} =\alpha_2+2\alpha_3-3\alpha_1+1-4\alpha_2-14\alpha_3-4\alpha_2+\alpha_3 = 1-3\alpha_1-7\alpha_2-11\alpha_3=0
∂α2∂Q=α2+2α3−3α1+1−4α2−14α3−4α2+α3=1−3α1−7α2−11α3=0
∂
Q
∂
α
3
=
2
α
2
+
8
α
3
−
12
α
1
−
14
α
2
+
1
+
α
2
+
4
α
3
+
4
α
3
=
1
−
12
α
1
−
11
α
2
+
16
α
3
=
0
\frac{\partial Q}{\partial \alpha_3} = 2\alpha_2+8\alpha_3-12\alpha_1-14\alpha_2+1+\alpha_2+4\alpha_3+4\alpha_3=1-12\alpha_1-11\alpha_2+16\alpha_3=0
∂α3∂Q=2α2+8α3−12α1−14α2+1+α2+4α3+4α3=1−12α1−11α2+16α3=0
整理得到方程组(2):
3
α
2
+
12
α
3
=
1
①
3\alpha_2+12\alpha_3 = 1 \qquad ①
3α2+12α3=1①
3
α
1
+
7
α
2
+
11
α
3
=
1
②
3\alpha_1+7\alpha_2+11\alpha_3=1 \qquad ②
3α1+7α2+11α3=1②
12
α
1
+
11
α
2
−
16
α
3
=
1
③
12\alpha_1+11\alpha_2-16\alpha_3=1 \qquad ③
12α1+11α2−16α3=1③
解方程组得:
α
1
=
13
9
\alpha_1 = \frac{13}{9}
α1=913
α
2
=
−
1
\alpha_2 = -1
α2=−1
α
3
=
1
3
\alpha_3 = \frac{1}{3}
α3=31
因为拉格朗日乘子需要满足
α
i
≥
0
\alpha_i \geq 0
αi≥0,该解不满足,所以最小值应该在边界上达到。
因为方程组(1)中给出的约束:
α
1
+
α
2
−
α
3
=
0
\alpha_1+\alpha_2-\alpha_3=0
α1+α2−α3=0
- 当
α
1
=
0
\alpha_1=0
α1=0时,
α
2
=
α
3
\alpha_2 =\alpha_3
α2=α3, 代入到Q中得:
Q
=
−
13
2
(
α
2
−
2
13
)
2
+
2
13
Q=-\frac{13}{2}(\alpha_2 - \frac{2}{13})^2+\frac{2}{13}
Q=−213(α2−132)2+132, 明显,Q在
α
2
=
2
13
\alpha_2 = \frac{2}{13}
α2=132处取得最大值
2
13
\frac{2}{13}
132;
- 当
α
2
=
0
\alpha_2=0
α2=0时,
α
1
=
α
3
\alpha_1 =\alpha_3
α1=α3, 同样代入到Q中得:
Q
=
−
4
(
α
1
−
1
4
)
2
+
1
4
Q=-4(\alpha_1 - \frac{1}{4})^2+\frac{1}{4}
Q=−4(α1−41)2+41, 明显,Q在
α
1
=
1
4
\alpha_1 = \frac{1}{4}
α1=41处取得最大值
1
4
\frac{1}{4}
41;
- 当
α
3
=
0
\alpha_3=0
α3=0时,
α
1
=
−
α
2
\alpha_1 =-\alpha_2
α1=−α2,且拉格朗日乘子大于等于0,故
α
1
=
α
2
=
α
3
=
0
\alpha_1 =\alpha_2=\alpha_3=0
α1=α2=α3=0,不成立。
因为
1
4
>
2
13
\frac{1}{4} > \frac{2}{13}
41>132,所以当
α
2
=
0
\alpha_2=0
α2=0时,Q取得最优解,相应的
α
1
=
α
3
=
1
4
\alpha_1 = \alpha_3= \frac{1}{4}
α1=α3=41。
由此可知,
x
1
,
x
3
x_1,x_3
x1,x3为支持向量。
回到方程组(1)的解可以得到:
w
1
=
2
α
3
=
1
2
w_1 = 2\alpha_3 = \frac{1}{2}
w1=2α3=21
w
2
=
α
2
+
2
α
3
=
1
2
w_2 = \alpha_2+ 2\alpha_3= \frac{1}{2}
w2=α2+2α3=21
根据支持向量的定义:
y
i
(
w
T
x
i
+
b
)
−
1
=
0
y_i(w^Tx_i+b)-1=0
yi(wTxi+b)−1=0
将样本
(
x
1
,
y
1
)
(x_1 ,y_1)
(x1,y1)代入可得:
1
×
(
1
2
×
3
+
1
2
×
3
+
b
)
−
1
=
0
1×(\frac{1}{2}×3 + \frac{1}{2}×3 +b)-1=0
1×(21×3+21×3+b)−1=0
解得
b
=
−
2
b=-2
b=−2。
最终得到最大间隔分离超平面为:
1
2
x
(
1
)
+
1
2
x
(
2
)
−
2
=
0
\frac{1}{2}x^{(1)} + \frac{1}{2}x^{(2)} -2 = 0
21x(1)+21x(2)−2=0