python和matlab拟合_梯度下降算法 线性回归拟合(附Python/Matlab/Julia源代码)

2023-05-16

梯度下降

梯度下降法的原理

梯度下降法(gradient descent)是一种常用的一阶(first-order)优化方法,是求解无约束优化问题最简单、最经典的方法之一。

梯度下降最典型的例子就是从山上往下走,每次都寻找当前位置最陡峭的方向小碎步往下走,最终就会到达山下(暂不考虑有山谷的情况)。

首先来解释什么是梯度?这就要先讲微分。对于微分,相信大家都不陌生,看几个例子就更加熟悉了。

先来看单变量的微分:

equation?tex=%5Cfrac%7Bd%5Cleft%28x%5E%7B2%7D%5Cright%29%7D%7Bd+x%7D%3D2+x

再看多变量的微分:

equation?tex=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial+x%7D%5Cleft%28x%5E%7B2%7D+y%5Cright%29%3D2+x+y+%5C%5C+++%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial+y%7D%5Cleft%28x%5E%7B2%7D+y%5Cright%29%3Dx%5E%7B2%7D

补充:导数和微分的区别

导数是函数在某一点处的斜率,是Δy和Δx的比值;而微分是指函数在某一点处的切线在横坐标取得增量Δx以后,纵坐标取得的增量,一般表示为dy。

梯度就是由微分结果组成的向量,令

equation?tex=f%28x%2Cy%2Cz%29+%3D+x%5E%7B2%7D+%2B+2xy+%2B+3yz

equation?tex=%5Cleft%5C%7B%5Cbegin%7Bmatrix%7D+%5Cfrac%7B%5Cpartial+f%7D%7B%5Cpartial+x%7D%3D2x+%2B+2y+%5C%5C+%5C%5C+%5Cfrac%7B%5Cpartial+f%7D%7B%5Cpartial+y%7D%3D2x+%2B3z+%5C%5C+%5C%5C+%5Cfrac%7B%5Cpartial+f%7D%7B%5Cpartial+z%7D%3D3y+%5Cend%7Bmatrix%7D%5Cright.

那么,函数f(x,y,z)在(1,2,3)处的微分为

equation?tex=%5Cleft%5C%7B%5Cbegin%7Bmatrix%7D+%5Cfrac%7B%5Cpartial+f%7D%7B%5Cpartial+x%7D%3D2x+%2B+2y+%3D+2%2A1%2B2%2A2%3D6+%5C%5C+%5C%5C+%5Cfrac%7B%5Cpartial+f%7D%7B%5Cpartial+y%7D%3D2x+%2B3z+%3D+2%2A1+%2B+3%2A3%3D11+%5C%5C+%5C%5C+%5Cfrac%7B%5Cpartial+f%7D%7B%5Cpartial+z%7D%3D3y%3D3%2A2%3D6+%5Cend%7Bmatrix%7D%5Cright.

因此,函数f(x,y,z)在(1,2,3)处的梯度为(6,11,6)。

梯度是一个向量,对于一元函数,梯度就是该点处的导数,表示切线的斜率。对于多元函数,梯度的方向就是函数在该点上升最快的方向。

梯度下降法就是每次都寻找梯度的反方向,这样就能到达局部的最低点。

那为什么按照梯度的反方向能到达局部的最低点呢?这个问题直观上很容易看出来,但严禁起见,我们还是给出数学证明。

对于连续可微函数f(x),从某个随机点出发,想找到局部最低点,可以通过构造一个序列

equation?tex=x_%7B0%7D%2Cx_%7B1%7D%2Cx_%7B2%7D... 能够满足

equation?tex=f%28x_%7Bt%2B1%7D%29+%3C+f%28x_%7Bt%7D%29%2C+t%3D0%2C1%2C2...

那么我们就能够不断执行该过程即可收敛到局部极小点,可参考下图。

那么问题就是如何找到下一个点

equation?tex=x%5E%7Bt%2B1%7D 并保证

equation?tex=f%28x%5E%7Bt%2B1%7D%29+%3C+f%28x%5Et%29 呢?我们以一元函数为例来说明。对于一元函数来说,x是会存在两个方向:要么是正方向

equation?tex=%5CDelta+x+%3E+0 , 要么是负方向

equation?tex=%5CDelta+x+%3C+0 ,如何选择每一步的方向,就需要用到大名鼎鼎的泰勒公式,先看一下下面这个泰勒展式:

equation?tex=f%28x%2B%5CDelta+x%29+%5Csimeq+f%28x%29%2B%5CDelta+x+%5Cnabla+f%28x%29

其中

equation?tex=%5Cnabla+f%28x%29 表示f(x)在x处的导数。

若想

equation?tex=f%28x%2B%5CDelta+x%29%3Cf%28x%29 ,就需要保证

equation?tex=%5CDelta+x+%5Cnabla+f%28x%29%3C0 ,令

equation?tex=%5CDelta+x%3D-%5Calpha+%5Cnabla+f%28x%29%2C+%5Cquad%28%5Calpha%3E0%29+

步长

equation?tex=%5Calpha 是一个较小的正数,从而有

equation?tex=%5CDelta+x+%5Cnabla+f%28x%29+%3D+-%5Calpha%28%5Cnabla+f%28x%29%29%5E2+%3C+0

因此,有

equation?tex=f%28x%2B%5CDelta+x%29%3Df%28x-%5Calpha+%5Cnabla+f%28x%29%29%5Csimeq+f%28x%29-%5Calpha%28%5Cnabla+f%28x%29%29%5E2+%3Cf%28x%29%5C%5C

每一步我们都按照

equation?tex=x_%7Bi%2B1%7D+%3D+x_%7Bi%7D-%5Calpha+%5Cnabla+f%28x%29 更新x,这就是梯度下降的原理。

这里再对

equation?tex=%5Calpha 解释一下,α在梯度下降算法中被称作为学习率或者步长,意味着我们可以通过α来控制每一步走的距离。既要保证步子不能太小,还没下到山底太阳就下山了;也要保证步子不能跨的太大,可能会导致错过最低点。

在梯度前加负号就是朝梯度的反方向前进,因为梯度是上升最快的方向,所以方向就是下降最快的方向。

梯度下降的实例

一元函数的梯度下降

设一元函数为

equation?tex=J%28%5Ctheta%29+%3D+%5Ctheta%5E2%5C%5C

函数的微分为

equation?tex=J%28%5Ctheta%29+%3D+%5Ctheta%5E2%5C%5C

设起点为

equation?tex=J%28%5Ctheta%29+%3D+%5Ctheta%5E2 ,步长

equation?tex=alpha%3D0.4 ,根据梯度下降的公式

equation?tex=%5Ctheta_%7B1%7D+%3D+%5Ctheta_%7B0%7D+-+%5Calpha+%5Cnabla+J%28%5Ctheta%29%5C%5C

经过4次迭代:

equation?tex=%5Ctheta_%7B0%7D+%3D+1+%5C%5C+%5Ctheta_%7B1%7D+%3D+%5Ctheta_%7B0%7D+-+0.4%2A2%2A1%3D0.2+%5C%5C+%5Ctheta_%7B2%7D+%3D+%5Ctheta_%7B1%7D+-+0.4%2A2%2A0.2%3D0.04+%5C%5C+%5Ctheta_%7B3%7D+%3D+%5Ctheta_%7B2%7D+-+0.4%2A2%2A0.04%3D0.008

多元函数的梯度下降

设二元函数为

equation?tex=J%28%5CTheta%29%3D%5Ctheta_%7B1%7D%5E%7B2%7D%2B%5Ctheta_%7B2%7D%5E%7B2%7D%5C%5C

函数的梯度为

equation?tex=%5Cnabla+J%28%5CTheta%29%3D%282+%5Ctheta_%7B1%7D%2C+2+%5Ctheta_%7B2%7D+%29%5C%5C

设起点为(2,3),步长

equation?tex=%5Calpha%3D0.1 ,根据梯度下降的公式,经过多次迭代后,有

equation?tex=%5CTheta_%7B0%7D+%3D+%282%2C3%29+%5C%5C+%5CTheta_%7B1%7D+%3D+%5CTheta_%7B0%7D+-+0.1%2A%282%2A2%2C+2%2A3%29%3D+%281.6%2C+2.4%29+%5C%5C+%5CTheta_%7B2%7D+%3D+%5CTheta_%7B1%7D+-+0.1%2A%282%2A1.6%2C+2%2A2.4%29%3D+%281.28%2C+1.92%29+%5C%5C+%5Cvdots++%5C%5C+%5CTheta_%7B99%7D+%3D+%5CTheta_%7B98%7D+-+0.1%2A%5CTheta_%7B98%7D+%3D+%286.36e-10%2C+9.55e-10%29+%5C%5C+%5CTheta_%7B100%7D+%3D+%5CTheta_%7B99%7D+-+0.1%2A%5CTheta_%7B99%7D+%3D+%285.09e-10%2C+7.64e-10%29

loss function(损失函数)

损失函数也叫代价函数(cost function),是用来衡量模型预测出来的值h(θ)与真实值y之间的差异的函数,如果有多个样本,则可以将所有代价函数的取值求均值,记做J(θ)。代价函数有下面几个性质:对于每种算法来说,代价函数不是唯一的;

代价函数是参数θ的函数;

总的代价函数J(θ)可以用来评价模型的好坏,代价函数越小说明模型和参数越符合训练样本(x, y);

J(θ)是一个标量。

最常见的代价函数是均方误差函数,即

equation?tex=J%5Cleft%28%5Ctheta_%7B0%7D%2C+%5Ctheta_%7B1%7D%5Cright%29%3D%5Cfrac%7B1%7D%7B2+m%7D+%5Csum_%7Bi%3D1%7D%5E%7Bm%7D%5Cleft%28%5Chat%7By%7D%5E%7B%28i%29%7D-y%5E%7B%28i%29%7D%5Cright%29%5E%7B2%7D%3D%5Cfrac%7B1%7D%7B2+m%7D+%5Csum_%7Bi%3D1%7D%5E%7Bm%7D%5Cleft%28h_%7B%5Ctheta%7D%5Cleft%28x%5E%7B%28i%29%7D%5Cright%29-y%5E%7B%28i%29%7D%5Cright%29%5E%7B2%7D%5C%5C

其中,m为训练样本的个数

equation?tex=h_%7B%5Ctheta%7D%28x%29 表示估计值,表达式如下

equation?tex=h_%7B%5CTheta%7D%5Cleft%28x%5E%7B%28i%29%7D%5Cright%29%3D%5CTheta_%7B0%7D%2B%5CTheta_%7B1%7D+x_%7B1%7D%5E%7B%28i%29%7D%5C%5C

y是原训练样本中的值

我们需要做的就是找到θ的值,使得J(θ)最小。代价函数的图形跟我们上面画过的图很像,如下图所示。

看到这个图,相信大家也就知道了我们可以用梯度下降算法来求可以使代价函数最小的θ值。

先求代价函数的梯度

equation?tex=%5Cbegin%7Barray%7D%7Bc%7D%7B%5Cnabla+J%28%5CTheta%29%3D%5Cleft%5Clangle%5Cfrac%7B%5Cpartial++J%7D%7B%5Cpartial++%5CTheta_%7B0%7D%7D%2C+%5Cfrac%7B%5Cpartial++J%7D%7B%5Cpartial++%5CTheta_%7B1%7D%7D%5Cright%5Crangle%7D++%5C%5C++%5C%5C+%7B%5Cfrac%7B%5Cpartial++J%7D%7B%5Cpartial++%5CTheta_%7B0%7D%7D%3D%5Cfrac%7B1%7D%7Bm%7D+%5Csum_%7Bi%3D1%7D%5E%7Bm%7D%5Cleft%28h_%7B%5CTheta%7D%5Cleft%28x%5E%7B%28i%29%7D%5Cright%29-y%5E%7B%28i%29%7D%5Cright%29%7D++%5C%5C+%5C%5C+%7B%5Cfrac%7B%5Cpartial++J%7D%7B%5Cpartial++%5CTheta_%7B1%7D%7D%3D%5Cfrac%7B1%7D%7Bm%7D+%5Csum_%7Bi%3D1%7D%5E%7Bm%7D%5Cleft%28h_%7B%5CTheta%7D%5Cleft%28x%5E%7B%28i%29%7D%5Cright%29-y%5E%7B%28i%29%7D%5Cright%29+x_%7B1%7D%5E%7B%28i%29%7D%7D%5Cend%7Barray%7D%5C%5C

这里有两个变量

equation?tex=%5Ctheta_%7B0%7D

equation?tex=%5Ctheta_%7B1%7D ,为了方便矩阵表示,我们给x增加一维,这一维的值都是1,并将会乘到

equation?tex=%5Ctheta_%7B0%7D 上。那么cost function的矩阵形式为:

equation?tex=%5Cbegin%7Barray%7D%7Bc%7D%7BJ%28%5CTheta%29%3D%5Cfrac%7B1%7D%7B2+m%7D%28X+%5CTheta-%5Cvec%7By%7D%29%5E%7BT%7D%28X+%5CTheta-%5Cvec%7By%7D%29%7D++%5C%5C+%5C%5C+%7B%5Cnabla+J%28%5CTheta%29%3D%5Cfrac%7B1%7D%7Bm%7D+X%5E%7BT%7D%28X+%5CTheta-%5Cvec%7By%7D%29%7D%5Cend%7Barray%7D%5C%5C

这么看公式可能很多同学会不太明白,我们把每个矩阵的具体内容表示出来,大家就很容易理解了。

矩阵

equation?tex=%5CTheta 为:

equation?tex=%5Cleft%5B+%5Cbegin%7Barray%7D%7Bl%7D%7B%5Ctheta_%7B0%7D%7D++%5C%5C++%5C%5C+%7B%5Ctheta_%7B1%7D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C

矩阵X为:

equation?tex=%5Cbegin%7Bbmatrix%7D+1++%26++%26+x%5E%7B0%7D+%5C%5C+1++%26+%26+x%5E%7B1%7D+%5C%5C+1++%26+%26+x%5E%7B2%7D+%5C%5C++%26+%5Cvdots++%5C%5C+1+%26+%26+x%5E%7Bm%7D+%5Cend%7Bbmatrix%7D%5C%5C

矩阵y为:

equation?tex=%5Cleft%5B+%5Cbegin%7Barray%7D%7Bc%7D%7By%5E%7B0%7D%7D+%5C%5C+%7By%5E%7B1%7D%7D+%5C%5C+%7B%5Cvdots%7D+%5C%5C+%7By%5E%7Bm%7D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C

这样写出来后再去对应上面的公式,就很容易理解了。

下面我们来举一个用梯度下降算法来实现线性回归的例子。有一组数据如下图所示,我们尝试用求出这些点的线性回归模型。

首先产生矩阵X和矩阵y

# generate matrix X

X0 = np.ones((m, 1))

X1 = np.arange(1, m+1).reshape(m, 1)

X = np.hstack((X0, X1))

# matrix y

y = np.array([2,3,3,5,8,10,10,13,15,15,16,19,19,20,22,22,25,28]).reshape(m,1)

按照上面的公式定义梯度函数

def gradient_function(theta, X, y):

diff = np.dot(X, theta) - y

return (1./m) * np.dot(np.transpose(X), diff)

接下来就是最重要的梯度下降算法,我们取

equation?tex=%5Ctheta_%7B0%7D

equation?tex=%5Ctheta_%7B1%7D 的初始值都为1,再进行梯度下降过程。

def gradient_descent(X, y, alpha):

theta = np.array([1, 1]).reshape(2, 1)

gradient = gradient_function(theta, X, y)

while not np.all(np.absolute(gradient) <= 1e-5):

theta = theta - alpha * gradient

gradient = gradient_function(theta, X, y)

return theta

通过该过程,最终求出的

equation?tex=%5Ctheta_%7B0%7D%3D0.137

equation?tex=%5Ctheta_%7B1%7D%3D1.477 ,线性回归的曲线如下

附录 source code

matlab一元函数的梯度下降程序

clc;

close all;

clear all;

%%

delta = 1/100000;

x = -1.1:delta:1.1;

y = x.^2;

dot = [1, 0.2, 0.04, 0.008];

figure;plot(x,y);

axis([-1.2, 1.2, -0.2, 1.3]);

grid on

hold on

plot(dot, dot.^2,'r');

for i=1:length(dot)

text(dot(i),dot(i)^2,['\theta_{',num2str(i),'}']);

end

title('一元函数的梯度下降过程');

python一元函数的梯度下降程序

import numpy as np

import matplotlib.pyplot as plt

delta = 1/100000

x = np.arange(-1.1, 1.1, delta)

y = x ** 2

dot = np.array([1, 0.2, 0.04, 0.008])

plt.figure(figsize=(7,5))

plt.plot(x,y)

plt.grid(True)

plt.xlim(-1.2, 1.2)

plt.ylim(-0.2, 1.3)

plt.plot(dot, dot**2, 'r')

for i in range(len(dot)):

plt.text(dot[i],dot[i]**2,r'$\theta_%d$' % i)

plt.title('一元函数的梯度下降过程')

plt.show()

julia一元函数的梯度下降程序

using PyPlot

delta = 1/100000

x = -1.1:delta:1.1

y = x.^2

dot = [1, 0.2, 0.04, 0.008]

plot(x, y)

grid(true)

axis("tight")

plot(dot, dot.^2, color="r")

for i=1:length(dot)

text(dot[i], dot[i]^2, "\$\\theta_$i\$")

end

title("Single variable function gradient descent")

matlab二元函数的梯度下降程序

pecision = 1/100;

[x,y] = meshgrid(-3.1:pecision:3.1);

z = x.^2 + y.^2;

figure;

mesh(x,y,z);

dot = [[2,3];[1.6,2.4];[1.28,1.92];[5.09e-10, 7.64e-10]];

hold on

scatter3(dot(:,1),dot(:,2),dot(:,1).^2+dot(:,2).^2,'r*');

for i=1:4

text(dot(i,1)+0.4,dot(i,2),dot(i,1).^2+0.2+dot(i,2).^2+0.2,['\theta_{',num2str(i),'}']);

end

title('二元函数的梯度下降过程')

python二元函数的梯度下降程序

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

x = np.linspace(-3.1,3.1,300)

y = np.linspace(-3.1,3.1,300)

x,y = np.meshgrid(x, y)

z = x**2 + y**2

dot = np.array([[2,3],[1.6,2.4],[1.28,1.92],[5.09e-10, 7.64e-10]])

fig = plt.figure(figsize = (10,6))

ax = fig.gca(projection = '3d')

cm = plt.cm.get_cmap('YlGnBu')

surf = ax.plot_surface(x, y, z, cmap=cm)

fig.colorbar(surf,shrink=0.5, aspect=5)

ax.scatter3D(dot[:,0], dot[:,1], dot[:,0]**2 + dot[:,1]**2, marker='H',c='r')

for i in range(len(dot)-1):

ax.text(dot[i,0]+0.4, dot[i,1], dot[i,0]**2 + dot[i,1]**2, r'$\Theta_%d$' % i)

ax.text(dot[3,0]+0.4, dot[3,1]+0.4, dot[3,0]**2 + dot[3,1]**2-0.4, r'min')

plt.show()

julia二元函数的梯度下降程序

这个图的text死活标不上,希望知道的朋友可以告知一下。再多说一句,虽然我之前出了个Julia的教程,里面也包含4种绘图工具(Plots,GR,Gadfly & PyPlot),但没有画过3维的图形,今天为了画这个图可真是费尽周折,Julia官网上的3D绘图的程序基本没有一个可以直接使用的,具体的绘图过程和调试中碰到的问题我还会整理篇文章到知乎和公众号,大家可以看一下。

using Plots

Plots.plotlyjs()

n = 50

x = range(-3, stop=3, length=n)

y= x

z = zeros(n,n)

for i in 1:n, k in 1:n

z[i,k] = x[i]^2 + y[k]^2

end

surface(x, y, z)

dot = [[2 3]; [1.6 2.4]; [1.28 1.92]; [5.09e-10 7.64e-10]]

scatter!(dot[:,1], dot[:,2], dot[:,1].^2 .+ dot[:,2].^2)

matlab梯度下降的线性回归

m = 18;

X0 = ones(m,1);

X1 = (1:m)';

X = [X0, X1];

y = [2,3,3,5,8,10,10,13,15,15,16,19,19,20,22,22,25,28]';

alpha = 0.01;

theta = gradient_descent(X, y, alpha, m);

function [grad_res] = gradient_function(theta, X, y, m)

diff = X * theta - y;

grad_res = X' * diff / m;

end

function [theta_res] = gradient_descent(X, y, alpha, m)

theta = [1;1];

gradient = gradient_function(theta, X, y, m);

while sum(abs(gradient)>1e-5)>=1

theta = theta - alpha * gradient;

gradient = gradient_function(theta, X, y, m);

end

theta_res = theta;

end

python梯度下降的线性回归

import numpy as np

import matplotlib.pyplot as plt

# y = np.array([2,3,3,5,8,10,10,13,15,15,16,19,19,20,22,22,25,28])

# x = np.arange(1,len(y)+1)

# plt.figure()

# plt.scatter(x,y)

# plt.grid(True)

# plt.show()

# sample length

m = 18

# generate matrix X

X0 = np.ones((m, 1))

X1 = np.arange(1, m+1).reshape(m, 1)

X = np.hstack((X0, X1))

# matrix y

y = np.array([2,3,3,5,8,10,10,13,15,15,16,19,19,20,22,22,25,28]).reshape(m,1)

# alpha

alpha = 0.01

def cost_function(theta, X, y):

diff = np.dot(X, theta) - y

return (1./2*m) * np.dot(np.transpose(diff), diff)

def gradient_function(theta, X, y):

diff = np.dot(X, theta) - y

return (1./m) * np.dot(np.transpose(X), diff)

def gradient_descent(X, y, alpha):

theta = np.array([1, 1]).reshape(2, 1)

gradient = gradient_function(theta, X, y)

while not np.all(np.absolute(gradient) <= 1e-5):

theta = theta - alpha * gradient

gradient = gradient_function(theta, X, y)

return theta

[theta0, theta1] = gradient_descent(X, y, alpha)

plt.figure()

plt.scatter(X1,y)

plt.plot(X1, theta0 + theta1*X1, color='r')

plt.title('基于梯度下降算法的线性回归拟合')

plt.grid(True)

plt.show()

julia梯度下降的线性回归

m = 18

X0 = ones(m,1)

X1 = Array(1:m)

X = [X0 X1];

y = [2,3,3,5,8,10,10,13,15,15,16,19,19,20,22,22,25,28];

alpha = 0.01;

theta = gradient_descent(X, y, alpha, m)

function gradient_function(theta, X, y, m)

diff = X * theta .- y;

grad_res = X' * diff / m;

end

function gradient_descent(X, y, alpha, m)

theta = [1,1]

gradient = gradient_function(theta, X, y, m)

while all(abs.(gradient) .>1e-5)==true

theta = theta - alpha * gradient

gradient = gradient_function(theta, X, y, m)

end

theta_res = theta

end

最后附上我出过的免费的Julia教学视频链接:Julia 教程 从入门到进阶 - 网易云课堂​study.163.com

微信公众号:Quant_Times

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

python和matlab拟合_梯度下降算法 线性回归拟合(附Python/Matlab/Julia源代码) 的相关文章

随机推荐