线性回归模型及K-score归一化方法Python3实现杂文

2023-11-08

Python3——线性回归模型及K-score归一化方法实现

前言:本文是博主参考吴恩达的机器学习课程记录的杂文笔记,主要内容是线性回归的代码实现和K-score归一化方法的代码实现,以及线性回归的主要公式内容。鉴于博主水平,如有错误,请帮忙斧正,若有意见,还望不吝赐教!


一元线性梯度下降代码

一元线性梯度下降三部曲

  1. 计算下降
    执行公式:
    重复 直到收敛:    {    w = w − α ∂ J ( w , b ) ∂ w               b = b − α ∂ J ( w , b ) ∂ b            } \begin{align*} \text{重复}&\text{直到收敛:} \newline \; \lbrace \newline \; w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \ \ \ \ \ \ \ \ \ \ \tag{1} \; \newline b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \ \ \ \ \ \ \ \ \ \ \tag{2} \newline \rbrace \end{align*} 重复{wb}直到收敛:=wαwJ(w,b)          =bαbJ(w,b)          (1)(2)
    ∂ J ( w , b ) ∂ w = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x ( i )            ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) \begin{align} \frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \ \ \ \ \ \ \ \ \ \ \tag{3} \\ \frac{\partial J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{4} \\ \end{align} wJ(w,b)bJ(w,b)=m1i=0m1(fw,b(x(i))y(i))x(i)          =m1i=0m1(fw,b(x(i))y(i))(3)(4)

  2. 计算代价
    执行公式 J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2            (5) J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \ \ \ \ \ \ \ \ \ \ \tag{5} J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2          (5)

  3. 梯度下降
    重复执行上述两个步骤。

代码实现

import math, copy
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
from lab_utils_uni import plt_house_x, plt_contour_wgrad, plt_divergence, plt_gradients

# 代价函数计算方法
def compute_cost(x, y, w, b):
   
    m = x.shape[0] 
    cost = 0
    
    for i in range(m):
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i])**2
    total_cost = 1 / (2 * m) * cost
    return total_cost

# 计算两个权重的梯度下降的导数项
def compute_gradient(x, y, w, b): 
    """
    Computes the gradient for linear regression 
	计算梯度下降的下降率
    Args:(参数)
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns(返回值)
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b     
     """
    
    # Number of training examples
    m = x.shape[0]    
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):  
        f_wb = w * x[i] + b 
        dj_dw_i = (f_wb - y[i]) * x[i] 
        dj_db_i = f_wb - y[i] 
        dj_db += dj_db_i
        dj_dw += dj_dw_i 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 
        
    return dj_dw, dj_db

# 计算梯度下降函数
def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:(参数)
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters (模型变量的初始值)
      alpha (float):     Learning rate (学习率)
      num_iters (int):   number of iterations to run gradient descent (指定迭代次数)
      cost_function:     function to call to produce cost (代价函数的方法)
      gradient_function: function to call to produce gradient (梯度下降偏导计算)
      
    Returns:(返回值)
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    w = copy.deepcopy(w_in) # avoid modifying global w_in (规避全局变量)
    # 用于储存返回值的历史代价函数和历史参数对
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        # 计算梯度下降的下降率
        dj_dw, dj_db = gradient_function(x, y, w , b)     
        # Update Parameters using equation (3) above
	    # 计算梯度下降的参数值并反代
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            
        # Save cost J at each iteration
	    # 储存代价函数J
        if i<100000:      # prevent resource exhaustion (避免资源耗尽)
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
	    # 每10次运行就打印一次参数和代价函数
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")

    return w, b, J_history, p_history #return w and J,w history for graphing

注意事项

  • 对于梯度下降公式,导数项是同步的
  • 取得w, b最好远离最合适的值
  • 学习率α要适当,过大会导致发散

多元线性梯度下降代码

数据矩阵

样本数据

假如拥有 m m m个样本(example)和 n n n个特征(further),则这些数据可以用如下的 ( m , n ) (m, n) (m,n)矩阵 x \mathbf{x} x表示:
X = ( x 0 ( 0 ) x 1 ( 0 ) ⋯ x n − 1 ( 0 ) x 0 ( 1 ) x 1 ( 1 ) ⋯ x n − 1 ( 1 ) ⋯ x 0 ( m − 1 ) x 1 ( m − 1 ) ⋯ x n − 1 ( m − 1 ) ) \mathbf{X} = \begin{pmatrix} x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\ \cdots \\ x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} \end{pmatrix} X= x0(0)x0(1)x0(m1)x1(0)x1(1)x1(m1)xn1(0)xn1(1)xn1(m1)
其中:

  • x ( i ) \mathbf{x}^{(i)} x(i) 代表一个样本, x ( i ) = ( x 0 ( i ) , x 1 ( i ) , ⋯   , x n − 1 ( i ) ) \mathbf{x}^{(i)}=(x_{0}^{(i)}, x_{1}^{(i)}, \cdots, x_{n-1}^{(i)}) x(i)=(x0(i),x1(i),,xn1(i))
  • x j ( i ) x_{j}^{(i)} xj(i) 代表第 i i i个样本的第 j j j个元素。上标是样本号,下标是元素索引。

参数向量

w \mathbf{w} w是一个带有 n n n个元素的向量:

  • w \mathbf{w} w每一个元素都有一个关联的特征
  • 称之为列向量(column vector)
    w = ( w 0 w 1 ⋯ w n − 1 ) \mathbf{w} = \begin{pmatrix} w_0 \\ w_1 \\ \cdots\\ w_{n-1} \end{pmatrix} w= w0w1wn1
  • b b b 是一个标量参数
    此时给定的参数 w w w的初始值同样是个向量 w \mathbf{w} w

计算线性回归 f ( x ) f(x) f(x)

用非向量表示:
f w , b ( x ) = w _ 0 x _ 0 + w _ 1 x _ 1 + . . . + w n − 1 x n − 1 + b         (1) f_{\mathbf{w},b}(\mathbf{x}) = w\_0x\_0 + w\_1x\_1 +... + w_{n-1}x_{n-1} + b \ \ \ \ \ \ \ \tag{1} fw,b(x)=w_0x_0+w_1x_1+...+wn1xn1+b       (1)
用向量表示:
f w , b ( x ) = w ⋅ x + b        (2) f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b \ \ \ \ \ \ \tag{2} fw,b(x)=wx+b      (2)
这里的 ⋅ \cdot 代表的是向量点乘(dot product)

方法

循环法,对应公式(1)
def predict_single_loop(x, w, b): 
    """
    single predict using linear regression 用线性回归预测一次
    
    Args:
      x (ndarray): Shape (n,) example with multiple features(一个样本的向量)
      w (ndarray): Shape (n,) model parameters(权重的向量)
      b (scalar):  model parameter(模型标参)
      
    Returns:
      p (scalar):  prediction(函数预测值)
    """
    n = x.shape[0]
    p = 0
    for i in range(n):
        p_i = x[i] * w[i]  
        p = p + p_i         
    p = p + b                
    return p
数量积法,对应公式(2)
def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b     
    return p    

计算代价函数 j ( x ) j(x) j(x)

J ( w , b ) J(\mathbf{w},b) J(w,b)的表达式为:
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 (3) J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3} J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2(3)
用向量表示:

f w , b ( x ( i ) ) = w ⋅ x ( i ) + b (4) f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b \tag{4} fw,b(x(i))=wx(i)+b(4)

方法

def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features(所有输入的矩阵)
      y (ndarray (m,)) : target values(目标变量)
      w (ndarray (n,)) : model parameters(权重向量)  
      b (scalar)       : model parameter(标参)
      
    Returns:
      cost (scalar): cost(代价函数)
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

下降率

参数:
循环 直到收敛:    {    w j = w j − α ∂ J ( w , b ) ∂ w j    for j = 0..n-1              b    = b − α ∂ J ( w , b ) ∂ b } \begin{align*} \text{循环}&\text{直到收敛:} \; \lbrace \newline\; & w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \; & \text{for j = 0..n-1}\ \ \ \ \ \ \ \ \ \ \ \ \ \tag{5} \newline &b\ \ = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \newline \rbrace \end{align*} 循环}直到收敛:{wj=wjαwjJ(w,b)b  =bαbJ(w,b)for j = 0..n-1             (5)
导数项:
∂ J ( w , b ) ∂ w j = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i )        ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) )         \begin{align} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \ \ \ \ \ \ \tag{6} \\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \ \ \ \ \ \ \ \tag{7} \end{align} wjJ(w,b)bJ(w,b)=m1i=0m1(fw,b(x(i))y(i))xj(i)      =m1i=0m1(fw,b(x(i))y(i))       (6)(7)
其中 m m m是数据集中的样本数, f ( w , b ) ( x ( i ) ) f_{(w, b)}(x^{(i)}) f(w,b)(x(i))是预测值, y ( i ) y^{(i)} y(i)是目标值。

def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features(数据矩阵)
      y (ndarray (m,)) : target values(目标值)
      w (ndarray (n,)) : model parameters(参数向量)
      b (scalar)       : model parameter(标参)
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

实现梯度下降

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

回归模型优化方法

特征缩放(scale features)

目的;让梯度下降运行地更快
问题:不同的特征有不同的取值范围,且尺度相差大,梯度下降会反复跳跃,导致运行缓慢
处理方法:重新缩放这些特征,保持同样的数量级,如将范围定在(-1~1)附近。
三种方法混用导致模型准确性有一定的偏差,具体要看算法
特征缩放处理数据训练的模型,预测特征也要用对应的特征缩放

简单的特征缩放(Feature scaling)

将数据除其取值范围的最大值即可:
x i , s c a l e d = x i x m a x x_{i, scaled} = \frac{x_i}{x_{max}} xi,scaled=xmaxxi

  • x i x_i xi代表这个特征向量中的第 i i i个特征。

均值归一化(Mean normalization)

缩放数值,使数值分布在原点附近(四个象限均有)。
计算方法:
x i = x i − μ j x m a x − x m i n x_i=\frac{x_i - \mu_j}{x_{max} - x_{min}} xi=xmaxxminxiμj

  • x i x_i xi代表这个特征向量中的第 i i i个特征。
  • μ j \mu_j μj代表这个特征向量的均值

Z-score归一化(Z-score normalization)

同样能使数据分布在原点附近。
Z-score归一化以后,所有要素的平均值为0,标准差为1。
计算方法:
x i = x i − μ j σ j x_i = \frac{x_i - \mu_j}{\sigma_j} xi=σjxiμj

  • σ j \sigma_j σj:第 j j j个特征的标准差(standard deviation)

判断梯度下降是否收敛

绘制学习曲线图

  • 绘制学习曲线图( j ( w , b ) − i t e r a t i o n j(w,b)-iteration j(w,b)iteration曲线图)
    • j ( w , b ) j(w,b) j(w,b)必须在每次迭代后下降
    • 如果出现上升,则需要考虑:
      • α \alpha α是否过大
      • 是否代码有bug
      • w = w − α d w = w - \alpha d w=wαd 中的“ − - ”写成了“ + + +

自动收敛测试

ϵ \epsilon ϵ代表一个很小的正数,如 1 0 − 3 10^{-3} 103
如果每一次迭代 j ( w , b ) j(w,b) j(w,b)的下降小于 ϵ \epsilon ϵ,那么可以认为收敛
由于 ϵ \epsilon ϵ很难取,所以通常用学习曲线观察法看收敛情况

选择合适的学习率(Learning rate)

排除问题

方法:选择较小的学习率,观察其学习曲线,从而判断是bug还是学习率问题。
技巧:通常从0.001开始以三倍递增(0.001~ 0.003~ 0.01~ 0.03~ 0.1~ 0.3),从而找到最合适的学习率。
结果:选择尽可能大,又不会导致发散的学习率。

特征工程(Feature engineering)

概念:运用所学知识和现有的特征创建新的特征,通常通过变换,合并等方式,使其能帮助算法更简单地做出准确的预测。
例子:一块土地有特征长( x 1 x_1 x1)和宽( x 2 x_2 x2),可以写出回归方程 f ( x ) = w 1 x 1 × w 2 x 2 + b f(x)=w_1 x_1 \times w_2 x_2 + b f(x)=w1x1×w2x2+b 。但是长和宽可以合并成面积( x 3 = x 1 x 2 x_3=x_1 x_2 x3=x1x2),扩充后的回归方程可以写为 f ( x ) = w 1 x 1 × w 2 x 2 × w 3 x 3 + b f(x) = w_1 x_1 \times w_2 x_2 \times w_3 x_3 + b f(x)=w1x1×w2x2×w3x3+b,使得对目标值的预测更准确。

多项式回归(Polynomial regression)

通过特征工程对参数处理,从而得到一个多项式回归模型,如 f ( X ) = w 1 x 1 × w 2 x 2 2 × w 3 x 3 3 + b f(X)=w_1 x_1 \times w_2 x_2^2 \times w_3 x_3^3 + b f(X)=w1x1×w2x22×w3x33+b这样的曲线回归模型。实质上,最佳特征相对于目标呈线性,比如一个用二次方匹配的目标,用二次方去拟合目标值,则对应的图像是一条直线。其实可以理解为将多项式回归中的高次项当成一个整体,以线性回归方程理解。

K-Score归一化

计算方法:
x i = x i − μ j σ j x_i = \frac{x_i - \mu_j}{\sigma_j} xi=σjxiμj
代码实现:

# K-score归一化
def zscore\_normalize\_features(X):
   """
 按列计算k均值归一化
   
 参数:
 X (ndarray): Shape (m,n) input data, m examples, n features(训练用的特征矩阵)
   
 Returns:
 X_norm (ndarray): Shape (m,n)  input normalized by column(归一化后的特征矩阵)
 mu (ndarray):     Shape (n,)   mean of each feature(每一列的特征均值)
 sigma (ndarray):  Shape (n,)   standard deviation of each feature(每一列的特征标准差)
 """
    \# 计算均值
    mu     = np.mean(X, axis=0)                 \# mu will have shape (n,)
    \# 计算标准差
    sigma  = np.std(X, axis=0)                  \# sigma will have shape (n,)
    \# element-wise, subtract mu for that column from each example, divide by std for that column
    X_norm = (X - mu) / sigma      

    return (X_norm, mu, sigma)

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

线性回归模型及K-score归一化方法Python3实现杂文 的相关文章

随机推荐