回归算法-概述 (Regression Algorithms - Overview)
回归概论 (Introduction to Regression)
Regression is another important and broadly used statistical and machine learning tool. The key objective of regression-based tasks is to predict output labels or responses which are continues numeric values, for the given input data. The output will be based on what the model has learned in training phase. Basically, regression models use the input data features (independent variables) and their corresponding continuous numeric output values (dependent or outcome variables) to learn specific association between inputs and corresponding outputs.
回归是另一个重要且广泛使用的统计和机器学习工具。 基于回归的任务的主要目标是针对给定的输入数据,预测输出标签或响应(连续的数值)。 输出将基于模型在训练阶段学到的知识。 基本上,回归模型使用输入数据特征(独立变量)及其对应的连续数值输出值(因变量或结果变量)来学习输入与对应输出之间的特定关联。
回归模型的类型 (Types of Regression Models)
Regression models are of following two types −
回归模型具有以下两种类型-
Simple regression model − This is the most basic regression model in which predictions are formed from a single, univariate feature of the data.
简单回归模型 -这是最基本的回归模型,其中预测是根据数据的单变量特征形成的。
Multiple regression model − As name implies, in this regression model the predictions are formed from multiple features of the data.
多元回归模型 -顾名思义,在此回归模型中,预测是根据数据的多个特征形成的。
用Python构建一个回归器 (Building a Regressor in Python)
Regressor model in Python can be constructed just like we constructed the classifier. Scikit-learn, a Python library for machine learning can also be used to build a regressor in Python.
可以像构造分类器一样构造Python中的Regressor模型。 Scikit-learn,一个用于机器学习的Python库,也可以用于在Python中构建一个回归器。
In the following example, we will be building basic regression model that will fit a line to the data i.e. linear regressor. The necessary steps for building a regressor in Python are as follows −
在下面的示例中,我们将构建基本的回归模型,该模型将使一条线适合数据,即线性回归。 在Python中构建回归器的必要步骤如下-
步骤1:导入必要的python包 (Step 1: Importing necessary python package)
For building a regressor using scikit-learn, we need to import it along with other necessary packages. We can import the by using following script −
为了使用scikit-learn构建回归器,我们需要将其与其他必要的软件包一起导入。 我们可以使用以下脚本导入-
import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
import matplotlib.pyplot as plt
步骤2:导入数据集 (Step 2: Importing dataset)
After importing necessary package, we need a dataset to build regression prediction model. We can import it from sklearn dataset or can use other one as per our requirement. We are going to use our saved input data. We can import it with the help of following script −
导入必要的程序包后,我们需要一个数据集来构建回归预测模型。 我们可以从sklearn数据集中导入它,也可以根据需要使用其他一个。 我们将使用保存的输入数据。 我们可以在以下脚本的帮助下导入它-
input = r'C:\linear.txt'
Next, we need to load this data. We are using np.loadtxt function to load it.
接下来,我们需要加载此数据。 我们正在使用np.loadtxt函数加载它。
input_data = np.loadtxt(input, delimiter=',')
X, y = input_data[:, :-1], input_data[:, -1]
步骤3:将数据整理到训练和测试集中 (Step 3: Organizing data into training & testing sets)
As we need to test our model on unseen data hence, we will divide our dataset into two parts: a training set and a test set. The following command will perform it −
由于我们需要在看不见的数据上测试模型,因此,我们将数据集分为两部分:训练集和测试集。 以下命令将执行它-
training_samples = int(0.6 * len(X))
testing_samples = len(X) - num_training
X_train, y_train = X[:training_samples], y[:training_samples]
X_test, y_test = X[training_samples:], y[training_samples:]
步骤4:模型评估和预测 (Step 4: Model evaluation & prediction)
After dividing the data into training and testing we need to build the model. We will be using LineaRegression() function of Scikit-learn for this purpose. Following command will create a linear regressor object.
将数据划分为训练和测试后,我们需要构建模型。 为此,我们将使用Scikit-learn的LineaRegression()函数。 以下命令将创建一个线性回归对象。
reg_linear= linear_model.LinearRegression()
Next, train this model with the training samples as follows −
接下来,使用以下训练样本训练该模型:
reg_linear.fit(X_train, y_train)
Now, at last we need to do the prediction with the testing data.
现在,最后我们需要对测试数据进行预测。
y_test_pred = reg_linear.predict(X_test)
第5步:绘图和可视化 (Step 5: Plot & visualization)
After prediction, we can plot and visualize it with the help of following script −
经过预测,我们可以在以下脚本的帮助下进行绘制和可视化-
Example
例
plt.scatter(X_test, y_test, color='red')
plt.plot(X_test, y_test_pred, color='black', linewidth=2)
plt.xticks(())
plt.yticks(())
plt.show()
Output
输出量
In the above output, we can see the regression line between the data points.
在上面的输出中,我们可以看到数据点之间的回归线。
步骤6:性能计算 (Step 6: Performance computation)
We can also compute the performance of our regression model with the help of various performance metrics as follows −
我们还可以借助各种性能指标来计算回归模型的性能,如下所示:
Example
例
print("Regressor model performance:")
print("Mean absolute error(MAE) =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error(MSE) =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
Output
输出量
Regressor model performance:
Mean absolute error(MAE) = 1.78
Mean squared error(MSE) = 3.89
Median absolute error = 2.01
Explain variance score = -0.09
R2 score = -0.09
ML回归算法的类型 (Types of ML Regression Algorithms)
The most useful and popular ML regression algorithm is Linear regression algorithm which further divided into two types namely −
最有用和最受欢迎的ML回归算法是线性回归算法,该算法进一步分为两种类型,即-
We will discuss about it and implement it in Python in the next chapter.
我们将在下一章讨论并在Python中实现它。
应用领域 (Applications)
The applications of ML regression algorithms are as follows −
ML回归算法的应用如下-
Forecasting or Predictive analysis − One of the important uses of regression is forecasting or predictive analysis. For example, we can forecast GDP, oil prices or in simple words the quantitative data that changes with the passage of time.
预测或预测分析 -回归的重要用途之一是预测或预测分析。 例如,我们可以预测GDP,石油价格或简单地说随时间推移而变化的定量数据。
Optimization − We can optimize business processes with the help of regression. For example, a store manager can create a statistical model to understand the peek time of coming of customers.
优化 -我们可以借助回归来优化业务流程。 例如,商店经理可以创建统计模型以了解顾客来访的时间。
Error correction − In business, taking correct decision is equally important as optimizing the business process. Regression can help us to take correct decision as well in correcting the already implemented decision.
纠错 -在业务中,做出正确的决定与优化业务流程同等重要。 回归可以帮助我们做出正确的决定,也可以纠正已经实施的决定。
Economics − It is the most used tool in economics. We can use regression to predict supply, demand, consumption, inventory investment etc.
经济学 -这是经济学中最常用的工具。 我们可以使用回归来预测供应,需求,消耗,库存投资等。
Finance − A financial company is always interested in minimizing the risk portfolio and want to know the factors that affects the customers. All these can be predicted with the help of regression model.
金融 -金融公司始终对最小化风险投资组合感兴趣,并希望了解影响客户的因素。 所有这些都可以借助回归模型进行预测。
翻译自: https://www.tutorialspoint.com/machine_learning_with_python/regression_algorithms_overview.htm