大数据机器学习分类算法_13种用于数据科学的机器学习分类算法及其代码

2023-11-05

大数据机器学习分类算法

The roundup of most common classification algorithms along with their python and r code:

吨 他的Roundup与他们的Python和R代码一起最常见的分类算法：

Decision Tree, Naive Bayes, Gaussian Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, K Nearest Neighbours (KNN), Support Vector Machine (SVM), Linear Support Vector Classifier (SVC), Stochastic Gradient Descent (SGD) Classifier, Logistic Regression, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Fisher’s Linear Discriminant….

决策树，朴素贝叶斯，高斯朴素贝叶斯，伯努利朴素贝叶斯，多项式朴素贝叶斯，K最近邻(KNN)，支持向量机(SVM)，线性支持向量分类器(SVC)，随机梯度下降(SGD)分类器，对数回归，线性判别分析(LDA)，二次判别分析(QDA)，费舍尔线性判别 …。

Classification algorithms can be performed on a variety of data — structured and unstructured data. Classification is a technique where we divide the data into a given number of classes. The main goal of a classification problem is to identify the category or class to which a new data will fall under.

分类算法可以对多种数据(结构化和非结构化数据)执行。分类是一种将数据划分为给定数量的类的技术。分类问题的主要目标是确定新数据所属的类别或类别。

Important Terminologies encounter in machine learning — classification algorithms:

机器学习中遇到的重要术语-分类算法：

classifier: An algorithm that maps the input data to a specific category.

分类器 ：一种将输入数据映射到特定类别的算法。
classification: A model draw some conclusion from input data which is given for training purpose. It will predict class labels or categories for new data.

分类：模型从输入数据得出一些结论，这些结论是出于培训目的而给出的。它将预测新数据的类别标签或类别。
Binary classification: Classification task with two possible outcomes. Eg: Gender classification (Male / Female)

二进制分类 ：具有两个可能结果的分类任务。例如：性别分类(男/女)
Multi-class classification: Classification with more than two classes. In multi-class classification, we assigned each sample to one and only one target label. Eg: An animal can be cat or dog but not both at the same time

多类分类：具有两个以上类的分类 。在多类别分类中，我们将每个样本分配给一个且只有一个目标标签。例如：动物可以是猫或狗，但不能同时是两者
Multi-label classification: Classification task where each sample is mapped to a set of target labels (more than one class). Eg: A news article may be about sport, a person, and location at the same time.

多标签分类 ：将每个样本映射到一组目标标签(一个以上类别)的分类任务。例如：新闻文章可能同时涉及体育，人物和位置。

Any of these classification algorithms can be used to build a model that predicts the outcome class or categories for a given dataset. The datasets can come from a variety of domains. Depending upon the dimensionality of the dataset, the attribute types, sparsity, and missing values, etc. one algorithm maybe give you better predictive accuracy than most others. Let’s briefly discuss these algorithms.

这些分类算法中的任何一种都可用于构建预测给定数据集的结果类或类别的模型。数据集可以来自多种领域。取决于数据集的维数，属性类型，稀疏性和缺失值等。一种算法可能比大多数算法提供更好的预测准确性。让我们简要地讨论这些算法。

1.决策树 (1. Decision Tree)

Decision trees are very extremely intuitive ways to classify or label objects: you simply ask a series of questions designed to zero in on the classification. For example, if you wanted to build a decision tree to classify an animal you come across while on a hike, you might construct the one shown in Figure.

d ecision树是非常非常直观的方式来分类或标签对象：你只是问了一系列的分类设计为零的问题。例如，如果您想构建一个决策树以对您在远足时遇到的动物进行分类，则可以构建如图所示的动物。

Decision tree classification models can easily handle qualitative independent variables without the need to create dummy variables. Missing values are not a problem either. Interestingly, decision tree algorithms can be used for regression models as well. The same library that you used to build a classification model, can also be used to build a regression model after change ing some of the parameters.

决策树分类模型可以轻松处理定性自变量，而无需创建虚拟变量。缺少值也不是问题。有趣的是，决策树算法也可以用于回归模型。更改某些参数后，也可以使用用于构建分类模型的相同库来构建回归模型。

As the decision tree-based classification models are easy to interpret, they are not robust. One major problem with decision trees is their high variance or low bias. One small change in the training dataset can give an entirely different decision tree model.

由于基于决策树的分类模型易于解释，因此不可靠。决策树的一个主要问题是它们的高方差或低偏差。训练数据集中的一个小变化可以提供完全不同的决策树模型。

R tutorial

R教程

Python tutorial

Python教程

2.朴素贝叶斯 (2. Naive Bayes)

Naive Bayes models are a group of extremely fast and simple classification algorithms that are often suitable for very high-dimensional datasets. Because they are so fast and have so few tunable paramete

大数据 机器学习 分类算法_13种用于数据科学的机器学习分类算法及其代码

1.决策树 (1. Decision Tree)

2.朴素贝叶斯 (2. Naive Bayes)

大数据 机器学习 分类算法_13种用于数据科学的机器学习分类算法及其代码 的相关文章

随机推荐

热门标签

大数据机器学习分类算法_13种用于数据科学的机器学习分类算法及其代码

大数据机器学习分类算法_13种用于数据科学的机器学习分类算法及其代码的相关文章