sklearn K近邻KNeighborsClassifier参数详解

2023-11-18

【原文网址】https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=None, **kwargs)[source]

Classifier implementing the k-nearest neighbors vote.

Parameters:	n_neighbors : int, optional (default = 5) Number of neighbors to use by default for `kneighbors` queries. weights : str or callable, optional (default = ‘uniform’) weight function used in prediction. Possible values: ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally. ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away. [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights. algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional Algorithm used to compute the nearest neighbors: ‘ball_tree’ will use `BallTree` ‘kd_tree’ will use `KDTree` ‘brute’ will use a brute-force search. ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to `fit` method. Note: fitting on sparse input will override the setting of this parameter, using brute force. leaf_size : int, optional (default = 30) Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. p : integer, optional (default = 2) Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. metric : string or callable, default ‘minkowski’ the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics. metric_params : dict, optional (default = None) Additional keyword arguments for the metric function. n_jobs : int or None, optional (default=None) The number of parallel jobs to run for neighbors search. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors. See Glossary for more details. Doesn’t affect `fit` method.

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

>>>

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y) 
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[0.66666667 0.33333333]]

Methods

`fit`(X, y)	Fit the model using X as training data and y as target values
`get_params`([deep])	Get parameters for this estimator.
`kneighbors`([X, n_neighbors, return_distance])	Finds the K-neighbors of a point.
`kneighbors_graph`([X, n_neighbors, mode])	Computes the (weighted) graph of k-Neighbors for points in X
`predict`(X)	Predict the class labels for the provided data
`predict_proba`(X)	Return probability estimates for the test data X.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.

__init__(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=None, **kwargs)[source]

fit(X, y)[source]

Fit the model using X as training data and y as target values

Parameters:	X : {array-like, sparse matrix, BallTree, KDTree} Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’. y : {array-like, sparse matrix} Target values of shape = [n_samples] or [n_samples, n_outputs]

Parameters:

X : {array-like, sparse matrix, BallTree, KDTree}

Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’.

y : {array-like, sparse matrix}

Target values of shape = [n_samples] or [n_samples, n_outputs]

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

kneighbors(X=None, n_neighbors=None, return_distance=True)[source]

Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor. n_neighbors : int Number of neighbors to get (default is the value passed to the constructor). return_distance : boolean, optional. Defaults to True. If False, distances will not be returned
Returns:	dist : array Array representing the lengths to points, only present if return_distance=True ind : array Indices of the nearest points in the population matrix.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors to get (default is the value passed to the constructor).

return_distance : boolean, optional. Defaults to True.

If False, distances will not be returned

Returns:

dist : array

Array representing the lengths to points, only present if return_distance=True

ind : array

Indices of the nearest points in the population matrix.

Examples

In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]

>>>

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]])) 
(array([[0.5]]), array([[2]]))

As you can see, it returns [[0.5]], and [[2]], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:

>>>

>>> X = [[0., 1., 0.], [1., 0., 1.]]
>>> neigh.kneighbors(X, return_distance=False) 
array([[1],
       [2]]...)

kneighbors_graph(X=None, n_neighbors=None, mode=’connectivity’)[source]

Computes the (weighted) graph of k-Neighbors for points in X

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor. n_neighbors : int Number of neighbors for each sample. (default is value passed to the constructor). mode : {‘connectivity’, ‘distance’}, optional Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are Euclidean distance between points.
Returns:	A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit] n_samples_fit is the number of samples in the fitted data A[i, j] is assigned the weight of edge that connects i to j.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors for each sample. (default is value passed to the constructor).

mode : {‘connectivity’, ‘distance’}, optional

Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are Euclidean distance between points.

Returns:

A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit]

n_samples_fit is the number of samples in the fitted data A[i, j] is assigned the weight of edge that connects i to j.

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ Test samples.
Returns:	y : array of shape [n_samples] or [n_samples, n_outputs] Class labels for each data sample.

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ Test samples.
Returns:	p : array of shape = [n_samples, n_classes], or a list of n_outputs of such arrays if n_outputs > 1. The class probabilities of the input samples. Classes are ordered by lexicographic order.

Parameters:	X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights.
Returns:	score : float Mean accuracy of self.predict(X) wrt. y.

Returns:	self

sklearn K近邻KNeighborsClassifier参数详解的相关文章

KMeans 是否会在 sklearn 中自动标准化特征

我想知道 KMeans 是否在进行聚类之前自动标准化特征似乎没有选项可以提供输入来要求标准化其中之一是区分数据预处理标准化分箱加权等和机器学习算法应用使用sklearn preprocessing http scikit le
在 scikit-learn Pipeline 中插入或删除步骤

是否可以删除或插入步骤sklearn pipeline Pipeline object 我正在尝试在 Pipeline 对象中有或没有一步进行网格搜索想知道我是否可以在管道中插入或删除一个步骤我看到在Pipeline源代码有一个sel
如何在Python中使用保存模型进行预测

我正在 python 中进行文本分类我想在生产环境中使用它来对新文档进行预测我正在使用 TfidfVectorizer 来构建 bagofWord 我在做 X train vectorizer fit transform clean d
在处理 VotingClassifier 或网格搜索时，Sklearn 中的 GradientBoostingClassifier 是否有类别权重（或替代方法）？

我正在使用 GradientBoostingClassifier 来处理不平衡的标记数据集 Sklearn 中似乎不存在类权重作为该分类器的参数我发现我可以在合适时使用sample weight 但在处理VotingClassifier
在SURF中使用欧几里德距离

在我的代码中我根据最近邻距离比过滤好图像如下所示 for int i 0 i lt min des image rows 1 int matches size i if matches i 0 distance lt 0 6 match
Scorer函数：make_scorer/score_func和的区别

在 scikit 0 18 1 文档中我发现接下来的内容有点令人困惑似乎可以通过多种方式编写自己的评分函数但有什么区别呢网格搜索简历 http scikit learn org stable modules generated sk
VotingClassifier：不同的功能集

在我的例子中我有两个不同的功能集因此行数相同且标签相同 DataFrames df1 A B C 1 4 2 1 4 8 2 1 1 2 3 0 3 2 5 df2 E F 6 1 1 3 8 1 2 8 5 2 labels lab
每次使用 scikit 运行线性回归时都会得到不同的结果

您好我有一个正在尝试优化的线性回归模型我正在优化指数移动平均线的跨度以及回归中使用的滞后变量的数量然而我不断发现结果和计算的均方误差不断得出不同的最终结果不知道为什么有人可以帮忙启动循环后的流程 1 使用三个变量创建新的数据框
如何在 jupyter 笔记本中导入 scikit-learn？

我创建了一个新的 conda 环境来使用 scikit learn 并使用conda install
将 OneClassSVM 与 GridSearchCV 结合使用

我正在尝试在 OneClassSVM 上执行 GridSearchCV 函数但我似乎无法找到 OCSVM 的正确评分方法根据我收集的信息像 OneClassSVM score 这样的东西不存在因此 GridSearchCV 中没有所
GridSearchCV：每次函数完成循环时打印一些表达式

假设你有一些功能function在Python中通过循环工作例如它可以是一个计算某个数学表达式的函数例如x 2 对于数组中的所有元素例如 1 2 100 显然这是一个玩具示例是否可以编写这样的代码每次function经过一个循环
混淆矩阵错误“分类指标无法处理多标签指标和多类目标的混合”

我得到了 Classification metrics can t handle a mix of multilabel indicator and multiclass targets 当我尝试使用混淆矩阵时出错我正在做我的第一个深度学
无法在 OS X 上安装 scikit-learn

我无法安装scikit学习 http scikit learn org stable 我可以通过从源代码构建或通过 pip 来安装其他软件包没有任何问题对于 scikit learn 我尝试在 GitHub 上克隆项目并通过 pip 安
找到分类的重要特征

我正在尝试使用逻辑回归模型对一些脑电图数据进行分类这似乎给出了我的数据的最佳分类我拥有的数据来自多通道 EEG 设置因此本质上我有一个 63 x 116 x 50 的矩阵即通道 x 时间点 x 试验次数有两种试验类型均为 50
使用基于 ConvLSTM2D 的 Keras 模型从较低分辨率图像估计高分辨率图像

我正在尝试使用以下内容ConvLSTM2D从低分辨率图像序列估计高分辨率图像序列的架构 import numpy as np scipy ndimage matplotlib pyplot as plt from keras models
是否可以使用 Google BERT 来计算两个文本文档之间的相似度？

是否可以使用 Google BERT 来计算两个文本文档之间的相似度据我了解 BERT 的输入应该是有限大小的句子一些作品使用 BERT 来计算句子的相似度例如 https github com AndriyMulyar semant
如何使用sklearn Pipeline和FeatureUnion选择多个（数字和文本）列进行文本分类？

我开发了一个用于多标签分类的文本模型这OneVsRest分类器 http scikit learn org stable modules generated sklearn multiclass OneVsRestClassifier h
sklearn pipeline + keras顺序模型-如何获取历史记录？

Keras https keras io模型当 fit被调用时返回一个历史对象如果我将此模型用作 sklearn 管道的一步是否可以检索它顺便说一句我正在使用 python 3 6 提前致谢 History 回调记录每个时期的训
我的 R 平方分数为负，但使用 k 倍交叉验证的准确度分数约为 92%

对于下面的代码我的 r 平方分数为负但使用 k 折交叉验证的准确度分数为 92 这怎么可能我使用随机森林回归算法来预测一些数据数据集的链接在下面的链接中给出 https www kaggle com ludobenistant hr
Scikit Learn - K-Means - 肘部 - 标准

今天我想学习一些关于 K means 的知识我已经了解该算法并且知道它是如何工作的现在我正在寻找正确的 k 我发现肘部准则作为检测正确的 k 的方法但我不明白如何将它与 scikit learn 一起使用在 scikit learn

随机推荐

Java IO流缓冲流-BufferedInputStream、BufferedOutputStream

首先抛出一个问题有了InputStream为什么还要有BufferedInputStream BufferedInputStream和BufferedOutputStream这两个类分别是FilterInputStream和FilterO
将Python脚本编译为so文件的方法，并实现调用

本文以Linux系统 Ubuntu 为例讲解如何将自己的Python程序 py文件加密为 so文件 1 安装必要的工具首先我们需要在Ubuntu系统中安装一些准备工具包括python3 dev gcc Cython 其中Cytho
lua环境搭建数据类型

lua作为一门计算机语言从语法角度个人感觉还是挺简洁的接下来我们从0开始学习lua语言 1 首先我们需要下载lua开发工具包在这里我们使用的工具是luadist 下载链接为 https luadist org repository 下载
2023年每天都投递很多份简历，但都石沉大海，我还投吗？测试人该何去何从？

各大互联网公司的接连裁员政策限制的行业接连消失让今年的求职雪上加霜想躺平却没有资本还有人说软件测试岗位饱和了对此很多求职者深信不疑因为投出去的简历回复的越来越少了另一面企业招人真的变得容易了吗有企业HR吐槽简历确实比以前多
销售、售前、项目实施不同的培训要求

产品部门对于不同的岗位培训要有不同的针对性不能搞一刀切针对销售部门培训的要求和考核的要求知其然即知道产品的功能性能优势针对售前部门培训的要求和考核的要求知其然知起所以然即要知道产品的然更要知道然从何来优势
Linux操作系统的题目联系及解析

一创建文件命令练习 1 在目录下创建一个临时目录test 这个比较基础就是考创建利用mkdir就能完成如 2 在临时目录test下创建五个文件文件名分别为passwd group bashrc profile sshd conf
如何判断网页是否使用了Ajax

方法一一次AJAX请求头如下一次普通get请求如下方法2 使用JS插件查看是不是异步加载方法3
操作系统中的作业、程序、进程

作业作业是用户向计算机提交任务的任务实体是要求计算机系统所做工作的集合在用户向计算机提交作业后系统将它放入外存中的作业等待队列中等待执行它包括程序数据及其作业说明书程序程序是为解决一个信息处理任务而预先编制的工作执行方案是
最热门的大数据技术

大数据已经融入到各行各业哪些大数据技术是最受欢迎哪些大数据技术潜力巨大对10个最热门的大数据技术的介绍一预测分析预测分析是一种统计或数据挖掘解决方案包含可在结构化和非结构化数据中使用以确定未来结果的算法和技术可为预测优化
LeetCode 2391. 收集垃圾的最少总时间

给你一个下标从 0 开始的字符串数组 garbage 其中 garbage i 表示第 i 个房子的垃圾集合 garbage i 只包含字符 M P 和 G 但可能包含多个相同字符每个字符分别表示一单位的金属纸和玻璃垃圾车收拾一单
Qt离线安装MSVC方法

安装好Qt后有时候需要用到MSVC编译环境如果电脑连接了互联网直接下载安装器在线安装即可那么需要为没有联网的电脑安装MSVC时就需要采用下载离线安装包离线安装的方法 MSVC安装器下载地址 MSVC2019 https visu
MTCNN代码解读

首先了解MTCNN算法理论基础正如上图所示该MTCNN由3个网络结构组成 P Net R Net O Net Proposal Network P Net 该网络结构主要获得了人脸区域的候选窗口和边界框的回归向量并用该边界框做回归
Apache和Nginx虚拟机的配置方法+跨域知识点整理

Apache的配置 ip 创建虚拟主机目录新建测试页面修改主配置文件 root hya vim etc httpd conf httpd conf 在主配置文件的最下面添加
Vue3优雅地监听localStorage变化

目录前言为什么要这样做思路实现实现中介者模式重写localStorage 实现useStorage hook 测试使用localStorage 监听localStorage变化结果前言最近在研究框架也仔细用了Vue3一
搜索引擎使用技巧详解

说到搜索这可能是我们每个网民每天都要用到的操作这个操作看起来很简单一般用户都是想搜什么就输入什么然后一按搜索就直接开始这是最简单最快速的方法但可能并不是最有效的方法要想搜索结果最合乎你的意愿 IT 之家建议你掌握如下 8 个技
第十三课，深度测试

开启深度测试 glEnable GL DEPTH TEST 清除深度缓存 glClear GL COLOR BUFFER BIT GL DEPTH BUFFER BIT 深度测试函数 OpenGL允许我们禁用深度缓冲的写入只需要设置它的深
xshell无法连接vmware虚拟机

一问题描述本机使用Xshell无法连接VMware中的虚拟机并且从本机也无法ping通虚拟机虚拟机也无法ping通本机物理机二环境场景物理机 windows10系统 Xshell 6 VMware Workstation 1
linux 下的 iptables/ netfilter 防火墙深度理解前篇

一概述 iptables 其实不是真正的防火墙我们可以把它理解为一个客户端代理用户通过iptables 这个代理将用户的安全设置执行到对应的安全框架中这个安全框架才是真正的防火墙这个框架的名称叫做netfilter 二五链
服务器虚拟化导出快照,ESXi5 PACS服务器虚拟化系统快照数据恢复

杭州某国有企业一台ESXi5 1 虚拟化系统中运行一重要的PACS服务的虚拟机因为之前做了快照管理员在误还原快照后数据回到3个月前数据很重要管理员在尝试多种方式后也无法补救数据后通过集成商介绍联系到了北京安数云和科技北京
sklearn K近邻KNeighborsClassifier参数详解

原文网址 https scikit learn org stable modules generated sklearn neighbors KNeighborsClassifier html class sklearn neighbors

sklearn K近邻KNeighborsClassifier参数详解

sklearn K近邻KNeighborsClassifier参数详解 的相关文章

随机推荐

热门标签

sklearn K近邻KNeighborsClassifier参数详解的相关文章