sklearn K近邻KNeighborsClassifier参数详解

2023-11-18

【原文网址】https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5weights=’uniform’algorithm=’auto’leaf_size=30p=2metric=’minkowski’metric_params=Nonen_jobs=None**kwargs)[source]

Classifier implementing the k-nearest neighbors vote.

Read more in the User Guide.

Parameters:

n_neighbors : int, optional (default = 5)

Number of neighbors to use by default for kneighbors queries.

weights : str or callable, optional (default = ‘uniform’)

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree
  • ‘kd_tree’ will use KDTree
  • ‘brute’ will use a brute-force search.
  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric : string or callable, default ‘minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.

metric_params : dict, optional (default = None)

Additional keyword arguments for the metric function.

n_jobs : int or None, optional (default=None)

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit method.

See also

 

RadiusNeighborsClassifierKNeighborsRegressorRadiusNeighborsRegressor,NearestNeighbors

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

 

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

>>>

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y) 
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[0.66666667 0.33333333]]

Methods

fit(X, y) Fit the model using X as training data and y as target values
get_params([deep]) Get parameters for this estimator.
kneighbors([X, n_neighbors, return_distance]) Finds the K-neighbors of a point.
kneighbors_graph([X, n_neighbors, mode]) Computes the (weighted) graph of k-Neighbors for points in X
predict(X) Predict the class labels for the provided data
predict_proba(X) Return probability estimates for the test data X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(**params) Set the parameters of this estimator.

__init__(n_neighbors=5weights=’uniform’algorithm=’auto’leaf_size=30p=2metric=’minkowski’metric_params=Nonen_jobs=None**kwargs)[source]

fit(Xy)[source]

Fit the model using X as training data and y as target values

Parameters:

X : {array-like, sparse matrix, BallTree, KDTree}

Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’.

y : {array-like, sparse matrix}

Target values of shape = [n_samples] or [n_samples, n_outputs]

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

kneighbors(X=Nonen_neighbors=Nonereturn_distance=True)[source]

Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors to get (default is the value passed to the constructor).

return_distance : boolean, optional. Defaults to True.

If False, distances will not be returned

Returns:

dist : array

Array representing the lengths to points, only present if return_distance=True

ind : array

Indices of the nearest points in the population matrix.

Examples

In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]

>>>

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]])) 
(array([[0.5]]), array([[2]]))

As you can see, it returns [[0.5]], and [[2]], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:

>>>

>>> X = [[0., 1., 0.], [1., 0., 1.]]
>>> neigh.kneighbors(X, return_distance=False) 
array([[1],
       [2]]...)

kneighbors_graph(X=Nonen_neighbors=Nonemode=’connectivity’)[source]

Computes the (weighted) graph of k-Neighbors for points in X

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors for each sample. (default is value passed to the constructor).

mode : {‘connectivity’, ‘distance’}, optional

Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are Euclidean distance between points.

Returns:

A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit]

n_samples_fit is the number of samples in the fitted data A[i, j] is assigned the weight of edge that connects i to j.

See also

 

NearestNeighbors.radius_neighbors_graph

Examples

>>>

>>> X = [[0], [3], [1]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=2)
>>> neigh.fit(X) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> A = neigh.kneighbors_graph(X)
>>> A.toarray()
array([[1., 0., 1.],
       [0., 1., 1.],
       [1., 0., 1.]])

predict(X)[source]

Predict the class labels for the provided data

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

Test samples.

Returns:

y : array of shape [n_samples] or [n_samples, n_outputs]

Class labels for each data sample.

predict_proba(X)[source]

Return probability estimates for the test data X.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

Test samples.

Returns:

p : array of shape = [n_samples, n_classes], or a list of n_outputs

of such arrays if n_outputs > 1. The class probabilities of the input samples. Classes are ordered by lexicographic order.

score(Xysample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:

score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:

self

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

sklearn K近邻KNeighborsClassifier参数详解 的相关文章

随机推荐

  • Java IO流 缓冲流-BufferedInputStream、BufferedOutputStream

    首先抛出一个问题 有了InputStream为什么还要有BufferedInputStream BufferedInputStream和BufferedOutputStream这两个类分别是FilterInputStream和FilterO
  • 将Python脚本编译为so文件的方法,并实现调用

    本文以Linux系统 Ubuntu 为例 讲解如何将自己的Python程序 py文件 加密为 so文件 1 安装必要的工具 首先 我们需要在Ubuntu系统中安装一些准备工具 包括python3 dev gcc Cython 其中Cytho
  • lua环境搭建数据类型

    lua作为一门计算机语言 从语法角度个人感觉还是挺简洁的接下来我们从0开始学习lua语言 1 首先我们需要下载lua开发工具包 在这里我们使用的工具是luadist 下载链接为 https luadist org repository 下载
  • 2023年每天都投递很多份简历,但都石沉大海,我还投吗?测试人该何去何从?

    各大互联网公司的接连裁员 政策限制的行业接连消失 让今年的求职雪上加霜 想躺平却没有资本 还有人说软件测试岗位饱和了 对此很多求职者深信不疑 因为投出去的简历回复的越来越少了 另一面企业招人真的变得容易了吗 有企业HR吐槽 简历确实比以前多
  • 销售、售前、项目实施不同的培训要求

    产品部门对于不同的岗位 培训要有不同的针对性 不能搞一刀切 针对销售部门 培训的要求和考核的要求 知其然 即知道产品的功能 性能 优势 针对售前部门 培训的要求和考核的要求 知其然 知起所以然 即要知道产品的 然 更要知道 然 从何来 优势
  • Linux操作系统的题目联系及解析

    一 创建文件命令练习 1 在 目录下创建一个临时目录test 这个比较基础 就是考创建 利用mkdir就能完成 如 2 在临时目录test下创建五个文件 文件名分别为passwd group bashrc profile sshd conf
  • 如何判断网页是否使用了Ajax

    方法一 一次AJAX请求头如下 一次普通get请求如下 方法2 使用JS插件查看是不是异步加载 方法3
  • 操作系统中的作业、程序、进程

    作业 作业是用户向计算机提交任务的任务实体 是要求计算机系统所做工作的集合 在用户向计算机提交作业后 系统将它放入外存中的作业等待队列中等待执行 它包括程序 数据及其作业说明书 程序 程序是为解决一个信息处理任务而预先编制的工作执行方案 是
  • 最热门的大数据技术

    大数据已经融入到各行各业 哪些大数据技术是最受欢迎 哪些大数据技术潜力巨大 对10个最热门的大数据技术的介绍 一 预测分析 预测分析是一种统计或数据挖掘解决方案 包含可在结构化和非结构化数据中使用以确定未来结果的算法和技术 可为预测 优化
  • LeetCode 2391. 收集垃圾的最少总时间

    给你一个下标从 0 开始的字符串数组 garbage 其中 garbage i 表示第 i 个房子的垃圾集合 garbage i 只包含字符 M P 和 G 但可能包含多个相同字符 每个字符分别表示一单位的金属 纸和玻璃 垃圾车收拾 一 单
  • Qt离线安装MSVC方法

    安装好Qt后 有时候需要用到MSVC编译环境 如果电脑连接了互联网 直接下载安装器在线安装即可 那么需要为没有联网的电脑安装MSVC时 就需要采用下载离线安装包 离线安装的方法 MSVC安装器下载地址 MSVC2019 https visu
  • MTCNN代码解读

    首先了解MTCNN算法 理论基础 正如上图所示 该MTCNN由3个网络结构组成 P Net R Net O Net Proposal Network P Net 该网络结构主要获得了人脸区域的候选窗口和边界框的回归向量 并用该边界框做回归
  • Apache和Nginx虚拟机的配置方法+跨域知识点整理

    Apache的配置 ip 创建虚拟主机目录 新建测试页面 修改主配置文件 root hya vim etc httpd conf httpd conf 在主配置文件的最下面添加
  • Vue3优雅地监听localStorage变化

    目录 前言 为什么要这样做 思路 实现 实现中介者模式 重写localStorage 实现useStorage hook 测试 使用localStorage 监听localStorage变化 结果 前言 最近在研究框架 也仔细用了Vue3一
  • 搜索引擎使用技巧详解

    说到搜索 这可能是我们每个网民每天都要用到的操作 这个操作看起来很简单 一般用户都是想搜什么就输入什么 然后一按搜索就直接开始 这是最简单最快速的方法 但可能并不是最有效的方法 要想搜索结果最合乎你的意愿 IT 之家建议你掌握如下 8 个技
  • 第十三课,深度测试

    开启深度测试 glEnable GL DEPTH TEST 清除深度缓存 glClear GL COLOR BUFFER BIT GL DEPTH BUFFER BIT 深度测试函数 OpenGL允许我们禁用深度缓冲的写入 只需要设置它的深
  • xshell无法连接vmware虚拟机

    一 问题描述 本机使用Xshell无法连接VMware中的虚拟机 并且从本机也无法ping通虚拟机 虚拟机也无法ping通本机物理机 二 环境 场景 物理机 windows10系统 Xshell 6 VMware Workstation 1
  • linux 下的 iptables/ netfilter 防火墙 深度理解 前篇

    一 概述 iptables 其实不是真正的防火墙 我们可以把它理解为一个客户端代理 用户通过iptables 这个代理 将用户的安全设置执行到对应的 安全框架 中 这个安全框架才是真正的防火墙 这个框架的名称叫做netfilter 二 五链
  • 服务器虚拟化导出快照,ESXi5 PACS服务器虚拟化系统快照数据恢复

    杭州某国有企业 一台ESXi5 1 虚拟化系统中运行一重要的PACS服务的虚拟机 因为之前做了快照 管理员在误还原快照后 数据回到3个月前 数据很重要 管理员在尝试多种方式后 也无法补救数据 后通过集成商介绍 联系到了北京安数云和科技 北京
  • sklearn K近邻KNeighborsClassifier参数详解

    原文网址 https scikit learn org stable modules generated sklearn neighbors KNeighborsClassifier html class sklearn neighbors