获取sklearn中节点的决策路径

2023-11-21

我想要 scikit-learn 决策树 (DecisionTreeClassifier) 中从根节点到给定节点（我提供）的决策路径（即规则集）。clf.decision_path指定样本经过的节点，这可能有助于获取样本遵循的规则集，但是如何获取直到树中特定节点的规则集？

对于节点的决策规则，使用`iris dataset`:

from sklearn.datasets import load_iris
from sklearn import tree
import graphviz 

iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

dot_data = tree.export_graphviz(clf, out_file=None, 
                                feature_names=iris.feature_names,  
                                class_names=iris.target_names,  
                                filled=True, rounded=True,  
                                special_characters=True)  
graph = graphviz.Source(dot_data)  
#this will create an iris.pdf file with the rule path
graph.render("iris")

对于基于样本的路径，请使用：

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

estimator = DecisionTreeClassifier(max_leaf_nodes=3, random_state=0)
estimator.fit(X_train, y_train)

# The decision estimator has an attribute called tree_  which stores the entire
# tree structure and allows access to low level attributes. The binary tree
# tree_ is represented as a number of parallel arrays. The i-th element of each
# array holds information about the node `i`. Node 0 is the tree's root. NOTE:
# Some of the arrays only apply to either leaves or split nodes, resp. In this
# case the values of nodes of the other type are arbitrary!
#
# Among those arrays, we have:
#   - left_child, id of the left child of the node
#   - right_child, id of the right child of the node
#   - feature, feature used for splitting the node
#   - threshold, threshold value at the node

n_nodes = estimator.tree_.node_count
children_left = estimator.tree_.children_left
children_right = estimator.tree_.children_right
feature = estimator.tree_.feature
threshold = estimator.tree_.threshold

# The tree structure can be traversed to compute various properties such
# as the depth of each node and whether or not it is a leaf.
node_depth = np.zeros(shape=n_nodes, dtype=np.int64)
is_leaves = np.zeros(shape=n_nodes, dtype=bool)
stack = [(0, -1)]  # seed is the root node id and its parent depth
while len(stack) > 0:
    node_id, parent_depth = stack.pop()
    node_depth[node_id] = parent_depth + 1

    # If we have a test node
    if (children_left[node_id] != children_right[node_id]):
        stack.append((children_left[node_id], parent_depth + 1))
        stack.append((children_right[node_id], parent_depth + 1))
    else:
        is_leaves[node_id] = True

print("The binary tree structure has %s nodes and has "
      "the following tree structure:"
      % n_nodes)
for i in range(n_nodes):
    if is_leaves[i]:
        print("%snode=%s leaf node." % (node_depth[i] * "\t", i))
    else:
        print("%snode=%s test node: go to node %s if X[:, %s] <= %s else to "
              "node %s."
              % (node_depth[i] * "\t",
                 i,
                 children_left[i],
                 feature[i],
                 threshold[i],
                 children_right[i],
                 ))
print()

# First let's retrieve the decision path of each sample. The decision_path
# method allows to retrieve the node indicator functions. A non zero element of
# indicator matrix at the position (i, j) indicates that the sample i goes
# through the node j.

node_indicator = estimator.decision_path(X_test)

# Similarly, we can also have the leaves ids reached by each sample.

leave_id = estimator.apply(X_test)

# Now, it's possible to get the tests that were used to predict a sample or
# a group of samples. First, let's make it for the sample.

# HERE IS WHAT YOU WANT
sample_id = 0
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
                                    node_indicator.indptr[sample_id + 1]]

print('Rules used to predict sample %s: ' % sample_id)
for node_id in node_index:

    if leave_id[sample_id] == node_id:  # <-- changed != to ==
        #continue # <-- comment out
        print("leaf node {} reached, no decision here".format(leave_id[sample_id])) # <--

    else: # < -- added else to iterate through decision nodes
        if (X_test[sample_id, feature[node_id]] <= threshold[node_id]):
            threshold_sign = "<="
        else:
            threshold_sign = ">"

        print("decision id node %s : (X[%s, %s] (= %s) %s %s)"
              % (node_id,
                 sample_id,
                 feature[node_id],
                 X_test[sample_id, feature[node_id]], # <-- changed i to sample_id
                 threshold_sign,
                 threshold[node_id]))

这将在最后打印以下内容：

Rules used to predict sample 0: decision id node 0 : (X[0, 3] (= 2.4) > 0.800000011920929) decision id node 2 : (X[0, 2] (= 5.1) > 4.949999809265137) leaf node 4 reached, no decision here

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

scikitlearn

Decisiontree

sklearnpandas

获取sklearn中节点的决策路径的相关文章

Django：模拟模型上的字段

如何将模拟对象分配给该模型上的用户字段无论如何都要绕过 SomeModel user 必须是 User 实例检查吗 class SomeModel models Model user models ForeignKey User 我不会
Python setuptools：如何在 setup.py 中添加私有存储库 (gitlab)？

我上传了 2 个包它们位于我的 gitlab 存储库中如果我想使用 pip 将它们安装在我的系统中这很容易因为 gitlab 可以帮助您 https docs gitlab com ee user packages pypi rep
for 循环如何评估其参数

我的问题很简单 Does a for循环评估它每次使用的参数 Such as for i in range 300 python 是否会为此循环的每次迭代创建一个包含 300 个项目的列表如果是的话这是避免这种情况的方法吗 lst ra
将 numpy 数组写入文本文件的速度

我需要将一个非常高的两列数组写入文本文件而且速度非常慢我发现如果我将数组改造成更宽的数组写入速度会快得多例如 import time import numpy as np dataMat1 np random rand 1000
更新 Sqlalchemy 中的多个列

我有一个在 Flask 上运行的应用程序并使用 sqlalchemy 与数据库交互我想用用户指定的值更新表的列我正在使用的查询是 def update table value1 value2 value3 query update T
在 macOS 中通过 Python 访问进程的压缩 RAM（顶部的 CMPRS）的方法？

我试图弄清楚如何从 Python 访问任何给定进程占用的实际 RAM 量我发现 psutil Process PID memory info rss 工作得很好直到操作系统决定开始压缩某些进程的 RAM 然后所有的 memory in
更改 Altair 中的构面标题位置？

如何将方面标题在本例中为年份移动到每个图的上方默认值似乎位于图表的一侧这可以轻易改变吗 import altair as alt from vega datasets import data df data seattle weat
Apache Spark 中的高效字符串匹配

我使用 OCR 工具从屏幕截图中提取文本每个大约 1 5 句话然而当手动验证提取的文本时我注意到时不时会出现一些错误鉴于文本你好我真的很喜欢 Spark 我注意到 1 像 I 和 l 这样的字母被替换 2 表情符号未被正确提
`list()` 被认为是一个函数吗？

list显然是内置类型 https docs python org 3 library stdtypes html list在Python中我看到底下有一条评论this https stackoverflow com a 53645813
如何使用 Django 项目设置 SQLite？

我已阅读 Django 文档仅供参考 https docs djangoproject com en 1 3 intro tutorial01 https docs djangoproject com en 1 3 intro tutor
如何使用 sys.path.append 在 Python 中导入文件？

我的桌面上有两个目录 DIR1 and DIR2其中包含以下文件 DIR1 file1 py DIR2 file2 py myfile txt 这些文件包含以下内容 file1 py import sys sys path append s
将文本注释到轴并对齐为圆

我正在尝试在轴上绘制文本并将该文本与圆对齐更准确地说有一些具有不同坐标 x y 的点位于该圆内并使用以下命令创建 ax scatter x y s 100 我想用圆圈连接并标记每个点 Cnameb 文本的坐标由 xp yp 定义因此
Python 类型安全吗？

根据维基百科 https en wikipedia org wiki Type system Type safety and memory safety 如果一种语言不允许违反类型系统规则的操作或转换计算机科学家就认为该语言是类型安全的
如何将回溯/sys.exc_info() 值保存在变量中？

我想将错误名称和回溯详细信息保存到变量中这是我的尝试 import sys try try print x except Exception ex raise NameError except Exception er print 0 s
如何使用 matplotlib 为圆柱体的每个单独面添加颜色

我正在尝试为圆柱体的每个面着色但是我不确定如何进行我尝试了以下方法 for i in range 10 col append for i in range 10 for j in range 20 col i append plt cm
Pandas - 合并数据框以将所有值保留在左侧，如果“左侧没有键”，则从右侧“插入”值，否则“更新”左侧现有的“键”

我有两个数据框 df1 和 df2 np random seed 0 df1 pd DataFrame key A B C D id 2 23 234 2345 2021 np random randn 4 df2 pd DataFrame
Django 模型：如何使用 mixin 类来覆盖 django 模型以实现 save 等功能

我想在每次保存模型之前验证值所以我必须重写保存函数代码几乎是一样的我想把它写在 mixin 类中但失败了我不知道如何写 super func 我英语不好抱歉 class SyncableMixin object def sav
如何循环遍历字典列表并打印特定键的值？

我是 Python 新手有一个问题我知道这是一个非常简单的问题运行Python 3 4 我有一个需要迭代并提取特定信息的列表以下是列表称为部分的示例已截断数千个项目 state DEAD id phwl type name
python sklearn中的fit方法

我问自己关于 sklearn 中拟合方法的各种问题问题1 当我这样做时 from sklearn decomposition import TruncatedSVD model TruncatedSVD svd 1 model fit X
长/宽数据到宽/长

我有一个数据框如下所示 import pandas as pd d decil 1 decil 1 decil 2 decil 2 decil 3 decil 3 decil kommune AA BB AA BB AA BB 2010

随机推荐

Windows 事件查看器锁定了我的 EXE 文件

我对某件事很好奇我正在开发一个 Windows 服务并将所有诊断事件记录到 Windows 事件日志中因此当服务运行时我打开事件查看器从管理工具来查看服务运行的结果除了当我需要卸载程序时再次出于测试目的这非常有效出于某种
是否使用辅助角色或 Web 角色：Windows Azure

我正在编写一个小型计算程序对 blob 文件进行大量读取操作我应该选择工作者角色还是网络角色 Web 角色和辅助角色之间的唯一区别是在 Web 角色中 IIS 实际上是托管 Web 核心启动并指向您的应用程序数据目录您仍然可以将代
如果上次修改日期已经过了某个时间，我如何告诉 Camel 仅复制文件？

我想知道这是否可以用 Apache Camel 来实现我想做的是让 Camel 查看文件目录并只复制上次修改日期比某个日期更新的文件例如仅复制 2014 年 2 月 7 日之后修改的文件基本上我想在每次 Camel 运行时更
查找 .NET 程序集中的字节偏移量

我正在尝试调试客户向我们报告的错误堆栈跟踪只有字节偏移量没有行号 e g NullReferenceException 未将对象引用设置为对象的实例 Foo Bar FooFoo p 32Foo BarBar 191Foo BarBar
测试立即失败，并出现未知错误：通过 systemd 运行 Selenium 网格时，DevToolsActivePort 文件不存在

我一直在尝试改变从 shell 脚本启动 Selenium 网格服务的方式 rclocal to a systemd服务但不起作用脚本是这样的 bin bash java jar opt selenium server standalo
关于C++默认值的一些问题

我对函数参数列表中的默认值有一些疑问默认值是签名的一部分吗默认参数的参数类型怎么样默认值存储在哪里在堆栈或全局堆中还是在常量数据段中否默认argument不是签名的一部分也不是函数类型的一部分参数类型是签名的一部分但默认参
传递所有适用类型的函数

我遵循了发现的建议here定义一个名为 square 的函数然后尝试将其传递给一个名为两次的函数函数定义如下 def square T n T implicit numeric Numeric T T numeric times n n
在 Linux 内核模块中读/写文件

我知道所有关于为什么不应该从内核读取写入文件的讨论而是如何使用 proc or netlink要做到这一点无论如何我想读写我也读过让我发疯你永远不应该在内核中做的事情然而问题是2 6 30不导出sys read 相反它被包裹
我是否需要在 C++ 线程中使用整数锁定

如果我在多个线程中访问单个整数类型例如 long int bool 等我是否需要使用同步机制例如互斥体来锁定它们我的理解是作为原子类型我不需要锁定对单个线程的访问但我看到很多代码确实使用了锁定对此类代码进行分析表明使用锁
DB2 中的 SQL Server 事务相当于什么？

DB2 中的以下 SQL Server 语句等效于什么开始交易提交交易回滚事务答案实际上比这里指出的要复杂一些确实事务是 ANSI 标准化的而 DB2may支持他们 DB2 for z OS 与其他变体 LUW Linux U
重置 IRB 控制台

如何告别所有定义的常量对象等in an irb会话回到干净的状态经过 in 我的意思是不操纵子会话 Type exec 0 在您的 IRB 控制台会话中
UIView 纵横比混淆了 systemLayoutSizeFittingSize

好吧另一个 UITableViewCell 动态高度问题但有一点点扭曲不幸的是我只能在发布时跳转到iOS 8 否则问题就解决了需要 iOS gt 7 1 我试图实现一个单元格单元格顶部有两个图像下面有一个标题标签下面有一个描述
如何在Sql commandText中传递int参数

如何像SQL命令参数一样传递整数值我正在尝试这样 cmd CommandText insert questions cmd Parameters AddWithValue store result store result cmd Par
使用 DirectoryIterator 对文件进行排序

我正在创建一个目录列出 lighttpd 的 PHP5 脚本在给定的目录中我希望能够列出直接子目录和文件带有信息快速搜索后目录迭代器似乎是我的朋友 foreach new DirectoryIterator as file ec
移动网站设计

我刚刚使用样式表即 media print 等向网站添加了打印功能并且想知道是否可以使用类似的方法来添加对移动设备的支持如果没有我如何检测移动设备我的页面是 C aspx 我想缩小页面以便于在移动设备上使用对我有什么建议吗编
如何在静态类中使用IHttpContextAccessor设置cookie

我正在尝试创建一个通用的addReplaceCookie静态类中的方法该方法看起来像这样 public static void addReplaceCookie string cookieName string cookieValue i
如何在Python中解析带有'+'的标签

当我尝试编译此内容时出现无重复错误 search re compile r a zA Z0 9 s a zA Z0 9 test re I 问题是号我该怎么处理 re compile r a zA Z0 9 s a zA Z0 9
AVPlayer 不会在 iOS9 中播放来自 url 的视频

我试图在 UIView 中嵌入 AVPlayer 并从 url 播放 mp4 视频文件问题是我只收到黑色空白视图参见屏幕截图在以前的 iOS 版本中它对我有用但自从升级到 iOS9 后我遇到了这个问题我的 h 文件如下所示 i
在 ASP.NET 中生成 PDF 文档[重复]

这个问题在这里已经有答案了可能的重复直接将 aspx 转换为 pdf 有没有办法直接从页面输出从asp net生成PDF文档我的要求是当用户访问我网站上的页面时应该有一个条款可以获取 PDF 格式的相同页面报告使用iTextS
获取sklearn中节点的决策路径

我想要 scikit learn 决策树 DecisionTreeClassifier 中从根节点到给定节点我提供的决策路径即规则集 clf decision path指定样本经过的节点这可能有助于获取样本遵循的规则集但是如何获取

获取sklearn中节点的决策路径

对于节点的决策规则，使用iris dataset:

对于基于样本的路径，请使用：

这将在最后打印以下内容：

获取sklearn中节点的决策路径 的相关文章

随机推荐

热门标签

对于节点的决策规则，使用`iris dataset`:

获取sklearn中节点的决策路径的相关文章