【Tensorflow】No OpKernel was registered to support Op ‘NcclAllReduce‘错误

2023-05-16

问题：

使用tf.distribute.MirroredStrategy时，在windows系统多卡下出现No OpKernel was registered to support Op 'NcclAllReduce'问题（Linux下正常），具体报错信息如下：

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node training/Adam/NcclAllReduce}}with these attrs: [reduction="sum", T=DT_FLOAT, num_devices=2, shared_name="c0"]
Registered devices: [CPU, GPU]
Registered kernels:
  <no registered kernels>

	 [[training/Adam/NcclAllReduce]] [Op:__inference_keras_scratch_graph_2200]

原因：

tf.distribute.MirroredStrategy默认使用NCCL进行多卡并行，但官方NCCL不支持windows，可以找非官方的windows NCCL或弃用NCCL。

解决：

对于tf1：

nccl is only useful if there are GPU to GPU connections available in your setup. Is that the case? If yes, you could try finding a nccl binary for Windows.

If not, then it would be better to try some of the non nccl options. To get those, try the following:

Option 1:
Try using hierarchical copy.

cross_tower_ops = tf.contrib.distribute.AllReduceCrossTowerOps(
    'hierarchical_copy', num_packs=num_gpus))
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

Option 2:
Reduce to first GPU:

cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps()
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

Option 3:
Reduce to CPU:

cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps(
    reduce_to_device="/device:CPU:0")
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)

You will have to try out the 2 approaches and see which one works and gives the best performance for your use case.

@yuefengz - for use cases like this, perhaps we should detect if nccl is not available, give a warning, and default to something else that will work for sure?

参考：NCCL is not supported on Windows · Issue #21470 · tensorflow/tensorflow (github.com)

对于tf2:

tf2中cross_tower_ops 升级为 cross_device_ops，参考链接

此时上述解决方案的选项1，在tf2中使用会遇到模块找不到的问题，需要更新为：

cross_tower_ops = tf.distribute.HierarchicalCopyAllReduce()
strategy = tf.distribute.MirroredStrategy(cross_device_ops=cross_tower_ops)

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

【Tensorflow】No OpKernel was registered to support Op ‘NcclAllReduce‘错误的相关文章

Tensorflow 中的平衡准确度分数

我正在为高度不平衡的分类问题实现 CNN 并且我想在张量流中实现自定义指标以使用选择最佳模型回调具体来说我想实现平衡的准确度分数这是每个类别的召回率的平均值请参阅 sklearn 实现here https scikit lear
模块“tensorflow._api.v2.train”没有属性“GradientDescentOptimizer”

我使用Python 3 7 3并安装了tensorflow 2 0 0 alpha0 但是存在一些问题例如模块 tensorflow api v2 train 没有属性 GradientDescentOptimizer 这是我的全部代码
TensorFlow Lite 量化无法改善推理延迟

TensorFlow 网站声称量化可将移动设备上的延迟降低多达 3 倍 https www tensorflow org lite performance post training quantization https www tenso
使用张量流实现 RBM

我正在尝试用tensorflow实现RBM 代码如下 rbm py An rbm implementation for TensorFlow based closely on the one in Theano import tensorf
Tensorflow 中多维时间序列预测中的向量表示

我有一个大型数据集约 3000 万个数据点具有 5 个特征我已使用 K 均值将其减少到 200 000 个集群数据是大约 150 000 个时间步长的时间序列我想要训练模型的数据是每个时间步上特定簇的存在预测模型的目的是生成一个
在 Keras 中，如何修改每个批次的损失（使用训练期间应运行的额外代码）

使用此自定义回调我可以 1 查看训练期间的损失 2 访问正在训练的模型 class ChangeBatchLoss tf keras callbacks Callback def on train batch begin self bat
Odroid XU4 上的 Tensorflow 编译

我正在尝试在 Odroid XU4 16GB eMMc Ubuntu 16 上编译 Tensorflow 尝试了完整和精简版但出现如图所示的错误 https www dropbox com sh j86ysncze1q0eka AAB8R
导入错误：无法导入名称“transpose_shape”

我正在学习 Coursera Andrew Ng 的深度学习课程使用 YOLO 算法进行对象检测我尝试使用 Windows 和 Anaconda Navigator 在我的 PC 上运行该算法我安装了 Keras 以在 TensorF
Tensorflow：tf.get_collection 未返回范围内的变量

我正在尝试获取变量范围内的所有变量如所解释的here https stackoverflow com questions 36533723 tensorflow get all variables in scope 然而该行tf get
TensorFlow - 根据另一个变量的形状动态定义变量的形状

假设我有一定的张量x其维度未在图初始化时定义我可以使用以下方法获得它的形状 x shape tf shape input x 现在如果我想根据中定义的值创建一个变量x shape using y tf get variable vari
使用sklearn宏f1-score作为tensorflow.keras中的指标

我已经为tensorflow keras定义了自定义指标以在每个时期之后计算宏f1分数如下所示 from tensorflow import argmax as tf argmax from sklearn metric import
AttributeError：该层从未被调用，因此没有定义的输入形状

我尝试通过创建三个类在 TensorFlow 2 0 中构建自动编码器 Encoder Decoder 和 AutoEncoder 由于我不想手动设置输入形状因此我尝试从编码器的 input shape 推断解码器的输出形状 import
在 Tensorflow 中使用队列将数据馈送到网络时分开验证和训练图

我一直在做大量关于如何使用队列将数据正确输入网络的研究但是我在互联网上找不到任何解决方案目前我的代码能够读取训练数据并执行训练但无需验证和测试这里有一些重要的行构成了我的代码 images volumes utils inputs
AMD plaidml 与 CPU Tensorflow - 意外结果

我目前正在运行一个简单的脚本来训练mnist数据集通过 Tensorflow 通过我的 CPU 运行训练给了我49us sample和使用以下代码的 3e 纪元 CPU import tensorflow as tf mnist tf k
Tensorflow构建量化工具-bazel构建错误

我正在尝试编译量化脚本如下所述皮特沃登的博客 https petewarden com 2016 05 03 how to quantize neural networks with tensorflow 但是在运行以下 bazel
Keras 自定义损失函数：访问当前输入模式

在 Keras 带有 Tensorflow 后端中当前输入模式可用于我的自定义损失函数吗当前输入模式被定义为用于产生预测的输入向量例如请考虑以下情况 X train X test y train y test train test
如何在 Tensorflow 中计算 Spearman 相关性

Problem 我需要计算 Pearson 和 Spearman 相关性并将其用作张量流中的指标对于皮尔逊来说这是微不足道的 tf contrib metrics streaming pearson correlation y pre
使用 anaconda3 python 3.5 安装 Tensorflow 出现“读取超时错误”

我正在尝试安装tensorflow gpu 使用python3 6和anaconda 3 我正在按照以下链接中的安装步骤进行操作https www tensorflow org install install windows https w
3D 卷积神经网络输入形状

我在使用 3D CNN 提供数据时遇到问题Keras http keras io和 Python 对 3D 形状进行分类我有一个文件夹其中包含一些 JSON 格式的模型我将这些模型读入 Numpy 数组模型为 25 25 25 表示
Tensorflow：np数组的next_batch函数

我的火车数据为 xTrain numpy asarray 100 1 5 6 yTrain numpy asarray 200 2 10 12 如何定义 next batch size 方法以从训练数据中获取随机元素的 size 个数您可

随机推荐

使用 Tensorboard 实现 ML 模型可视化的完整指南

来源 xff1a The complete guide to ML model visualization with Tensorboard cnvrg io What Is TensorBoard While building machi
TensorBoard不创建目录plugins/profile

Tensorboard用作callback时 xff0c from keras callbacks import TensorBoard tb 61 TensorBoard log dir 61 clog dir 默认创建plugins p
linux shell执行Python脚本提示找不到模块问题

问题描述 xff1a 调试python工程时代码正常执行 xff0c 但远程执行时 xff0c 导入模块时提示找不到模块 xff0c 即使模块就在当前工程目录下也不行解决方案 xff1a 1 将工程目录添加到sys path中 xff08
迷宫问题算法分析

首先给出经典的算法 xff0c 然后分析算法的实现 define MAX SIZE 8 int H 4 61 0 1 0 1 int V 4 61 1 0 1 0 char Maze MAX SIZE MAX SIZE 61 39 X 39
Linux中查看进程状态信息

Linux中查看进程状态信息一常用命令总结 ps pid 查询特定的进程 ps l 列出与本次登录有关的进程信息 xff1b ps aux 查询内存中进程信息 xff1b ps aux grep 查询进程的详细信息 xff1b top
在shell中执行Python脚本出错：import: unable to open X server `‘ @ error/import.c/ImportImageCommand/364.

问题在shell脚本中执行python脚本出错 xff0c 以及直接执行Python文件也出错 xff1a 解决问题实质上出在了没有定义好解释器 xff0c 在首行 xff1a usr bin python 如果不行 xff0c 可以尝
MMSeg错误：RuntimeError: Default process group has not been initialized

在使用mmSegmentation框架时遇到的问题 xff1a File 34 C software Anaconda3 envs python36 lib site packages torch distributed distribut
shell执行python找不到自定义包的问题

使用IDE时可以正常导入文件夹形式的包 xff0c 但是shell脚本执行时无法找到文件夹形式的包 xff0c 仅能识别环境里的包例如 xff0c 工程结构为 xff1a train py mmseg 文件夹包 py文件中包含代码 imp
python编译后程序执行出现中文乱码问题

问题 xff1a 使用IDE中英文路径读写文件没有问题 xff0c 但程序编译后或使用命令行执行py文件时 xff0c 出现中文乱码原因 xff1a Python idle 中utf 8 gbk或者unicode这三种编码都支持 xff0
python下使用sort()函数对目录下文件名进行多条件排序

1 基础函数 a sort 函数 sort 函数的作用是对列表内容进行正向排序 xff0c 直接在原列表进行修改 xff0c 返回的是修改后的列表 lists 61 1 5 10 8 6 lists sort print lists gt
std::max() error C2589: ‘(‘ : illegal token on right side of ‘::‘ 解决办法

int max 61 std numeric limits lt int gt max 根据错误提示 xff1a f code cpp webspider main cpp 47 warning C4003 not enough actua
【C++】DLL文件的编写与实现——三步走

一 DLL是什么 xff1f 动态链接库 xff08 Dynamic Link Library xff09 DLL文件与EXE文件一样也是可执行文件 xff0c 但是DLL也被称之为库 xff0c 因为里面封装了各种类啊 xff0c 函数啊
【OpenCV】windows10 vs2019 + opencv 3.4.7环境搭

windows vs2019 43 opencv 3 4 7环境搭建安装Opencv 3 4 7 下载 Opencv 第1步进入 opencv releases 页面 xff0c 点击 34 Windows 34 的下载链接之后 xff
【ONNXRuntime】Win10 GPU环境 ONNXRuntime下载与VS开发配置

ONNXRuntime下载下载地址 xff1a Releases microsoft onnxruntime github com 在下载地址找到对应版本 xff0c 历史版本在Assets点开折叠后可以看到 xff0c 可下载内容如下
C语言版随机分配座位问题

include lt stdio h gt include lt stdlib h gt include lt time h gt define COUNT 50 int main int argc char argv int tmp 61
【C++】判断vector中是否存在特定元素的方法

简介常常需要在vector中查找元素是否存在 xff0c 或者确定元素的个数但vector未提供相关的成员函数这里不讨论手写for遍历的方法无论从工作量还是效率方面 xff0c 都应该优先选用STL算法注意 xff1a 对于任意的
【TensorRT】Win10 TensorRT下载与VS开发配置

1 TensorRT下载下载地址 xff1a NVIDIA TensorRT Download NVIDIA Developer 在下载地址找到对应版本 xff0c 以TensorRT7 2 3为例 xff0c 找到其中的windows版
【TensorRT】TensorRT踩过的坑

1 TensorRT发布的模型 xff08 engine xff09 不能跨平台使用例如linux发布的模型不能在windows下用 2 TensorRT发布的模型需要在相同GPU算力 xff08 compute capability x
【PyCharm】PyCharm破解版在系统崩溃后无法启动

安装的PyCharm破解版在系统崩溃后无法启动 xff0c 一直卡在启动界面无法进入 xff0c 也不退出此时通过安装目录的D Program Files JetBrains PyCharm 2021 2 3 bin pycharm ba
【Tensorflow】No OpKernel was registered to support Op ‘NcclAllReduce‘错误

问题 xff1a 使用tf distribute MirroredStrategy时 xff0c 在windows系统多卡下出现No OpKernel was registered to support Op 39 NcclAllReduc

【Tensorflow】No OpKernel was registered to support Op ‘NcclAllReduce‘错误

【Tensorflow】No OpKernel was registered to support Op ‘NcclAllReduce‘错误 的相关文章

随机推荐

热门标签

【Tensorflow】No OpKernel was registered to support Op ‘NcclAllReduce‘错误的相关文章