tensorflow-gpu 无法与 Blas GEMM 一起使用启动失败

2024-02-06

我安装了tensorflow-gpu 以在GPU 上运行我的tensorflow 代码。但我无法让它运行。它不断给出上述错误。以下是我的示例代码，后面是错误堆栈跟踪：

import tensorflow as tf
import numpy as np

def check(W,X):
    return tf.matmul(W,X)


def main():
    W = tf.Variable(tf.truncated_normal([2,3], stddev=0.01))
    X = tf.placeholder(tf.float32, [3,2])
    check_handle = check(W,X)
    with tf.Session() as sess:
        tf.initialize_all_variables().run()
        num = sess.run(check_handle, feed_dict = 
            {X:np.reshape(np.arange(6), (3,2))})
        print(num)
if __name__ == '__main__':
    main()

我的 GPU 是非常好的 GeForce GTX 1080 Ti，带有 11 GB vram，并且没有其他重要的东西在上面运行（只有 chrome），正如您在 nvidia-smi 中看到的那样：

Fri Aug  4 16:34:49 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22                 Driver Version: 381.22                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 0000:07:00.0      On |                  N/A |
| 30%   55C    P0    79W / 250W |    711MiB / 11169MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7650    G   /usr/lib/xorg/Xorg                             380MiB |
|    0      8233    G   compiz                                         192MiB |
|    0     24226    G   ...el-token=963C169BB38ADFD67B444D57A299CE0A   136MiB |
+-----------------------------------------------------------------------------+

以下是错误堆栈跟踪：

2017-08-04 15:44:21.585091: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585110: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585114: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585118: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585122: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.853700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:07:00.0
Total memory: 10.91GiB
Free memory: 9.89GiB
2017-08-04 15:44:21.853724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-04 15:44:21.853728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-04 15:44:21.853734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:07:00.0)
2017-08-04 15:44:24.948616: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-08-04 15:44:24.948640: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
2017-08-04 15:44:24.948805: W tensorflow/core/framework/op_kernel.cc:1158] Internal: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
     [[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
Traceback (most recent call last):
  File "test.py", line 51, in <module>
    _, loss_out, res_out = sess.run([train_op, loss, res], feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
     [[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
     [[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'layer1/MatMul', defined at:
  File "test.py", line 18, in <module>
    pre_activation = tf.matmul(input_ph, weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1816, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1217, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
     [[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
     [[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

除此之外，我之前安装的tensorflow cpu运行得很好。任何帮助表示赞赏。谢谢！

注意-我安装了 cuda-8.0 和 cudnn-5.1，并将它们的路径添加到我的 bashrc 配置文件中。

我有一个非常相似的问题。对我来说，它与 nvidia 驱动程序更新同时发生。所以我认为这是驱动程序的问题。但更换驱动程序并没有效果。最终对我有用的是清理 nvidia 缓存：

sudo rm -rf ~/.nv/

在 NVIDIA 开发者论坛中发现了这个建议：https://devtalk.nvidia.com/default/topic/1007071/cuda-setup-and-installation/cuda-error-when-running-matrixmulcublas-sample-ubuntu-16-04/post/5169223/ https://devtalk.nvidia.com/default/topic/1007071/cuda-setup-and-installation/cuda-error-when-running-matrixmulcublas-sample-ubuntu-16-04/post/5169223/

我怀疑在驱动程序更新期间，仍然有一些旧版本的编译文件不兼容，甚至在过程中损坏。抛开假设不谈，这为我解决了问题。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

NVIDIA

tensorflow

cuDNN

tensorflow-gpu 无法与 Blas GEMM 一起使用启动失败的相关文章

使用 train_test_split 分割数据时的精度与随后加载 csv 文件的精度不同

我建立了一个模型来预测客户是企业客户还是私人客户训练模型后我预测了 1000 个数据集的类别但我没有将其用于训练此预测将保存在 csv 文件中现在我有两种不同的行为在程序中分割样本数据当我创建示例时train sample t
TensorFlow：张量不是该图的元素

file for inputing the data for testing from scipy import ndimage image file test png image data ndimage imread image fil
使用 load_model 加载经过训练的tensorflow.keras模型会返回JSON解码错误，而未经训练的模型加载正常

我有一个训练有素的 Keras 模型使用 tensorflow keras API 构建和训练并使用tf keras save model 没有可选参数的方法 Tensorflow 是最新的我的 Python 版本是 3 8 根据我的
Tensorflow：tf.get_collection 未返回范围内的变量

我正在尝试获取变量范围内的所有变量如所解释的here https stackoverflow com questions 36533723 tensorflow get all variables in scope 然而该行tf get
在 Tensorboard 中获取简单的绘图

我正在尝试在张量板上画一个简单的图就像他们在主页上一样如下所示 To understand how this is working I ve wrote the following import tensorflow as tf imp
TensorFlow - 根据另一个变量的形状动态定义变量的形状

假设我有一定的张量x其维度未在图初始化时定义我可以使用以下方法获得它的形状 x shape tf shape input x 现在如果我想根据中定义的值创建一个变量x shape using y tf get variable vari
使用sklearn宏f1-score作为tensorflow.keras中的指标

我已经为tensorflow keras定义了自定义指标以在每个时期之后计算宏f1分数如下所示 from tensorflow import argmax as tf argmax from sklearn metric import
如何使用 TensorFlow 设置 Udacity 深度学习课程的学习环境 (Windows)

相信很多对深度学习感兴趣的人都听说过这门课程 https www udacity com course deep learning ud730 https www udacity com course deep learning ud730
如何复制 tf.keras.models.Model 子类？

我需要复制 keras 模型但我无法知道可以做什么除非该模型is not a tf keras models Model 子类 Note 使用copy deepcopy 将在没有任何错误的情况下工作但是每当使用副本时都会导致另一个错误
Pytorch CUDA 错误：没有内核映像可用于在带有 cuda 11.1 的 RTX 3090 设备上执行

如果我运行以下命令 import torch import sys print A sys version print B torch version print C torch cuda is available print D torc
如何在 Tensorflow 中计算 Spearman 相关性

Problem 我需要计算 Pearson 和 Spearman 相关性并将其用作张量流中的指标对于皮尔逊来说这是微不足道的 tf contrib metrics streaming pearson correlation y pre
加载视频数据集（Keras）

我正在尝试实现 LRCN C LSTM RNN 来对视频中的情绪进行分类我的数据集结构分为两个文件夹 train set 和 valid set 当你打开其中任何一个时你可以找到3个文件夹积极消极和惊喜最后这 3 个文件夹中
使用 keras.utils.Sequence 多处理和数据库 - 何时连接？

我正在使用 Keras 和 Tensorflow 后端训练神经网络数据集不适合 RAM 因此我将其存储在 Mongo 数据库中并使用子类检索批次keras utils Sequence 一切正常如果我跑的话model fit gene
了解 Tensorflow 中的 while 循环

我正在使用用于 Tensorflow 的 Python API https www tensorflow org api docs python 我正在努力实施罗森布罗克函数 https www sfu ca ssurjano rosen
tf.data.Dataset 迭代器返回 Tensor("IteratorGetNext:1", shape=(None, 16), dtype=int32) 但无法获取张量的值

我正在尝试编写一个自定义模型其中我正在编写一个自定义train step功能我正在从自定义数据生成器创建 tf data Dataset 例如 tds tf data Dataset from generator tdg iter ar
张量流中是否存在无操作（传递）操作？

正如标题所示我想利用这样的操作来重命名节点并更好地组织图表或者是否有其他推荐的做法来重命名图中的现有节点谢谢有tf no op https www tensorflow org api docs python tf no op它允许
可重用的 Tensorflow 卷积网络

我想重用来自Tensorflow 专业人士的 MNIST CNN 示例 http www tensorflow org tutorials mnist pros index md 我的图像尺寸为 388px X 191px 只有 2 个输出
您必须为 MNIST 数据集的占位符张量“Placeholder”提供一个值，dtype float 和 shape [?,784]

这是我在 MNIST 数据集上测试量化的示例我正在使用以下代码测试我的模型 import tensorflow as tf from tensorflow examples tutorials mnist import input dat
为什么在线预测失败并显示“无法从 feed 中获取元素作为字节”？

在线预测失败并显示无法从源中获取字节形式的元素这是什么意思以及如何解决它我使用以下代码生成预测 request data examples pickup longitude 73 885262 pickup latitude 40
使用基于 ConvLSTM2D 的 Keras 模型从较低分辨率图像估计高分辨率图像

我正在尝试使用以下内容ConvLSTM2D从低分辨率图像序列估计高分辨率图像序列的架构 import numpy as np scipy ndimage matplotlib pyplot as plt from keras models

随机推荐

在 Perl 中解析时间戳与毫秒

假设我有一堆时间戳如 11 05 2010 16 27 26 003 如何在 Perl 中用毫秒解析它们本质上我想比较时间戳以查看它们是在特定时间之前还是之后我尝试使用 Time Local 但似乎 Time Local 只能解析第
使用与 Trait 函数相同名称的 PHP 类

我有以下代码作为示例 trait sampletrait function hello echo hello from trait class client use sampletrait function hello echo hello
YouTube-Player-iOS-Helper 无法使用 YTPlayerView 类

我正在尝试实现 youtube ios player helper 在这里找到 https github com youtube youtube ios player helper https github com youtube yout
在android上进行双向数据绑定的正确方法是什么？

我为 2 路数据绑定做了一个简单的 hello world 并且接缝工作完美当在 editext 上写入时 textview 自动更新但是像官方文档一样在网上找到的所有代码都有更多的代码和复杂性例如https developer an
如何在 Windows 7 64 位上调试 VB6 IIS 应用程序

我需要能够在 Windows 7 64 位上调试 Visual Basic 6 IIS 应用程序不仅仅是为了解决一个问题而是为了持续发展尝试调试会导致 WebClass 运行时出现错误发生了未指定的错误如果我不进行调试而只是访问编
对 Azure Cosmos DB 中的嵌套字段建立索引

我想在 Azure Cosmos DB 文档中的嵌套字段上创建索引例如如果我有以下架构 id 1 nested mode mode1 text nice text 我想在该字段上创建索引nested mode 如何才能做到这一点 Ans
如何在Eclipse控制台打印[简体]汉字？

我有以下代码 import java io PrintStream import java io UnsupportedEncodingException import java util Locale public final class
点击刷新时，IIS 上出现路由 404 错误

我没有找到这个问题的解决方案但我已经尝试了各种解决方案但没有任何效果我有一个 React JS 应用程序当部署在测试服务器上并且您在页面上点击刷新时我收到 404 错误消息我尝试过 URL 重写这有助于导航回主页但这并不能解
如何使 DateTime 独立于当前文化？

我尝试将日期时间转换为字符串并返回但使其适用于所有文化我基本上有一个文本框 tbDateTime 和一个标签 lbDateTime 该标签告诉用户软件期望以哪种格式输入 tbDateTime 文本框的输入将用于 MySQL 命令目前
Angular ng 使用不同的“配置文件”构建

在 Maven Java 中可以构建具有不同配置文件的 Web 应用程序战争配置文件指示例如要放入配置文件中的 Web 服务的 URL 因此测试配置文件将指示与生产配置文件不同的 URL 有没有类似于 ng build 的
Swift - 如何在单击时使图像全屏显示，然后在再次单击时使图像变为原始大小？ [关闭]

Closed 这个问题需要多问focused help closed questions 目前不接受答案对于我正在制作的应用程序我希望用户能够单击图像以使其在应用程序上全屏显示然后用户可以单击现在的全屏图像以使其恢复原始大小这可能吗
我不明白 format() 和 ... (python) 之间有什么区别

这里是困惑的新手使用有什么区别 print So you are 0 years old format age AND print So you are age years old 两者都有效其实差别很大前者使用字符串format h
汇编中的纯高位乘法？

为了实现 0 到 1 之间的实数通常使用 ANSI 浮点数或双精度数但是 0 到 1 之间的固定精度数字小数模 1 可以有效地实现为 32 位整数或 16 位字它们像普通整数字一样相加但乘以错误的方式这意味着当您乘以 X 倍
gem 服务器：如何更新缺少 rdoc 的 gem？

我很喜欢gem server使用本地 RubyGems 文档索引引导 Web 服务器的命令我唯一的问题是有些 gems 没有 rdoc 文件如何添加缺失的rdoc 所有gem都是主流gem 不是我自己的通过Bundler安装导轨2
电子生成器应用程序大小太大

我发现使用 Electron builder 生成的 MyApp exe 文件有将近 500M 左右我不确定我做了什么因为以前仅对于 ia32 或 x64 它大约是 196M 我也看了这个link https stackoverflo
矢量上的段错误

我创建了一个结构来保存一些数据然后声明一个向量来保存该结构但是当我执行 Push back 时我遇到了该死的段错误我不知道为什么我的结构定义为 typedef struct Group int codigo string name
查找最大值并显示 SQL Server 中不同字段的相应值

我有一个表其中包含有关城市的数据其中包括城市名称人口和与我的问题无关的其他字段 ID Name Population 1 A 45667 2 B 123456 3 C 3005 4 D 13769 找到最大人口是基本的但我需要一个结
python 在 x 轴上旋转值以不重叠

I m having some problems with the xticks of the graph here 有人可以帮忙吗我尝试了他们在这里所做的事情 matplotlib 中的日期刻度和旋转 https stackoverfl
如何设置 Facebook 分享图片（仅作为后备）？

我们当然可以使用以下命令来设置默认共享图像但是有没有办法将其设置为仅后备而不是默认值这意味着只有当 Facebook 无法从博客文章中找到更大更合适的图像时才可以使用注意 Facebook 已经自动自行抓取无需网站所有者的任
tensorflow-gpu 无法与 Blas GEMM 一起使用启动失败

我安装了tensorflow gpu 以在GPU 上运行我的tensorflow 代码但我无法让它运行它不断给出上述错误以下是我的示例代码后面是错误堆栈跟踪 import tensorflow as tf import numpy

tensorflow-gpu 无法与 Blas GEMM 一起使用 启动失败

tensorflow-gpu 无法与 Blas GEMM 一起使用 启动失败 的相关文章

随机推荐

热门标签

tensorflow-gpu 无法与 Blas GEMM 一起使用启动失败

tensorflow-gpu 无法与 Blas GEMM 一起使用启动失败的相关文章