Cuda——cudaGetDeviceProperties函数及cudaDeviceProp结构体的调用

2023-10-26

首先介绍下 cudaGetDeviceCount函数

cudaError_t err = cudaGetDeviceCount(&count);

获取当前支持cuda编程的设备数目，通过count值返回，若count值为0，则初始化失败，当前设备不支持cuda编程。

在cuda初始化的时候，常常需要查看系统中是否存在支持GPU编程的设备，需要调用函数式cudaGetDeviceProperties()，其函数使用方式为：

cudaError = cudaGetDeviceProperties(struct cudaDeviceProp *prop, int device)

其中cudaDeviceProp结构体中包含了一些列设备的信息，cuda11.1中的结构具体如下：

struct __device_builtin__ cudaDeviceProp
{
    char         name[256];                  /**< ASCII string identifying device */
    cudaUUID_t   uuid;                       /**< 16-byte unique identifier */
    char         luid[8];                    /**< 8-byte locally unique identifier. Value is undefined on TCC and non-Windows platforms */
    unsigned int luidDeviceNodeMask;         /**< LUID device node mask. Value is undefined on TCC and non-Windows platforms */
    size_t       totalGlobalMem;             /**< Global memory available on device in bytes */
    size_t       sharedMemPerBlock;          /**< Shared memory available per block in bytes */
    int          regsPerBlock;               /**< 32-bit registers available per block */
    int          warpSize;                   /**< Warp size in threads */
    size_t       memPitch;                   /**< Maximum pitch in bytes allowed by memory copies */
    int          maxThreadsPerBlock;         /**< Maximum number of threads per block */
    int          maxThreadsDim[3];           /**< Maximum size of each dimension of a block */
    int          maxGridSize[3];             /**< Maximum size of each dimension of a grid */
    int          clockRate;                  /**< Clock frequency in kilohertz */
    size_t       totalConstMem;              /**< Constant memory available on device in bytes */
    int          major;                      /**< Major compute capability */
    int          minor;                      /**< Minor compute capability */
    size_t       textureAlignment;           /**< Alignment requirement for textures */
    size_t       texturePitchAlignment;      /**< Pitch alignment requirement for texture references bound to pitched memory */
    int          deviceOverlap;              /**< Device can concurrently copy memory and execute a kernel. Deprecated. Use instead asyncEngineCount. */
    int          multiProcessorCount;        /**< Number of multiprocessors on device */
    int          kernelExecTimeoutEnabled;   /**< Specified whether there is a run time limit on kernels */
    int          integrated;                 /**< Device is integrated as opposed to discrete */
    int          canMapHostMemory;           /**< Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer */
    int          computeMode;                /**< Compute mode (See ::cudaComputeMode) */
    int          maxTexture1D;               /**< Maximum 1D texture size */
    int          maxTexture1DMipmap;         /**< Maximum 1D mipmapped texture size */
    int          maxTexture1DLinear;         /**< Maximum size for 1D textures bound to linear memory */
    int          maxTexture2D[2];            /**< Maximum 2D texture dimensions */
    int          maxTexture2DMipmap[2];      /**< Maximum 2D mipmapped texture dimensions */
    int          maxTexture2DLinear[3];      /**< Maximum dimensions (width, height, pitch) for 2D textures bound to pitched memory */
    int          maxTexture2DGather[2];      /**< Maximum 2D texture dimensions if texture gather operations have to be performed */
    int          maxTexture3D[3];            /**< Maximum 3D texture dimensions */
    int          maxTexture3DAlt[3];         /**< Maximum alternate 3D texture dimensions */
    int          maxTextureCubemap;          /**< Maximum Cubemap texture dimensions */
    int          maxTexture1DLayered[2];     /**< Maximum 1D layered texture dimensions */
    int          maxTexture2DLayered[3];     /**< Maximum 2D layered texture dimensions */
    int          maxTextureCubemapLayered[2];/**< Maximum Cubemap layered texture dimensions */
    int          maxSurface1D;               /**< Maximum 1D surface size */
    int          maxSurface2D[2];            /**< Maximum 2D surface dimensions */
    int          maxSurface3D[3];            /**< Maximum 3D surface dimensions */
    int          maxSurface1DLayered[2];     /**< Maximum 1D layered surface dimensions */
    int          maxSurface2DLayered[3];     /**< Maximum 2D layered surface dimensions */
    int          maxSurfaceCubemap;          /**< Maximum Cubemap surface dimensions */
    int          maxSurfaceCubemapLayered[2];/**< Maximum Cubemap layered surface dimensions */
    size_t       surfaceAlignment;           /**< Alignment requirements for surfaces */
    int          concurrentKernels;          /**< Device can possibly execute multiple kernels concurrently */
    int          ECCEnabled;                 /**< Device has ECC support enabled */
    int          pciBusID;                   /**< PCI bus ID of the device */
    int          pciDeviceID;                /**< PCI device ID of the device */
    int          pciDomainID;                /**< PCI domain ID of the device */
    int          tccDriver;                  /**< 1 if device is a Tesla device using TCC driver, 0 otherwise */
    int          asyncEngineCount;           /**< Number of asynchronous engines */
    int          unifiedAddressing;          /**< Device shares a unified address space with the host */
    int          memoryClockRate;            /**< Peak memory clock frequency in kilohertz */
    int          memoryBusWidth;             /**< Global memory bus width in bits */
    int          l2CacheSize;                /**< Size of L2 cache in bytes */
    int          persistingL2CacheMaxSize;   /**< Device's maximum l2 persisting lines capacity setting in bytes */
    int          maxThreadsPerMultiProcessor;/**< Maximum resident threads per multiprocessor */
    int          streamPrioritiesSupported;  /**< Device supports stream priorities */
    int          globalL1CacheSupported;     /**< Device supports caching globals in L1 */
    int          localL1CacheSupported;      /**< Device supports caching locals in L1 */
    size_t       sharedMemPerMultiprocessor; /**< Shared memory available per multiprocessor in bytes */
    int          regsPerMultiprocessor;      /**< 32-bit registers available per multiprocessor */
    int          managedMemory;              /**< Device supports allocating managed memory on this system */
    int          isMultiGpuBoard;            /**< Device is on a multi-GPU board */
    int          multiGpuBoardGroupID;       /**< Unique identifier for a group of devices on the same multi-GPU board */
    int          hostNativeAtomicSupported;  /**< Link between the device and the host supports native atomic operations */
    int          singleToDoublePrecisionPerfRatio; /**< Ratio of single precision performance (in floating-point operations per second) to double precision performance */
    int          pageableMemoryAccess;       /**< Device supports coherently accessing pageable memory without calling cudaHostRegister on it */
    int          concurrentManagedAccess;    /**< Device can coherently access managed memory concurrently with the CPU */
    int          computePreemptionSupported; /**< Device supports Compute Preemption */
    int          canUseHostPointerForRegisteredMem; /**< Device can access host registered memory at the same virtual address as the CPU */
    int          cooperativeLaunch;          /**< Device supports launching cooperative kernels via ::cudaLaunchCooperativeKernel */
    int          cooperativeMultiDeviceLaunch; /**< Device can participate in cooperative kernels launched via ::cudaLaunchCooperativeKernelMultiDevice */
    size_t       sharedMemPerBlockOptin;     /**< Per device maximum shared memory per block usable by special opt in */
    int          pageableMemoryAccessUsesHostPageTables; /**< Device accesses pageable memory via the host's page tables */
    int          directManagedMemAccessFromHost; /**< Host can directly access managed memory on the device without migration. */
    int          maxBlocksPerMultiProcessor; /**< Maximum number of resident blocks per multiprocessor */
    int          accessPolicyMaxWindowSize;  /**< The maximum value of ::cudaAccessPolicyWindow::num_bytes. */
    size_t       reservedSharedMemPerBlock;  /**< Shared memory reserved by CUDA driver per block in bytes */
};

部分主要信息的中文翻译：

struct cudaDeviceProp {
    char name[256]; // 识别设备的ASCII字符串（比如，"GeForce GTX 940M"）
    size_t totalGlobalMem; // 全局内存大小
    size_t sharedMemPerBlock; // 每个block内共享内存的大小
    int regsPerBlock; // 每个block 32位寄存器的个数
    int warpSize; // warp大小
    size_t memPitch; // 内存中允许的最大间距字节数
    int maxThreadsPerBlock; // 每个Block中最大的线程数是多少
    int maxThreadsDim[3]; // 一个块中每个维度的最大线程数
    int maxGridSize[3]; // 一个网格的每个维度的块数量
    size_t totalConstMem; // 可用恒定内存量
    int major; // 该设备计算能力的主要修订版号
    int minor; // 设备计算能力的小修订版本号
    int clockRate; // 时钟速率
    size_t textureAlignment; // 该设备对纹理对齐的要求
    int deviceOverlap; // 一个布尔值，表示该装置是否能够同时进行cudamemcpy()和内核执行
    int multiProcessorCount; // 设备上的处理器的数量
    int kernelExecTimeoutEnabled; // 一个布尔值，该值表示在该设备上执行的内核是否有运行时的限制
    int integrated; // 返回一个布尔值，表示设备是否是一个集成的GPU（即部分的芯片组、没有独立显卡等）
    int canMapHostMemory; // 表示设备是否可以映射到CUDA设备主机内存地址空间的布尔值
    int computeMode; // 一个值，该值表示该设备的计算模式：默认值，专有的，或禁止的
    int maxTexture1D; // 一维纹理内存最大值
    int maxTexture2D[2]; // 二维纹理内存最大值
    int maxTexture3D[3]; // 三维纹理内存最大值
    int maxTexture2DArray[3]; // 二维纹理阵列支持的最大尺寸
    int concurrentKernels; // 一个布尔值，该值表示该设备是否支持在同一上下文中同时执行多个内核
｝

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

CUDA

c

开发语言

Cuda——cudaGetDeviceProperties函数及cudaDeviceProp结构体的调用的相关文章

实体框架一对多关系

我的 EF 查询大约需要 3 秒才能获取 10 个玩家因为它获取另一个表的所有 500k 行而不是我需要的少数行这是玩家实体 namespace RocketLeagueStats Database Entities Table pl
lambda 始终返回“1”

有这样的代码 include
C++ 最大非负整数

以下内容是否会在所有平台 int 大小等上按预期工作或者有更容易接受的方法吗我做了以下的事情 define MAX NON NEGATIVE INT int unsigned int 1 2 我不会通过解释它在做什么来侮辱你的智商编辑
将信号/槽（QObject）添加到 QGraphicsItem：性能受到影响？

我想将信号槽添加到 QGraphicsItem 以便我可以从另一个线程访问 QGraphicsItemObjects 我知道有两个选项使用 QGraphicsObject 或从 QObject 和 QGraphicsItem 继承使用
Microsoft Visual C++ 2008 和 R2007b 的 Mex 类型

我想对 vs2008 和 matlab2007b 使用 mex 类型我尝试了下面的代码 include
C++ 中的可变参数函数声明中省略了逗号

我习惯于这样声明可变参数函数 int f int n 读书时C 编程语言我发现书中的声明省略了逗号 int f int n the comma has been omitted 这个语法似乎是 C 特定的因为当我尝试使用 C 编译器编译它
CRTP 能否完全取代小型设计的虚拟功能？

Is CRTP http en wikipedia org wiki Curiously recurring template pattern有足够的能力智胜virtual功能齐全我认为 CRTP 的唯一缺点是为每个重复模式生成大量代码
如何使构造函数只能由基类访问？

如果我想要一个只能从子类访问的构造函数我可以使用protected构造函数中的关键字现在我想要相反的我的子类应该有一个构造函数该构造函数可以由其基类访问但不能从任何其他类访问这可能吗这是我当前的代码问题是子类有一个公共构造函
C - '=' 标记之前的预期表达式...在没有 '=' 的行上

我疯狂地试图找出这个与现实我的代码没有明显联系的错误消息我一直在这里搜索并得出一个结论你会讨厌 typedef 隐藏的指针抱歉这超出了我的控制范围教授以这种方式提供了代码我正在编辑问题中指定的代码我弹出完整节点以避免每个推送
如何获取字符串宽度

我需要在类库中构建一个函数该函数接受一个字符串和该字符串的特定字体然后获取字符串的宽度那么我怎样才能得到字符串边界宽度呢另一种方法是使用TextRenderer 并致电its MeasureString http msdn micr
如何为用户提供给定 boost::spirit 语法的自动完成建议？

我正在使用 Boost Spirit 在我的 C GUI 应用程序中为非技术用户构建简单的数据过滤器语言语言与纯英语非常相似并且可以解析为 AST 我被要求使该过程尽可能对用户友好因此我希望提供类似 CLang 的错误消息无法识
您在 C# 或 .NET 中见过的最奇怪的极端情况是什么？ [关闭]

就目前情况而言这个问题不太适合我们的问答形式我们希望答案得到事实参考资料或专业知识的支持但这个问题可能会引发辩论争论民意调查或扩展讨论如果您觉得这个问题可以改进并可能重新开放访问帮助中心 help reopen questi
在 C 中初始化结构体的静态数组

我正在用 C 实现一个纸牌游戏纸牌有很多种类型每种纸牌都有大量信息包括一些需要单独编写与其关联的脚本的操作给定这样的结构并且我不确定我的语法是否适合函数指针 struct CARD int value int cost This
如何将焦点设置到 Windows 窗体应用程序中的控件？

在 Windows 窗体应用程序中 when我是否编写代码以在应用程序启动时以及随后调用函数后将焦点设置到控件例如如果我有一个 DropDownList 一个 TextBox 和四个按钮并且我希望将 Focus 设置为 DropDow
invoke_result获取模板成员函数的返回类型

如何获取模板成员函数的结果类型下面的最小示例说明了该问题 include
C/C++ 特殊 CPU 功能的使用

我很好奇新的编译器是否使用了新 CPU 中内置的一些额外功能例如 MMX SSE 3DNow 所以我的意思是在最初的 8086 中甚至没有 FPU 所以旧的编译器甚至不能使用它但新的编译器可以因为 FPU 是每个新 CPU 的一
SoapHttpClientProtocol：以流而不是字符串的形式获取响应？

我正在使用一种网络服务它可以一次性输出大量数据响应字符串可能约为 8MB 虽然在台式电脑上这不是问题但嵌入式设备在处理 8MB 字符串对象时会发疯我想知道是否有办法以流的形式获取响应目前我正在使用如下方法我尝试使用 POST 请
使用 STL 迭代器而不初始化它

我想做这样的事情 container iterator it NULL switch eSomeEnum case Container1 it vecContainer1 begin break case Container2 it vec
奇怪的 MSC 8.0 错误：“ESP 的值未在函数调用中正确保存...”

我们最近尝试将一些 Visual Studio 项目分解为库并且在测试项目中一切似乎都编译和构建得很好其中一个库项目作为依赖项然而尝试运行该应用程序给我们带来了以下令人讨厌的运行时错误消息运行时检查失败 0 ESP 的值未在函数调
捕获 System.Exception 总是不好的做法吗？

请考虑下面的代码它抛出三个不同的异常即 System Configuration ConfigurationErrorsException System FormatException and System OverflowExcept

随机推荐

ajax异步加载jqgrid之动态创建

2019独角兽企业重金招聘Python工程师标准 gt gt gt 之前写过一篇过于ajax异步加载jqgrid的文章那个只是一个特殊的情况如果创建不同数据库表的jqgrid 必须分别写servlet dao层和连接池很麻烦今天我写
Hive insert overwrite 问题

微信公众号苏言论理论联系实际畅言技术与生活文章目录 1 测试的版本 2 insert overwrite使用说明 3 示例 4 建议的操作 5 参考链接 1 测试的版本 Apache hive 1 1 0 2 3 1 3 1 0 2
vue3 全局批量注册组件

思路 1 使用 require 提供的函数 context 加载某一个目录下的所有 vue 后缀的文件 2 context 函数会返回一个导入函数 importFn 3 它有一个方法 keys 获取所有的文件路径 4 通过文件路径数组通过
Ubuntu20.04 + 3090 安装nvidia驱动，附加解决重启黑屏卡在 /dev/***: clean, **files,***blocks的问题

目录准备禁用nouveau 解决黑屏问题并安装驱动参考准备首先需要知道当前电脑服务器的显卡型号这个自行查找自己电脑配置查找显卡对应的驱动版本通过命令ubuntu drivers devices查看当前设备所支持的驱动带有
Android 监控SD卡的插拔状态

http blog csdn net pasterzhang article details 8151877 我们是以DV6300 T的平台来做测试的发现有2种方式来检测Android中external media 包括SD卡 USB 的
Spring Cloud Feign nested exception is java.lang.IllegalStateException

Spring Cloud Feign 使用时抛出异常 nested exception is java lang IllegalStateException RequestParam value was empty on parameter
数据结构——广度优先遍历（队列）

队列的基本操作 include
单片机C语言零基础入门05 - 逻辑运算

硬件家园单片机C语言零基础入门资料汇总链接 https mp weixin qq com s hMTreNUX V90461tvALjJA 一逻辑与或非基础理论逻辑与或非运算对象是布尔值 1或0 真或假类似于数字电路的与门或门
Qt 快速读写Excel指南

Qt Windows 下快速读写Excel指南很多人搜如何读写excel都会看到用QAxObject来进行操作很多人试了之后都会发现一个问题就是慢非常缓慢因此很多人得出结论是QAxObject读写excel方法不可取效率低后来
c#——简易的客车售票系统

制作一个简单的客车售票系统假设客车的座位数是9行4列使用一个二维数组记录客车售票系统中的所有座位号并在每个座位号上都显示有票然后用户输入一个坐标位置按回车键即可将该座位号显示为已售程序运行结果如下所示 using Syst
Redis的安装与Linux下查看服务安装情况

Redis的安装移步到大神博客https www cnblogs com hunanzp p 12304622 html Linux下服务的安装情况移步到大神博客 https www cnblogs com zyh0430 p 1187
SpringMVC ssm 接收 List对象

ssm接收参数不能为接口类型因此可以使用ArrayList对象接受前端传来的list对象 RequestMapping list public PageVO
jQuery之简单的表单验证

点击打开链接 html部分
HTML单选、多选、按钮、下拉框、文本输入框
(文章复现)基于主从博弈的新型城镇配电系统产消者竞价策略

参考文献 1 陈修鹏李庚银夏勇基于主从博弈的新型城镇配电系统产消者竞价策略 J 电力系统自动化 2019 43 14 97 104 1 基本原理在竞争性电力市场下新型城镇配电系统内主要有以下几类主体电力交易中心和调度部门产消者
GLSurfaceView黑屏问题解决

问题列表打开其他页面返回当前页面 GLSurfaceView会有短暂黑屏按HOME键回到后台再切换回来 GLSurfaceView会有短暂黑屏分析以上问题总结下就是回到后台后再切换到前台 GLSurfaceView会有短暂黑屏提
TensorFlow中的name有什么用

在某些地方我看到了语法其中变量用name初始化有时没有name 例如 With name var tf Variable 0 name counter Without one tf constant 1 那么变量名var和counte
以太坊合并升级的全面介绍

以太坊主网即将通过称为合并的升级从工作量证明转向权益证明共识机制合并 Merge 是以太坊生态系统一系列主要升级的一部分此外还有Surge Verge Purge以及Splurge 多次升级的目的是为了提高以太坊的可扩展性和能效
基于STM32F103C6T6的AB相霍尔编码电机的PID转速调节（CubeMx-HAL库）（未完成-持续更新）

基于STM32F103C6T6的AB相霍尔编码电机的PID转速调节 CubeMx HAL库未完成持续更新主要是记录一下以后忘了再来看看也记录记录自己做过的东西首先是硬件电路图一下是驱动板的硬件电路图来自于实验室的某大佬比赛开
Cuda——cudaGetDeviceProperties函数及cudaDeviceProp结构体的调用

首先介绍下 cudaGetDeviceCount函数 cudaError t err cudaGetDeviceCount count 获取当前支持cuda编程的设备数目通过count值返回若count值为0 则初始化失败当前设备不支

Cuda——cudaGetDeviceProperties函数及cudaDeviceProp结构体的调用

Cuda——cudaGetDeviceProperties函数及cudaDeviceProp结构体的调用 的相关文章

随机推荐

热门标签

Cuda——cudaGetDeviceProperties函数及cudaDeviceProp结构体的调用的相关文章