TensoRT—— buffers管理(samplesCommon::BufferManager)

2023-05-16

BufferManager类处理主机和设备buffer分配和释放。

这个RAII类处理主机和设备buffer的分配和释放、主机和设备buffers之间的memcpy以帮助inference,以及debugging dumps以验证inference。BufferManager类用于简化buffer管理以及bufferengine之间的交互。

代码位于:TensorRT\samples\common\buffers.h

协作图

在这里插入图片描述

Public Member

BufferManager

BufferManager构造函数作用分配主机和设备的buffer内存,创建一个BufferManager来处理和引擎的缓冲区交互。具体实现如下:

    BufferManager(std::shared_ptr<nvinfer1::ICudaEngine> engine, const int batchSize = 0,
        const nvinfer1::IExecutionContext* context = nullptr)
        : mEngine(engine), mBatchSize(batchSize)
    {
        // 查询引擎是否使用implicit batch维度生成。 如果张量具有implicit batch维度,则返回True,否则返回false。engine中的所有张量要么都有implicit batch维度,要么都没有。
        // 仅当生成此引擎的INetworkDefinition是使用createNetworkV2创建的,而没有NetworkDefinitionCreationFlag::kEXPLICIT_BATCH标志。hasImplicitBatchDimension返回为true
        assert(engine->hasImplicitBatchDimension() || mBatchSize == 0);
        // Create host and device buffers
        for (int i = 0; i < mEngine->getNbBindings(); i++)
        {
            auto dims = context ? context->getBindingDimensions(i) : mEngine->getBindingDimensions(i);
            // 1
            size_t vol = context || !mBatchSize ? 1 : static_cast<size_t>(mBatchSize);
            nvinfer1::DataType type = mEngine->getBindingDataType(i);
            int vecDim = mEngine->getBindingVectorizedDim(i);
            // 
            if (-1 != vecDim) // i.e., 0 != lgScalarsPerVector
            {   
                int scalarsPerVec = mEngine->getBindingComponentsPerElement(i);
                // divUp:(a + b - 1) / b;
                dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec);
                vol *= scalarsPerVec;
            }
            vol *= samplesCommon::volume(dims);
            std::unique_ptr<ManagedBuffer> manBuf{new ManagedBuffer()};
            // 分配buffer设备内存和主机内存
            manBuf->deviceBuffer = DeviceBuffer(vol, type);
            manBuf->hostBuffer = HostBuffer(vol, type);
            mDeviceBindings.emplace_back(manBuf->deviceBuffer.data());
            // std::move避免不必要的拷贝操作,将对象的状态或者所有权从一个对象转移到另一个对象,只是转移,没有内存的搬迁
            mManagedBuffers.emplace_back(std::move(manBuf));
        }
    }

getDeviceBindings

返回设备缓冲区的向量,可以直接将其用作IExecutionContext的execute和enqueue方法的绑定。

    //!
    //! \brief Returns a vector of device buffers.
    //!
    const std::vector<void*>& getDeviceBindings() const
    {
        return mDeviceBindings;
    }

    //!
    //! \brief Returns the device buffer corresponding to tensorName.
    //!        Returns nullptr if no such tensor can be found.
    //!
    void* getDeviceBuffer(const std::string& tensorName) const
    {
        return getBuffer(false, tensorName);
    }

getDeviceBuffer

返回与tensorName对应的设备缓冲区。如果找不到此类张量,则返回nullptr。

    void* getDeviceBuffer(const std::string& tensorName) const
    {
        return getBuffer(false, tensorName);
    }

getHostBuffer

返回与tensorName对应的主机缓冲区。如果找不到此类张量,则返回nullptr。

    //!
    //! \brief Returns the host buffer corresponding to tensorName.
    //!        Returns nullptr if no such tensor can be found.
    //!
    void* getHostBuffer(const std::string& tensorName) const
    {
        return getBuffer(true, tensorName);
    }

size

返回与tensorName对应的主机和设备缓冲区的大小。如果找不到此类张量,则返回kINVALID_SIZE_VALUE。

    size_t size(const std::string& tensorName) const
    {
        int index = mEngine->getBindingIndex(tensorName.c_str());
        if (index == -1)
            return kINVALID_SIZE_VALUE;
        return mManagedBuffers[index]->hostBuffer.nbBytes();
    }

print

将任意类型的缓冲区转储到std::ostream的模板化打印函数。rowCount参数控制每行上的元素数。rowCount为1表示每行上只有1个元素。

    template <typename T>
    void print(std::ostream& os, void* buf, size_t bufSize, size_t rowCount)
    {
        assert(rowCount != 0);
        assert(bufSize % sizeof(T) == 0);
        T* typedBuf = static_cast<T*>(buf);
        size_t numItems = bufSize / sizeof(T);
        for (int i = 0; i < static_cast<int>(numItems); i++)
        {
            // Handle rowCount == 1 case
            if (rowCount == 1 && i != static_cast<int>(numItems) - 1)
                os << typedBuf[i] << std::endl;
            else if (rowCount == 1)
                os << typedBuf[i];
            // Handle rowCount > 1 case
            else if (i % rowCount == 0)
                os << typedBuf[i];
            else if (i % rowCount == rowCount - 1)
                os << " " << typedBuf[i] << std::endl;
            else
                os << " " << typedBuf[i];
        }
    }

copyInputToDevice

将输入主机缓冲区的内容同步复制到输入设备缓冲区。

    void copyInputToDevice()
    {
        memcpyBuffers(true, false, false);
    }

copyOutputToHost

将输出设备缓冲区的内容同步复制到输出主机缓冲区。

    void copyOutputToHost()
    {
        memcpyBuffers(false, true, false);
    }

将输出设备缓冲区的内容同步复制到输出主机缓冲区。

copyInputToDeviceAsync

将输入主机缓冲区的内容异步复制到输入设备缓冲区。

    void copyInputToDeviceAsync(const cudaStream_t& stream = 0)
    {
        memcpyBuffers(true, false, true, stream);
    }

copyOutputToHostAsync

将输出设备缓冲区的内容异步复制到输出主机缓冲区。

    void copyOutputToHostAsync(const cudaStream_t& stream = 0)
    {
        memcpyBuffers(false, true, true, stream);
    }

~BufferManager

~BufferManager() = default;

Private Member

getBuffer

    void* getBuffer(const bool isHost, const std::string& tensorName) const
    {
        int index = mEngine->getBindingIndex(tensorName.c_str());
        if (index == -1)
            return nullptr;
        return (isHost ? mManagedBuffers[index]->hostBuffer.data() : mManagedBuffers[index]->deviceBuffer.data());
    }

memcpyBuffers

    void memcpyBuffers(const bool copyInput, const bool deviceToHost, const bool async, const cudaStream_t& stream = 0)
    {
        for (int i = 0; i < mEngine->getNbBindings(); i++)
        {
            void* dstPtr
                = deviceToHost ? mManagedBuffers[i]->hostBuffer.data() : mManagedBuffers[i]->deviceBuffer.data();
            const void* srcPtr
                = deviceToHost ? mManagedBuffers[i]->deviceBuffer.data() : mManagedBuffers[i]->hostBuffer.data();
            const size_t byteSize = mManagedBuffers[i]->hostBuffer.nbBytes();
            const cudaMemcpyKind memcpyType = deviceToHost ? cudaMemcpyDeviceToHost : cudaMemcpyHostToDevice;
            if ((copyInput && mEngine->bindingIsInput(i)) || (!copyInput && !mEngine->bindingIsInput(i)))
            {
                if (async)
                    CHECK(cudaMemcpyAsync(dstPtr, srcPtr, byteSize, memcpyType, stream));
                else
                    CHECK(cudaMemcpy(dstPtr, srcPtr, byteSize, memcpyType));
            }
        }
    }

Member Data

Static Public Attributes

static const size_t 	kINVALID_SIZE_VALUE = ~size_t(0)

Private Attributes

// The pointer to the engine. 
std::shared_ptr<nvinfer1::ICudaEngine> samplesCommon::BufferManager::mEngine

// The batch size for legacy networks, 0 otherwise. 
int samplesCommon::BufferManager::mBatchSize

// The vector of pointers to managed buffers. 
std::vector<std::unique_ptr<ManagedBuffer>> samplesCommon::BufferManager::mManagedBuffers

// The vector of device buffers needed for engine execution. 
std::vector<void*> samplesCommon::BufferManager::mDeviceBindings

代码

/*
 * SPDX-FileCopyrightText: Copyright (c) 1993-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: Apache-2.0
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
#ifndef TENSORRT_BUFFERS_H
#define TENSORRT_BUFFERS_H

#include "NvInfer.h"
#include "common.h"
#include "half.h"
#include <cassert>
#include <cuda_runtime_api.h>
#include <iostream>
#include <iterator>
#include <memory>
#include <new>
#include <numeric>
#include <string>
#include <vector>

namespace samplesCommon
{

//!
//! \brief  The GenericBuffer class is a templated class for buffers.
//!
//! \details This templated RAII (Resource Acquisition Is Initialization) class handles the allocation,
//!          deallocation, querying of buffers on both the device and the host.
//!          It can handle data of arbitrary types because it stores byte buffers.
//!          The template parameters AllocFunc and FreeFunc are used for the
//!          allocation and deallocation of the buffer.
//!          AllocFunc must be a functor that takes in (void** ptr, size_t size)
//!          and returns bool. ptr is a pointer to where the allocated buffer address should be stored.
//!          size is the amount of memory in bytes to allocate.
//!          The boolean indicates whether or not the memory allocation was successful.
//!          FreeFunc must be a functor that takes in (void* ptr) and returns void.
//!          ptr is the allocated buffer address. It must work with nullptr input.
//!
template <typename AllocFunc, typename FreeFunc>
class GenericBuffer
{
public:
    //!
    //! \brief Construct an empty buffer.
    //!
    GenericBuffer(nvinfer1::DataType type = nvinfer1::DataType::kFLOAT)
        : mSize(0)
        , mCapacity(0)
        , mType(type)
        , mBuffer(nullptr)
    {
    }

    //!
    //! \brief Construct a buffer with the specified allocation size in bytes.
    //!
    GenericBuffer(size_t size, nvinfer1::DataType type)
        : mSize(size)
        , mCapacity(size)
        , mType(type)
    {
        if (!allocFn(&mBuffer, this->nbBytes()))
        {
            throw std::bad_alloc();
        }
    }

    GenericBuffer(GenericBuffer&& buf)
        : mSize(buf.mSize)
        , mCapacity(buf.mCapacity)
        , mType(buf.mType)
        , mBuffer(buf.mBuffer)
    {
        buf.mSize = 0;
        buf.mCapacity = 0;
        buf.mType = nvinfer1::DataType::kFLOAT;
        buf.mBuffer = nullptr;
    }

    GenericBuffer& operator=(GenericBuffer&& buf)
    {
        if (this != &buf)
        {
            freeFn(mBuffer);
            mSize = buf.mSize;
            mCapacity = buf.mCapacity;
            mType = buf.mType;
            mBuffer = buf.mBuffer;
            // Reset buf.
            buf.mSize = 0;
            buf.mCapacity = 0;
            buf.mBuffer = nullptr;
        }
        return *this;
    }

    //!
    //! \brief Returns pointer to underlying array.
    //!
    void* data()
    {
        return mBuffer;
    }

    //!
    //! \brief Returns pointer to underlying array.
    //!
    const void* data() const
    {
        return mBuffer;
    }

    //!
    //! \brief Returns the size (in number of elements) of the buffer.
    //!
    size_t size() const
    {
        return mSize;
    }

    //!
    //! \brief Returns the size (in bytes) of the buffer.
    //!
    size_t nbBytes() const
    {
        return this->size() * samplesCommon::getElementSize(mType);
    }

    //!
    //! \brief Resizes the buffer. This is a no-op if the new size is smaller than or equal to the current capacity.
    //!
    void resize(size_t newSize)
    {
        mSize = newSize;
        if (mCapacity < newSize)
        {
            freeFn(mBuffer);
            if (!allocFn(&mBuffer, this->nbBytes()))
            {
                throw std::bad_alloc{};
            }
            mCapacity = newSize;
        }
    }

    //!
    //! \brief Overload of resize that accepts Dims
    //!
    void resize(const nvinfer1::Dims& dims)
    {
        return this->resize(samplesCommon::volume(dims));
    }

    ~GenericBuffer()
    {
        freeFn(mBuffer);
    }

private:
    size_t mSize{0}, mCapacity{0};
    nvinfer1::DataType mType;
    void* mBuffer;
    AllocFunc allocFn;
    FreeFunc freeFn;
};

class DeviceAllocator
{
public:
    bool operator()(void** ptr, size_t size) const
    {
        return cudaMalloc(ptr, size) == cudaSuccess;
    }
};

class DeviceFree
{
public:
    void operator()(void* ptr) const
    {
        cudaFree(ptr);
    }
};

class HostAllocator
{
public:
    bool operator()(void** ptr, size_t size) const
    {
        *ptr = malloc(size);
        return *ptr != nullptr;
    }
};

class HostFree
{
public:
    void operator()(void* ptr) const
    {
        free(ptr);
    }
};

using DeviceBuffer = GenericBuffer<DeviceAllocator, DeviceFree>;
using HostBuffer = GenericBuffer<HostAllocator, HostFree>;

//!
//! \brief  The ManagedBuffer class groups together a pair of corresponding device and host buffers.
//!
class ManagedBuffer
{
public:
    DeviceBuffer deviceBuffer;
    HostBuffer hostBuffer;
};

//!
//! \brief  The BufferManager class handles host and device buffer allocation and deallocation.
//!
//! \details This RAII class handles host and device buffer allocation and deallocation,
//!          memcpy between host and device buffers to aid with inference,
//!          and debugging dumps to validate inference. The BufferManager class is meant to be
//!          used to simplify buffer management and any interactions between buffers and the engine.
//!
class BufferManager
{
public:
    static const size_t kINVALID_SIZE_VALUE = ~size_t(0);

    //!
    //! \brief Create a BufferManager for handling buffer interactions with engine.
    //!
    BufferManager(std::shared_ptr<nvinfer1::ICudaEngine> engine, const int batchSize = 0,
        const nvinfer1::IExecutionContext* context = nullptr)
        : mEngine(engine)
        , mBatchSize(batchSize)
    {
        // Full Dims implies no batch size.
        assert(engine->hasImplicitBatchDimension() || mBatchSize == 0);
        // Create host and device buffers
        for (int i = 0; i < mEngine->getNbBindings(); i++)
        {
            auto dims = context ? context->getBindingDimensions(i) : mEngine->getBindingDimensions(i);
            size_t vol = context || !mBatchSize ? 1 : static_cast<size_t>(mBatchSize);
            nvinfer1::DataType type = mEngine->getBindingDataType(i);
            int vecDim = mEngine->getBindingVectorizedDim(i);
            if (-1 != vecDim) // i.e., 0 != lgScalarsPerVector
            {
                int scalarsPerVec = mEngine->getBindingComponentsPerElement(i);
                dims.d[vecDim] = divUp(dims.d[vecDim], scalarsPerVec);
                vol *= scalarsPerVec;
            }
            vol *= samplesCommon::volume(dims);
            std::unique_ptr<ManagedBuffer> manBuf{new ManagedBuffer()};
            manBuf->deviceBuffer = DeviceBuffer(vol, type);
            manBuf->hostBuffer = HostBuffer(vol, type);
            mDeviceBindings.emplace_back(manBuf->deviceBuffer.data());
            mManagedBuffers.emplace_back(std::move(manBuf));
        }
    }

    //!
    //! \brief Returns a vector of device buffers that you can use directly as
    //!        bindings for the execute and enqueue methods of IExecutionContext.
    //!
    std::vector<void*>& getDeviceBindings()
    {
        return mDeviceBindings;
    }

    //!
    //! \brief Returns a vector of device buffers.
    //!
    const std::vector<void*>& getDeviceBindings() const
    {
        return mDeviceBindings;
    }

    //!
    //! \brief Returns the device buffer corresponding to tensorName.
    //!        Returns nullptr if no such tensor can be found.
    //!
    void* getDeviceBuffer(const std::string& tensorName) const
    {
        return getBuffer(false, tensorName);
    }

    //!
    //! \brief Returns the host buffer corresponding to tensorName.
    //!        Returns nullptr if no such tensor can be found.
    //!
    void* getHostBuffer(const std::string& tensorName) const
    {
        return getBuffer(true, tensorName);
    }

    //!
    //! \brief Returns the size of the host and device buffers that correspond to tensorName.
    //!        Returns kINVALID_SIZE_VALUE if no such tensor can be found.
    //!
    size_t size(const std::string& tensorName) const
    {
        int index = mEngine->getBindingIndex(tensorName.c_str());
        if (index == -1)
            return kINVALID_SIZE_VALUE;
        return mManagedBuffers[index]->hostBuffer.nbBytes();
    }

    //!
    //! \brief Templated print function that dumps buffers of arbitrary type to std::ostream.
    //!        rowCount parameter controls how many elements are on each line.
    //!        A rowCount of 1 means that there is only 1 element on each line.
    //!
    template <typename T>
    void print(std::ostream& os, void* buf, size_t bufSize, size_t rowCount)
    {
        assert(rowCount != 0);
        assert(bufSize % sizeof(T) == 0);
        T* typedBuf = static_cast<T*>(buf);
        size_t numItems = bufSize / sizeof(T);
        for (int i = 0; i < static_cast<int>(numItems); i++)
        {
            // Handle rowCount == 1 case
            if (rowCount == 1 && i != static_cast<int>(numItems) - 1)
                os << typedBuf[i] << std::endl;
            else if (rowCount == 1)
                os << typedBuf[i];
            // Handle rowCount > 1 case
            else if (i % rowCount == 0)
                os << typedBuf[i];
            else if (i % rowCount == rowCount - 1)
                os << " " << typedBuf[i] << std::endl;
            else
                os << " " << typedBuf[i];
        }
    }

    //!
    //! \brief Copy the contents of input host buffers to input device buffers synchronously.
    //!
    void copyInputToDevice()
    {
        memcpyBuffers(true, false, false);
    }

    //!
    //! \brief Copy the contents of output device buffers to output host buffers synchronously.
    //!
    void copyOutputToHost()
    {
        memcpyBuffers(false, true, false);
    }

    //!
    //! \brief Copy the contents of input host buffers to input device buffers asynchronously.
    //!
    void copyInputToDeviceAsync(const cudaStream_t& stream = 0)
    {
        memcpyBuffers(true, false, true, stream);
    }

    //!
    //! \brief Copy the contents of output device buffers to output host buffers asynchronously.
    //!
    void copyOutputToHostAsync(const cudaStream_t& stream = 0)
    {
        memcpyBuffers(false, true, true, stream);
    }

    ~BufferManager() = default;

private:
    void* getBuffer(const bool isHost, const std::string& tensorName) const
    {
        int index = mEngine->getBindingIndex(tensorName.c_str());
        if (index == -1)
            return nullptr;
        return (isHost ? mManagedBuffers[index]->hostBuffer.data() : mManagedBuffers[index]->deviceBuffer.data());
    }

    void memcpyBuffers(const bool copyInput, const bool deviceToHost, const bool async, const cudaStream_t& stream = 0)
    {
        for (int i = 0; i < mEngine->getNbBindings(); i++)
        {
            void* dstPtr
                = deviceToHost ? mManagedBuffers[i]->hostBuffer.data() : mManagedBuffers[i]->deviceBuffer.data();
            const void* srcPtr
                = deviceToHost ? mManagedBuffers[i]->deviceBuffer.data() : mManagedBuffers[i]->hostBuffer.data();
            const size_t byteSize = mManagedBuffers[i]->hostBuffer.nbBytes();
            const cudaMemcpyKind memcpyType = deviceToHost ? cudaMemcpyDeviceToHost : cudaMemcpyHostToDevice;
            if ((copyInput && mEngine->bindingIsInput(i)) || (!copyInput && !mEngine->bindingIsInput(i)))
            {
                if (async)
                    CHECK(cudaMemcpyAsync(dstPtr, srcPtr, byteSize, memcpyType, stream));
                else
                    CHECK(cudaMemcpy(dstPtr, srcPtr, byteSize, memcpyType));
            }
        }
    }

    std::shared_ptr<nvinfer1::ICudaEngine> mEngine;              //!< The pointer to the engine
    int mBatchSize;                                              //!< The batch size for legacy networks, 0 otherwise.
    std::vector<std::unique_ptr<ManagedBuffer>> mManagedBuffers; //!< The vector of pointers to managed buffers
    std::vector<void*> mDeviceBindings;                          //!< The vector of device buffers needed for engine execution
};

} // namespace samplesCommon

#endif // TENSORRT_BUFFERS_H

参考:

  • https://www.ccoderun.ca/programming/doxygen/tensorrt/classsamplesCommon_1_1BufferManager.html#aa64f0092469babe813db491696098eb0
  • https://github.com/NVIDIA/TensorRT
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

TensoRT—— buffers管理(samplesCommon::BufferManager) 的相关文章

随机推荐

  • 1.MVC和MV VM的关系图解

    一 xff0c MVC是后端服务器的结构 二 MVVM是前端视图层的概念 xff0c 主要关注于视图分离 xff0c 也就是说 xff1a MVVM把前端视图层 xff0c 分为了三部分 model xff0c view xff0c vm
  • Ubuntu 22.04 root直接登录设置

    Ubuntu 22 04 root直接登录设置 1 安装openssh server软件包 1 1 apt update 1 2 apt install openssh server 3 编辑 etc ssh sshd config 修改下
  • SONiC学习笔记(2)--SONiC架构中各模块间的交互

    SONiC学习笔记 xff08 2 xff09 SONiC架构中各模块间的交互 LLDP state 交互SNMP state 交互Routing state 交互Port state 交互参考资料 LLDP state 交互 下图描述了L
  • NVIDIA Jetson Xavier NX载板 RTSO-6002使用TF(MicroSD)卡重新刷机

    本教程适用于已经挂载过SD卡的NX系统刷机 总结为 xff1a xff08 1 xff09 先将系统烧录至NX板子自带系统上 xff08 2 xff09 再卸载SD卡 格式化 重新分区等 xff08 3 xff09 拷贝roof到SD卡 x
  • public和private的区别

    1 封装的概念 1 public该类或非该类均可以访问 同一个类 xff1a 左大括号和右大括号之间 同一个包的类 xff1a 在一个包 xff08 package xff09 中 xff0c class和public class为同一个包
  • Ubuntu18.04下运行ROS-Gazebo仿真出现Resource not found问题

    出现错误 xff1a Resource not found The following package was not found in span class token operator lt span arg default span
  • ubuntu20.04配置TVM环境

    官方安装教程 xff1a https tvm hyper ai docs install from source 安装环境配置信息 xff1a system xff1a ubuntu20 span class token punctuati
  • 一文读懂TensorRT整数量化

    接下来有空也会整理一些实战性的东西 xff0c 比如结合pointpillars网络 xff0c 用TensorRT进行PTQ int8量化和利用pytorch quantization进行QAT量化 感兴趣可以关注下 xff01 待继续整
  • CUDA动态并行实现快速排序

    简介 排序是任何应用的基本构造块的关键算法之一 有许多排序算法已经被广泛研究 xff0c 常见的排序算法时间和空间复杂度如下 xff1a 一些排序算法属于分治算法的范畴 这些算法适用于并行性 xff0c 并适合 GPU 等架构 xff0c
  • 3d稀疏卷积——spconv源码剖析(一)

    本节主要是介绍下卷积的理论基础 结合spconv代码剖析从第二小节开始介绍 xff0c 本节介绍2D和3D卷积基础理论和稀疏卷积分类 xff0c 后再详细介绍下3d稀疏卷积的工作原理 2D卷积 2D卷积 xff1a 卷积核在输入图像的二维空
  • 基于Spring Cloud Zuul搭建网关服务

    1 网关服务所谓何 在微服务架构风格中 xff0c 一个大应用被拆分成为了多个小的服务系统提供出来 xff0c 这些小的系统他们可以自成体系 xff0c 也就是说这些小系统可以拥有自己的数据库 xff0c 框架甚至语言等 xff0c 这些小
  • Redis 命令

    命令 描述Redis GEOADD 命令 将指定的地理空间位置 xff08 纬度 经度 名称 xff09 添加到指定的key中Redis GEODIST 命令 返回两个给定位置之间的距离Redis GEOHASH 命令 返回一个或多个位置元
  • 3d稀疏卷积——spconv源码剖析(二)

    本文基于OpenPCDet框架中CeneterPoint算法 xff0c 对spconv库中稀疏卷积源码进行剖析 xff1a 首先看OpenPCDet下的pcdet models backbones 3d spconv backbone p
  • 3d稀疏卷积——spconv源码剖析(三)

    构建Rulebook 下面看ops get indice pairs xff0c 位于 xff1a spconv ops py 构建Rulebook由ops get indice pairs接口完成 get indice pairs函数具体
  • 3d稀疏卷积——spconv源码剖析(四)

    普通3d稀疏卷积RuleBook构建 我们继续看普通稀疏卷积RuleBook的建立过程 xff0c 返回src spconv spconv ops cc 看getIndicePairs函数的普通3D稀疏卷积部分 span class tok
  • 3d稀疏卷积——spconv源码剖析(五)

    下面介绍了根据构建的Rulebook执行具体稀疏卷积计算 xff0c 继续看类SparseConvolution 代码位于 xff1a spconv conv py span class token keyword class span s
  • TensoRT API自定义trt网络结构

    这个后续有时间进一步整理 pth转wts 若使用tensorrt加载wts格式 xff0c 需将模型训练的pt pth ckpt等格式权重转换为wts span class token keyword def span span class
  • 生成voxel——spconv源码剖析(六)

    CPU 先看spconv1 0 中cpu版本的generate voxels xff0c spconv1 0无gpu版本 看centerpoint的预处理pcdet datasets processor data processor py
  • CUDA——向量化内存

    许多 CUDA 内核受带宽限制 xff0c 新硬件中触发器与带宽的比率增加导致更多带宽受限内核 这使得采取措施缓解代码中的带宽瓶颈变得非常重要 在本文中 xff0c 我将向您展示如何在 CUDA C C 43 43 中使用矢量加载和存储来帮
  • TensoRT—— buffers管理(samplesCommon::BufferManager)

    BufferManager类处理主机和设备buffer分配和释放 这个RAII类处理主机和设备buffer的分配和释放 主机和设备buffers之间的memcpy以帮助inference xff0c 以及debugging dumps以验证