MMDetection 3.x中的PackDetInputs

2023-11-06

MMDetection 3.X 里面对pipeline有一个重点修改是新增了 PackDetInputs，有利于统一进行检测 /语义分割 /全景分割任务。

从配置文件中我们可以看出包含LoadImageFromFile、LoadAnnotations、RandomFlip、RandomChoice和PackDetInputs五大步骤。

关于源码理解可以参考这位博主的MMDetection 3.x Pipeline 源码调试。

下面主要看下PackDetInputs，经过了PackDetInputs的变换，results重新规范化了一下，更标准化的输入数据有利于进行检测 /语义分割 /全景分割。源码附在最后，其keys默认包括'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'。

那么这些keys怎么得知呢？可以在函数定义的注释中查看，例如RandomFlip函数（mmdetection-3.0.0\mmdet\datasets\transforms\transforms.py）可以看到Added Keys有

- flip
- flip_direction
- homography_matrix

class RandomFlip(MMCV_RandomFlip):
    """Flip the image & bbox & mask & segmentation map. Added or Updated keys:
    flip, flip_direction, img, gt_bboxes, and gt_seg_map. There are 3 flip
    modes:

     - ``prob`` is float, ``direction`` is string: the image will be
         ``direction``ly flipped with probability of ``prob`` .
         E.g., ``prob=0.5``, ``direction='horizontal'``,
         then image will be horizontally flipped with probability of 0.5.
     - ``prob`` is float, ``direction`` is list of string: the image will
         be ``direction[i]``ly flipped with probability of
         ``prob/len(direction)``.
         E.g., ``prob=0.5``, ``direction=['horizontal', 'vertical']``,
         then image will be horizontally flipped with probability of 0.25,
         vertically with probability of 0.25.
     - ``prob`` is list of float, ``direction`` is list of string:
         given ``len(prob) == len(direction)``, the image will
         be ``direction[i]``ly flipped with probability of ``prob[i]``.
         E.g., ``prob=[0.3, 0.5]``, ``direction=['horizontal',
         'vertical']``, then image will be horizontally flipped with
         probability of 0.3, vertically with probability of 0.5.


    Required Keys:

    - img
    - gt_bboxes (BaseBoxes[torch.float32]) (optional)
    - gt_masks (BitmapMasks | PolygonMasks) (optional)
    - gt_seg_map (np.uint8) (optional)

    Modified Keys:

    - img
    - gt_bboxes
    - gt_masks
    - gt_seg_map

    Added Keys:

    - flip
    - flip_direction
    - homography_matrix

PackDetInputs定义：

@TRANSFORMS.register_module()
class PackDetInputs(BaseTransform):
    """Pack the inputs data for the detection / semantic segmentation /
    panoptic segmentation.

    The ``img_meta`` item is always populated.  The contents of the
    ``img_meta`` dictionary depends on ``meta_keys``. By default this includes:

        - ``img_id``: id of the image

        - ``img_path``: path to the image file

        - ``ori_shape``: original shape of the image as a tuple (h, w, c)

        - ``img_shape``: shape of the image input to the network as a tuple \
            (h, w, c).  Note that images may be zero padded on the \
            bottom/right if the batch tensor is larger than this shape.

        - ``scale_factor``: a float indicating the preprocessing scale

        - ``flip``: a boolean indicating if image flip transform was used

        - ``flip_direction``: the flipping direction

    Args:
        meta_keys (Sequence[str], optional): Meta keys to be converted to
            ``mmcv.DataContainer`` and collected in ``data[img_metas]``.
            Default: ``('img_id', 'img_path', 'ori_shape', 'img_shape',
            'scale_factor', 'flip', 'flip_direction')``
    """
    mapping_table = {
        'gt_bboxes': 'bboxes',
        'gt_bboxes_labels': 'labels',
        'gt_masks': 'masks'
    }

    def __init__(self,
                 meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                            'scale_factor', 'flip', 'flip_direction')):
        self.meta_keys = meta_keys

    def transform(self, results: dict) -> dict:
        """Method to pack the input data.

        Args:
            results (dict): Result dict from the data pipeline.

        Returns:
            dict:

            - 'inputs' (obj:`torch.Tensor`): The forward data of models.
            - 'data_sample' (obj:`DetDataSample`): The annotation info of the
                sample.
        """
        packed_results = dict()
        if 'img' in results:
            img = results['img']
            if len(img.shape) < 3:
                img = np.expand_dims(img, -1)
            img = np.ascontiguousarray(img.transpose(2, 0, 1))
            packed_results['inputs'] = to_tensor(img)

        if 'gt_ignore_flags' in results:
            valid_idx = np.where(results['gt_ignore_flags'] == 0)[0]
            ignore_idx = np.where(results['gt_ignore_flags'] == 1)[0]

        data_sample = DetDataSample()
        instance_data = InstanceData()
        ignore_instance_data = InstanceData()

        for key in self.mapping_table.keys():
            if key not in results:
                continue
            if key == 'gt_masks' or isinstance(results[key], BaseBoxes):
                if 'gt_ignore_flags' in results:
                    instance_data[
                        self.mapping_table[key]] = results[key][valid_idx]
                    ignore_instance_data[
                        self.mapping_table[key]] = results[key][ignore_idx]
                else:
                    instance_data[self.mapping_table[key]] = results[key]
            else:
                if 'gt_ignore_flags' in results:
                    instance_data[self.mapping_table[key]] = to_tensor(
                        results[key][valid_idx])
                    ignore_instance_data[self.mapping_table[key]] = to_tensor(
                        results[key][ignore_idx])
                else:
                    instance_data[self.mapping_table[key]] = to_tensor(
                        results[key])
        data_sample.gt_instances = instance_data
        data_sample.ignored_instances = ignore_instance_data

        if 'proposals' in results:
            data_sample.proposals = InstanceData(bboxes=results['proposals'])

        if 'gt_seg_map' in results:
            gt_sem_seg_data = dict(
                sem_seg=to_tensor(results['gt_seg_map'][None, ...].copy()))
            data_sample.gt_sem_seg = PixelData(**gt_sem_seg_data)

        img_meta = {}
        for key in self.meta_keys:
            img_meta[key] = results[key]

        data_sample.set_metainfo(img_meta)
        packed_results['data_samples'] = data_sample

        return packed_results

    def __repr__(self) -> str:
        repr_str = self.__class__.__name__
        repr_str += f'(meta_keys={self.meta_keys})'
        return repr_str

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

AIMLDL

AI

MMDetection 3.x中的PackDetInputs 的相关文章

《因果学习周刊》第6期：因果推荐系统

No 06 智源社区因果学习组因果学习研究观点资源活动关于周刊因果学习作为人工智能领域研究热点之一其研究进展与成果也引发了众多关注为帮助研究与工程人员了解该领域的相关进展和资讯智源社区结合领域内容撰写了第6期
【GeekUninstaller】卸载程序

软件介绍删除不了的文件或者软件可以下载试试不需要安装文章目录前言一如何下载二使用步骤 1 安装完之后自动打开前言 GeekUninstallers是一款高效快速小巧免费的软件卸载与清理工具旨在帮助用户删除系统上安装

随机推荐

caffe源码追踪--syncedmem

首先来看看caffe include caffe syncedmem hpp ifndef CAFFE SYNCEDMEM HPP define CAFFE SYNCEDMEM HPP include
深度学习之 python pandas

在数据科学领域 pandas是非常有用的工具在数据科学细分领域大数据通常和深度学习有关这部分本篇博客从pandas重要函数开始到数据变换以及数据分析 pandas提供了数据变换数据清理数据可视化以及数据提取等主要数据处理功能
tar -xf_linux 解压缩命令tar

linux环境下常见的压缩文件格式 tar tar gz tar bz2 tar xz 参数 c create create a new archive 创建文件 x extract get extract files from an ar
静态资源上传七牛云

一七牛云SDK function 请参考demo的index js中的initQiniu 方法若在使用处对options进行了赋值则此处config不需要赋默认值 init options 即updateConfigWithOptio
Python爬虫实战(五) :下载百度贴吧帖子里的所有图片

准备工作目标网址 https tieba baidu com p 5113603072 目的下载该页面上的所有楼层里的照片第一步分析网页源码火狐浏览器 gt 在该页面上右击查看页面源代码会打开一个新的标签页第二步查找图片源
ue4蓝图中的customevent和function的细微差别。

在调用第三方库时我用customEvent时可以调用LowEntryHttpRequest中的 Executes the request This blueprint can NOT execute several HTTP Reque
记录一下浏览器缩放和移动端缩放的区别，其实两者是有很大的不同的，之前一直搞不明白。

直接问AI它们之间的区别的话是这么回答的浏览器缩放和移动端缩放是两种不同的概念它们涉及到用户在不同设备上改变网页内容大小的方式以下是它们的主要区别浏览器缩放 Desktop Browser Zoom 浏览器缩放是指在桌面计算机浏览
以太坊学习计划1

1 如果链接远程链需要上链才可以打开服务才可以 2 开启本地geth 服务下载https geth ethereum org downloads 默认启动geth服务不启动rpc服务手动用命令行启动 geth rpc 代码端调用
C++的使用小教程8——多态与接口

C 的使用小教程8 多态与接口 1 什么是多态与接口 2 实现方式 3 应用实例学习好幸苦 1 什么是多态与接口 C 多态意味着调用成员函数时会根据调用函数的对象的类型来执行不同的函数接口描述了类的行为和功能而不需要完成类的特定实现
Qgis国际化

参考文章 QT实现多国语言几点需要注意的 1 pro文件生成方法 2 ts文件生成方法输入命令 lupdate f code QT Code QtApplication2 QtApplication2 QtApplication pro
Vit，DeiT，DeepViT，CaiT，CPVT，CVT，CeiT简介

Vit 最基础的就是将transformer的encoder取出来输入图像大小维度 B C H W 将图片不重叠地划分为H patch height w patch weight个patch 每个patch为patch height p
Spark相关问题

Spark相关问题 Hadoop FileFormat接口问题 Hadoop FileOutputFormat在写入数据的时候先写到临时目录最后写入最终目录临时目录到最终目录的过程中需要做文件树合并合并过程中有大量Rename操作 F
Hash函数

概述 Hash函数散列函数是一种将任意长度的数据映射到有限长度的域上通俗来讲就是将一串任意长度的数据进行打乱混合转换为一段固定长度的数据输出这段数据便成为输入数据的一个指纹特征 Hash函数的首要目标是保证数据的完整性而不
css连续的纯数字或字母强制换行

white space normal word break break all
一些网站1

N1BOOK平台 Nu1L Team Nu1L Team 0004 Median of Two Sorted Arrays LeetCode Cookbook 题库力扣 LeetCode 全球极客挚爱的技术成长平台
解决shell断开后java进程被结束

偶尔会碰到用SecureCRT在shell启动java进程并后台运行命令最后加的时候因为断电死机等原因断开shell 然后进程被结束了运维大佬也说用他们的工具启动进程后一断开连接进程就结束了后来查到是因为shell在断开的时候会向
漫谈数据挖掘从入门到进阶

做数据挖掘也有些年头了写这篇文一方面是写篇文给有个朋友作为数据挖掘方面的参考另一方面也是有抛砖引玉之意希望能够和一些大牛交流相互促进让大家见笑了入门数据挖掘入门的书籍中文的大体有这些 Jiawei Han的数据挖掘概念与
Day_1 Part_4 Structures of R

1 Vector Matrix Array 1 1 What are they Collection of observations Vector 1 dimensional Matrix 2 dimensional Array 3 dim
常见web漏洞及防范（转）

单个漏洞需要进行排查与整改借着别人的智慧做一个简单的收集最好能够将常见漏洞不限于web类的进行一个统一的整理这是今年的任务进行漏洞的工具的收集为未来的工作做好基础一 SQL注入漏洞 SQL注入攻击 SQL Injecti
MMDetection 3.x中的PackDetInputs

MMDetection 3 X 里面对pipeline有一个重点修改是新增了 PackDetInputs 有利于统一进行检测语义分割全景分割任务从配置文件中我们可以看出包含LoadImageFromFile LoadAnnotati

MMDetection 3.x中的PackDetInputs

MMDetection 3.x中的PackDetInputs 的相关文章

随机推荐

热门标签