【Pytorch｜Bug】解决 RuntimeError: Error(s) in loading state_dict for Network: size mismatch

2023-05-16

文章目录

问题背景
解决方法

问题背景

Github开源项目：https://github.com/zhang-tao-whu/e2ec

python train_net.py coco_finetune --bs 12 \
--type finetune --checkpoint data/model/model_coco.pth

报错如下：

loading annotations into memory...
Done (t=0.09s)
creating index...
index created!
load model: data/model/model_coco.pth
Traceback (most recent call last):
  File "train_net.py", line 67, in <module>
    main()
  File "train_net.py", line 64, in main
    train(network, cfg)
  File "train_net.py", line 40, in train
    begin_epoch = load_network(network, model_dir=args.checkpoint, strict=False)
  File "/root/autodl-tmp/e2ec/train/model_utils/utils.py", line 66, in load_network
    net.load_state_dict(net_weight, strict=strict)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Network:
        size mismatch for dla.ct_hm.2.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 256, 1, 1]).
        size mismatch for dla.ct_hm.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([1]).

由于我自己的数据集类别只有1，而COCO数据集有80个类别，预训练模型中dla.ct_hm.2 参数的大小与我的不符，所以需要舍弃预训练模型中的这个参数的权重。

解决方法

在 e2ec/train/model_utils/utils.py 中修改：

def load_network(net, model_dir, strict=True, map_location=None):

    if not os.path.exists(model_dir):
        print(colored('WARNING: NO MODEL LOADED !!!', 'red'))
        return 0

    print('load model: {}'.format(model_dir))
    if map_location is None:
        pretrained_model = torch.load(model_dir, map_location={'cuda:0': 'cpu', 'cuda:1': 'cpu',
                                                               'cuda:2': 'cpu', 'cuda:3': 'cpu'})
    else:
        pretrained_model = torch.load(model_dir, map_location=map_location)
    if 'epoch' in pretrained_model.keys():
        epoch = pretrained_model['epoch'] + 1
    else:
        epoch = 0
    pretrained_model = pretrained_model['net']

    net_weight = net.state_dict()
    for key in net_weight.keys():
        net_weight.update({key: pretrained_model[key]})
    '''
	舍弃部分参数
	'''
    net_weight.pop("dla.ct_hm.2.weight")
    net_weight.pop("dla.ct_hm.2.bias")
    
    net.load_state_dict(net_weight, strict=strict)
    return epoch

注意：load_state_dict 中设置 strict=False 只对增加或删除部分层有用，对于在原来参数上改变维度大小的情况不适用。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Pytorch

Bug

RuntimeError

Error

loading

【Pytorch｜Bug】解决 RuntimeError: Error(s) in loading state_dict for Network: size mismatch 的相关文章

如何避免 PyTorch 中的“CUDA 内存不足”

我认为对于 GPU 内存较低的 PyTorch 用户来说这是一个非常常见的消息 RuntimeError CUDA out of memory Tried to allocate X MiB GPU X X GiB total capac
一次热编码期间出现 RunTimeError

我有一个数据集其中类值以 1 步从 2 到 2 i e 2 1 0 1 2 其中 9 标识未标记的数据使用一种热编码 self one hot encode labels 我收到以下错误 RuntimeError index 1 is
如何在pytorch中查看DataLoader中的数据

我在 Github 上的示例中看到类似以下内容如何查看该数据的类型形状和其他属性 train data MyDataset int 1e3 length 50 train iterator DataLoader train data b
PySide.QtGui RuntimeError：对象基类的“__init__”方法未调用...但它是

一些环境基础知识 Python版本 3 4 2 操作系统 Windows 8 1 到目前为止的搜索我怀疑这另一个问题 https stackoverflow com questions 12280371 python runtimeerr
为什么 pytorch matmul 在 cpu 和 gpu 上执行时得到不同的结果？

我试图找出 numpy pytorch gpu cpu float16 float32 数字之间的舍入差异而我发现的内容让我感到困惑基本版本是 a torch rand 3 4 dtype torch float32 b torch r
如何在滚动 iPhone 上向 tableview 添加元素？

我正在使用 UITableView 列出来自 Web 服务的元素我需要做的是首先从Web服务调用20个元素并显示在列表中当用户向下滚动时从Web服务调用另外20个记录并添加到表格视图这个怎么做您可以从 Web 服务加载 20 个项目
pytorch 中的 autograd 可以处理同一模块中层的重复使用吗？

我有一层layer in an nn Module并在一次中使用两次或多次forward步这个的输出layer稍后输入到相同的layer pytorch可以吗autograd正确计算该层权重的梯度 def forward x x self
Blenderbot 微调

我一直在尝试微调 HuggingFace 的对话模型 Blendebot 我已经尝试过官方拥抱脸网站上给出的传统方法该方法要求我们使用 trainer train 方法来完成此操作我使用 compile 方法尝试了它我尝试过使用 Py
Pytorch ValueError：优化器得到一个空参数列表

当尝试创建神经网络并使用 Pytorch 对其进行优化时我得到了 ValueError 优化器得到一个空参数列表这是代码 import torch nn as nn import torch nn functional as F fro
如何有效地对一个数组中某个值在另一个数组中的位置出现的次数求和

我正在寻找一种有效的 for 循环避免解决方案来解决我遇到的数组相关问题我想使用一个巨大的一维数组 A gt size 250 000 用于一维索引的 0 到 40 之间的值以及用于第二维索引的具有 0 到 9995 之间的值的相同大
torch.stack() 和 torch.cat() 函数有什么区别？

OpenAI 的强化学习 REINFORCE 和 actor critic 示例具有以下代码加强 https github com pytorch examples blob master reinforcement learning r
update_all_contenttypes 似乎不适用于 Django 1.8

我在运行某个函数时收到以下错误 django contrib contenttypes models DoesNotExist ContentType matching query does not exist 根据post syncdb
Pytorch GPU 使用率低

我正在尝试 pytorch 的例子https pytorch org tutorials beginner blitz cifar10 tutorial html https pytorch org tutorials beginner b
如何在数据加载期间 IsBusy 为 true 时至少显示一次 Lottie 动画？

On my Xamarin Forms 项目我想显示一个洛蒂动画 during API调用或期间加载网站 in a WebView 为此我限制了IsVisible的财产洛蒂动画 to the IsBusy我的财产视图模型这个效果很好
PyTorch 中的交叉熵

交叉熵公式但为什么下面给出loss 0 7437代替loss 0 since 1 log 1 0 import torch import torch nn as nn from torch autograd import Variable
为什么 Rust 在运行时检查数组边界，而（大多数）其他检查发生在编译时？

正在阅读基本介绍 http doc rust lang org book arrays vectors and slices html 如果您尝试使用不在数组中的下标您将收到错误数组访问在运行时进行边界检查为什么 Rust 在运行时检
HDF5 库错误

我正在使用以下 1 VS 2010 C 2 调试Win 32 3 图书馆从这里 http www hdfgroup org HDF5 release obtain5 html http www hdfgroup org HDF5 relea
样本（）和r样本（）有什么区别？

当我从 PyTorch 中的发行版中采样时两者sample and rsample似乎给出了类似的结果 import torch seaborn as sns x torch distributions Normal torch tens
TensorFlow 相当于 PyTorch 的 Transforms.Normalize()

我正在尝试推断最初在 PyTorch 中构建的 TFLite 模型我一直在遵循PyTorch 实现 https github com leoxiaobin deep high resolution net pytorch blob 1ee
PyTorch 给出 cuda 运行时错误

我对我的代码做了一些小小的修改以便它不使用 DataParallel and DistributedDataParallel 代码如下 import argparse import os import shutil import time

随机推荐