Pytorch 分析器显示两个不同网络的卷积平均执行时间不同

2024-03-20

我有两个网络,我正在对它们进行分析以查看哪些操作占用了大部分时间。我注意到CUDA time avg为了aten::conv2d不同网络的操作有所不同。这也增加了一个数量级。在我的第一个网络中,它是22us,而对于第二个网络则是3ms。我的第一个网络的卷积层高达512过滤器,但第二个最多只有192过滤器。因此,我预计第二个网络中卷积运算所花费的平均时间应该更短。相反,它高出 3 个数量级。为什么会出现这种情况呢?

完整的分析输出如下

网络1:

                                                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  

                                      cudaLaunchKernel        99.80%     933.739ms        99.80%     933.739ms      20.750ms       0.000us         0.00%       0.000us       0.000us            45  
                                       model_inference         0.05%     453.000us       100.00%     935.567ms     935.567ms       0.000us         0.00%     195.000us     195.000us             1  
                               aten::cudnn_convolution         0.04%     388.000us        99.84%     934.047ms     103.783ms     195.000us       100.00%     195.000us      21.667us             9  
                                    aten::_convolution         0.01%     138.000us        99.88%     934.419ms     103.824ms       0.000us         0.00%     195.000us      21.667us             9  
                                          aten::conv2d         0.01%     122.000us        99.89%     934.592ms     103.844ms       0.000us         0.00%     195.000us      21.667us             9  
                                            aten::add_         0.01%     112.000us         0.02%     155.000us      17.222us       0.000us         0.00%       0.000us       0.000us             9  
                              aten::upsample_nearest2d         0.01%      82.000us         0.01%     105.000us      26.250us       0.000us         0.00%       0.000us       0.000us             4  
                                           aten::empty         0.01%      79.000us         0.01%      79.000us       3.292us       0.000us         0.00%       0.000us       0.000us            24  
                                       aten::threshold         0.01%      74.000us         0.02%     149.000us      18.625us       0.000us         0.00%       0.000us       0.000us             8  
                                            aten::_cat         0.01%      71.000us         0.01%     119.000us      29.750us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::relu         0.01%      57.000us         0.02%     206.000us      25.750us       0.000us         0.00%       0.000us       0.000us             8  
                                     aten::convolution         0.01%      51.000us        99.88%     934.470ms     103.830ms       0.000us         0.00%     195.000us      21.667us             9  
                                            aten::view         0.01%      50.000us         0.01%      50.000us       5.556us       0.000us         0.00%       0.000us       0.000us             9  
                                             aten::cat         0.00%      32.000us         0.02%     151.000us      37.750us       0.000us         0.00%       0.000us       0.000us             4  
                                         aten::reshape         0.00%      29.000us         0.01%      79.000us       8.778us       0.000us         0.00%       0.000us       0.000us             9  
                                         aten::resize_         0.00%      25.000us         0.00%      25.000us       0.962us       0.000us         0.00%       0.000us       0.000us            26  
                                            aten::rsub         0.00%      21.000us         0.00%      33.000us      33.000us       0.000us         0.00%       0.000us       0.000us             1  
                                             aten::mul         0.00%      17.000us         0.00%      27.000us      27.000us       0.000us         0.00%       0.000us       0.000us             1  
                                           aten::zeros         0.00%      13.000us         0.00%      16.000us      16.000us       0.000us         0.00%       0.000us       0.000us             1  
                                       cudaEventRecord         0.00%      12.000us         0.00%      12.000us       1.333us       0.000us         0.00%       0.000us       0.000us             9  
                                       cudaBindTexture         0.00%      11.000us         0.00%      11.000us       2.750us       0.000us         0.00%       0.000us       0.000us             4  
                                   aten::empty_strided         0.00%       6.000us         0.00%       6.000us       6.000us       0.000us         0.00%       0.000us       0.000us             1  
                                           aten::zero_         0.00%       1.000us         0.00%       1.000us       1.000us       0.000us         0.00%       0.000us       0.000us             1  
cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::ma...         0.00%       0.000us         0.00%       0.000us       0.000us     195.000us       100.00%     195.000us     195.000us             1  
                                     cudaUnbindTexture         0.00%       0.000us         0.00%       0.000us       0.000us       0.000us         0.00%       0.000us       0.000us             4  
Self CPU time total: 935.583ms
Self CUDA time total: 195.000us

网络2:

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                        cudaMemcpyAsync        42.86%        1.035s        42.86%        1.035s      11.495ms       0.000us         0.00%       0.000us       0.000us            90  
                                       cudaLaunchKernel        34.81%     840.325ms        34.81%     840.325ms     169.969us       0.000us         0.00%       0.000us       0.000us          4944  
                                  cudaStreamSynchronize        15.92%     384.331ms        15.92%     384.331ms       5.736ms       0.000us         0.00%       0.000us       0.000us            67  
                                        model_inference         1.51%      36.559ms       100.00%        2.414s        2.414s       0.000us         0.00%        1.215s        1.215s             1  
                                            aten::fill_         1.03%      24.843ms        34.91%     842.670ms       7.731ms       8.759ms         0.72%       8.759ms      80.358us           109  
                                              aten::sum         0.57%      13.648ms         0.91%      22.019ms      18.123us      57.415ms         4.73%      57.415ms      47.255us          1215  
                                            aten::slice         0.50%      12.124ms         0.59%      14.229ms       3.526us       0.000us         0.00%       0.000us       0.000us          4035  
                                              aten::mul         0.49%      11.935ms         0.88%      21.340ms      17.293us     492.228ms        40.52%     492.228ms     398.888us          1234  
                                            aten::empty         0.44%      10.568ms         0.44%      10.568ms       2.556us       0.000us         0.00%       0.000us       0.000us          4134  
                                            aten::clamp         0.31%       7.455ms         0.84%      20.342ms      19.485us      12.405ms         1.02%      24.810ms      23.764us          1044  
                                              aten::add         0.25%       6.053ms         0.36%       8.615ms      14.334us      33.147ms         2.73%      33.147ms      55.153us           601  
                                aten::cudnn_convolution         0.18%       4.459ms         0.27%       6.549ms      46.779us     423.769ms        34.88%     423.769ms       3.027ms           140  
                                              aten::div         0.16%       3.892ms         0.27%       6.584ms      16.098us       3.225ms         0.27%       3.225ms       7.885us           409  
                                          aten::resize_         0.09%       2.287ms         0.10%       2.445ms       2.582us      75.000us         0.01%      75.000us       0.079us           947  
                                            aten::copy_         0.09%       2.226ms        58.96%        1.423s       6.498ms      80.877ms         6.66%      81.024ms     369.973us           219  
                                             aten::_cat         0.09%       2.087ms         0.12%       2.971ms      34.547us      26.689ms         2.20%      26.689ms     310.337us            86  
                                       aten::as_strided         0.09%       2.082ms         0.10%       2.305ms       0.554us       0.000us         0.00%       0.000us       0.000us          4164  
                                  aten::constant_pad_nd         0.06%       1.497ms        34.09%     822.790ms       9.350ms       0.000us         0.00%      46.706ms     530.750us            88  
                                     aten::_convolution         0.05%       1.113ms         0.38%       9.142ms      65.300us       0.000us         0.00%     440.725ms       3.148ms           140  
                                              aten::sub         0.04%       1.082ms         0.08%       1.905ms      18.676us      16.975ms         1.40%      16.975ms     166.422us           102  
                                       aten::leaky_relu         0.03%     727.000us         0.05%       1.253ms      19.277us      11.039ms         0.91%      11.039ms     169.831us            65  
                                       aten::reciprocal         0.03%     722.000us         0.05%       1.258ms      17.971us      10.340ms         0.85%      10.340ms     147.714us            70  
                                            aten::index         0.03%     707.000us         0.09%       2.140ms      66.875us      16.861ms         1.39%      17.207ms     537.719us            32  
                                             aten::add_         0.03%     672.000us         0.04%       1.027ms      14.671us      16.956ms         1.40%      16.956ms     242.229us            70  
                                           aten::conv2d         0.03%     610.000us         0.43%      10.298ms      73.557us       0.000us         0.00%     440.725ms       3.148ms           140  
                                             aten::view         0.03%     605.000us         0.03%     619.000us       2.623us       0.000us         0.00%       0.000us       0.000us           236  
                                    aten::empty_strided         0.02%     564.000us         0.02%     564.000us       6.409us       0.000us         0.00%       0.000us       0.000us            88  
                                      aten::convolution         0.02%     546.000us         0.40%       9.688ms      69.200us       0.000us         0.00%     440.725ms       3.148ms           140  
                                           aten::narrow         0.02%     534.000us         0.06%       1.388ms       4.131us       0.000us         0.00%       0.000us       0.000us           336  
                                              aten::cat         0.02%     511.000us         0.14%       3.482ms      40.488us       0.000us         0.00%      26.689ms     310.337us            86  
                                               aten::to         0.02%     413.000us        58.86%        1.421s       9.665ms       0.000us         0.00%      42.584ms     289.687us           147  
                                             aten::rsub         0.02%     374.000us         0.03%     616.000us      19.250us      92.000us         0.01%      92.000us       2.875us            32  
                                           aten::select         0.01%     311.000us         0.01%     354.000us       4.023us       0.000us         0.00%       0.000us       0.000us            88  
                                          aten::reshape         0.01%     304.000us         0.03%     660.000us       3.976us       0.000us         0.00%       0.000us       0.000us           166  
                                             aten::ceil         0.01%     265.000us         0.03%     717.000us      21.088us     606.000us         0.05%       1.212ms      35.647us            34  
                                          aten::permute         0.01%     214.000us         0.01%     249.000us       4.446us       0.000us         0.00%       0.000us       0.000us            56  
                              aten::upsample_bilinear2d         0.01%     199.000us         0.03%     629.000us      34.944us       2.185ms         0.18%       2.260ms     125.556us            18  
                                           aten::expand         0.01%     189.000us         0.01%     246.000us       3.417us       0.000us         0.00%       0.000us       0.000us            72  
                                             aten::ones         0.01%     180.000us         1.02%      24.632ms     947.385us       0.000us         0.00%       0.000us       0.000us            26  
                                               aten::gt         0.01%     162.000us         0.02%     474.000us      29.625us     496.000us         0.04%     992.000us      62.000us            16  
                                           aten::repeat         0.01%     154.000us         0.03%     724.000us      60.333us       0.000us         0.00%       0.000us       0.000us            12  
                                        cudaEventRecord         0.01%     146.000us         0.01%     146.000us       1.043us       0.000us         0.00%       0.000us       0.000us           140  
                                        aten::unsqueeze         0.01%     144.000us         0.01%     177.000us       3.404us       0.000us         0.00%       0.000us       0.000us            52  
                                       aten::contiguous         0.01%     139.000us         0.03%     735.000us      22.969us       0.000us         0.00%     346.000us      10.812us            32  
                                             aten::mean         0.01%     137.000us         0.01%     214.000us      23.778us     131.000us         0.01%     131.000us      14.556us             9  
                                           aten::arange         0.01%     124.000us         0.01%     242.000us      10.083us       0.000us         0.00%       0.000us       0.000us            24  
                                       aten::empty_like         0.01%     123.000us         0.01%     284.000us       5.680us       0.000us         0.00%       0.000us       0.000us            50  
                                        cudaBindTexture         0.01%     121.000us         0.01%     121.000us       3.025us       0.000us         0.00%       0.000us       0.000us            40  
                                            aten::stack         0.00%     112.000us         0.03%     802.000us      50.125us       0.000us         0.00%     158.000us       9.875us            16  
                                            aten::floor         0.00%      77.000us         0.01%     191.000us      23.875us      18.000us         0.00%      36.000us       4.500us             8  
                                         aten::moveaxis         0.00%      73.000us         0.01%     276.000us      11.500us       0.000us         0.00%       0.000us       0.000us            24  
                                          aten::movedim         0.00%      67.000us         0.01%     203.000us       8.458us       0.000us         0.00%       0.000us       0.000us            24  
                                           aten::unfold         0.00%      61.000us         0.00%      82.000us       2.562us       0.000us         0.00%       0.000us       0.000us            32  
                                      aten::leaky_relu_         0.00%      51.000us         0.00%     119.000us      23.800us       0.000us         0.00%     789.000us     157.800us             5  
                                         aten::_s_where         0.00%      51.000us         0.00%      91.000us      22.750us     536.000us         0.04%     536.000us     134.000us             4  
                                            aten::clone         0.00%      36.000us         0.01%     159.000us      31.800us       0.000us         0.00%     435.000us      87.000us             5  
                                            aten::where         0.00%      34.000us         0.01%     174.000us      43.500us       0.000us         0.00%     536.000us     134.000us             4  
                                        aten::expand_as         0.00%      27.000us         0.00%      70.000us       4.375us       0.000us         0.00%       0.000us       0.000us            16  
                                            aten::zeros         0.00%      18.000us         0.00%      29.000us      14.500us       0.000us         0.00%       0.000us       0.000us             2  
                                             aten::item         0.00%      16.000us         0.00%      22.000us       2.750us       0.000us         0.00%       0.000us       0.000us             8  
                                          aten::detach_         0.00%      10.000us         0.00%      15.000us       3.750us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::alias         0.00%       8.000us         0.00%       8.000us       0.667us       0.000us         0.00%       0.000us       0.000us            12  
                              aten::_local_scalar_dense         0.00%       6.000us         0.00%       6.000us       0.750us       0.000us         0.00%       0.000us       0.000us             8  
                                                detach_         0.00%       5.000us         0.00%       5.000us       1.250us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::zero_         0.00%       2.000us         0.00%       2.000us       1.000us       0.000us         0.00%       0.000us       0.000us             2  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      41.981ms         3.46%      41.981ms     626.582us            67  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       8.759ms         0.72%       8.759ms     105.530us            83  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      37.512ms         3.09%      37.512ms     451.952us            83  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      65.145ms         5.36%      65.145ms     208.131us           313  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     416.783ms        34.31%     416.783ms     494.992us           842  
void at::native::reduce_kernel<256, 2, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       2.070ms         0.17%       2.070ms       8.519us           243  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      12.051ms         0.99%      12.051ms      24.950us           483  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       3.225ms         0.27%       3.225ms       7.885us           409  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      12.284ms         1.01%      12.284ms      24.277us           506  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      26.580ms         2.19%      26.580ms     359.189us            74  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      11.039ms         0.91%      11.039ms     169.831us            65  
                         Memcpy DtoD (Device -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us     510.000us         0.04%     510.000us      22.174us            23  
cudnn::maxwell::gemm::computeOffsetsKernel(cudnn::ma...         0.00%       0.000us         0.00%       0.000us       0.000us      62.000us         0.01%      62.000us       5.167us            12  
                 maxwell_scudnn_128x32_relu_interior_nn         0.00%       0.000us         0.00%       0.000us       0.000us       1.320ms         0.11%       1.320ms     132.000us            10  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      10.340ms         0.85%      10.340ms     147.714us            70  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      10.300ms         0.85%      10.300ms     130.380us            79  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      50.898ms         4.19%      50.898ms     242.371us           210  
void cudnn::winograd::generateWinogradTilesKernel<0,...         0.00%       0.000us         0.00%       0.000us       0.000us       1.166ms         0.10%       1.166ms      13.250us            88  
maxwell_scudnn_winograd_128x128_ldg1_ldg4_tile148n_n...         0.00%       0.000us         0.00%       0.000us       0.000us     150.355ms        12.38%     150.355ms       1.709ms            88  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       3.775ms         0.31%       3.775ms      78.646us            48  
                maxwell_scudnn_128x128_relu_interior_nn         0.00%       0.000us         0.00%       0.000us       0.000us     106.000us         0.01%     106.000us     106.000us             1  
                   maxwell_scudnn_128x128_relu_small_nn         0.00%       0.000us         0.00%       0.000us       0.000us     104.000us         0.01%     104.000us     104.000us             1  
                                      cudaUnbindTexture         0.00%       0.000us         0.00%       0.000us       0.000us       0.000us         0.00%       0.000us       0.000us            40  
void cudnn::detail::implicit_convolve_sgemm<float, f...         0.00%       0.000us         0.00%       0.000us       0.000us      12.632ms         1.04%      12.632ms     789.500us            16  
void at::native::reduce_kernel<256, 2, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      10.000us         0.00%      10.000us      10.000us             1  
void at::native::(anonymous namespace)::upsample_bil...         0.00%       0.000us         0.00%       0.000us       0.000us       2.185ms         0.18%       2.185ms     121.389us            18  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     606.000us         0.05%     606.000us      35.647us            17  
void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     121.000us         0.01%     121.000us      15.125us             8  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      18.000us         0.00%      18.000us       4.500us             4  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     103.000us         0.01%     103.000us      12.875us             8  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     121.000us         0.01%     121.000us       7.562us            16  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us     109.000us         0.01%     109.000us      13.625us             8  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     354.000us         0.03%     354.000us      11.062us            32  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      92.000us         0.01%      92.000us       2.875us            32  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     346.000us         0.03%     346.000us      10.812us            32  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 2.414s
Self CUDA time total: 1.215s

分析代码:

with torch.no_grad():
  with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
    with record_function("model_inference"):
      output_batch = self.frame_predictor(input_batch)
  print(prof.key_averages().table(sort_by="self_cuda_time_total", row_limit=10))

None

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Pytorch 分析器显示两个不同网络的卷积平均执行时间不同 的相关文章

随机推荐