【ModelArts系列】华为ModelArts Notebook训练yolov3模型(开发环境)

2023-11-07

一、参考资料

二、相关介绍

在ModelArts的 notebook中运行ModelZoo中模型,以yolov3为例,训练集为 COCO2014。

运行环境:ModelArts notebook
模型:ModelZoo,yolov3
数据集:COCO2014
镜像:tensorflow1.15-mindspore1.5.1-cann5.0.2-euler2.8-aarch64
规格:Ascend: 1*Ascend-910(32GB) | ARM: 24 核 96GB

如果要删除notebook,请及时备份到obs中,以免造成不必要的麻烦

三、关键操作

3.1 准备数据集

华为 OBS上传notebook以及notebook上传到OBS

  1. 下载COCO2014数据集,下载地址:

    链接:https://pan.baidu.com/s/16sxIpFs-hd-6FzN2rSHgqA
    提取码:1234

  2. 数据集上传到obs

    用obs-browser客户端上传COCO2014数据集到OBS。
    在这里插入图片描述

  3. 拷贝obs数据集到notebook

    在notebook操作

import moxing as mox

# COCO_COCO_2014_Train_Images.zip
mox.file.copy_parallel('obs://liulingjun-demo/datasets/COCO2014/COCO_COCO_2014_Train_Val_annotations.zip','/home/ma-user/work/COCO_COCO_2014_Train_Val_annotations.zip')
print('Copy procedure is completed !')

# COCO_COCO_2014_Val_Images.zip
mox.file.copy_parallel('obs://liulingjun-demo/datasets/COCO2014/COCO_COCO_2014_Val_Images.zip','/home/ma-user/work/COCO_COCO_2014_Val_Images.zip')
print('Copy procedure is completed !')

# COCO_COCO_2014_Train_Images.zip
mox.file.copy_parallel('obs://liulingjun-demo/datasets/COCO2014/COCO_COCO_2014_Train_Images.zip','/home/ma-user/work/COCO_COCO_2014_Train_Images.zip')
print('Copy procedure is completed !')
  1. 解压数据集

    在notebook操作

    cd work
    
    unzip COCO_COCO_2014_Train_Val_annotations.zip -d ./COCO2014
    
    [ma-user COCO2014]$ll
    total 6780
    drwx------ 2 ma-user ma-group    4096 Jun 18 08:43 annotations
    drwxrwxr-x 2 ma-user ma-group 4620288 Aug 16  2014 train2014
    drwxrwxr-x 2 ma-user ma-group 2310144 Aug 16  2014 val2014
    
  2. (可选)解压后的数据集拷贝回obs

    import moxing as mox
    mox.file.copy_parallel('/home/ma-user/work/COCO2014', 'obs://liulingjun-demo/yolov3/dataset')
    print('Copy procedure is completed !')
    

3.2 准备预训练模型

在这里插入图片描述

YOLOv3_TensorFlow_1.6_model/single/ckpt 路径下的模型文件拷贝到 YoloV3_for_TensorFlow_1.6_code/data/darknet_weights,并重命名为 darknet53.ckpt

3.3 准备源码

  1. 下载源码到本地(笔记本)
    在这里插入图片描述

  2. 解压并修改源码

  3. 准备txt标注文件

    根据COCO2014数据集的实际路径使用 coco_trainval_anns.pycoco_minival_anns.py 分别生成 训练和验证样本标注文件 coco2014_trainval.txtcoco2014_minival.txt 并放置于 YoloV3_for_TensorFlow_1.6_code/data 录下。

    # 1. 修改源码中的路径
    
    # 2. 执行 coco_trainval_anns.py
    python coco_trainval_anns.py
    
    # 3. 执行 coco_minival_anns.py
    python coco_minival_anns.py
    
  4. 修改txt标注文件的路径

    /opt/npu/dataset/coco/coco2014/ 修改为 /home/ma-user/work/COCO2014/

  5. 修改 train.py

    根据train.py 源代码可知,默认的训练模式是 single,加载 args_single.py 中的配置参数,所以修改 args_single.py 配置参数即可。

    train.py

    parser.add_argument("--mode", type=str, default='single', help="setting train mode of training.")
    
    if args_input.mode == 'single':
        import args_single as args
    

    args_single.py

    参数基本上默认即可。

    ### Some paths
    train_file =        os.path.join(work_path, './data/coco2014_trainval.txt')  # The path of the training txt file.
    val_file =          os.path.join(work_path, './data/coco2014_minival.txt')  # The path of the validation txt file.
    restore_path =      os.path.join(work_path, './data/darknet_weights/darknet53.ckpt')  # The path of the weights to restore.
    anchor_path =       os.path.join(work_path, './data/yolo_anchors.txt')  # The path of the anchor txt file.
    class_name_path =   os.path.join(work_path, './data/coco.names')  # The path of the class names.
    
    ...
    ...
    ...
    
    ### other training strategies
    multi_scale_train = False  # Whether to apply multi-scale training strategy. Image size varies from [320, 320] to [640, 640] by default.
    use_label_smooth = False # Whether to use class label smoothing strategy.
    use_focal_loss = False  # Whether to apply focal loss on the conf loss.
    use_mix_up = False  # Whether to use mix up data augmentation strategy.
    use_warm_up = True  # whether to use warm up strategy to prevent from gradient exploding.
    warm_up_epoch = min(total_epoches*0.1, 3)  # Warm up training epoches. Set to a larger value if gradient explodes.
    
  6. 压缩源码文件,并上传到obs

    将修改好的源码压缩成zip文件,上传到obs。

  7. 拷贝并解压源码

    拷贝obs的源码到notebook

    import moxing as mox
    
    # COCO_COCO_2014_Train_Images.zip
    mox.file.copy_parallel('obs://liulingjun-demo/cache/YoloV3_for_TensorFlow_1.6_code.zip','/home/ma-user/work/YoloV3_for_TensorFlow_1.6_code.zip')
    print('Copy procedure is completed !')
    

    解压源码

    cd /home/ma-user/work
    unzip YoloV3_for_TensorFlow_1.6_code.zip
    

3.4 训练模型

/home/ma-user/work/YoloV3_for_TensorFlow_1.6_code 路径下创建 notebook,执行以下指令开启训练:

!python train.py

训练output输出到 /YoloV3_for_TensorFlow_1.6_code/training/ 路径下。

3.5 运行成功

Thu, 28 Jul 2022 09:52:13 INFO shuffle seed_0 args.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/util/random_seed.py:58: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Thu, 28 Jul 2022 09:52:13 WARNING Entity <function <lambda> at 0xffff6711b560> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Str'
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:139: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
Thu, 28 Jul 2022 09:52:13 WARNING Entity <function valid_shape at 0xffff6711b950> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Index'
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:173: The name tf.data.Iterator is deprecated. Please use tf.compat.v1.data.Iterator instead.

Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:173: DatasetV1.output_types (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(dataset)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:173: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:187: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

Thu, 28 Jul 2022 09:52:13 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
Thu, 28 Jul 2022 09:52:18 WARNING From /home/ma-user/work/yolov3-tensorflow/code/utils/layer_utils.py:114: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

Thu, 28 Jul 2022 09:52:19 WARNING From /home/ma-user/work/yolov3-tensorflow/code/model.py:336: The name tf.log is deprecated. Please use tf.math.log instead.

Thu, 28 Jul 2022 09:52:20 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:190: The name tf.losses.get_regularization_loss is deprecated. Please use tf.compat.v1.losses.get_regularization_loss instead.

Thu, 28 Jul 2022 09:52:20 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:193: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:197: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:230: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

Thu, 28 Jul 2022 09:52:21 INFO total_steps: 200000
Thu, 28 Jul 2022 09:52:21 INFO warmup_steps: 3000
Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/utils/misc_utils.py:184: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:247: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

Thu, 28 Jul 2022 09:52:21 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:247: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

Thu, 28 Jul 2022 09:52:21 DEBUG compute_gradients...
Thu, 28 Jul 2022 09:52:31 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:253: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.

Thu, 28 Jul 2022 09:52:31 WARNING From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/npu_loss_scale_optimizer.py:159: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:262: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:295: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:297: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

Thu, 28 Jul 2022 09:52:37 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:297: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

Thu, 28 Jul 2022 09:53:11 INFO Restoring parameters from /home/ma-user/work/yolov3-tensorflow/code/./data/darknet_weights/darknet53.ckpt
Thu, 28 Jul 2022 09:53:18 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:306: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

Thu, 28 Jul 2022 09:53:18 WARNING From /home/ma-user/work/yolov3-tensorflow/code/train.py:307: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Thu, 28 Jul 2022 09:54:14 WARNING From /home/ma-user/anaconda3/envs/TensorFlow-1.15.0/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Thu, 28 Jul 2022 09:54:14 WARNING From /usr/local/Ascend/tfplugin/latest/tfplugin/python/site-packages/npu_bridge/estimator/npu/util.py:206: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
Thu, 28 Jul 2022 09:58:08 INFO Epoch: 0, global_step: 9 fps: 0.75 lr: 0.000007 | loss: total: 6.61, xy: 0.31, wh: 0.97, conf: 2.96, class: 2.38 | 
Thu, 28 Jul 2022 09:58:09 INFO Epoch: 0, global_step: 19 fps: 168.07 lr: 0.000015 | loss: total: 14.65, xy: 0.72, wh: 2.12, conf: 8.35, class: 3.46 | 
...
...
...

3.6 资源占用情况

在这里插入图片描述

CPU占用情况

1*Ascend 910 CPU24核 内存96GiB(CUE评分1943) (modelarts.kat1.xlarge)
在这里插入图片描述

8*Ascend 910 CPU192核 内存720GiB(CUE评分15544) (modelarts.kat1.8xlarge)
在这里插入图片描述

内存占用情况
在这里插入图片描述
NPU占用情况
在这里插入图片描述

四、FAQ

Q:修改notebook规格不合法

ModelArts.6405: RUNNING status not allowed update flavor, "modelarts.kat1.xlarge" 不合法

8*Ascend910 降为 1*Ascend910,降配失败

在这里插入图片描述
在这里插入图片描述

Q:文件超过100MB,notebook上传文件失败

在这里插入图片描述

解决办法:
上传到obs转存即可。

Q:磁盘空间不足

在这里插入图片描述
在这里插入图片描述

解决办法:
存储容量扩容
如果不能扩容,则重新创建notebook,配置更大的存储容量。

在这里插入图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

【ModelArts系列】华为ModelArts Notebook训练yolov3模型(开发环境) 的相关文章

  • Nmap功能和参数

    一 Nmap常用功能 主要有以下四项 主机存活探测 端口探测并识别端口所提供服务 主机操作系统识别 漏洞扫描检测 二 端口状态 能够识别六种端口状态 1 open 开放的端口 2 closed 关闭的端口 3 filtered 被过滤的 4
  • 加密所有事物,将数据安全存储在任何地方

    Storing your data is easy Protecting it is hard Our personal information photos documents banking information and more i
  • Java学习笔记(四):内部类

    Java学习笔记 四 内部类 一 局部内部类 二 匿名内部类 三 成员内部类 四 静态内部类 内部类 在一个类的内部定义一个类 可以直接访问外部的全部资源 包括私有成员 一 局部内部类 定义在外部类的局部位置 比如方法 有类名 可以直接访问
  • 时序预测

    时序预测 MATLAB实现时间序列回归之交叉验证及损失函数 目录 时序预测 MATLAB实现时间序列回归之交叉验证及损失函数 基本介绍 程序设计 环境准备 交叉验证 损失函数 模型比较 参考资料 致谢 基本介绍 本文介绍MATLAB实现时间
  • 每次刷脸缴费时就可直接进行扣款结算

    人脸识别支付是一款基于面部识别系统的支付应用 支付时消费者只需要面对自助终端屏幕上的摄像头 系统会获取用户面部信息并将面部信息与支付宝账户关联 通过支付宝账户进行费用支付 整个交易过程十分便捷 首次刷脸支付 在刷脸页面进行面部识别 输入与支
  • 【Linux】进程信号 -- 信号保存与递达

    阻塞信号 信号相关概念 内核中的表示 sigset t 信号集操作函数 sigprocmask sigpending 小实验 观察pending表 信号的捕捉流程 sigaction 1 小实验 如果进程在处理2号信号 那我们继续发送2号信

随机推荐