【TVM 学习资料】使用 Python 接口（AutoTVM）编译和优化模型

2023-11-03

本篇文章译自英文文档 Compiling and Optimizing a Model with the Python Interface (AutoTVM)¶
作者是 Chris Hoge。更多 TVM 中文文档可访问→TVM 中文站

TVMC 教程介绍了如何用 TVM 的命令行界面（TVMC）编译、运行和调优预训练的模型 ResNet-50 v2。TVM 不仅是一个命令行工具，也是一个具有多种不同语言的 API 优化框架，极大方便了机器学习模型的使用。

本节内容将介绍与使用 TVMC 相同的基础知识，不同的是这节内容是用 Python API 来实现的。完成本节后学习后，我们将用 TVM 的 Python API 实现以下任务：

为 TVM runtime 编译预训练的 ResNet-50 v2 模型。
用编译的模型预测真实图像，并解释输出和模型性能。
用 TVM 对 CPU 上建模的模型进行调优。
用 TVM 收集的调优数据重新编译优化模型。
用优化模型预测图像，并比较输出和模型性能。

本节目标是概述 TVM 的功能，以及如何通过 Python API 使用它们。

TVM 是一个深度学习编译器框架，有许多不同的模块可用于处理深度学习模型和算子。本教程将介绍如何用 Python API 加载、编译和优化模型。

首先导入一些依赖，包括用于加载和转换模型的 onnx、用于下载测试数据的辅助实用程序、用于处理图像数据的 Python 图像库、用于图像数据预处理和后处理的 numpy、TVM Relay 框架和 TVM 图形处理器。

import onnx
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np
import tvm.relay as relay
import tvm
from tvm.contrib import graph_executor

下载和加载 ONNX 模型

本教程中，我们会用到 ResNet-50 v2。ResNet-50 是一个深度为 50 层的卷积神经网络，适用于图像分类任务。我们即将用到的模型已经在超过 100 万张、具有 1000 种不同分类的图像上进行了预训练。该神经网络的输入图像大小为 224x224。推荐下载 Netron（免费的 ML 模型查看器）了解更多 ResNet-50 模型的结构信息。

TVM 提供帮助库来下载预训练模型。通过提供模型 URL、文件名和模型类型，TVM 可下载模型并将其保存到磁盘。可用 ONNX runtime 将 ONNX 模型实例加载到内存。

使用其他模型格式
TVM 支持许多流行的模型格式。可在 TVM 文档的编译深度学习模型部分找到支持的列表

model_url = (
    "https://github.com/onnx/models/raw/main/"
    "vision/classification/resnet/model/"
    "resnet50-v2-7.onnx"
)

model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
onnx_model = onnx.load(model_path)

# 为 numpy 的 RNG 设置 seed，得到一致的结果
np.random.seed(0)

下载、预处理和加载测试图像

模型的张量 shape、格式和数据类型各不相同。因此，大多数模型都需要一些预处理和后处理，以确保输入有效，并能解释输出。 TVMC 采用了 NumPy 的 .npz 格式的输入和输出数据。

本教程中的图像输入使用的是一张猫的图像，你也可以根据喜好选择其他图像。

在这里插入图片描述

下载图像数据，然后将其转换为 numpy 数组作为模型的输入。

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# 重设大小为 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# 输入图像是 HWC 布局，而 ONNX 需要 CHW 输入，所以转换数组
img_data = np.transpose(img_data, (2, 0, 1))

# 根据 ImageNet 输入规范进行归一化
imagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
imagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
norm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev

# 添加 batch 维度，期望 4 维输入：NCHW。
img_data = np.expand_dims(norm_img_data, axis=0)

使用 Relay 编译模型

下一步是编译 ResNet 模型，首先用 from_onnx 导入器，将模型导入到 Relay 中。然后，用标准优化，将模型构建到 TVM 库中，最后从库中创建一个 TVM 计算图 runtime 模块。

target = "llvm"

定义正确的 TARGET
指定正确的 target（选项 --target）可大大提升编译模块的性能，因为可利用 target 上可用的硬件功能。参阅针对 x86 CPU 自动调优卷积网络获取更多信息。建议确定好使用的 CPU 型号以及可选功能，然后适当地设置 target。例如，对于某些处理器，可用 target = "llvm -mcpu=skylake"；对于具有 AVX-512 向量指令集的处理器，可用 target = "llvm -mcpu=skylake-avx512"。

# 输入名称可能因模型类型而异
# 可用 Netron 工具检查输入名称
input_name = "data"
shape_dict = {input_name: img_data.shape}

mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)

with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))

输出结果：

/workspace/python/tvm/driver/build_module.py:268: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
  "target_host parameter is going to be deprecated. "

在 TVM Runtime 执行

编译好模型后，就可用 TVM runtime 对其进行预测。要用 TVM 运行模型并进行预测，需要：

刚生成的编译模型。
用来预测的模型的有效输入。

dtype = "float32"
module.set_input(input_name, img_data)
module.run()
output_shape = (1, 1000)
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()

收集基本性能数据

收集与未优化模型相关的基本性能数据，然后将其与调优后的模型进行比较。为了解释 CPU 噪声，在多个 batch 中多次重复计算，然后收集关于均值、中值和标准差的基础统计数据。

import timeit

timing_number = 10
timing_repeat = 10
unoptimized = (
    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
    * 1000
    / timing_number
)
unoptimized = {
    "mean": np.mean(unoptimized),
    "median": np.median(unoptimized),
    "std": np.std(unoptimized),
}

print(unoptimized)

输出结果：

{'mean': 495.13895513002353, 'median': 494.6680843500417, 'std': 1.3081147373726523}

输出后处理

如前所述，每个模型提供输出张量的方式都不一样。

本示例中，我们需要用专为该模型提供的查找表，运行一些后处理（post-processing），从而使得 ResNet-50 v2 的输出形式更具有可读性。

from scipy.special import softmax

# 下载标签列表
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

# 打开输出文件并读取输出张量
scores = softmax(tvm_output)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

输出结果：

class='n02123045 tabby, tabby cat' with probability=0.621103
class='n02123159 tiger cat' with probability=0.356379
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

预期输出如下：

# class='n02123045 tabby, tabby cat' with probability=0.610553
# class='n02123159 tiger cat' with probability=0.367179
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261

调优模型

以前的模型被编译到 TVM runtime 上运行，因此不包含特定于平台的优化。本节将介绍如何用 TVMC，针对工作平台构建优化模型。

用编译的模块推理，有时可能无法获得预期的性能。在这种情况下，可用自动调优器更好地配置模型，从而提高性能。 TVM 中的调优是指，在给定 target 上优化模型，使其运行得更快。与训练或微调不同，它不会影响模型的准确性，而只会影响 runtime 性能。作为调优过程的一部分，TVM 实现并运行许多不同算子的变体，以查看哪个性能最佳。这些运行的结果存储在调优记录文件中。

最简单的形式中，调优需要：

运行此模型的设备的规格
存储调优记录的输出文件的路径
要调优的模型的路径。

import tvm.auto_scheduler as auto_scheduler
from tvm.autotvm.tuner import XGBTuner
from tvm import autotvm

设置部分基本参数，运行由一组特定参数生成的编译代码并测试其性能。number 指定将要测试的不同配置的数量，而 repeat 指定将对每个配置进行多少次测试。 min_repeat_ms 指定运行配置测试需要多长时间，如果重复次数低于此时间，则增加其值，在 GPU 上进行精确调优时此选项是必需的，在 CPU 调优则不是必需的，将此值设置为 0表示禁用，timeout 指明每个测试配置运行训练代码的时间上限。

number = 10
repeat = 1
min_repeat_ms = 0  # 调优 CPU 时设置为 0
timeout = 10  # 秒

# 创建 TVM 运行器
runner = autotvm.LocalRunner(
    number=number,
    repeat=repeat,
    timeout=timeout,
    min_repeat_ms=min_repeat_ms,
    enable_cpu_cache_flush=True,
)

创建简单结构来保存调优选项。使用 XGBoost 算法来指导搜索。如果要在投产的项目中应用，则需要将试验次数设置为大于此处的 20。对于 CPU 推荐 1500，对于 GPU 推荐 3000-4000。所需的试验次数可能取决于特定的模型和处理器，要找到调优时间和模型优化之间的最佳平衡，得花一些时间评估一系列值的性能。

运行调优需要大量时间，所以这里将试验次数设置为 10，但不推荐使用这么小的值。early_stopping 参数是使得搜索提前停止的试验最小值。measure option 决定了构建试用代码并运行的位置，本示例用的是刚创建的 LocalRunner 和 LocalBuilder。Tuning_records 选项指定将调优数据写入的哪个文件中。

tuning_option = {
    "tuner": "xgb",
    "trials": 20,
    "early_stopping": 100,
    "measure_option": autotvm.measure_option(
        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
    ),
    "tuning_records": "resnet-50-v2-autotuning.json",
}

定义调优搜索算法
此搜索默认情况下使用 XGBoost Grid 算法进行引导。根据模型复杂性和可用时长可选择不同的算法。

设置调优参数
为节省时间将试验次数和提前停止次数设置为 10，数值设置越大，性能越好，所需时间也越长。收敛所需的试验次数根据模型和目标平台的不同而变化。

# 首先从 onnx 模型中提取任务
tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)

# 按顺序调优提取的任务
for i, task in enumerate(tasks):
    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
    tuner_obj = XGBTuner(task, loss_type="rank")
    tuner_obj.tune(
        n_trial=min(tuning_option["trials"], len(task.config_space)),
        early_stopping=tuning_option["early_stopping"],
        measure_option=tuning_option["measure_option"],
        callbacks=[
            autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
            autotvm.callback.log_to_file(tuning_option["tuning_records"]),
        ],
    )

输出结果（完整内容访问原文档）
加粗样式
调优过程的输出如下所示：
在这里插入图片描述

使用调优数据编译优化模型

获取存储在 resnet-50-v2-autotuning.json（上述调优过程的输出文件）中的调优记录。编译器会用这个结果，为指定 target 上的模型生成高性能代码。

收集到模型的调优数据后，可用优化的算子重新编译模型来加快计算速度。

with autotvm.apply_history_best(tuning_option["tuning_records"]):
    with tvm.transform.PassContext(opt_level=3, config={}):
        lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))

输出结果：

/workspace/python/tvm/driver/build_module.py:268: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
  "target_host parameter is going to be deprecated. "

验证优化模型是否运行并产生相同的结果：

dtype = "float32"
module.set_input(input_name, img_data)
module.run()
output_shape = (1, 1000)
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()

scores = softmax(tvm_output)
scores = np.squeeze(scores)
ranks = np.argsort(scores)[::-1]
for rank in ranks[0:5]:
    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))

输出结果：

class='n02123045 tabby, tabby cat' with probability=0.621104
class='n02123159 tiger cat' with probability=0.356378
class='n02124075 Egyptian cat' with probability=0.019712
class='n02129604 tiger, Panthera tigris' with probability=0.001215
class='n04040759 radiator' with probability=0.000262

验证预测值是否相同：

# class='n02123045 tabby, tabby cat' with probability=0.610550
# class='n02123159 tiger cat' with probability=0.367181
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261

比较调优和未调优的模型

收集与此优化模型相关的一些基本性能数据，并将其与未优化模型进行比较。根据底层硬件、迭代次数和其他因素，将优化模型和未优化模型比较时，可以看到性能的提升。

import timeit

timing_number = 10
timing_repeat = 10
optimized = (
    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
    * 1000
    / timing_number
)
optimized = {"mean": np.mean(optimized), "median": np.median(optimized), "std": np.std(optimized)}



print("optimized: %s" % (optimized))
print("unoptimized: %s" % (unoptimized))

输出结果：

optimized: {'mean': 407.31687583000166, 'median': 407.3377107500164, 'std': 1.692177042688564}
unoptimized: {'mean': 495.13895513002353, 'median': 494.6680843500417, 'std': 1.3081147373726523}

写在最后

本教程通过一个简短示例，说明了如何用 TVM Python API 编译、运行和调优模型。还讨论了对输入和输出进行预处理和后处理的必要性。在调优过程之后，演示了如何比较未优化和优化模型的性能。

本文档展示了一个在本地使用 ResNet-50 v2 的简单示例。TVMC 还支持更多功能，包括交叉编译、远程执行和分析/基准测试等。
下载 Python 源代码
 下载 Jupyter Notebook

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)