fit() 按预期工作,但在评估()期间模型偶然执行

2024-01-07

我注意到一个问题,在评估()期间,我没有看到基于fit()结果的预期结果。我在网上发现了很多讨论,人们都有类似的问题。例如,this https://github.com/keras-team/keras/issues/6977开放问题讨论了 dropout 层和批量标准化作为可能的原因,但也有人注意到可能存在与 dropout 和批量标准化不同的问题。对于初学者来说,甚至很难知道问题到底是什么。

我正在使用的网络架构确实包含批量标准化,但我不确定这是否是问题所在。

该演示的数据可以下载here https://drive.google.com/file/d/1wQZbCuw8cI9cyZIKz956wNLfgfz-o3c3/view?usp=sharing.

该脚本清楚地说明了我遇到的问题:

import random
import os
import matplotlib.image as mpimg
import cv2
import tensorflow as tf
tf.compat.v1.enable_eager_execution()
HEIGHT_WIDTH = 299
BATCH_SIZE = 10
VERBOSE = 2

SANITY_SWITCH = False

print('starting script')

net = tf.keras.applications.InceptionResNetV2(
    include_top=True,
    weights=None,  # 'imagenet',
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=2,  # 1000,
    classifier_activation='softmax'
)

print_output = True
def utility_metric(y_true, y_pred):
    global print_output
    if print_output:
        print(f'y_true:{y_true.numpy()}')
        print(f'y_pred:{y_pred.numpy()}')
        print_output = False
    return 0


net.compile(
    optimizer='ADAM',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy', utility_metric]
)

net.run_eagerly = True

class_map = {'dog': 0, 'cat': 1}

def preprocess(file):
    imdata = mpimg.imread(file)
    imdata = cv2.resize(imdata, dsize=(HEIGHT_WIDTH, HEIGHT_WIDTH), interpolation=cv2.INTER_LINEAR)
    imdata.shape = (HEIGHT_WIDTH, HEIGHT_WIDTH, 3)
    imdata /= 127.5
    imdata -= 1.
    return imdata, class_map[os.path.basename(os.path.dirname(file))]

train_data = [f'data/Training/cat/{x}' for x in os.listdir('data/Training/cat')] + [f'data/Training/dog/{x}' for x in os.listdir('data/Training/dog')]
test_data = [f'data/Testing/cat/{x}' for x in os.listdir('data/Testing/cat')] + [f'data/Testing/dog/{x}' for x in os.listdir('data/Testing/dog')]

random.shuffle(train_data)
random.shuffle(test_data)

if SANITY_SWITCH:
    tmp_data = train_data
    train_data = test_data
    test_data = tmp_data


def get_gen(data):
    def gen():
        pairs = []
        i = 0
        for im_file in data:
            i += 1
            if i <= BATCH_SIZE:
                pairs += [preprocess(im_file)]
            if i == BATCH_SIZE:
                yield (
                    [pair[0] for pair in pairs],
                    [pair[1] for pair in pairs]
                )
                pairs.clear()
                i = 0
    return gen

def get_ds(data):
    return tf.data.Dataset.from_generator(
        get_gen(data),
        (tf.float32, tf.int64),
        output_shapes=(
            tf.TensorShape((BATCH_SIZE, HEIGHT_WIDTH, HEIGHT_WIDTH, 3)),
            tf.TensorShape(([BATCH_SIZE]))
        )
    )
print('starting training')
net.fit(
    get_ds(train_data),
    epochs=5,
    verbose=VERBOSE,
    use_multiprocessing=True,
    workers=16,
    batch_size=BATCH_SIZE,
    shuffle=False
)
print('starting testing')
print_output = True
net.evaluate(
    get_ds(test_data),
    verbose=VERBOSE,
    batch_size=BATCH_SIZE,
    use_multiprocessing=True,
    workers=16,
)
print('script complete')

完整的输出在这里:

starting script
2020-12-22 15:29:33.896474: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-22 15:29:34.184215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:04:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.186083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:05:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.188086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:08:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.190088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:09:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.192124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 4 with properties: 
pciBusID: 0000:84:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.194144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 5 with properties: 
pciBusID: 0000:85:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.196095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 6 with properties: 
pciBusID: 0000:88:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.197451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 7 with properties: 
pciBusID: 0000:89:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.208178: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-12-22 15:29:34.301110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-12-22 15:29:34.348641: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-12-22 15:29:34.370185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-12-22 15:29:34.459524: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-12-22 15:29:34.471473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-12-22 15:29:34.599447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 15:29:34.634806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2020-12-22 15:29:34.635371: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-12-22 15:29:34.680254: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2000105000 Hz
2020-12-22 15:29:34.687348: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561e331d4820 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 15:29:34.687415: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-12-22 15:29:35.617673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:04:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.619368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:05:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.621161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:08:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.622953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:09:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.624745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 4 with properties: 
pciBusID: 0000:84:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.626508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 5 with properties: 
pciBusID: 0000:85:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.628264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 6 with properties: 
pciBusID: 0000:88:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.629460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 7 with properties: 
pciBusID: 0000:89:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.629581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-12-22 15:29:35.629633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-12-22 15:29:35.629685: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-12-22 15:29:35.629733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-12-22 15:29:35.629788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-12-22 15:29:35.629837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-12-22 15:29:35.629886: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 15:29:35.657298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2020-12-22 15:29:35.659638: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-12-22 15:29:35.678371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-22 15:29:35.678447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 1 2 3 4 5 6 7 
2020-12-22 15:29:35.678500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N Y Y Y N N N N 
2020-12-22 15:29:35.678538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1:   Y N Y Y N N N N 
2020-12-22 15:29:35.678569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2:   Y Y N Y N N N N 
2020-12-22 15:29:35.678597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 3:   Y Y Y N N N N N 
2020-12-22 15:29:35.678624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 4:   N N N N N Y Y Y 
2020-12-22 15:29:35.678652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 5:   N N N N Y N Y Y 
2020-12-22 15:29:35.678678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 6:   N N N N Y Y N Y 
2020-12-22 15:29:35.678705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 7:   N N N N Y Y Y N 
2020-12-22 15:29:35.703703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10689 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
2020-12-22 15:29:35.711407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8534 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0, compute capability: 3.7)
2020-12-22 15:29:35.716593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10689 MB memory) -> physical GPU (device: 2, name: Tesla K80, pci bus id: 0000:08:00.0, compute capability: 3.7)
2020-12-22 15:29:35.721879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10689 MB memory) -> physical GPU (device: 3, name: Tesla K80, pci bus id: 0000:09:00.0, compute capability: 3.7)
2020-12-22 15:29:35.726952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 10689 MB memory) -> physical GPU (device: 4, name: Tesla K80, pci bus id: 0000:84:00.0, compute capability: 3.7)
2020-12-22 15:29:35.732126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 10689 MB memory) -> physical GPU (device: 5, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
2020-12-22 15:29:35.736838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 10689 MB memory) -> physical GPU (device: 6, name: Tesla K80, pci bus id: 0000:88:00.0, compute capability: 3.7)
2020-12-22 15:29:35.740357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 108 MB memory) -> physical GPU (device: 7, name: Tesla K80, pci bus id: 0000:89:00.0, compute capability: 3.7)
2020-12-22 15:29:35.746472: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561e387dea00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-12-22 15:29:35.746517: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746537: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746577: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746594: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746614: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (4): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746645: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (5): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746664: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (6): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746694: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (7): Tesla K80, Compute Capability 3.7
starting training
Epoch 1/5
2020-12-22 15:29:48.307104: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 15:29:51.694232: W tensorflow/stream_executor/gpu/asm_compiler.cc:81] Running ptxas --version returned 256
2020-12-22 15:29:51.796020: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-12-22 15:29:52.577156: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
y_true:[[1.]
 [1.]
 [0.]
 [1.]
 [1.]
 [1.]
 [1.]
 [0.]
 [1.]
 [1.]]
y_pred:[[0.58956003 0.41043994]
 [0.63762885 0.36237112]
 [0.53731585 0.46268415]
 [0.5393683  0.4606317 ]
 [0.90735996 0.09264001]
 [0.552977   0.44702297]
 [0.7115651  0.28843486]
 [0.4068687  0.59313136]
 [0.5482196  0.4517804 ]
 [0.4330527  0.56694734]]
72/72 - 81s - loss: 0.9134 - accuracy: 0.5417 - utility_metric: 0.0000e+00
Epoch 2/5
72/72 - 81s - loss: 0.7027 - accuracy: 0.5847 - utility_metric: 0.0000e+00
Epoch 3/5
72/72 - 83s - loss: 0.6851 - accuracy: 0.5819 - utility_metric: 0.0000e+00
Epoch 4/5
72/72 - 83s - loss: 0.6810 - accuracy: 0.5944 - utility_metric: 0.0000e+00
Epoch 5/5
72/72 - 83s - loss: 0.6895 - accuracy: 0.5625 - utility_metric: 0.0000e+00
starting testing
y_true:[[1.]
 [1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [1.]
 [0.]
 [0.]
 [1.]]
y_pred:[[0.39538118 0.6046188 ]
 [0.39505056 0.6049495 ]
 [0.39406297 0.605937  ]
 [0.3947329  0.60526717]
 [0.3935887  0.60641134]
 [0.39452523 0.60547477]
 [0.39451653 0.6054835 ]
 [0.39475334 0.60524666]
 [0.39559898 0.604401  ]
 [0.3951175  0.60488254]]
90/90 - 37s - loss: 0.7157 - accuracy: 0.5000 - utility_metric: 0.0000e+00
script complete

输出中需要关注的部分是准确性:

训练纪元 1:0.5417

训练周期 2:0.5847

训练纪元 3:0.5819

训练纪元 4:0.5944

训练第 5 纪元:0.5625

评价:0.5000

我还在两种情况下包含了网络的原始输出。 训练时一:

y_true:[[1.]
     [1.]
     [0.]
     [1.]
     [1.]
     [1.]
     [1.]
     [0.]
     [1.]
     [1.]]
y_pred:[[0.58956003 0.41043994]
     [0.63762885 0.36237112]
     [0.53731585 0.46268415]
     [0.5393683  0.4606317 ]
     [0.90735996 0.09264001]
     [0.552977   0.44702297]
     [0.7115651  0.28843486]
     [0.4068687  0.59313136]
     [0.5482196  0.4517804 ]
     [0.4330527  0.56694734]]

还有一个在测试期间:

y_true:[[1.]
     [1.]
     [0.]
     [0.]
     [0.]
     [1.]
     [1.]
     [0.]
     [0.]
     [1.]]
    y_pred:[[0.39538118 0.6046188 ]
     [0.39505056 0.6049495 ]
     [0.39406297 0.605937  ]
     [0.3947329  0.60526717]
     [0.3935887  0.60641134]
     [0.39452523 0.60547477]
     [0.39451653 0.6054835 ]
     [0.39475334 0.60524666]
     [0.39559898 0.604401  ]
     [0.3951175  0.60488254]]

我发现令人困惑的是,为什么在测试过程中,图像之间的输出变化似乎很小。这似乎与问题的根源有关,但我不知道是什么原因造成的。

我已经运行这个脚本很多次了,有些事情是一致的。评估过程中的准确性始终是完全偶然的。在评估期间 y_pred 始终存在较低的变化,并且所有输出似乎都是相同的标签(因此,例如,在评估期间,模型可能会将每个输入图像报告为“狗”)。

有时在训练期间,准确率会超过 60%。这并不影响问题。我可以继续增加数据集的大小和时期数,并尝试改进训练结果,但我担心在不首先理解为什么评估结果像现在这样奇怪的情况下继续前进。


我最近遇到了一个非常类似的问题MobileNetV3大模型 https://www.tensorflow.org/api_docs/python/tf/keras/applications/MobileNetV3Large.

问题出在设置时weights=None,它会重置所有参数,包括评估期间使用的 BatchNormalization 指标。

更重要的是,正如一位朋友向我指出的那样,默认的 BatchNormalization 动量设置为 0.999,这意味着仅在评估期间使用的 BatchNormalization 参数(在训练期间使用批量均值/方差)移动非常非常缓慢。

如果您在几个时期内训练数百万步,那就没问题了。对于小数据集,这些参数没有显着改变,评估全部被破坏。

如果您的问题和我的一样,快速解决方法是将所有 BatchNormalization 层的动量设置为 0.9。这可以通过这个简单的递归函数来实现:

def SetBatchNormalizationMomentum(model, new_value, prefix='', verbose=False):
  for ii, layer in enumerate(model.layers):
    if hasattr(layer, 'layers'):
      SetBatchNormalizationMomentum(layer, new_value, f'{prefix}Layer {ii}/', verbose)
      continue
    elif isinstance(layer, tf.keras.layers.BatchNormalization):
      if verbose:
        print(f'{prefix}Layer {ii}: name={layer.name} momentum={layer.momentum} --> set momentum={new_value}')
      layer.momentum = new_value

我希望这对你也有帮助——它在这里起作用了。

(已编辑):在 MobileNet 中设置 BatchNorm 动量的代码here https://github.com/tensorflow/tensorflow/blob/85c8b2a817f95a3e979ecd1ed95bff1dc1335cff/tensorflow/python/keras/applications/mobilenet_v3.py#L509.

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

fit() 按预期工作,但在评估()期间模型偶然执行 的相关文章

随机推荐

  • 使用 websockets 和 SocketIO 防止“心跳超时”

    我正在使用 NodeJS 和 SocketIO 作为我的 websocket 解决方案 它工作正常 但几分钟后 我的套接字服务器总是超时 并在控制台中显示以下消息 debug fired heartbeat timeout for clie
  • glReadPixels() 会消耗单个核心的所有CPU周期

    我有一个带有 OpenGL 窗口的 SDL2 应用程序 它表现良好 当它运行时 该应用程序与我的 60Hz 显示器同步 并且我看到该应用程序的 CPU 使用率为 12 到目前为止 一切都很好 但是 当我通过从深度缓冲区 绘制后 读取单个 深
  • 检查 list 是否包含任何其他列表

    我有一个这样的参数列表 public class parameter public string name get set public string paramtype get set public string source get s
  • 覆盖类路径中的默认 hadoop jar

    我已经看到了许多使用用户类路径作为 hadoop 的先例的方法的表现形式 通常情况下 如果 m r 作业需要特定版本的库 而 hadoop 恰好已经使用旧版本的库 例如 jackson 的 json 解析器或 commons http 等
  • 进行隧道事件的正确方法

    EDIT 我想我问了一些 XY 问题 我并不真正关心隧道事件的运行 我关心的是事件的发生从父窗口后面的代码引发由该窗口子级的控件拾取并做出反应 而无需明确告诉子级其父级是谁并手动订阅该事件 我试图在父控件中引发一个事件 并让子控件侦听该事件
  • 4点透视变换失败

    我一直在尝试进行 4 点透视变换 以便开始进行一些 OCR 操作 从下图开始我可以检测到车牌号 并将其裁剪出来 绿色框是边界框 红点是我想要正方形的矩形的角 这是变换的输出 乍一看 它似乎已经完成了由内而外的变换 将零件放在两侧而不是点之间
  • 如何使用 matplotlib 或 graphviz 在 networkx 中绘制多重图

    当我将多图 numpy 邻接矩阵传递给 networkx 时 使用 from numpy matrix 函数 然后尝试使用 matplotlib 绘制图形 它会忽略多条边 我怎样才能让它绘制多条边 Graphviz 在绘制平行边方面做得很好
  • 将附加 JPanel 添加到 JPanel

    伙计们 这是一个相当基本的问题 基本上我有这样的代码 public SuperPanel setLayout new BorderLayout add panel1 BorderLayout NORTH add panel2 BorderL
  • 文本小部件内的 Tkinter 检查按钮和滚动

    使用中找到的代码这个 stackoverflow 帖子 https stackoverflow com questions 5860675 variable size list of checkboxes in standard tkint
  • 是否可以刷新视图中的ViewBag值?

    我正在尝试做一个动态下拉列表 我从数据库中获取下拉列表的选项并将它们放入对象列表中 根据复选框值 我从列表中删除对象并将该列表设置为 ViewBag 值 public ActionResult ThematicManagement stri
  • 使用 >450K 实例训练 Dlib 对象检测

    dlib 是否能够使用大规模数据集来训练目标检测器 我有超过 450K 的面部图像来训练面部检测器 是否可以使用 Dlib 或者我需要转向另一个替代方案 您可以使用多少数据取决于您的计算机中有多少 RAM 因此 也许您可 以根据每个图像的大
  • Mongodb,$sum 有条件

    文件 name abc length 25 area 10 name abc length 5 聚合查询后的输出 count 2 summarizedLength 30 summarizedArea null id name abc The
  • 字典更新序列元素#0的长度为3; 2 为必填项

    我想向对象添加线条account bank statement line通过其他对象但我收到以下错误 字典更新序列元素 0 的长度为 3 需要 2 这是我的代码 def action account line create self cr
  • System.TypeLoadException Microsoft.VisualBasic ASP.NET Core 2

    Microsoft VisualBasic 程序集与 ASP NET Core2 不兼容吗 我有一个 C 类库 它提供了一种读取 CSV 文件的方法 我选择使用 Microsoft VisualBasic FileIO TextFieldP
  • 如何在另一个微调器打开时一键打开一个微调器

    我有几个spinners在表格上 当我单击其中任何一个时 它会打开并显示选项 当我点击其他任何一个spinner它关闭了打开的spinner但随后我需要再次单击所需的spinner以便打开它 我想捕获第二个微调器的第一次单击 以便我可以关闭
  • 用于查找不属于超链接的文本的正则表达式

    我试图找到一个可用于解析 HTML 块以查找某些特定文本的单个正则表达式 但前提是该文本不是现有超链接的一部分 我想把非链接变成链接 这很容易 但是用单个表达式识别非链接似乎比较麻烦 在以下示例中 This problem is a res
  • 使用 --code-coverage 运行时,Mockery 失败并显示“无法加载模拟...类已存在”

    我正在尝试模拟 phpunit 的类 Php 单元失败并出现错误Could not load mock class already exists 这是我正在运行的唯一测试 因此该类不可能已经被模拟 任何建议将不胜感激 这是错误情况 name
  • Win32 C++ 控制台清屏而不闪烁

    我见过一些主机游戏 屏幕会自行刷新 清除 而不会出现烦人的闪烁 我已经尝试了很多解决方案 这是我目前得到的 while true if screenChanged if something needs to be drawn on new
  • intel oneAPI 基础安装中 ifort 的调用过程是怎样的?

    我正在尝试使用 intel 的数学内核库及其 fortran 编译器 该编译器内置于 oneAPI 基础套件中 然而 命令 ifort 不起作用 因为终端抱怨系统中找不到它 我已经使用命令 source intel oneapi setva
  • fit() 按预期工作,但在评估()期间模型偶然执行

    我注意到一个问题 在评估 期间 我没有看到基于fit 结果的预期结果 我在网上发现了很多讨论 人们都有类似的问题 例如 this https github com keras team keras issues 6977开放问题讨论了 dr