fit() 按预期工作，但在评估（）期间模型偶然执行

2024-01-07

我注意到一个问题，在评估（）期间，我没有看到基于fit（）结果的预期结果。我在网上发现了很多讨论，人们都有类似的问题。例如，this https://github.com/keras-team/keras/issues/6977开放问题讨论了 dropout 层和批量标准化作为可能的原因，但也有人注意到可能存在与 dropout 和批量标准化不同的问题。对于初学者来说，甚至很难知道问题到底是什么。

我正在使用的网络架构确实包含批量标准化，但我不确定这是否是问题所在。

该演示的数据可以下载here https://drive.google.com/file/d/1wQZbCuw8cI9cyZIKz956wNLfgfz-o3c3/view?usp=sharing.

该脚本清楚地说明了我遇到的问题：

import random
import os
import matplotlib.image as mpimg
import cv2
import tensorflow as tf
tf.compat.v1.enable_eager_execution()
HEIGHT_WIDTH = 299
BATCH_SIZE = 10
VERBOSE = 2

SANITY_SWITCH = False

print('starting script')

net = tf.keras.applications.InceptionResNetV2(
    include_top=True,
    weights=None,  # 'imagenet',
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=2,  # 1000,
    classifier_activation='softmax'
)

print_output = True
def utility_metric(y_true, y_pred):
    global print_output
    if print_output:
        print(f'y_true:{y_true.numpy()}')
        print(f'y_pred:{y_pred.numpy()}')
        print_output = False
    return 0


net.compile(
    optimizer='ADAM',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy', utility_metric]
)

net.run_eagerly = True

class_map = {'dog': 0, 'cat': 1}

def preprocess(file):
    imdata = mpimg.imread(file)
    imdata = cv2.resize(imdata, dsize=(HEIGHT_WIDTH, HEIGHT_WIDTH), interpolation=cv2.INTER_LINEAR)
    imdata.shape = (HEIGHT_WIDTH, HEIGHT_WIDTH, 3)
    imdata /= 127.5
    imdata -= 1.
    return imdata, class_map[os.path.basename(os.path.dirname(file))]

train_data = [f'data/Training/cat/{x}' for x in os.listdir('data/Training/cat')] + [f'data/Training/dog/{x}' for x in os.listdir('data/Training/dog')]
test_data = [f'data/Testing/cat/{x}' for x in os.listdir('data/Testing/cat')] + [f'data/Testing/dog/{x}' for x in os.listdir('data/Testing/dog')]

random.shuffle(train_data)
random.shuffle(test_data)

if SANITY_SWITCH:
    tmp_data = train_data
    train_data = test_data
    test_data = tmp_data


def get_gen(data):
    def gen():
        pairs = []
        i = 0
        for im_file in data:
            i += 1
            if i <= BATCH_SIZE:
                pairs += [preprocess(im_file)]
            if i == BATCH_SIZE:
                yield (
                    [pair[0] for pair in pairs],
                    [pair[1] for pair in pairs]
                )
                pairs.clear()
                i = 0
    return gen

def get_ds(data):
    return tf.data.Dataset.from_generator(
        get_gen(data),
        (tf.float32, tf.int64),
        output_shapes=(
            tf.TensorShape((BATCH_SIZE, HEIGHT_WIDTH, HEIGHT_WIDTH, 3)),
            tf.TensorShape(([BATCH_SIZE]))
        )
    )
print('starting training')
net.fit(
    get_ds(train_data),
    epochs=5,
    verbose=VERBOSE,
    use_multiprocessing=True,
    workers=16,
    batch_size=BATCH_SIZE,
    shuffle=False
)
print('starting testing')
print_output = True
net.evaluate(
    get_ds(test_data),
    verbose=VERBOSE,
    batch_size=BATCH_SIZE,
    use_multiprocessing=True,
    workers=16,
)
print('script complete')

完整的输出在这里：

starting script
2020-12-22 15:29:33.896474: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-22 15:29:34.184215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:04:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.186083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:05:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.188086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:08:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.190088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:09:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.192124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 4 with properties: 
pciBusID: 0000:84:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.194144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 5 with properties: 
pciBusID: 0000:85:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.196095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 6 with properties: 
pciBusID: 0000:88:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.197451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 7 with properties: 
pciBusID: 0000:89:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:34.208178: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-12-22 15:29:34.301110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-12-22 15:29:34.348641: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-12-22 15:29:34.370185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-12-22 15:29:34.459524: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-12-22 15:29:34.471473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-12-22 15:29:34.599447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 15:29:34.634806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2020-12-22 15:29:34.635371: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-12-22 15:29:34.680254: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2000105000 Hz
2020-12-22 15:29:34.687348: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561e331d4820 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 15:29:34.687415: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-12-22 15:29:35.617673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:04:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.619368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:05:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.621161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:08:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.622953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:09:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.624745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 4 with properties: 
pciBusID: 0000:84:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.626508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 5 with properties: 
pciBusID: 0000:85:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.628264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 6 with properties: 
pciBusID: 0000:88:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.629460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 7 with properties: 
pciBusID: 0000:89:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-12-22 15:29:35.629581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-12-22 15:29:35.629633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-12-22 15:29:35.629685: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-12-22 15:29:35.629733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-12-22 15:29:35.629788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-12-22 15:29:35.629837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-12-22 15:29:35.629886: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 15:29:35.657298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2020-12-22 15:29:35.659638: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-12-22 15:29:35.678371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-22 15:29:35.678447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 1 2 3 4 5 6 7 
2020-12-22 15:29:35.678500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N Y Y Y N N N N 
2020-12-22 15:29:35.678538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1:   Y N Y Y N N N N 
2020-12-22 15:29:35.678569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2:   Y Y N Y N N N N 
2020-12-22 15:29:35.678597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 3:   Y Y Y N N N N N 
2020-12-22 15:29:35.678624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 4:   N N N N N Y Y Y 
2020-12-22 15:29:35.678652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 5:   N N N N Y N Y Y 
2020-12-22 15:29:35.678678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 6:   N N N N Y Y N Y 
2020-12-22 15:29:35.678705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 7:   N N N N Y Y Y N 
2020-12-22 15:29:35.703703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10689 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
2020-12-22 15:29:35.711407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8534 MB memory) -> physical GPU (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0, compute capability: 3.7)
2020-12-22 15:29:35.716593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10689 MB memory) -> physical GPU (device: 2, name: Tesla K80, pci bus id: 0000:08:00.0, compute capability: 3.7)
2020-12-22 15:29:35.721879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10689 MB memory) -> physical GPU (device: 3, name: Tesla K80, pci bus id: 0000:09:00.0, compute capability: 3.7)
2020-12-22 15:29:35.726952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 10689 MB memory) -> physical GPU (device: 4, name: Tesla K80, pci bus id: 0000:84:00.0, compute capability: 3.7)
2020-12-22 15:29:35.732126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 10689 MB memory) -> physical GPU (device: 5, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
2020-12-22 15:29:35.736838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 10689 MB memory) -> physical GPU (device: 6, name: Tesla K80, pci bus id: 0000:88:00.0, compute capability: 3.7)
2020-12-22 15:29:35.740357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 108 MB memory) -> physical GPU (device: 7, name: Tesla K80, pci bus id: 0000:89:00.0, compute capability: 3.7)
2020-12-22 15:29:35.746472: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561e387dea00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-12-22 15:29:35.746517: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746537: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746577: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746594: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746614: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (4): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746645: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (5): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746664: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (6): Tesla K80, Compute Capability 3.7
2020-12-22 15:29:35.746694: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (7): Tesla K80, Compute Capability 3.7
starting training
Epoch 1/5
2020-12-22 15:29:48.307104: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-22 15:29:51.694232: W tensorflow/stream_executor/gpu/asm_compiler.cc:81] Running ptxas --version returned 256
2020-12-22 15:29:51.796020: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-12-22 15:29:52.577156: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
y_true:[[1.]
 [1.]
 [0.]
 [1.]
 [1.]
 [1.]
 [1.]
 [0.]
 [1.]
 [1.]]
y_pred:[[0.58956003 0.41043994]
 [0.63762885 0.36237112]
 [0.53731585 0.46268415]
 [0.5393683  0.4606317 ]
 [0.90735996 0.09264001]
 [0.552977   0.44702297]
 [0.7115651  0.28843486]
 [0.4068687  0.59313136]
 [0.5482196  0.4517804 ]
 [0.4330527  0.56694734]]
72/72 - 81s - loss: 0.9134 - accuracy: 0.5417 - utility_metric: 0.0000e+00
Epoch 2/5
72/72 - 81s - loss: 0.7027 - accuracy: 0.5847 - utility_metric: 0.0000e+00
Epoch 3/5
72/72 - 83s - loss: 0.6851 - accuracy: 0.5819 - utility_metric: 0.0000e+00
Epoch 4/5
72/72 - 83s - loss: 0.6810 - accuracy: 0.5944 - utility_metric: 0.0000e+00
Epoch 5/5
72/72 - 83s - loss: 0.6895 - accuracy: 0.5625 - utility_metric: 0.0000e+00
starting testing
y_true:[[1.]
 [1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [1.]
 [0.]
 [0.]
 [1.]]
y_pred:[[0.39538118 0.6046188 ]
 [0.39505056 0.6049495 ]
 [0.39406297 0.605937  ]
 [0.3947329  0.60526717]
 [0.3935887  0.60641134]
 [0.39452523 0.60547477]
 [0.39451653 0.6054835 ]
 [0.39475334 0.60524666]
 [0.39559898 0.604401  ]
 [0.3951175  0.60488254]]
90/90 - 37s - loss: 0.7157 - accuracy: 0.5000 - utility_metric: 0.0000e+00
script complete

输出中需要关注的部分是准确性：

训练纪元 1：0.5417

训练周期 2：0.5847

训练纪元 3：0.5819

训练纪元 4：0.5944

训练第 5 纪元：0.5625

评价：0.5000

我还在两种情况下包含了网络的原始输出。训练时一：

y_true:[[1.]
     [1.]
     [0.]
     [1.]
     [1.]
     [1.]
     [1.]
     [0.]
     [1.]
     [1.]]
y_pred:[[0.58956003 0.41043994]
     [0.63762885 0.36237112]
     [0.53731585 0.46268415]
     [0.5393683  0.4606317 ]
     [0.90735996 0.09264001]
     [0.552977   0.44702297]
     [0.7115651  0.28843486]
     [0.4068687  0.59313136]
     [0.5482196  0.4517804 ]
     [0.4330527  0.56694734]]

还有一个在测试期间：

y_true:[[1.]
     [1.]
     [0.]
     [0.]
     [0.]
     [1.]
     [1.]
     [0.]
     [0.]
     [1.]]
    y_pred:[[0.39538118 0.6046188 ]
     [0.39505056 0.6049495 ]
     [0.39406297 0.605937  ]
     [0.3947329  0.60526717]
     [0.3935887  0.60641134]
     [0.39452523 0.60547477]
     [0.39451653 0.6054835 ]
     [0.39475334 0.60524666]
     [0.39559898 0.604401  ]
     [0.3951175  0.60488254]]

我发现令人困惑的是，为什么在测试过程中，图像之间的输出变化似乎很小。这似乎与问题的根源有关，但我不知道是什么原因造成的。

我已经运行这个脚本很多次了，有些事情是一致的。评估过程中的准确性始终是完全偶然的。在评估期间 y_pred 始终存在较低的变化，并且所有输出似乎都是相同的标签（因此，例如，在评估期间，模型可能会将每个输入图像报告为“狗”）。

有时在训练期间，准确率会超过 60%。这并不影响问题。我可以继续增加数据集的大小和时期数，并尝试改进训练结果，但我担心在不首先理解为什么评估结果像现在这样奇怪的情况下继续前进。

我最近遇到了一个非常类似的问题MobileNetV3大模型 https://www.tensorflow.org/api_docs/python/tf/keras/applications/MobileNetV3Large.

问题出在设置时weights=None，它会重置所有参数，包括评估期间使用的 BatchNormalization 指标。

更重要的是，正如一位朋友向我指出的那样，默认的 BatchNormalization 动量设置为 0.999，这意味着仅在评估期间使用的 BatchNormalization 参数（在训练期间使用批量均值/方差）移动非常非常缓慢。

如果您在几个时期内训练数百万步，那就没问题了。对于小数据集，这些参数没有显着改变，评估全部被破坏。

如果您的问题和我的一样，快速解决方法是将所有 BatchNormalization 层的动量设置为 0.9。这可以通过这个简单的递归函数来实现：

def SetBatchNormalizationMomentum(model, new_value, prefix='', verbose=False):
  for ii, layer in enumerate(model.layers):
    if hasattr(layer, 'layers'):
      SetBatchNormalizationMomentum(layer, new_value, f'{prefix}Layer {ii}/', verbose)
      continue
    elif isinstance(layer, tf.keras.layers.BatchNormalization):
      if verbose:
        print(f'{prefix}Layer {ii}: name={layer.name} momentum={layer.momentum} --> set momentum={new_value}')
      layer.momentum = new_value

我希望这对你也有帮助——它在这里起作用了。

（已编辑）：在 MobileNet 中设置 BatchNorm 动量的代码here https://github.com/tensorflow/tensorflow/blob/85c8b2a817f95a3e979ecd1ed95bff1dc1335cff/tensorflow/python/keras/applications/mobilenet_v3.py#L509.

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

tensorflow

Keras

fit() 按预期工作，但在评估（）期间模型偶然执行的相关文章

无法加载动态库“libcudart.so.11.0”；

我尝试将 Tensorflow 2 7 0 与 GPU 结合使用但我不断遇到同样的问题 2022 02 03 08 32 31 822484 W tensorflow stream executor platform default ds
在相同任务上，Keras 比 TensorFlow 慢

我正在使用 Python 运行斩首 DCNN 本例中为 Inception V3 来获取图像特征我使用的是 Anaconda Py3 6 和 Windows7 使用 TensorFlow 时我将会话保存在变量中感谢 jdehesa 并
异常：加载数据时 URL 获取失败

我正在尝试设置我的机器来运行 Tensorflow 2 我从未使用过 Tensorflow 只是下载了 Python 3 7 我不确定这是否是我的机器的问题我按照上面列出的安装说明进行操作TensorFlow 的网站 https www
无法获取未知等级的 Shape 长度

我有一个神经网络来自tf data数据生成器和tf keras模型如下简化版本因为太长 dataset A tf data Dataset反对与next x方法调用get next为了x train迭代器和next y方法调用get
我可以在我的机器上同时安装 python 2.7 和 3.5 的tensorflow吗？

目前我通过 Anaconda 在我的机器 MAC OX 上安装了 Python 2 7 Python 3 5 Tensorflow for Python 3 5 我也想在我的机器上安装 Tensorflow for Python 2 7 当
张量流中的复杂卷积

我正在尝试运行一个简单的卷积但包含复数 r np random random 1 10 10 10 i np random random 1 10 10 10 x tf complex r i conv layer tf layers c
如何使用 Tensorflow-GPU 和 Keras 修复低易失性 GPU-Util？

我有一台 4 GPU 机器在上面运行带有 Keras 的 Tensorflow GPU 我的一些分类问题需要几个小时才能完成 nvidia smi returns Volatile GPU Util which never exceeds
如何使用一个模型中间层的输出作为另一个模型的输入？

我训练一个模型A并尝试使用中间层的输出name layer x 作为模型的附加输入B 我尝试像 Keras 文档一样使用中间层的输出https keras io getting started faq how can i obtain th
使用预训练的 word2vec 初始化 Seq2seq 嵌入

我对使用预训练的 word2vec 初始化tensorflow seq2seq 实现感兴趣我已经看过代码了嵌入似乎已初始化 with tf variable scope scope or embedding attention deco
使用大数据集在 Google Colab TPU 上训练 seq2seq 模型 - Keras

我正在尝试使用 Google Colab TPU 上的 Keras 训练用于机器翻译的序列到序列模型我有一个可以加载到内存中的数据集但我必须对其进行预处理才能将其提供给模型特别是我需要将目标单词转换为一个热向量并且在许多示例中我
在 Keras 中连接两个目录迭代器

假设我有类似以下内容 image data generator ImageDataGenerator rescale 1 255 train generator image data generator flow from director
TensorFlow 无法编译

尝试从源代码编译 TensorFlow 时出现以下错误任何想法都会有帮助 bazel out host bin solib local U S Stensorflow Spython Cgen Unn Uops Upy Uwrappers
TensorFlow的./configure在哪里以及如何启用GPU支持？

在我的 Ubuntu 上安装 TensorFlow 时我想将 GPU 与 CUDA 结合使用但我却停在了这一步官方教程 http www tensorflow org get started os setup md 这到底是哪里 con
在 Tensorflow 中使用 tf.while_loop 更新变量

我想更新 Tensorflow 中的变量因此我使用 tf while loop 例如 a tf Variable 0 0 0 0 0 0 dtype np int16 i tf constant 0 size tf size a def
LSTM - 一段时间后预测相同的常数值

我有一个变量我想预测未来 30 年的情况不幸的是我没有很多样品 df pd DataFrame FISCAL YEAR 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 199
AttributeError：模块“tensorflow.python.summary.summary”没有属性“FileWriter”

我收到此错误尽管我到处都看过file writer tf summary FileWriter path to logs sess graph 被提到为正确的实施this https github com tensorflow tenso
在keras自定义损失中使用层输出

我正在 Keras 中开发自定义损失函数我需要第一层输出我怎样才能取回它 def custom loss y true y pred cross K mean K binary crossentropy y true y pred ax
如何使用tensorFlow C++ API中的fileWrite摘要在Tensorboard中查看它

无论如何我是否可以获得与 FileWriter 相对应的张量名称以便我可以写出我的摘要以在 Tensorboard 中查看它们我的应用程序是基于C 的所以我必须使用C 来进行训练 FileWriter 不是张量 import ten
在按顺序读取的多个特征文件上训练 Keras 模型以节省内存

当我尝试读取大量功能文件时我遇到了内存问题见下文我想我应该分割训练文件并按顺序读取它们做到这一点的最佳方法是什么 x train np load path features x train npy y train np load p
将 Keras 集成到 SKLearn 管道？

我有一个 sklearn 管道对异构数据类型布尔分类数字文本执行特征工程并想尝试使用神经网络作为我的学习算法来拟合模型我遇到了输入数据形状的一些问题我想知道我想做的事情是否可能或者我是否应该尝试不同的方法我尝试了几种不

随机推荐

使用 websockets 和 SocketIO 防止“心跳超时”

我正在使用 NodeJS 和 SocketIO 作为我的 websocket 解决方案它工作正常但几分钟后我的套接字服务器总是超时并在控制台中显示以下消息 debug fired heartbeat timeout for clie
glReadPixels() 会消耗单个核心的所有CPU周期

我有一个带有 OpenGL 窗口的 SDL2 应用程序它表现良好当它运行时该应用程序与我的 60Hz 显示器同步并且我看到该应用程序的 CPU 使用率为 12 到目前为止一切都很好但是当我通过从深度缓冲区绘制后读取单个深
检查 list 是否包含任何其他列表

我有一个这样的参数列表 public class parameter public string name get set public string paramtype get set public string source get s
覆盖类路径中的默认 hadoop jar

我已经看到了许多使用用户类路径作为 hadoop 的先例的方法的表现形式通常情况下如果 m r 作业需要特定版本的库而 hadoop 恰好已经使用旧版本的库例如 jackson 的 json 解析器或 commons http 等
进行隧道事件的正确方法

EDIT 我想我问了一些 XY 问题我并不真正关心隧道事件的运行我关心的是事件的发生从父窗口后面的代码引发由该窗口子级的控件拾取并做出反应而无需明确告诉子级其父级是谁并手动订阅该事件我试图在父控件中引发一个事件并让子控件侦听该事件
4点透视变换失败

我一直在尝试进行 4 点透视变换以便开始进行一些 OCR 操作从下图开始我可以检测到车牌号并将其裁剪出来绿色框是边界框红点是我想要正方形的矩形的角这是变换的输出乍一看它似乎已经完成了由内而外的变换将零件放在两侧而不是点之间
如何使用 matplotlib 或 graphviz 在 networkx 中绘制多重图

当我将多图 numpy 邻接矩阵传递给 networkx 时使用 from numpy matrix 函数然后尝试使用 matplotlib 绘制图形它会忽略多条边我怎样才能让它绘制多条边 Graphviz 在绘制平行边方面做得很好
将附加 JPanel 添加到 JPanel

伙计们这是一个相当基本的问题基本上我有这样的代码 public SuperPanel setLayout new BorderLayout add panel1 BorderLayout NORTH add panel2 BorderL
文本小部件内的 Tkinter 检查按钮和滚动

使用中找到的代码这个 stackoverflow 帖子 https stackoverflow com questions 5860675 variable size list of checkboxes in standard tkint
是否可以刷新视图中的ViewBag值？

我正在尝试做一个动态下拉列表我从数据库中获取下拉列表的选项并将它们放入对象列表中根据复选框值我从列表中删除对象并将该列表设置为 ViewBag 值 public ActionResult ThematicManagement stri
使用 >450K 实例训练 Dlib 对象检测

dlib 是否能够使用大规模数据集来训练目标检测器我有超过 450K 的面部图像来训练面部检测器是否可以使用 Dlib 或者我需要转向另一个替代方案您可以使用多少数据取决于您的计算机中有多少 RAM 因此也许您可以根据每个图像的大
Mongodb，$sum 有条件

文件 name abc length 25 area 10 name abc length 5 聚合查询后的输出 count 2 summarizedLength 30 summarizedArea null id name abc The
字典更新序列元素#0的长度为3； 2 为必填项

我想向对象添加线条account bank statement line通过其他对象但我收到以下错误字典更新序列元素 0 的长度为 3 需要 2 这是我的代码 def action account line create self cr
System.TypeLoadException Microsoft.VisualBasic ASP.NET Core 2

Microsoft VisualBasic 程序集与 ASP NET Core2 不兼容吗我有一个 C 类库它提供了一种读取 CSV 文件的方法我选择使用 Microsoft VisualBasic FileIO TextFieldP
如何在另一个微调器打开时一键打开一个微调器

我有几个spinners在表格上当我单击其中任何一个时它会打开并显示选项当我点击其他任何一个spinner它关闭了打开的spinner但随后我需要再次单击所需的spinner以便打开它我想捕获第二个微调器的第一次单击以便我可以关闭
用于查找不属于超链接的文本的正则表达式

我试图找到一个可用于解析 HTML 块以查找某些特定文本的单个正则表达式但前提是该文本不是现有超链接的一部分我想把非链接变成链接这很容易但是用单个表达式识别非链接似乎比较麻烦在以下示例中 This problem is a res
使用 --code-coverage 运行时，Mockery 失败并显示“无法加载模拟...类已存在”

我正在尝试模拟 phpunit 的类 Php 单元失败并出现错误Could not load mock class already exists 这是我正在运行的唯一测试因此该类不可能已经被模拟任何建议将不胜感激这是错误情况 name
Win32 C++ 控制台清屏而不闪烁

我见过一些主机游戏屏幕会自行刷新清除而不会出现烦人的闪烁我已经尝试了很多解决方案这是我目前得到的 while true if screenChanged if something needs to be drawn on new
intel oneAPI 基础安装中 ifort 的调用过程是怎样的？

我正在尝试使用 intel 的数学内核库及其 fortran 编译器该编译器内置于 oneAPI 基础套件中然而命令 ifort 不起作用因为终端抱怨系统中找不到它我已经使用命令 source intel oneapi setva
fit() 按预期工作，但在评估（）期间模型偶然执行

我注意到一个问题在评估期间我没有看到基于fit 结果的预期结果我在网上发现了很多讨论人们都有类似的问题例如 this https github com keras team keras issues 6977开放问题讨论了 dr

fit() 按预期工作，但在评估（）期间模型偶然执行

fit() 按预期工作，但在评估（）期间模型偶然执行 的相关文章

随机推荐

热门标签

fit() 按预期工作，但在评估（）期间模型偶然执行的相关文章