如何为每个输入预先计算一个掩码并根据该掩码调整权重？

2024-01-21

我想提供一个与输入图像大小相同的掩码，并根据该掩码调整从图像中学习到的权重（类似于注意力，但为每个图像输入预先计算）。如何使用 Keras（或 TensorFlow）做到这一点？

Question

如何向图像添加另一个特征层（例如掩模），并让神经网络考虑这个新的特征层？

Answer

简短的答案是将其作为另一个颜色通道添加到图像中。如果您的图像已有 3 个颜色通道；红色、蓝色、绿色，然后添加另一个掩码 1 和 0 通道，为神经网络提供更多信息用于做出决策。

思想实验

作为一个思想实验，我们来处理 MNIST。 MNIST 图像为 28x28。让我们使用 1 个图像（“真实”图像）和另外 3 个图像（“干扰”图像），将 4 个 28x28 图像组成一个 56x56 图像。 MNIST 是黑白的，因此它只有 1 个颜色通道，即亮度。现在让我们添加另一个颜色通道，它是一个遮罩，在“真实”图像所在的 56x56 图像区域中为 1，在其他位置为 0。

如果我们使用与通常解决 MNIST 相同的架构，一直向下进行卷积，我们可以想象它可以使用这些新信息来学习只关注“真实”区域并正确对图像进行分类。

代码示例

在此示例中，我们尝试解决 XOR 问题。我们采用经典的异或运算，将带有噪声的输入加倍，并添加一个通道，其中 1 表示非噪声，0 表示噪声


# Adapted from https://github.com/panchishin/learn-to-tensorflow/blob/master/solutions/04-xor-2d.py

# -- The xor problem --
x = np.array([[0., 0.], [1., 1.], [1., 0.], [0., 1.]])
y_ = [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]


def makeBatch() :
    # Add an additional 2 channels of noise
    # either before or after the two real 'x's.
    global x
    rx = np.random.rand(4,4,2) > 0.5
    # set the mask to 0 for all items
    rx[:,:,1] = 0
    index = int(np.random.random()*3)
    rx[:,index:index+2,0] = x
    # set the mask to 1 for 'real' values
    rx[:,index:index+2,1] = 1
    return rx

# -- imports --
import tensorflow as tf

# np.set_printoptions(precision=1) reduces np precision output to 1 digit
np.set_printoptions(precision=2, suppress=True)


# -- induction --

# Layer 0
x0 = tf.placeholder(dtype=tf.float32, shape=[None, 4, 2])
y0 = tf.placeholder(dtype=tf.float32, shape=[None, 2])

# Layer 1
f1 = tf.reshape(x0,shape=[-1,8])
m1 = tf.Variable(tf.random_uniform([8, 9], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([9], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(f1, m1) + b1)

# Layer 2
m2 = tf.Variable(tf.random_uniform([9, 2], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([2], minval=0.1, maxval=0.9, dtype=tf.float32))
y_out = tf.nn.softmax(tf.matmul(h1, m2) + b2)


# -- loss --

# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y0 - y_out))

# training step : gradient descent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)



# -- training --
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    print("\nloss")
    for step in range(5000):
        sess.run(train, feed_dict={x0: makeBatch(), y0: y_})
        if (step + 1) % 1000 == 0:
            print(sess.run(loss, feed_dict={x0: makeBatch(), y0: y_}))

    results = sess.run([m1, b1, m2, b2, y_out, loss], feed_dict={x0: makeBatch(), y0: y_})
    labels = "m1,b1,m2,b2,y_out,loss".split(",")
    for label, result in zip(*(labels, results)):
        print("")
        print(label)
        print(result)

print("")

Output

我们可以看到网络正确地解决了问题并给出了高确定性的正确输出

y_（真值）= [[1., 0.], [1., 0.], [0., 1.], [0., 1.]]

y_out
[[0.99 0.01]
 [0.99 0.01]
 [0.01 0.99]
 [0.01 0.99]]

loss
0.00056630466

确认面具正在做某事

让我们通过注释掉将 0 表示噪声、将 1 表示信号的行来更改掩码函数，使其成为随机的

def makeBatch() :
    global x
    rx = np.random.rand(4,4,2) > 0.5
    #rx[:,:,1] = 0
    index = int(np.random.random()*3)
    rx[:,index:index+2,0] = x
    #rx[:,index:index+2,1] = 1
    return rx

然后重新运行代码。事实上，我们可以看到，如果没有掩模，网络就无法学习。

y_out
[[0.99 0.01]
 [0.76 0.24]
 [0.09 0.91]
 [0.58 0.42]]

loss
0.8080765

结论

如果图像（或其他数据结构）中有一些信号和噪声，并成功添加另一个通道（掩码）来指示信号所在位置和噪声所在位置，则神经网络可以利用该掩码来关注信号但仍然可以接触到噪音。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

tensorflow

Keras

convneuralnetwork

attentionmodel