问题出在你的输入上。你的输入是有形状的(100, 64)
其中第一个维度是时间步长。所以忽略这一点,你的输入是有形状的(64)
to a Conv1D
.
现在,请参考Keras Conv1D 文档 https://keras.io/layers/convolutional/#conv1d,它表明输入应该是 3D 张量(batch_size, steps, input_dim)
。忽略batch_size
,你的输入应该是一个二维张量(steps, input_dim)
.
因此,您提供一维张量输入,其中输入的预期大小是二维张量。例如,如果您向Conv1D
以单词的形式,那么你的句子中有 64 个单词,假设每个单词都用长度为 50 的向量进行编码,那么你的输入应该是(64, 50)
.
另外,请确保您向 LSTM 提供正确的输入,如下面的代码所示。
所以,正确的代码应该是
embedding_size = 50 # Set this accordingingly
mfcc_input = Input(shape=(100, 64, embedding_size), dtype='float', name='mfcc_input')
CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)
# Directly feeding CNN_out to LSTM will also raise Error, since the 3rd dimension is 1, you need to purge it as
CNN_out = Reshape((int(CNN_out.shape[1]), int(CNN_out.shape[3])))(CNN_out)
LSTM_out = LSTM(64,return_sequences=True)(CNN_out)
... (more code) ...