问题是报道的validation accuracy
我从 Keras 获得的价值model.fit
历史显着高于validation accuracy
我得到的指标sklearn.metrics
功能。
我得到的结果model.fit
总结如下:
Last Validation Accuracy: 0.81
Best Validation Accuracy: 0.84
结果(标准化)来自sklearn
非常不同:
True Negatives: 0.78
True Positives: 0.77
Validation Accuracy = (TP + TN) / (TP + TN + FP + FN) = 0.775
(see confusion matrix below for reference)
Edit: this calculation is incorrect, because one can not
use the normalized values to calculate the accuracy, since
it does not account for differences in the total absolute
number of points in the dataset. Thanks to the comment by desertnaut
我觉得这个问题和this有点相似Sklearn 指标值与 Keras 值有很大不同 https://stackoverflow.com/questions/54580679/sklearn-metrics-values-are-very-different-from-keras-values但我已经检查过这两种方法都在同一数据池上进行验证,因此这个答案可能不适合我的情况。
还有这个问题Keras 二进制精度度量给出的精度太高 https://stackoverflow.com/questions/46354182/keras-binary-accuracy-metric-gives-too-high-accuracy似乎解决了二元交叉熵影响多类问题的方式的一些问题,但在我的情况下它可能不适用,因为它是一个真正的二元分类问题。
以下是使用的命令:
型号定义:
inputs = Input((Tx, ))
n_e = 30
embeddings = Embedding(n_x, n_e, input_length=Tx)(inputs)
out = Bidirectional(LSTM(32, recurrent_dropout=0.5, return_sequences=True))(embeddings)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5, return_sequences=True))(out)
out = Bidirectional(LSTM(16, recurrent_dropout=0.5))(out)
out = Dense(3, activation='softmax')(out)
modelo = Model(inputs=inputs, outputs=out)
modelo.summary()
型号概要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 100) 0
_________________________________________________________________
embedding (Embedding) (None, 100, 30) 86610
_________________________________________________________________
bidirectional (Bidirectional (None, 100, 64) 16128
_________________________________________________________________
bidirectional_1 (Bidirection (None, 100, 32) 10368
_________________________________________________________________
bidirectional_2 (Bidirection (None, 32) 6272
_________________________________________________________________
dense (Dense) (None, 3) 99
=================================================================
Total params: 119,477
Trainable params: 119,477
Non-trainable params: 0
_________________________________________________________________
模型编译:
mymodel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
模型拟合调用:
num_epochs = 30
myhistory = mymodel.fit(X_pad, y, epochs=num_epochs, batch_size=50, validation_data=[X_val_pad, y_val_oh], shuffle=True, callbacks=callbacks_list)
模型拟合日志:
Train on 505 samples, validate on 127 samples
Epoch 1/30
500/505 [============================>.] - ETA: 0s - loss: 0.6135 - acc: 0.6667
[...]
Epoch 10/30
500/505 [============================>.] - ETA: 0s - loss: 0.1403 - acc: 0.9633
Epoch 00010: val_acc improved from 0.77953 to 0.79528, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 41ms/sample - loss: 0.1393 - acc: 0.9637 - val_loss: 0.5203 - val_acc: 0.7953
Epoch 11/30
500/505 [============================>.] - ETA: 0s - loss: 0.0865 - acc: 0.9840
Epoch 00011: val_acc did not improve from 0.79528
505/505 [==============================] - 21s 41ms/sample - loss: 0.0860 - acc: 0.9842 - val_loss: 0.5257 - val_acc: 0.7953
Epoch 12/30
500/505 [============================>.] - ETA: 0s - loss: 0.0618 - acc: 0.9900
Epoch 00012: val_acc improved from 0.79528 to 0.81102, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0615 - acc: 0.9901 - val_loss: 0.5472 - val_acc: 0.8110
Epoch 13/30
500/505 [============================>.] - ETA: 0s - loss: 0.0415 - acc: 0.9940
Epoch 00013: val_acc improved from 0.81102 to 0.82152, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0413 - acc: 0.9941 - val_loss: 0.5853 - val_acc: 0.8215
Epoch 14/30
500/505 [============================>.] - ETA: 0s - loss: 0.0443 - acc: 0.9933
Epoch 00014: val_acc did not improve from 0.82152
505/505 [==============================] - 21s 42ms/sample - loss: 0.0453 - acc: 0.9921 - val_loss: 0.6043 - val_acc: 0.8136
Epoch 15/30
500/505 [============================>.] - ETA: 0s - loss: 0.0360 - acc: 0.9933
Epoch 00015: val_acc improved from 0.82152 to 0.84777, saving model to modelo-10-melhor-modelo.hdf5
505/505 [==============================] - 21s 42ms/sample - loss: 0.0359 - acc: 0.9934 - val_loss: 0.5663 - val_acc: 0.8478
[...]
Epoch 30/30
500/505 [============================>.] - ETA: 0s - loss: 0.0039 - acc: 1.0000
Epoch 00030: val_acc did not improve from 0.84777
505/505 [==============================] - 20s 41ms/sample - loss: 0.0039 - acc: 1.0000 - val_loss: 0.8340 - val_acc: 0.8110
sklearn 的混淆矩阵:
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_values, predicted_values)
预测值和黄金值确定如下:
preds = mymodel.predict(X_val)
preds_ints = [[el] for el in np.argmax(preds, axis=1)]
values_pred = tokenizer_y.sequences_to_texts(preds_ints)
values_gold = tokenizer_y.sequences_to_texts(y_val)
最后,我想补充一点,我已经打印出了数据和所有预测错误,并且我相信 sklearn 值更可靠,因为它们似乎与我打印出保存的“最佳”模型的预测得到的结果相匹配。
另一方面,我无法理解这些指标怎么会如此不同。由于它们都是众所周知的软件,因此我断定我是犯错误的人,但我无法确定错误的位置或方式。