我们可以将错误追溯到predict_model
,这称为predict.textmodel_nb_fitted
(我只使用了前 10 行train_raw
以加快计算速度):
traceback()
# 7: stop("feature set in newdata different from that in training set")
# 6: predict.textmodel_nb_fitted(x, newdata = newdata, type = type,
# ...)
# 5: predict(x, newdata = newdata, type = type, ...)
# 4: predict_model.default(explainer$model, case_perm, type = o_type)
# 3: predict_model(explainer$model, case_perm, type = o_type)
# 2: explain.data.frame(train_raw[1:10, 1:5], explainer, n_labels = 1,
# n_features = 5, cols = 2, verbose = 0)
# 1: lime::explain(train_raw[1:10, 1:5], explainer, n_labels = 1,
# n_features = 5, cols = 2, verbose = 0)
问题是predict.textmodel_nb_fitted
需要 dfm,而不是数据框。例如,predict(nb_model, test_raw[1:5])
给你相同的“新数据中的特征集与训练集中的特征集不同”错误。然而,explain
将数据框作为其x
争论。
解决办法是写一个自定义的textmodel_nb_fitted
方法用于predict_model
在调用之前进行必要的对象转换predict.textmodel_nb_fitted
:
predict_model.textmodel_nb_fitted <- function(x, newdata, type, ...) {
X <- corpus(newdata)
X <- dfm_select(dfm(X), x$data$x)
res <- predict(x, newdata = X, ...)
switch(
type,
raw = data.frame(Response = res$nb.predicted, stringsAsFactors = FALSE),
prob = as.data.frame(res$posterior.prob, check.names = FALSE)
)
}
这给了我们
explanation <- lime::explain(train_raw[1:10, 1:5],
explainer,
n_labels = 1,
n_features = 5,
cols = 2,
verbose = 0)
explanation[1, 1:5]
# model_type case label label_prob model_r2
# 1 classification 1 FALSE 0.9999986 0.001693861