当您指定时,插入符确实支持将类概率传递给自定义汇总函数classProbs = TRUE
in trainControl
。在这种情况下data
创建自定义汇总函数时的参数将具有额外的两列,名为类,其中包含每个类的概率。这些类的名称将在lev
参数是长度为 2 的向量。
参见示例:
library(caret)
library(mlbench)
data(Sonar)
自定义总结LogLoss:
LogLoss <- function (data, lev = NULL, model = NULL){
obs <- data[, "obs"] #truth
cls <- levels(obs) #find class names
probs <- data[, cls[2]] #use second class name to extract probs for 2nd clas
probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability, this line and bellow is just logloss calculation, irrelevant for your question
logPreds <- log(probs)
log1Preds <- log(1 - probs)
real <- (as.numeric(data$obs) - 1)
out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
names(out) <- c("LogLoss") #important since this is specified in call to train. Output can be a named vector of multiple values.
out
}
fitControl <- trainControl(method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = LogLoss)
fit <- train(Class ~.,
data = Sonar,
method = "rpart",
metric = "LogLoss" ,
tuneLength = 5,
trControl = fitControl,
maximize = FALSE) #important, depending on calculated performance measure
fit
#output
CART
208 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 166, 166, 166, 167, 167
Resampling results across tuning parameters:
cp LogLoss
0.00000000 1.1220902
0.01030928 1.1220902
0.05154639 1.1017268
0.06701031 1.0694052
0.48453608 0.6405134
LogLoss was used to select the optimal model using the smallest value.
The final value used for the model was cp = 0.4845361.
或者使用lev
包含类级别并定义一些错误检查的参数
LogLoss <- function (data, lev = NULL, model = NULL){
if (length(lev) > 2) {
stop(paste("Your outcome has", length(lev), "levels. The LogLoss() function isn't appropriate."))
}
obs <- data[, "obs"] #truth
probs <- data[, lev[2]] #use second class name
probs <- pmax(pmin(as.numeric(probs), 1 - 1e-15), 1e-15) #bound probability
logPreds <- log(probs)
log1Preds <- log(1 - probs)
real <- (as.numeric(data$obs) - 1)
out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
names(out) <- c("LogLoss")
out
}
查看插入符书的这一部分:https://topepo.github.io/caret/model-training-and-tuning.html#metrics
获取更多信息。如果您打算使用插入符号,即使您不这样做,这本书也值得一读。