


  1. 获取精确召回阈值
  2. 对于每个阈值,对连续 y_scores 进行二值化
  3. 从列联表(混淆矩阵)计算它们的准确性
  4. 返回阈值的平均准确度

    recall, precision, thresholds = precision_recall_curve(np.array(np_y_true), np.array(np_y_scores))
    accuracy = 0
    for threshold in thresholds:
        contingency_table = confusion_matrix(np_y_true, binarize(np_y_scores, threshold=threshold)[0])
        accuracy += (float(contingency_table[0][0]) + float(contingency_table[1][1]))/float(np.sum(contingency_table))
    print "Classification accuracy is: {}".format(accuracy/len(thresholds))

您正朝着正确的方向前进。 混淆矩阵绝对是计算分类器准确性的正确起点。在我看来,您的目标是接收器的操作特性。

在统计学中,接收器操作特性 (ROC) 或 ROC 曲线是说明二元分类器系统在判别阈值变化时的性能的图形。https://en.wikipedia.org/wiki/Receiver_operating_characteristic https://en.wikipedia.org/wiki/Receiver_operating_characteristic


https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it

http://mlwiki.org/index.php/ROC_Analysis http://mlwiki.org/index.php/ROC_Analysis


def auc(y_true, y_val, plot=False):  
#check input
if len(y_true) != len(y_val):
    raise ValueError('Label vector (y_true) and corresponding value vector (y_val) must have the same length.\n')
#empty arrays, true positive and false positive numbers
tp = []
fp = []
#count 1's and -1's in y_true
cond_positive = list(y_true).count(1)
cond_negative = list(y_true).count(-1)
#all possibly relevant bias parameters stored in a list
bias_set = sorted(list(set(y_val)), key=float, reverse=True)

#initialize y_pred array full of negative predictions (-1)
y_pred = np.ones(len(y_true))*(-1)

#the computation time is mainly influenced by this for loop
#for a contamination rate of 1% it already takes ~8s to terminate
for bias in bias_set:
    #"lower values tend to correspond to label −1"
    #indices of values which exceed the bias
    posIdx = np.where(y_val > bias)
    #set predicted values to 1
    y_pred[posIdx] = 1
    #the following function simply calculates results which enable a distinction 
    #between the cases of true positive and  false positive
    results = np.asarray(y_true) + 2*np.asarray(y_pred)
    #append the amount of tp's and fp's

#calculate false positive/negative rate
tpr = np.asarray(tp)/cond_positive
fpr = np.asarray(fp)/cond_negative
#optional scatterplot
if plot == True:
#calculate AUC
AUC = np.trapz(tpr,fpr)

return AUC

