xgboost 多类工作中的 base_score 有什么用?


我正在尝试探索 Xgboost 二元分类以及多类的工作原理。 在二进制类的情况下,我观​​察到基本分数被认为是起始概率,并且在计算时也显示出重大影响Gain and Cover.

在多类别的情况下,我无法弄清楚的重要性基本分数参数,因为它向我显示了相同的值Gain and Cover对于不同的(任何)base_score 值。

我也无法找出原因因数 2计算时有吗cover对于多类,即2*p*(1-p)


为了回答你的问题,让我们看看多类分类在 xgboost 中到底做了什么multi:softmax目标,比如说 6 个类别。

假设您想训练一个分类器,指定num_boost_round=5。您希望 xgboost 为您训练多少棵树?正确答案是 30 棵树。原因是因为 softmax 期望每个训练行都有num_classes=6不同的分数,以便 xgboost 可以计算梯度/hessian w.r.t.这 6 个分数中的每一个分数,并使用它们为每个分数构建一棵新树(有效更新 6 个并行模型,以便为每个样本输出 6 个更新分数)。

为了要求 xgboost 分类器输出每个样本的最终 6 个值,例如从测试集中,您需要调用bst.predict(xg_test, output_margin=True) (where bst是你的分类器并且xg_test例如测试集)。正则的输出bst.predict(xg_test)实际上与选择具有最高值 6 的类别相同bst.predict(xg_test, output_margin=True).


现在的问题是什么base_score do in multi:softmax案件。答案是 - 在添加任何树之前,将其添加为 6 个类别中每个类别分数的起始分数。所以如果你,例如申请base_score=42.您将能够观察到所有值bst.predict(xg_test, output_margin=True)也将增加42。同时对于softmax将所有班级的分数增加等量不会改变任何事情,因此在这种情况下multi:softmax申请base_score不同于 0 没有任何明显的效果。

将此行为与二元分类进行比较。虽然几乎相同multi:softmax对于 2 个类别,最大的区别在于 xgboost 只尝试为类别 1 生成 1 分,而类别 0 的分数等于0.0。因为当你使用base_score在二元分类中,它仅添加到类别 1 的分数中,从而增加类别 1 的起始预测概率。理论上,对于多个类别,例如通过多个基本分数(每个班级一个),这是你无法使用的base_score。相反,你可以使用set_base_margin应用于训练集的功能,但默认情况下工作不太方便predict,所以之后你需要始终使用它output_margin=True并添加与您使用的值相同的值set_base_margin用于您的训练数据(如果您想使用set_base_margin在多类情况下,您需要按照建议压平边距值here).


import numpy as np
import xgboost as xgb
TRAIN = 1000
TEST = 2
F = 10

def gen_data(M):
    np_train_features = np.random.rand(M, F)
    np_train_labels = np.random.binomial(2, np_train_features[:,0])
    return xgb.DMatrix(np_train_features, label=np_train_labels)

def regenerate_data():
    return gen_data(TRAIN), gen_data(TEST)

param = {}
param['objective'] = 'multi:softmax'
param['eta'] = 0.001
param['max_depth'] = 1
param['nthread'] = 4
param['num_class'] = 3

def sbm(xg_data, original_scores):
    xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1, 1))

num_round = 3

print("#1. No base_score, no set_base_margin")
xg_train, xg_test = regenerate_data()
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print("Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.")
bst1 = bst

print("#2. Use base_score")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print("In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.")
bst2 = bst

print("#3. Use very large base_score and screw up numeric precision")
xg_train, xg_test = regenerate_data()
param['base_score'] = 5.8e10
bst = xgb.train(param, xg_train, num_round)
print(bst.predict(xg_test, output_margin=True))
print("In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.")
print("But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).")
xg_train, xg_test = regenerate_data() # if we don't regenerate the dataframe here xgboost seems to be either caching it or somehow else remembering that it didn't have base_margins and result will be different.
sbm(xg_test, [0.1, 0.1, 0.1])
print(bst.predict(xg_test, output_margin=True))
bst3 = bst

print("#4. Use set_base_margin for training")
xg_train, xg_test = regenerate_data()
# only used in train/test whenever set_base_margin is not applied.
# Peculiar that trained model will remember this value even if it was trained with
# dataset which had set_base_margin. In that case this base_score will be used if
# and only if test set passed to `bst.predict` didn't have `set_base_margin` applied to it.
param['base_score'] = 4.2
sbm(xg_train, [-0.4, 0., 0.8])
bst = xgb.train(param, xg_train, num_round)
sbm(xg_test, [-0.4, 0., 0.8])
print(bst.predict(xg_test, output_margin=True))
print("Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.")
print("If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.")
xg_train, xg_test = regenerate_data() # regenerate test and don't set the base margin values
print(bst.predict(xg_test, output_margin=True))
bst4 = bst

print("Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.")


#1. No base_score, no set_base_margin
[[0.50240415 0.5003637  0.49870378]
 [0.49863306 0.5003637  0.49870378]]
[0. 1.]
Easy to see that in this case all scores/margins have 0.5 added to them initially, which is default value for base_score here for some bizzare reason, but it doesn't really affect anything, so no one cares.

#2. Use base_score
[[5.8024044 5.800364  5.798704 ]
 [5.798633  5.800364  5.798704 ]]
[0. 1.]
In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.

#3. Use very large base_score and screw up numeric precision
[[5.8e+10 5.8e+10 5.8e+10]
 [5.8e+10 5.8e+10 5.8e+10]]
[0. 0.]
In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.
But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).
[[0.10240632 0.10036398 0.09870315]
 [0.09863247 0.10036398 0.09870315]]
[0. 1.]

#4. Use set_base_margin for training
[[-0.39458954  0.00102317  0.7973728 ]
 [-0.40044016  0.00102317  0.7973728 ]]
[2. 2.]
Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.
If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare, right? But then again, not much difference on what to add here if we are adding same value to all classes' scores.
[[4.2054105 4.201023  4.1973724]
 [4.1995597 4.201023  4.1973724]]
[0. 1.]

Trees bst1, bst2, bst3 are almost identical, because there is no difference in how they were trained. bst4 is different though.
Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1, dtype: object

Tree                0
Node                1
ID                0-1
Feature          Leaf
Split             NaN
Yes               NaN
No                NaN
Missing           NaN
Gain       0.00180733
Cover         100.858
Name: 1, dtype: object

xgboost 多类工作中的 base_score 有什么用? 的相关文章
