Lightgbm 中“is_unbalance”参数的使用

2024-02-03

I am trying to use the 'is_unbalance' parameter in my model training for a binary classification problem where the positive class is approximately 3%. If I set the parameter 'is_unbalance', I observe that the binary log loss drops in the first iteration but then keeps on increasing. I'm noticing this behavior only if I enable this parameter 'is_unbalance'. Otherwise, there is a steady drop in log_loss. Appreciate your help on this. Thanks.enter image description here


当您不平衡这种不平衡数据集的集合时,显然目标值总是会下降 - 并且可能会达到将所有预测分类为多数类的程度,同时具有出色的目标值。

平衡类别是必要的,但这并不意味着您应该停止is_unbalanced- 您可以使用sample_pos_weight,具有自定义指标,或对样本应用权重,如下所示:

WEIGHTS = y_train.value_counts(normalize = True).min() / y_train.value_counts(normalize = True)
TRAIN_WEIGHTS = pd.DataFrame(y_train.rename('old_target')).merge(WEIGHTS, how = 'left', left_on = 'old_target', right_on = WEIGHTS.index).target.values
train_data = lgb.Dataset(X_train, label=y_train, weight = TRAIN_WEIGHTS)

此外,优化其他超参数应该可以解决增加的问题log_loss.

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Lightgbm 中“is_unbalance”参数的使用 的相关文章

随机推荐