这个例子不能完全重现,因为内容"Prali Marble.xlsx"
不包括在内。
However, I can reproduce a 0.0 R2 with the following code that I think closely matches your example. Similar to your code, this trains a LightGBM regression model on a dataset with a single feature.
这段代码使用lightgbm
Python 3.8 上的 3.1.1。
import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import r2_score,mean_squared_error as MSE
X = pd.DataFrame({
"feat1": np.append(np.repeat(0.5, 99), np.ones(1))
})
Y = np.random.random(100, )
lgb_r = lgb.LGBMRegressor()
lgb_r.fit(X,Y)
y_pred = lgb_r.predict(X)
print("LGBM R2_SCORE:", r2_score(Y, lgb_r.predict(X)))
LGBM R2_SCORE:0.0
In this case, the R2 is 0 because the model is just predicting the mean of Y
. You can see this by examining the structure of the model.
lgb_r.booster_.trees_to_dataframe()
这将返回 1 行数据帧,当 LightGBM 不添加任何树时会发生这种情况。
LightGBM 有一些参数用于防止过度拟合。这里有两个相关的:
-
min_data_in_leaf https://lightgbm.readthedocs.io/en/latest/Parameters.html#min_data_in_leaf(默认=20)
-
min_sum_hessian_in_leaf https://lightgbm.readthedocs.io/en/latest/Parameters.html#min_sum_hessian_in_leaf(默认=0.001)
您可以通过将这些参数设置为 0 来告诉 LightGBM 忽略这些过拟合保护。
import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.metrics import r2_score
X = pd.DataFrame({
"feat1": np.append(np.repeat(0.5, 99), np.ones(1))
})
Y = np.random.random(100, )
lgb_r = lgb.LGBMRegressor(
min_data_in_leaf=0,
min_sum_hessian_in_leaf=0.0
)
lgb_r.fit(X,Y)
y_pred = lgb_r.predict(X)
print("LGBM R2_SCORE:", r2_score(Y, lgb_r.predict(X)))