如果缺少列值,则替换为 pandas 数据框中的增量值

2024-04-20

输入数据框:

max_value = 16
x_max = max_value
data = {

's_id' :['G1','','','','G2','G3','G3','G4','','','']    

}
df2 = pd.DataFrame.from_dict(data)
df2
Out[365]: 
   s_id
0    G1
1      
2      
3      
4    G2
5    G3
6    G3
7    G4
8      
9      
10     

输出数据帧:

    data = {

's_id' :['G1','G17','G18','G19','G2','G3','G3','G4','G20','G21','G22']    

}
df3 = pd.DataFrame.from_dict(data)
df3

Out[366]: 
   s_id
0    G1
1   G17
2   G18
3   G19
4    G2
5    G3
6    G3
7    G4
8   G20
9   G21
10  G22

我尝试了以下方法: df2['s_id'] = df2['s_id'].mask(df2['s_id'].eq(''))

s = df2[df2['s_id'].isna()].drop_duplicates()

类型错误:不可散列的类型:“列表”

d = {v: f'G{k}' for k, v in enumerate(s, x_max + 1)}
print (d)

如何实现输出数据帧,如果 S_ID 为空,则将其替换为外部变量的最大值。检查 s_id 列的值是否替换为外部变量的增量值。例如:在 G1 之后的 s_id 列中,它必须是 G17,即 max_value +1,


想法就是创造list大小与空值数量相同range并通过掩码将值设置为列DataFrame.loc http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html:

df2 = pd.DataFrame.from_dict(data)

m = df2['s_id'].eq('')
v = [f'G{x}' for x in range(x_max+1, x_max + m.sum()+1)]
print (v)
['G17', 'G18', 'G19', 'G20', 'G21', 'G22']

df2.loc[m, 's_id'] = v
print (df2)
   s_id
0    G1
1   G17
2   G18
3   G19
4    G2
5    G3
6    G3
7    G4
8   G20
9   G21
10  G22

来自@Jon Clements 的解决方案,谢谢:

df2['s_id'] = df2['s_id'].apply(lambda v, c=itertools.count(x_max + 1): v or f'G{next(c)}')
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如果缺少列值,则替换为 pandas 数据框中的增量值 的相关文章

随机推荐