我有包含文本数据的数据框,如下所示,
name | address | number
1 Bob bob No.56
2 @gmail.com
3 Carly [email protected] /cdn-cgi/l/email-protection No.90
4 Gorge greg@yahoo
5 .com
6 No.100
并想把它做成这个框架。
name | address | number
1 Bob [email protected] /cdn-cgi/l/email-protection No.56
2 Carly [email protected] /cdn-cgi/l/email-protection No.90
3 Gorge [email protected] /cdn-cgi/l/email-protection No.100
我正在使用 pandas 读取文件,但不确定如何使用合并或连接。
的情况下name
列由唯一值组成,
print df
name address number
0 Bob bob No.56
1 NaN @gmail.com NaN
2 Carly [email protected] /cdn-cgi/l/email-protection No.90
3 Gorge greg@yahoo NaN
4 NaN .com NaN
5 NaN NaN No.100
df['name'] = df['name'].ffill()
print df.fillna('').groupby(['name'], as_index=False).sum()
name address number
0 Bob [email protected] /cdn-cgi/l/email-protection No.56
1 Carly [email protected] /cdn-cgi/l/email-protection No.90
2 Gorge [email protected] /cdn-cgi/l/email-protection No.100
你可能需要ffill()
, bfill()
, [::-1]
, .groupby('name').apply(lambda x: ' '.join(x['address']))
, strip()
, lstrip()
, rstrip()
, replace()
将上面的代码扩展到更复杂的数据。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)