在 pandas 数据框中分割包含 NaN 和不包含 NaN 的行的最有效方法。
input :- ID Gender Dependants Income Education Married
1 Male 2 500 Graduate Yes
2 NaN 4 2500 Graduate No
3 Female 3 NaN NaN Yes
4 Male NaN 7000 Graduate Yes
5 Female 4 500 Graduate NaN
6 Female 2 4500 Graduate Yes
没有 NaN 的预期输出是,
ID Gender Dependants Income Education Married
1 Male 2 500 Graduate Yes
6 Female 2 4500 Graduate Yes
NaN 的预期输出是,
ID Gender Dependants Income Education Married
2 NaN 4 2500 Graduate No
3 Female 3 NaN NaN Yes
4 Male NaN 7000 Graduate Yes
5 Female 4 500 Graduate NaN
Use boolean indexing http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing检查缺失值和any http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.any.html检查至少一项True
每行:
mask = df.isnull().any(axis=1)
df1 = df[~mask]
df2 = df[mask]
print (df1)
ID Gender Dependants Income Education Married
0 1 Male 2.0 500.0 Graduate Yes
5 6 Female 2.0 4500.0 Graduate Yes
print (df2)
ID Gender Dependants Income Education Married
1 2 NaN 4.0 2500.0 Graduate No
2 3 Female 3.0 NaN NaN Yes
3 4 Male NaN 7000.0 Graduate Yes
4 5 Female 4.0 500.0 Graduate NaN
Details:
print (df.isnull())
ID Gender Dependants Income Education Married
0 False False False False False False
1 False True False False False False
2 False False False True True False
3 False False True False False False
4 False False False False False True
5 False False False False False False
print (mask)
0 False
1 True
2 True
3 True
4 True
5 False
dtype: bool
并且您始终可以使用先前代码的更易读的方式,而无需反转掩码:
mask = df.notna().any(axis=1)
df1 = df[mask]
完全相同的结果。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)