我想忽略该职业的唯一名称少于 2 个的行:
name value occupation
a 23 mechanic
a 24 mechanic
b 30 mechanic
c 40 mechanic
c 41 mechanic
d 30 doctor
d 20 doctor
e 70 plumber
e 71 plumber
f 30 plumber
g 50 tailor
I did:
df.groupby('ocuupation')['name'].nunique()
>>>>>>
occupation
mechanic 3
doctor 1
plumber 2
tailor 1
Name: name, dtype: int64
是否可以使用类似的东西df = df.drop(df[<some boolean condition>].index)
?
期望的输出:
name value occupation
a 23 mechanic
a 24 mechanic
b 30 mechanic
c 40 mechanic
c 41 mechanic
e 70 plumber
e 71 plumber
f 30 plumber
Use GroupBy.transform with Series.ge获得等于或大于的值2
:
df = df[df.groupby('occupation')['name'].transform('nunique').ge(2)]
print (df)
name value occupation
0 a 23 mechanic
1 a 24 mechanic
2 b 30 mechanic
3 c 40 mechanic
4 c 41 mechanic
7 e 70 plumber
8 e 71 plumber
9 f 30 plumber
您的解决方案是系列中索引的过滤值与Series.isin:
s = df.groupby('occupation')['name'].nunique()
df = df[df['occupation'].isin(s[s.ge(2)].index)]
print (df)
name value occupation
0 a 23 mechanic
1 a 24 mechanic
2 b 30 mechanic
3 c 40 mechanic
4 c 41 mechanic
7 e 70 plumber
8 e 71 plumber
9 f 30 plumber
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)