给定一个数据框如下:
city district date price
0 bj cy 2019-03-01 NaN
1 bj cy 2019-04-01 6.0
2 sh hp 2019-03-01 4.0
3 sh hp 2019-04-01 3.0
4 bj hd 2019-03-01 7.0
5 bj hd 2019-04-01 NaN
我需要过滤分组的行city
and district
当满足以下两个条件时:date
is 2019-04-01
and price
is NaN
.
我已经用以下代码进行了测试:
df['date'] = pd.to_datetime(df['date']).dt.date.astype(str)
df.groupby(['city','district']).filter(lambda x: (x['price'].isnull() & x['date'].isin(['2019-04-01'])).any())
Out:
city district date price
4 bj hd 2019-03-01 7.0
5 bj hd 2019-04-01 NaN
另一个测试:
df.groupby(['city','district']).filter(lambda x: (x['price'].isnull() & x['date']).any())
Out:
city district date price
0 bj cy 2019-03-01 NaN
1 bj cy 2019-04-01 6.0
4 bj hd 2019-03-01 7.0
5 bj hd 2019-04-01 NaN
但我需要的是如下。我该如何修改上面的代码?多谢。
city district date price
0 bj cy 2019/3/1 NaN
1 bj cy 2019/4/1 6.0
2 sh hp 2019/3/1 4.0
3 sh hp 2019/4/1 3.0
我认为你需要反转面具 - 这里&
to |
, isnull
to notna
, eq
to ne
and any
to all
:
df['date'] = pd.to_datetime(df['date'])
f = lambda x: (x['price'].notna() | x['date'].ne('2019-04-01')).all()
df = df.groupby(['city','district']).filter(f)
print (df)
city district date price
0 bj cy 2019-03-01 NaN
1 bj cy 2019-04-01 6.0
2 sh hp 2019-03-01 4.0
3 sh hp 2019-04-01 3.0
或者可以使用not
用于反转布尔值True
to False
and False
to True
:
f = lambda x: not (x['price'].isnull() & x['date'].eq('2019-04-01')).any()
df = df.groupby(['city','district']).filter(f)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)