我有一个按日期时间索引的数据框。我想根据行的索引与前一行的索引之间的差异来过滤行。
因此,如果我的标准是“删除比前一行晚一小时以上的所有行”,则应删除下面示例中的第二行:
2005-07-15 17:00:00
2005-07-17 18:00:00
在以下情况下,两行都保留:
2005-07-17 23:00:00
2005-07-18 00:00:00
看来你需要boolean indexing http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing with diff http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.diff.html求差异并与1 hour Timedelta
:
dates=['2005-07-15 17:00:00','2005-07-17 18:00:00', '2005-07-17 19:00:00',
'2005-07-17 23:00:00', '2005-07-18 00:00:00']
df = pd.DataFrame({'a':range(5)}, index=pd.to_datetime(dates))
print (df)
a
2005-07-15 17:00:00 0
2005-07-17 18:00:00 1
2005-07-17 19:00:00 2
2005-07-17 23:00:00 3
2005-07-18 00:00:00 4
diff = df.index.to_series().diff().fillna(0)
print (diff)
2005-07-15 17:00:00 0 days 00:00:00
2005-07-17 18:00:00 2 days 01:00:00
2005-07-17 19:00:00 0 days 01:00:00
2005-07-17 23:00:00 0 days 04:00:00
2005-07-18 00:00:00 0 days 01:00:00
dtype: timedelta64[ns]
mask = diff <= pd.Timedelta(1, unit='h')
print (mask)
2005-07-15 17:00:00 True
2005-07-17 18:00:00 False
2005-07-17 19:00:00 True
2005-07-17 23:00:00 False
2005-07-18 00:00:00 True
dtype: bool
df = df[mask]
print (df)
a
2005-07-15 17:00:00 0
2005-07-17 19:00:00 2
2005-07-18 00:00:00 4
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)