一种方法:
df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],
[27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])
Use cut
分组:
df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])
找到断点:
df['categories']=(df.groups!=df.groups.shift()).cumsum()
Then df
is :
"""
start end height groups categories
0 1 3 10 (5, 10] 0
1 4 10 7 (5, 10] 0
2 11 17 6 (5, 10] 0
3 18 26 12 (10, 15] 1
4 27 30 15 (10, 15] 1
5 31 40 6 (5, 10] 2
6 41 42 6 (5, 10] 2
"""
定义有趣的数据:
f = {'start':['first'],'end':['last'], 'groups':['first']}
并使用groupby.agg
功能 :
df.groupby('categories').agg(f)
"""
groups end start
first last first
categories
0 (5, 10] 17 1
1 (10, 15] 30 18
2 (5, 10] 42 31
"""