我发现接受的解决方案(更新:现已删除)具有误导性,因为它无法推广到类似的情况。例如对于以下示例:
df = pd.DataFrame({'left': [0,5,10,3,12,13,18,31],
'right':[4,8,13,7,19,16,23,35]})
df
建议的聚合函数输出以下数据帧(请注意,18-23 应与 12-19 一起位于组 1 中)。
一种解决方案是使用以下方法(基于组合间隔的方法)由@CentAu 发布 https://stackoverflow.com/questions/15273693/python-union-of-multiple-ranges):
# Union intervals by @CentAu
from sympy import Interval, Union
def union(data):
""" Union of a list of intervals e.g. [(1,2),(3,4)] """
intervals = [Interval(begin, end) for (begin, end) in data]
u = Union(*intervals)
return [u] if isinstance(u, Interval) \
else list(u.args)
# Create a list of intervals
df['left_right'] = df[['left', 'right']].apply(list, axis=1)
intervals = union(df.left_right)
# Add a group column
df['group'] = df['left'].apply(lambda x: [g for g,l in enumerate(intervals) if
l.contains(x)][0])
...输出: