以下是计算所需游程长度的一种方法:
Code:
def min_run_length(series):
terminal = pd.Series([0])
diffs = pd.concat([terminal, series, terminal]).diff()
starts = np.where(diffs == 1)
ends = np.where(diffs == -1)
return [(e-s, (s, e-1)) for s, e in zip(starts[0], ends[0])
if e - s >= 2]
测试代码:
df = pd.read_fwf(StringIO(u"""
12 13 14 15
0 0 1 0
0 0 1 1
1 0 0 1
1 1 0 1
1 1 1 0
0 0 1 0
0 0 1 1
1 1 0 1
0 0 1 1
0 0 1 1
1 1 0 1
1 1 1 1
1 1 1 1
1 0 1 1
0 0 1 1"""), header=1)
print(df.dtypes)
indices = {cname: min_run_length(df[cname]) for cname in df.columns}
print(indices)
Results:
{
u'12': [(3, (3, 5)), (4, (11, 14))],
u'13': [(2, (4, 5)), (3, (11, 13))],
u'14': [(2, (1, 2)), (3, (5, 7)), (2, (9, 10)), (4, (12, 15))]
u'15': [(3, (2, 4)), (9, (7, 15))],
}