这是我的看法:
from itertools import combinations
def combine(batch):
"""Combine all products within one batch into pairs"""
return pd.Series(list(combinations(set(batch), 2)))
edges = df.groupby('Batch_ID')['Product_ID'].apply(combine).value_counts()
edges
#(B, C) 3
#(A, B) 1
#(A, C) 1
#(D, C) 1
据我了解,实际上并不需要 0 次出现的边。
如果需要,您可以将索引进一步拆分为源索引和目标索引:
edges = edges.reset_index()
edges = pd.concat([edges, edges['index'].apply(pd.Series)], axis=1)
edges.drop(['index'], axis=1, inplace=True)
edges.columns = 'Weight','Source','Target'
# Weight Source Target
#0 3 B C
#1 1 A B
#2 1 A C
#3 1 D C
Or:
c = ['Source', 'Target']
L = edges.index.values.tolist()
edges = pd.DataFrame(L, columns=c).join(edges.reset_index(drop=True))