您可以创建以下列表DataFrame
s 并在列表理解中对每行进行排序并删除重复项:
dfs = [df1,df2,df3]
L = [pd.DataFrame(np.sort(x.values, axis=1), columns=x.columns).drop_duplicates()
for x in dfs]
print (L)
[ col1 col2
0 A B
1 C D
3 E F, col1 col2
0 A B
1 C D
2 M N
3 E F, col1 col2
0 A B
1 C D
2 M N
3 E F]
进而merge list of DataFrames https://stackoverflow.com/a/30512931按所有列(无参数on
):
from functools import reduce
df = reduce(lambda left,right: pd.merge(left,right), L)
print (df)
col1 col2
0 A B
1 C D
2 E F
@pygo 的另一个解决方案:
Create index
by frozenset
s 并通过以下方式连接在一起concat http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html with inner
join,最后按索引删除重复项duplicated http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.duplicated.html with boolean indexing http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing and iloc http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html获取前 2 列:
df = pd.concat([x.set_index(x.apply(frozenset, axis=1)) for x in dfs], axis=1, join='inner')
df = df.iloc[~df.index.duplicated(), :2]
print (df)
col1 col2
(B, A) A B
(C, D) C D
(F, E) E F