我有一个像这样的数据框:
source target weight
1 2 5
2 1 5
1 2 5
1 2 7
3 1 6
1 1 6
1 3 6
我的目标是删除重复的行,但源列和目标列的顺序并不重要。事实上,两列的顺序并不重要,应该将其删除。在这种情况下,预期结果将是
source target weight
1 2 5
1 2 7
3 1 6
1 1 6
有什么办法可以做到这一点而不需要循环吗?
Use frozenset
and duplicated
df[~df[['source', 'target']].apply(frozenset, 1).duplicated()]
source target weight
0 1 2 5
3 3 1 6
4 1 1 6
如果你想考虑无序source
/target
and weight
df[~df[['weight']].assign(A=df[['source', 'target']].apply(frozenset, 1)).duplicated()]
source target weight
0 1 2 5
3 1 2 7
4 3 1 6
5 1 1 6
但是,要明确地使用更具可读性的代码。
# Create series where values are frozensets and therefore hashable.
# With hashable things, we can determine duplicity.
# Note that I also set the index and name to set up for a convenient `join`
s = pd.Series(list(map(frozenset, zip(df.source, df.target))), df.index, name='mixed')
# Use `drop` to focus on just those columns leaving whatever else is there.
# This is more general and accommodates more than just a `weight` column.
mask = df.drop(['source', 'target'], axis=1).join(s).duplicated()
df[~mask]
source target weight
0 1 2 5
3 1 2 7
4 3 1 6
5 1 1 6
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)