我从 Hotmail 下载了一个 CSV 文件,但其中有很多重复项。这些副本是完整的副本,我不知道为什么我的手机创建了它们。
我想摆脱重复项。
技术规格:
Windows XP SP 3
Python 2.7
CSV file with 400 contacts
更新:2016
如果您乐意使用有用的more_itertools外部库:
from more_itertools import unique_everseen
with open('1.csv', 'r') as f, open('2.csv', 'w') as out_file:
out_file.writelines(unique_everseen(f))
@IcyFlame 解决方案的更高效版本
with open('1.csv', 'r') as in_file, open('2.csv', 'w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
if line in seen: continue # skip duplicate
seen.add(line)
out_file.write(line)
要就地编辑同一文件,您可以使用它(旧的 Python 2 代码)
import fileinput
seen = set() # set for fast O(1) amortized lookup
for line in fileinput.FileInput('1.csv', inplace=1):
if line in seen: continue # skip duplicate
seen.add(line)
print line, # standard output is now redirected to the file
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)