我正在编写一个 python 代码来搜索、删除和替换 csv 文件中的列
我有3个文件。
输入.csv:
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
mmmmmmmm,nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx
删除.csv:
aaaaaaaa
eeeeeeee
uuuuuuuu
替换.csv:
iiiiiiii,11111111,22222222
mmmmmmmm,33333333,44444444
这是我的代码:
input_file='input.csv'
new_array=[]
for line in open(input_file):
data=line.split(',')
a==data[0]
b=data[1]
c=data[2]
d=data[3]
for line2 in open(delete):
if (name in line2)==True:
break
else:
for line1 in open(replace):
data1=line1.split(',')
aa=data1[0]
replaced_a=data1[1]
repalced_b=data1[2]
if (data[0]==data1[0]):
data[0]=data1[1]
data[2]=data1[2]
new_array=data
print(new_array)
else:
new_array=data
我的逻辑是:
1)open input.csv read line by line
2)load elements into an array
3)compare first element with entire delete.csv
4)if found in delete.csv then do nothing and take next line in array
5)if not found in delete.csv then compare with replace.csv
6)if the first element is found in the first column of replace.csv then replace the element by the corresponding second column of replace.csv and the second element with the corresponding 3rd third column of repalce.csv.
7)load this array into a bigger 10 element array.
所以我想要的输出是:
11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
所以现在我面临以下问题:
1) 不会打印replace.csv 或delete.csv 中不存在的行
2)我的 input.csv 可能在一个条目中包含换行符,因此逐行读取是一个问题,但是可以肯定的是,分布在不同行上的数据位于引号之间。
例如:
aaaaa,bbbb,ccccc,"ddddddddddd
ddddddd"
11111,2222,3333,4444
任何将代码和我的逻辑结合在一起的帮助都是值得赞赏的。