如何通过将 csv 文件与 python 中的其他 csv 文件进行比较来删除和替换 csv 文件中的列?

2024-04-15

我正在编写一个 python 代码来搜索、删除和替换 csv 文件中的列 我有3个文件。

输入.csv:

aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
mmmmmmmm,nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx

删除.csv:

aaaaaaaa
eeeeeeee
uuuuuuuu

替换.csv:

iiiiiiii,11111111,22222222
mmmmmmmm,33333333,44444444

这是我的代码:

input_file='input.csv'
new_array=[]
for line in open(input_file):
    data=line.split(',')
    a==data[0]
    b=data[1]
    c=data[2]
    d=data[3]
    for line2 in open(delete):
        if (name in line2)==True:
            break
        else:
            for line1 in open(replace):
                data1=line1.split(',')
                aa=data1[0]
                replaced_a=data1[1]
                repalced_b=data1[2]


            if (data[0]==data1[0]):

                data[0]=data1[1]
                data[2]=data1[2]
                new_array=data
                print(new_array)

            else:   
                new_array=data

我的逻辑是:

1)open input.csv read line by line
2)load elements into an array
3)compare first element with entire delete.csv
4)if found in delete.csv then do nothing and take next line in array
5)if not found in delete.csv then compare with replace.csv
6)if the first element is found in the first column of replace.csv then replace the element by the corresponding second column of replace.csv and the second element with the corresponding 3rd third column of repalce.csv.
7)load this array into a bigger 10 element array.

所以我想要的输出是:

11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt

所以现在我面临以下问题: 1) 不会打印replace.csv 或delete.csv 中不存在的行 2)我的 input.csv 可能在一个条目中包含换行符,因此逐行读取是一个问题,但是可以肯定的是,分布在不同行上的数据位于引号之间。 例如:

aaaaa,bbbb,ccccc,"ddddddddddd
ddddddd"
11111,2222,3333,4444

任何将代码和我的逻辑结合在一起的帮助都是值得赞赏的。


我建议稍微改变一下:

  • read the things you want to replace in a dictionary
    • 将 key 设置为数据中的第 0 个位置,将 value 设置为替换数据的第 0 个和第 1 个位置
  • read the things you want to delete into a set
    • 如果您的数据行以此开头:跳过行,否则将其添加到输出中。

循环数据并使用这两个查找来“做正确的事情”。

我对您的数据进行了一些更改,以合并提到的“转义”数据,包括换行符:

文件创建:

with open("i.csv","w") as f: 
    f.write("""
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
"mmmm
mmmm",nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx""")

with open ("d.csv","w") as f: 
    f.write("""
aaaaaaaa
eeeeeeee
uuuuuuuu""")

with open ("r.csv","w") as f: 
    f.write("""
iiiiiiii,11111111,22222222
"mmmm
mmmm",33333333,44444444""")

节目:

import csv

def read_file(fn):
    rows = [] 
    with open(fn) as f:
        reader = csv.reader(f, quotechar='"',delimiter=",")
        for row in reader:
            if row:                     # eliminate empty rows from data read
                rows.append(row)
    return rows 

# create a dict for the replace stuff        
replace = {x[0]:x[1:] for x in read_file("r.csv")}

# create a set for the delete stuff
delete = set( (row[0] for row in read_file("d.csv")) )  

# collect what we need to write back
result = []

# https://docs.python.org/3/library/csv.html
with open("i.csv") as f:
    reader = csv.reader(f, quotechar='"')
    for row in reader:
        if row:
            if row[0] in delete:
                continue                                   # skip data row
            elif row[0] in replace:
                # replace with mapping, add rest of row
                result.append(replace[row[0]] + row[2:])   # replace data
            else:
                result.append(row)                         # use as is

# write result back into file
with open ("done.csv", "w", newline="") as f:
    w = csv.writer(f,quotechar='"', delimiter= ",")
    w.writerows(result)

检查结果:

with open ("done.csv") as f:
    print(f.read()) 

Output:

11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt

Doku:

  • csv.writer/csv.reader https://docs.python.org/3/library/csv.html
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何通过将 csv 文件与 python 中的其他 csv 文件进行比较来删除和替换 csv 文件中的列? 的相关文章

随机推荐