我有两个 CSV 文件,
a1.csv
city,state,link
Aguila,Arizona,https://www.glendaleaz.com/planning/documents/AppendixAZONING.pdf
AkChin,Arizona,http://www.maricopa-az.gov/zoningcode/wp-content/uploads/2014/05/Zoning-Code-Rewrite-Public-Review-Draft-3-Tracked-Edits-lowres1.pdf
Aguila,Arizona,http://www.co.apache.az.us/planning-and-zoning-division/zoning-ordinances/
a2.csv
city,state,link
Aguila,Arizona,http://www.co.apache.az.us
我想得到差异。
这是我的尝试:
import pandas as pd
a = pd.read_csv('a1.csv')
b = pd.read_csv('a2.csv')
mask = a.isin(b.to_dict(orient='list'))
# Reverse the mask and remove null rows.
# Upside is that index of original rows that
# are now gone are preserved (see result).
c = a[~mask].dropna()
print c
预期输出:
city,state,link
Aguila,Arizona,https://www.glendaleaz.com/planning/documents/AppendixAZONING.pdf
AkChin,Arizona,http://www.maricopa-az.gov/zoningcode/wp-content/uploads/2014/05/Zoning-Code-Rewrite-Public-Review-Draft-3-Tracked-Edits-lowres1.pdf
但我收到错误:
Empty DataFrame
Columns: [city, state, link]
Index: []**
我想根据前两行进行检查,如果它们相同,则将其删除。