假设我们有一个逗号分隔的文件 (csv),如下所示:
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The "day" when earth stood still","Michael Rennie,the 'strong' man","robert wise","1951"
"the 'gladiator'","russel "the awesome" crowe","ridley scott","2000"
从上面可以看出,第 4 行和第 5 行有引号内的引号。
输出应如下所示:
"name of movie","starring","director","release year"
"dark knight rises","christian bale, anna hathaway","christopher nolan","2012"
"the dark knight","christian bale, heath ledger","christopher nolan","2008"
"The day when earth stood still","Michael Rennie,the strong man","robert wise","1951"
"the gladiator","russel the awesome crowe","ridley scott","2000"
如何消除 csv 文件中出现的此类引号(单引号和双引号)。请注意,单个字段中的逗号是可以的,因为解析器会识别出它位于引号内并将其视为一个字段。这只是排列 csv 文件的预处理步骤,以便可以将其输入多个解析器以转换为我们想要的任何格式。
Bash、awk、python 都可以。请不要使用 Perl,我厌倦了那种语言:D
提前致谢!