Ubuntu 12.04 LTS
Ruby ruby 1.9.3dev(2011-09-23 修订版 33323)[i686-linux]
轨道 3.2.9
以下是我收到的 CSV 文件的内容:
"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"
但是,当我尝试解析 CSV 文件时,出现错误:
1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""}
1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
from (irb):22
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
然后我尝试简化数据,即
"name","age","email"
"jignesh","30","[email protected] /cdn-cgi/l/email-protection"
但是我仍然遇到同样的错误:
1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
from (irb):23
from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
我再次尝试像这样简化数据:
name,age,email
jignesh,30,[email protected] /cdn-cgi/l/email-protection
它有效。请参阅下面的输出:
1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
name
age
email
jignesh
30
[email protected] /cdn-cgi/l/email-protection
=> nil
但我将收到包含引用数据的 CSV 文件,因此删除引号解决方案实际上并不是我正在寻找的。我无法弄清楚导致错误的原因:CSV::MalformedCSVError:第 1 行中存在非法引用。在我之前的例子中。
我已经通过在文本编辑器中启用“显示空白字符”和“显示行结尾”来验证 CSV 中没有前导/尾随空格。此外,我还使用以下方法验证了编码。
1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
=> #<Encoding:UTF-8>
注意:我也尝试使用 CSV.read 但该方法出现同样的错误。
有人可以帮助我摆脱这个问题并让我明白哪里出了问题吗?
=====================
我刚刚在以下位置找到了以下帖子:http://www.ruby-forum.com/topic/448070 http://www.ruby-forum.com/topic/448070并尝试了以下操作:
file_data = file.read
file_data.gsub!('"', "'")
arr_of_arrs = CSV.parse(file_data)
arr_of_arrs.each do |arr|
Rails.logger.debug "=======#{arr}"
end
并得到以下输出:
=======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
=======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]
默认情况下,这会导致正确读取数据col_sep使用的是逗号字符。
不过我尝试使用引用字符像这样的选项:
arr_of_arrs = CSV.parse(file_data, :quote_char => "'")
但最终出现以下错误:
CSV::MalformedCSVError (Illegal quoting in line 1.):
谢谢,
吉涅什