我正在尝试创建一个DataFrame
通过读取由 '######' 5 个哈希值分隔的 csv 文件
代码是:
import dask.dataframe as dd
df = dd.read_csv('D:\temp.csv',sep='#####',engine='python')
res = df.compute()
错误是:
dask.async.ValueError:
Dask dataframe inspected the first 1,000 rows of your csv file to guess the
data types of your columns. These first 1,000 rows led us to an incorrect
guess.
For example a column may have had integers in the first 1000
rows followed by a float or missing value in the 1,001-st row.
You will need to specify some dtype information explicitly using the
``dtype=`` keyword argument for the right column names and dtypes.
df = dd.read_csv(..., dtype={'my-column': float})
Pandas has given us the following error when trying to parse the file:
"The 'dtype' option is not supported with the 'python' engine"
Traceback
---------
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/dataframe/io.py", line 69, in _read_csv
raise ValueError(msg)
那么如何摆脱它呢。
如果我遵循错误,那么我必须为每一列提供 dtype,但如果我有 100 多个列,那么这是没有用的。
如果我在没有分隔符的情况下阅读,那么一切都会很好,但到处都是#####。所以在将其计算为pandas之后DataFrame
,有办法摆脱它吗?
所以请帮助我。