我在用numpy.fromfile
构造一个数组,我可以将其传递给pandas.DataFrame
构造函数
import numpy as np
import pandas as pd
def read_best_file(file, **kwargs):
'''
Loads best price data into a dataframe
'''
names = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
formats = [ 'u8', 'i4', 'f8', 'i4', 'f8' ]
offsets = [ 0, 8, 12, 20, 24 ]
dt = np.dtype({
'names': names,
'formats': formats,
'offsets': offsets
})
return pd.DataFrame(np.fromfile(file, dt))
我想扩展此方法以处理 gzip 压缩文件。
根据numpy.fromfile http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html文档,第一个参数是文件:
file : file or str
Open file object or filename
因此,我添加了以下内容来检查 gzip 文件路径:
if isinstance(file, str) and file.endswith(".gz"):
file = gzip.open(file, "r")
但是,当我尝试将其传递给fromfile
构造函数我得到一个IOError
:
IOError: first argument must be an open file
问题:
我怎样才能打电话numpy.fromfile
使用 gzip 压缩文件?
Edit:
根据评论中的请求,显示检查 gzip 压缩文件的实现:
def read_best_file(file, **kwargs):
'''
Loads best price data into a dataframe
'''
names = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
formats = [ 'u8', 'i4', 'f8', 'i4', 'f8' ]
offsets = [ 0, 8, 12, 20, 24 ]
dt = np.dtype({
'names': names,
'formats': formats,
'offsets': offsets
})
if isinstance(file, str) and file.endswith(".gz"):
file = gzip.open(file, "r")
return pd.DataFrame(np.fromfile(file, dt))