numpy：gzip 压缩文件的 fromfile

2024-04-27

我在用numpy.fromfile构造一个数组，我可以将其传递给pandas.DataFrame构造函数

import numpy as np
import pandas as pd

def read_best_file(file, **kwargs):
    '''
    Loads best price data into a dataframe
    '''
    names   = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
    formats = [ 'u8',   'i4',       'f8',        'i4',       'f8'        ]
    offsets = [  0,      8,          12,          20,         24         ]

    dt = np.dtype({
            'names': names, 
            'formats': formats,
            'offsets': offsets 
        })
    return pd.DataFrame(np.fromfile(file, dt))

我想扩展此方法以处理 gzip 压缩文件。

根据numpy.fromfile http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html文档，第一个参数是文件：

file : file or str
Open file object or filename

因此，我添加了以下内容来检查 gzip 文件路径：

if isinstance(file, str) and file.endswith(".gz"):
    file = gzip.open(file, "r")

但是，当我尝试将其传递给fromfile构造函数我得到一个IOError:

IOError: first argument must be an open file

问题：

我怎样才能打电话numpy.fromfile使用 gzip 压缩文件？

Edit:

根据评论中的请求，显示检查 gzip 压缩文件的实现：

def read_best_file(file, **kwargs):
    '''
    Loads best price data into a dataframe
    '''
    names   = [ 'time', 'bid_size', 'bid_price', 'ask_size', 'ask_price' ]
    formats = [ 'u8',   'i4',       'f8',        'i4',       'f8'        ]
    offsets = [  0,      8,          12,          20,         24         ]

    dt = np.dtype({
            'names': names, 
            'formats': formats,
            'offsets': offsets 
        })

    if isinstance(file, str) and file.endswith(".gz"):
        file = gzip.open(file, "r")

    return pd.DataFrame(np.fromfile(file, dt))

我通过 numpy.frombuffer() 提供 read() 结果，成功地从 gzipped 文件中读取原始二进制数据数组。此代码适用于 Python 3.7.3，也许也适用于早期版本。

# Example: read short integers (signed) from gzipped raw binary file

import gzip
import numpy as np

fname_gzipped = 'my_binary_data.dat.gz'
raw_dtype = np.int16
with gzip.open(fname_gzipped, 'rb') as f:
    from_gzipped = np.frombuffer(f.read(), dtype=raw_dtype)

# Demonstrate equivalence with direct np.fromfile()
fname_raw = 'my_binary_data.dat'
from_raw = np.fromfile(fname_raw, dtype=raw_dtype)

# True
print('raw binary and gunzipped are the same: {}'.format(
    np.array_equiv(from_gzipped, from_raw)))

# False
wrong_dtype = np.uint8
binary_as_wrong_dtype = np.fromfile(fname_raw, dtype=wrong_dtype)
print('wrong dtype and gunzipped are the same: {}'.format(
    np.array_equiv(from_gzipped, binary_as_wrong_dtype)))

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

NumPy

numpy：gzip 压缩文件的 fromfile 的相关文章

ipdb 和 pdb++ 之间的区别？

Python 有一个名为 pdb 的默认调试器但社区创建了一些替代品其中两个是ipdb https github com gotcha ipdb and pdb https github com pdbpp pdbpp 它们似乎迎合了相
Celery计划任务中的打印语句不会出现在终端中

当我跑步时celery A tasks2 celery worker B我想看到每秒打印芹菜任务目前没有打印任何内容为什么这不起作用 from app import app from celery import Celery from
带有指针数组的 cython

我在 python 中有一个 numpy ndarrays 列表具有不同的长度并且需要非常快速地访问 python 中的列表我认为指针数组就可以解决问题我试过 float type t list of arrays no of ar
帮助需要在可选条件下编写正则表达式[关闭]

我有一个日志文件包含如下内容 log Using data from yyyy mm dd 2011 8 3 0 files queued for scanning Warning E test H ndler pdf File not F
创建圆形图像 PIL Tkinter

Currently I have a zoom feature in my application that works very well however I d like the actual zoom box to be a circ
引发 RuntimeError(f"目录 '{directory}' 不存在") RuntimeError: 导入 fitz 时目录 'static/' 不存在

当我运行 extract img py 文件时出现此错误 RuntimeError f 目录 directory 不存在运行时错误导入 fitz 时不存在目录 static 我不明白为什么这会给我发回此错误消息我之前看到过关于这个话题
Django 如何从 ManyToManyField 序列化并列出全部

我正在使用 Django 1 9 1 开发移动应用程序后端我实现了关注者模型现在我想列出用户的所有关注者但目前我不得不这样做我还使用 Django Rest 框架这是我的 UserProfile 模型 class UserProf
Python 使用 M2Crypto 通过 S/MIME 对消息进行签名

我现在花了几个小时但找不到我的错误我想要一个简单的例程来创建 S MIME 签名消息稍后可以与 smtplib 一起使用这是我到目前为止所拥有的 usr bin python2 7 coding utf 8 from future
十六进制数的按位异或

我们如何在 Python 中对十六进制数进行异或例如我想要异或 ABCD and 12EF 答案应该是 B922 我使用了下面的代码但它给出了错误的结果 xor two strings of different lengths def
Selenium Webdriver - Python - leboncoin - pb 选择带重音的按钮

我正在尝试在以下网站上自动填写表格 https www leboncoin fr https www leboncoin fr 我用 Selenium IDE 录制了一个脚本我有一个通过单击 Se 连接器按钮并填写我的密码和用户名来自动
Pyinstaller --onefile 警告文件已存在但不应存在

跑步时Pyinstaller onefile 并开始得到结果 exe 会出现多个弹出窗口并显示以下警告 WARNING file already exists but should not C Users myuser AppData L
在ansible中合并字典

我目前正在构建一个使用 ansible 安装 PHP 的角色并且在合并字典时遇到一些困难我尝试了多种方法来做到这一点但我无法让它像我想要的那样工作 A vars file my default values key value my
如何使用python读取最后一行的特定位置

我有一个太大的 txt 文件并且有几行类似的行如下所示字1 字2 字3 字4 553 75 我对位置 4 值感兴趣即最后一行 553 75 我的文件文本 word1 word2 word3 word4 553 20 word1 w
如何使用 jira-python 设置 fixVersions 字段

我正在尝试使用 jira python 模块 http jira python readthedocs org en latest 更新现有的 JIRA 具体来说我正在尝试设置问题的fixesVersion 列表我已经尝试了一段时间但没
管理文件字段当前 url 不正确

在 Django 管理中只要有 FileField 编辑页面上就会有一个当前框其中包含指向当前文件的超链接但是此链接会附加到当前页面 url 因此会导致 404 因为不存在这样的页面例如 http 127 0 0 1 8000
如何在 Python 中仅列出 zip 存档中的文件夹？

如何仅列出 zip 存档中的文件夹这将列出存档中的每个文件夹和文件 import zipfile file zipfile ZipFile samples sample zip r for name in file namelist pr
psutil：测量特定进程的CPU使用率

我正在尝试测量进程树的 cpu 使用率目前获取进程没有子进程的 cpu usage 就可以了但我得到了奇怪的结果 import psutil p psutil Process PID p cpu percent 还给我float g
使 matplotlib 图形默认看起来像 R？

Is there a way to make matplotlib behave identically to R or almost like R in terms of plotting defaults For example R t
在读/写二进制数据结构时访问位域

我正在为二进制格式编写一个解析器这种二进制格式涉及不同的表这些表同样采用二进制格式通常包含不同的字段大小其中 50 100 个之间大多数这些结构都有位域并且在 C 语言中表示时看起来像这样 struct myHeader uns
python中匹配3个或更多相同的字符

我正在尝试使用正则表达式在字符串中查找三个或更多相同的字符例如你好不匹配噢会的我尝试过做类似的事情 re compile 1 3 a zA Z re compile w 1 5 但似乎都不起作用 w 1 2 是您正在寻找的正则表

随机推荐

警告：“继续”定位开关相当于“中断”

你好我的有问题网站 https dakatherm ks com1 https dakatherm ks com这是错误警告继续目标开关相当于中断您的意思是使用继续2 吗在 home enghouse dakatherm
iOS：有没有Siri出现的通知？

我希望每次 Siri 从屏幕上出现或消失时收到通知当用户将 iPhone 靠近耳边时可能会出现此情况是否可以我进行了测试当 Siri 出现时会发送以下通知 UIApplicationWillAddDeactivationReas
如何覆盖由 Asp.Net UpdatePanel （动态）添加的 Javascript 函数？

我遇到了一些麻烦我只能想象是 Javascript 范围问题以及 Microsoft Asp Net 客户端框架由于上述原因这个问题 https stackoverflow com questions 18862565 what is
将程序和外部文件捆绑到单个可执行文件中？

这个问题有点类似于this one https stackoverflow com questions 1730742 pack program and dynamically loaded files into single execut
Delphi：如何在不使用 MAPI 的情况下在 Outlook 中撰写电子邮件？

在这个问题中我只是问 https stackoverflow com questions 4907143 ideas for storing e mail messages in a delphi client server applica
Python 上的 io.open() 和 os.open() 有什么区别？

我意识到open 我一直在使用的函数是一个别名io open 以及导入 from os会掩盖这一点通过以下方式打开文件有什么区别io模块和os module io open 是文件 I O 的首选高级接口它将操作系统级文件描述符包装在一
翻译数据库内文本的最佳方法是什么

我们这里有问题我们需要将网站翻译成多种语言我们已经使用 gettext 来翻译静态内容但我们必须将一些文本内容翻译成多种语言 ui不是问题我们找到了两种翻译文本的方法 1 在我们的文本输入中使用 JSON 为什么这个解决方案不好每
React/React Hooks：用于更改文本的 onChange 函数同时更改所有 3 个元素，而不是仅更改一个

我有一个组件使用反应钩子来更改样式折叠手风琴面板的文本每当用户单击打开它时我遇到的问题是这个逻辑同时影响所有 3 个折叠面板的文本而不仅仅是打开的面板我已经包含了一个代码沙箱的链接来突出显示该行为并且我已经包含了下面的代码 C
添加新包会破坏 .NET 5 应用程序

我一直试图找出为什么我的控制台应用程序在引入新包后立即失败使用IdentityModel OidcClient and Microsoft AspNetCore Server Kestrel only有效但是添加时Microsoft E
Java中使用流的byte[]到byte[]的ArrayList

我有一个 byte 的 ArrayList 我想知道是否可以使用 Java 8 中的流将其转换为 byte ArrayList 内的所有数组都具有相同的大小 ArrayList
如何对 Android 画布上的剪辑边界进行抗锯齿处理？

我用的是安卓系统android graphics Canvas http developer android com reference android graphics Canvas html class 画一个戒指 http code
我需要多少时间来学习 LabVIEW [关闭]

Closed 这个问题是基于意见的 help closed questions 目前不接受答案我知道这个问题太抽象了但我需要学习多少时间才能成为普通的 LabVIEW 开发人员例如如果我买了一本关于 LabVIEW 的好书并且每
通过 facebook 登录后设置 spring security 记住我 cookie

我正在构建一个移动网络应用程序可以选择通过 facebook twitter 登录我希望应用程序通过 Spring security 的记住我功能记住登录以便用户需要经常登录我有可以调用 facebook 并获取可识别用户身份的 a
安装pipenv导致pip3无法使用

我安装了pipenv using pip3 install pipenv 这给了我错误ImportError cannot import name main 为了解决这个错误我遵循这些说明 https stackoverflow com q
Python 的局限性是什么？ [关闭]

Closed 这个问题是基于意见的 help closed questions 目前不接受答案我花了几天时间阅读有关 C 和 Python 的内容发现 Python 非常简单且易于学习所以我想知道它真的值得花时间学习吗或者我应该花时
使用 jQuery 检查另一个域上的 URL 是否为 404？

在使用 jQuery 的客户端我想知道是否可以检查链接 URL 是否有效即不返回 404 该链接指向另一个域因此如果我只使用 get 那么我最终会遇到权限问题我记得读过一些有关使用 JSONP 请求的内容但我不记得了我找到了一个
Kubernetes：是否可以在 Kubernetes 集群中通过单个请求访问多个 Pod

我想清除 Kubernetes 命名空间中所有 Pod 中的缓存我想向端点发送一个请求然后该端点将向命名空间中的所有 Pod 发送 HTTP 调用以清除缓存目前我只能使用 Kubernetes 命中一个 pod 并且无法控制哪个 p
“PWC6345：调用 javac 时出错。”使用 Jetty WTP 插件在 Jetty 上部署 JSP 页面时出错

我正在尝试在 Jetty 上部署 JSP 页面使用Jetty WTP 插件 http wiki eclipse org Jetty WTP Plugin对于 Eclipse 但我收到以下错误 Jetty 好像找不到javac 我需要在 E
我可以在 rspec 中使用多个排除过滤器吗？

在 spec rb 文件中我设置了一个排除过滤器如下所示 RSpec configure do config we need determine this once at the very front and the result be
numpy：gzip 压缩文件的 fromfile

我在用numpy fromfile构造一个数组我可以将其传递给pandas DataFrame构造函数 import numpy as np import pandas as pd def read best file file kwar

numpy：gzip 压缩文件的 fromfile

numpy：gzip 压缩文件的 fromfile 的相关文章

随机推荐

热门标签