文件是没有任何隐含结构的字节流。如果你想加载二进制 blob 列表,那么你应该存储一些额外的元数据来恢复结构,例如,你可以使用网络字符串格式 http://cr.yp.to/proto/netstrings.txt:
#!/usr/bin/env python
blocks = [b'\xa1\r\xa594\x92z\xf8\x16\xaa', b'xfbI\xfdqx|\xcd\xdb\x1b\xb3']
# save blocks
with open('blocks.netstring', 'wb') as output_file:
for blob in blocks:
# [len]":"[string]","
output_file.write(str(len(blob)).encode())
output_file.write(b":")
output_file.write(blob)
output_file.write(b",")
读回它们:
#!/usr/bin/env python3
import re
from mmap import ACCESS_READ, mmap
blocks = []
match_size = re.compile(br'(\d+):').match
with open('blocks.netstring', 'rb') as file, \
mmap(file.fileno(), 0, access=ACCESS_READ) as mm:
position = 0
for m in iter(lambda: match_size(mm, position), None):
i, size = m.end(), int(m.group(1))
blocks.append(mm[i:i + size])
position = i + size + 1 # shift to the next netstring
print(blocks)
作为替代方案,您可以考虑您的数据的 BSON 格式 http://bsonspec.org/ or ascii 装甲格式 https://www.rfc-editor.org/rfc/rfc4880#section-6.2.