只需让你的reader
通过将其包装成可下标list
。显然,这会破坏非常大的文件(请参阅Updates below):
>>> reader = csv.reader(open('big.csv', 'rb'))
>>> lines = list(reader)
>>> print lines[:100]
...
进一步阅读:如何在 Python 中将列表分割成大小均匀的块? https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python
Update 1(列表版本):另一种可能的方法是在迭代行时处理每个卡盘:
#!/usr/bin/env python
import csv
reader = csv.reader(open('4956984.csv', 'rb'))
chunk, chunksize = [], 100
def process_chunk(chuck):
print len(chuck)
# do something useful ...
for i, line in enumerate(reader):
if (i % chunksize == 0 and i > 0):
process_chunk(chunk)
del chunk[:] # or: chunk = []
chunk.append(line)
# process the remainder
process_chunk(chunk)
Update 2(生成器版本):我还没有对它进行基准测试,但也许你可以通过使用块来提高性能发电机:
#!/usr/bin/env python
import csv
reader = csv.reader(open('4956984.csv', 'rb'))
def gen_chunks(reader, chunksize=100):
"""
Chunk generator. Take a CSV `reader` and yield
`chunksize` sized slices.
"""
chunk = []
for i, line in enumerate(reader):
if (i % chunksize == 0 and i > 0):
yield chunk
del chunk[:] # or: chunk = []
chunk.append(line)
yield chunk
for chunk in gen_chunks(reader):
print chunk # process chunk
# test gen_chunk on some dummy sequence:
for chunk in gen_chunks(range(10), chunksize=3):
print chunk # process chunk
# => yields
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]
有一个小问题,如@totalhack https://stackoverflow.com/users/10682164/totalhack 指出 https://stackoverflow.com/questions/4956984/how-do-you-split-reading-a-large-csv-file-into-evenly-sized-chunks-in-python/4957046?noredirect=1#comment103177531_4957046:
请注意,这会一遍又一遍地产生具有不同内容的同一对象。如果您计划对每次迭代之间的块执行所需的所有操作,那么这种方法效果很好。