你可以这样做:
with open('file') as file:
lines = file.readlines()
headers = lines[0:1]
rest = lines[1:]
chunk_size = 4
def chunks(lst, chunk_size):
for i in xrange(0, len(lst), chunk_size):
yield lst[i:i + chunk_size]
def write_rows(rows, file):
for row in rows:
file.write('%s' % row)
part = 1
for chunk in chunks(rest, chunk_size):
with open('part%d' % part, 'w') as file:
write_rows(headers, file)
write_rows(chunk, file)
part += 1
这是一个测试运行:
$ cat file && python mkt.py && for p in part*; do echo ---- $p; cat $p; done
header
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---- part1
header
1
2
3
4
---- part2
header
5
6
7
8
---- part3
header
9
10
11
12
---- part4
header
13
14
显然,改变chunk_size
以及你如何获取headers
取决于他们的计数。
Credits:
- https://stackoverflow.com/a/312464/438544
编辑 - 要逐行执行此操作以避免内存问题,您可以执行以下操作:
from itertools import islice
headers_count = 5
chunk_size = 250000
with open('file') as fin:
headers = list(islice(fin, headers_count))
part = 1
while True:
line_iter = islice(fin, chunk_size)
try:
first_line = line_iter.next()
except StopIteration:
break
with open('part%d' % part, 'w') as fout:
for line in headers:
fout.write(line)
fout.write(first_line)
for line in line_iter:
fout.write(line)
part += 1
Credits:
测试用例(将以上内容放入名为mkt2.py
):
创建一个包含 5 行标题和 1234567 行的文件:
with open('file', 'w') as fout:
for i in range(5):
fout.write(10 * ('header %d ' % i) + '\n')
for i in range(1234567):
fout.write(10 * ('line %d ' % i) + '\n')
要测试的 Shell 脚本(放入名为rt.sh
):
rm part*
echo ---- file
head -n7 file
tail -n2 file
python mkt2.py
for i in part*; do
echo ---- $i
head -n7 $i
tail -n2 $i
done
示例输出:
$ sh rt.sh
---- file
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0
line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1
line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565
line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566
---- part1
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0 line 0
line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1 line 1
line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998 line 249998
line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999 line 249999
---- part2
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000 line 250000
line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001 line 250001
line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998 line 499998
line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999 line 499999
---- part3
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000 line 500000
line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001 line 500001
line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998 line 749998
line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999 line 749999
---- part4
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000 line 750000
line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001 line 750001
line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998 line 999998
line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999 line 999999
---- part5
header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0 header 0
header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1 header 1
header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2 header 2
header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3 header 3
header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4 header 4
line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000 line 1000000
line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001 line 1000001
line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565 line 1234565
line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566 line 1234566
上述时间安排:
real 0m0.935s
user 0m0.708s
sys 0m0.200s
希望这可以帮助。