我正在运行一个脚本(在多处理模式下),该脚本从一堆 JSON 文件中提取一些参数,但目前速度非常慢。这是脚本:
from __future__ import print_function, division
import os
from glob import glob
from os import getpid
from time import time
from sys import stdout
import resource
from multiprocessing import Pool
import subprocess
try:
import simplejson as json
except ImportError:
import json
path = '/data/data//*.A.1'
print("Running with PID: %d" % getpid())
def process_file(file):
start = time()
filename =file.split('/')[-1]
print(file)
with open('/data/data/A.1/%s_DI' %filename, 'w') as w:
with open(file, 'r') as f:
for n, line in enumerate(f):
d = json.loads(line)
try:
domain = d['rrname']
ips = d['rdata']
for i in ips:
print("%s|%s" % (i, domain), file=w)
except:
print (d)
pass
if __name__ == "__main__":
files_list = glob(path)
cores = 12
print("Using %d cores" % cores)
pp = Pool(processes=cores)
pp.imap_unordered(process_file, files_list)
pp.close()
pp.join()
有谁知道如何加快速度吗?
切换自
import json
to
import ujson
https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/
或切换到 orjson
import orjson
https://github.com/ijl/orjson https://github.com/ijl/orjson
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)