如何加快Python中加载和读取JSON文件的过程?

2024-05-08

我正在运行一个脚本(在多处理模式下),该脚本从一堆 JSON 文件中提取一些参数,但目前速度非常慢。这是脚本:

from __future__ import print_function, division
import os
from glob import glob
from os import getpid
from time import time
from sys import stdout
import resource
from multiprocessing import Pool
import subprocess
try:
    import simplejson as json
except ImportError:
    import json


path = '/data/data//*.A.1'
print("Running with PID: %d" % getpid())

def process_file(file):
    start = time()
    filename =file.split('/')[-1]
    print(file)
    with open('/data/data/A.1/%s_DI' %filename, 'w') as w:
        with open(file, 'r') as f:
            for n, line in enumerate(f):
                d = json.loads(line)
                try:

                    domain = d['rrname']
                    ips = d['rdata']
                    for i in ips:
                        print("%s|%s" % (i, domain), file=w)
                except:
                    print (d)
                    pass

if __name__ == "__main__":
    files_list = glob(path)
    cores = 12
    print("Using %d cores" % cores)
    pp = Pool(processes=cores)
    pp.imap_unordered(process_file, files_list)
    pp.close()
    pp.join()

有谁知道如何加快速度吗?


切换自

import json 

to

import ujson

https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/

或切换到 orjson

import orjson 

https://github.com/ijl/orjson https://github.com/ijl/orjson

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何加快Python中加载和读取JSON文件的过程? 的相关文章

随机推荐