使用 pandas python 将嵌套 JSON 解析为多个数据帧

2024-04-23

我有一个嵌套的 JSON,如下所示,想要解析为 python 中的多个数据帧..请帮助

{
"tableName": "cases",
"url": "EndpointVoid",
"tableDataList": [{
    "_id": "100017252700",
    "title": "Test",
    "type": "TECH",
    "created": "2016-09-06T19:00:17.071Z",
    "createdBy": "193164275",
    "lastModified": "2016-10-04T21:50:49.539Z",
    "lastModifiedBy": "1074113719",
    "notes": [{
        "id": "30",
        "title": "Multiple devices",
        "type": "INCCL",
        "origin": "D",
        "componentCode": "PD17A",
        "issueCode": "IP321",
        "affectedProduct": "134322",
        "summary": "testing the json",

        "caller": {
            "email": "[email protected] /cdn-cgi/l/email-protection",
            "phone": "651-744-4522"
        }
    }, {
        "id": "50",
        "title": "EDU: Multiple Devices - Lightning-to-USB Cable",
        "type": "INCCL",
        "origin": "D",
        "componentCode": "PD17A",
        "issueCode": "IP321",
        "affectedProduct": "134322",
        "summary": "parsing json 2",
        "caller": {
            "email": "[email protected] /cdn-cgi/l/email-protection",
            "phone": "123-345-1111"
        }
    }],
    "syncCount": 2316,
    "repair": [{
            "id": "D208491610",
            "created": "2016-09-06T19:02:48.000Z",
            "createdBy": "193164275",
            "lastModified": "2016-09-21T12:49:47.000Z"
        }, {
            "id": "D208491610"
        }, {
            "id": "D208491628",
            "created": "2016-09-06T19:03:37.000Z",
            "createdBy": "193164275",
            "lastModified": "2016-09-21T12:49:47.000Z"
        }

    ],
    "enterpriseStatus": "8"
}],
"dateTime": 1475617849,
"primaryKeys": ["$._id"],
"primaryKeyVals": ["100017252700"],
"operation": "UPDATE"

}

我想解析它并创建 3 个表/数据框/csv,如下所示..请帮助..

这种格式的输出表 https://i.stack.imgur.com/NTHwa.png


我不认为这是最好的方法,但我想向您展示可能性。

import pandas as pd
from pandas.io.json import json_normalize
import json

with open('your_sample.json') as f:    
    dt = json.load(f)

Table1

df1 = json_normalize(dt, 'tableDataList', 'dateTime')[['_id', 'title', 'type', 'created', 'createdBy', 'lastModified', 'lastModifiedBy', 'dateTime']]
print df1


            _id title  type                   created  createdBy  \
0  100017252700  Test  TECH  2016-09-06T19:00:17.071Z  193164275   

               lastModified lastModifiedBy    dateTime  
0  2016-10-04T21:50:49.539Z     1074113719  1475617849  

Table 2

df2 = json_normalize(dt['tableDataList'], 'notes', '_id')
df2['phone'] = df2['caller'].map(lambda x: x['phone'])
df2['email'] = df2['caller'].map(lambda x: x['email'])
df2 = df2[['_id', 'id', 'title', 'email', 'phone']]
print df2


            _id  id                                           title  \
0  100017252700  30                                Multiple devices   
1  100017252700  50  EDU: Multiple Devices - Lightning-to-USB Cable   

                    email         phone  
0  [email protected] /cdn-cgi/l/email-protection  651-744-4522  
1       [email protected] /cdn-cgi/l/email-protection  123-345-1111  

Table 3

df3 = json_normalize(dt['tableDataList'], 'repair', '_id').dropna()
print df3


                    created  createdBy          id              lastModified  \
0  2016-09-06T19:02:48.000Z  193164275  D208491610  2016-09-21T12:49:47.000Z   
2  2016-09-06T19:03:37.000Z  193164275  D208491628  2016-09-21T12:49:47.000Z   

            _id  
0  100017252700  
2  100017252700  
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

使用 pandas python 将嵌套 JSON 解析为多个数据帧 的相关文章

随机推荐