如何使用flatten_json递归地扁平化嵌套的JSON?

回答:

Repo使用:flatten](https://github.com/amirziai/flatten)

  • 该软件包位于pypi flatten-json 0.1.7上,可以与pip install flatten-json
  • 此问题特定于软件包的以下组件:

    def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:

"""

Flatten a list of nested dicts.

"""

out = dict()

def flatten(x: (list, dict, str), name: str='', exclude=exclude):

if type(x) is dict:

for a in x:

if a not in exclude:

flatten(x[a], f'{name}{a}{sep}')

elif type(x) is list:

i = 0

for a in x:

flatten(a, f'{name}{i}{sep}')

i += 1

else:

out[name[:-1]] = x

flatten(nested_json)

return out

使用递归展平嵌套 dicts

  • 用Python递归思考
  • 在Python中展平JSON对象

嵌套如何data?:

  • flatten_json 已用于解压缩最终超过100000列的文件

展平的JSON是否可以展平?

  • 是的,这个问题不能解决这个问题。但是,如果您安装flatten软件包,则有一种unflatten方法,但我尚未对其进行测试。

回答:

如何展平一个JSONdict一个常见问题,对此有很多答案。

  • 此答案集中于使用flatten_json递归展平嵌套dictJSON

回答:

  • 该答案假定您已经将JSONdict加载到了某个变量(例如,文件,api等)中

    • 在这种情况下,我们将使用 data

回答:

  • 它接受一个dict,如功能类型提示所示。

最常见的形式data

  • 只是一句话: {}

    • flatten_json(data)

  • 字典列表: [{}, {}, {}]

    • [flatten_json(x) for x in data]

  • 带有顶级密钥的JSON,其中值重复: {1: {}, 2: {}, 3: {}}

    • [flatten_json(data[key]) for key in data.keys()]

  • 其他

    • {'key': [{}, {}, {}]}[flatten_json(x) for x in data['key']]

回答:

  • 我通常会扁平data化成pandas.DataFrame

    • pandasimport pandas as pd

资料1:

{

"id": 1,

"class": "c1",

"owner": "myself",

"metadata": {

"m1": {

"value": "m1_1",

"timestamp": "d1"

},

"m2": {

"value": "m1_2",

"timestamp": "d2"

},

"m3": {

"value": "m1_3",

"timestamp": "d3"

},

"m4": {

"value": "m1_4",

"timestamp": "d4"

}

},

"a1": {

"a11": [

]

},

"m1": {},

"comm1": "COMM1",

"comm2": "COMM21529089656387",

"share": "xxx",

"share1": "yyy",

"hub1": "h1",

"hub2": "h2",

"context": [

]

}

展平1:

    df = pd.DataFrame([flatten_json(data)])

id class owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp comm1 comm2 share share1 hub1 hub2

1 c1 myself m1_1 d1 m1_2 d2 m1_3 d3 m1_4 d4 COMM1 COMM21529089656387 xxx yyy h1 h2

资料2:

[{

'accuracy': 17,

'activity': [{

'activity': [{

'confidence': 100,

'type': 'STILL'

}

],

'timestampMs': '1542652'

}

],

'altitude': -10,

'latitudeE7': 3777321,

'longitudeE7': -122423125,

'timestampMs': '1542654',

'verticalAccuracy': 2

}, {

'accuracy': 17,

'activity': [{

'activity': [{

'confidence': 100,

'type': 'STILL'

}

],

'timestampMs': '1542652'

}

],

'altitude': -10,

'latitudeE7': 3777321,

'longitudeE7': -122423125,

'timestampMs': '1542654',

'verticalAccuracy': 2

}, {

'accuracy': 17,

'activity': [{

'activity': [{

'confidence': 100,

'type': 'STILL'

}

],

'timestampMs': '1542652'

}

],

'altitude': -10,

'latitudeE7': 3777321,

'longitudeE7': -122423125,

'timestampMs': '1542654',

'verticalAccuracy': 2

}

]

展平2:

    df = pd.DataFrame([flatten_json(x) for x in data])

accuracy activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs altitude latitudeE7 longitudeE7 timestampMs verticalAccuracy

17 100 STILL 1542652 -10 3777321 -122423125 1542654 2

17 100 STILL 1542652 -10 3777321 -122423125 1542654 2

17 100 STILL 1542652 -10 3777321 -122423125 1542654 2

资料3:

{

"1": {

"VENUE": "JOEBURG",

"COUNTRY": "HAE",

"ITW": "XAD",

"RACES": {

"1": {

"NO": 1,

"TIME": "12:35"

},

"2": {

"NO": 2,

"TIME": "13:10"

},

"3": {

"NO": 3,

"TIME": "13:40"

},

"4": {

"NO": 4,

"TIME": "14:10"

},

"5": {

"NO": 5,

"TIME": "14:55"

},

"6": {

"NO": 6,

"TIME": "15:30"

},

"7": {

"NO": 7,

"TIME": "16:05"

},

"8": {

"NO": 8,

"TIME": "16:40"

}

}

},

"2": {

"VENUE": "FOOBURG",

"COUNTRY": "ABA",

"ITW": "XAD",

"RACES": {

"1": {

"NO": 1,

"TIME": "12:35"

},

"2": {

"NO": 2,

"TIME": "13:10"

},

"3": {

"NO": 3,

"TIME": "13:40"

},

"4": {

"NO": 4,

"TIME": "14:10"

},

"5": {

"NO": 5,

"TIME": "14:55"

},

"6": {

"NO": 6,

"TIME": "15:30"

},

"7": {

"NO": 7,

"TIME": "16:05"

},

"8": {

"NO": 8,

"TIME": "16:40"

}

}

}

}

展平3:

    df = pd.DataFrame([flatten_json(data[key]) for key in data.keys()])

VENUE COUNTRY ITW RACES_1_NO RACES_1_TIME RACES_2_NO RACES_2_TIME RACES_3_NO RACES_3_TIME RACES_4_NO RACES_4_TIME RACES_5_NO RACES_5_TIME RACES_6_NO RACES_6_TIME RACES_7_NO RACES_7_TIME RACES_8_NO RACES_8_TIME

JOEBURG HAE XAD 1 12:35 2 13:10 3 13:40 4 14:10 5 14:55 6 15:30 7 16:05 8 16:40

FOOBURG ABA XAD 1 12:35 2 13:10 3 13:40 4 14:10 5 14:55 6 15:30 7 16:05 8 16:40

以上是 如何使用flatten_json递归地扁平化嵌套的JSON? 的全部内容, 来源链接: utcz.com/qa/423270.html

回到顶部