一尘不染

解析嵌套的JSON并将其写入CSV

json

我正在努力解决这个问题。我有一个JSON文件,需要将其放到CSV中,如果结构是扁平的,没有深层嵌套的项目,就可以了。

但是在这种情况下,嵌套会使RACES我烦恼。

我将如何以这种格式获取数据:

VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME

每个对象和对象中的每个种族?

{
    "1": {
        "VENUE": "JOEBURG",
        "COUNTRY": "HAE",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    },
    "2": {
        "VENUE": "FOOBURG",
        "COUNTRY": "ABA",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    }, ...
}

我想这样输出到CSV:

VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
JOEBERG, HAE, XAD, 1, 12:35
JOEBERG, HAE, XAD, 2, 13:10
JOEBERG, HAE, XAD, 3, 13:40
...
...
FOOBURG, ABA, XAD, 1, 12:35
FOOBURG, ABA, XAD, 2, 13:10

所以首先我得到正确的密钥:

self.keys = self.data.keys()
keys = ["DATA_KEY"]
for key in self.keys:
    if type(self.data[key]) == dict:
        for k in self.data[key].keys():
            if k not in keys:
                if type(self.data[key][k]) == unicode:
                    keys.append(k)
                elif type(self.data[key][k]) == dict:
                    self.subkey = k
                    for sk in self.data[key][k].values():
                        for subkey in sk.keys():
                            subkey = "%s__%s" % (self.subkey, subkey)
                            if subkey not in keys:
                                keys.append(subkey)

然后添加数据:

但是如何?

对于您熟练的forlooper来说,这应该是一个有趣的过程。 ;-)


阅读 304

收藏
2020-07-27

共1个答案

一尘不染

我只收集第一个对象的键,然后假定格式的其余部分是一致的。

以下代码还将嵌套对象限制为 一个
;您没有指定当一个以上时应该发生的情况。具有两个或多个相同长度的嵌套结构可以工作(将它们“压缩”在一起),但是,如果您具有不同长度的结构,则需要做出明确的选择来处理它们。用空列压缩以填充或写出这些条目的乘积(A
x B行,每次找到B条目都从A重复信息)。

import csv
from operator import itemgetter


with open(outputfile, 'wb') as outf:
    writer = None  # will be set to a csv.DictWriter later

    for key, item in sorted(data.items(), key=itemgetter(0)):
        row = {}
        nested_name, nested_items = '', {}
        for k, v in item.items():
            if not isinstance(v, dict):
                row[k] = v
            else:
                assert not nested_items, 'Only one nested structure is supported'
                nested_name, nested_items = k, v

        if writer is None:
            # build fields for each first key of each nested item first
            fields = sorted(row)

            # sorted keys of first item in key sorted order
            nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
            fields.extend('__'.join((nested_name, k)) for k in nested_keys)

            writer = csv.DictWriter(outf, fields)
            writer.writeheader()

        for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
            row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
            writer.writerow(row)

对于您的样本输入,将产生:

COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
HAE,XAD,JOEBURG,1,12:35
HAE,XAD,JOEBURG,2,13:10
HAE,XAD,JOEBURG,3,13:40
HAE,XAD,JOEBURG,4,14:10
HAE,XAD,JOEBURG,5,14:55
HAE,XAD,JOEBURG,6,15:30
HAE,XAD,JOEBURG,7,16:05
HAE,XAD,JOEBURG,8,16:40
ABA,XAD,FOOBURG,1,12:35
ABA,XAD,FOOBURG,2,13:10
ABA,XAD,FOOBURG,3,13:40
ABA,XAD,FOOBURG,4,14:10
ABA,XAD,FOOBURG,5,14:55
ABA,XAD,FOOBURG,6,15:30
ABA,XAD,FOOBURG,7,16:05
ABA,XAD,FOOBURG,8,16:40
2020-07-27