熊猫groupby嵌套JSON

一尘不染

熊猫groupby嵌套JSON

json

我经常使用pandas groupby生成堆积表。但是然后我经常想将生成的嵌套关系输出到json。有什么方法可以从生成的堆叠表中提取嵌套的json文件吗？

假设我有一个df，例如：

year office candidate  amount
2010 mayor  joe smith  100.00
2010 mayor  jay gould   12.00
2010 govnr  pati mara  500.00
2010 govnr  jess rapp   50.00
2010 govnr  jess rapp   30.00

我可以：

grouped = df.groupby('year', 'office', 'candidate').sum()

print grouped
                       amount
year office candidate 
2010 mayor  joe smith   100
            jay gould    12
     govnr  pati mara   500
            jess rapp    80

美丽！当然，我真正想做的是通过命令沿着grouped.to_json嵌套嵌套的json。但是该功能不可用。任何解决方法？

所以，我真正想要的是这样的：

{"2010": {"mayor": [
                    {"joe smith": 100},
                    {"jay gould": 12}
                   ]
         }, 
          {"govnr": [
                     {"pati mara":500}, 
                     {"jess rapp": 80}
                    ]
          }
}

唐

阅读 333

2020-07-27

共1个答案

一尘不染

我认为熊猫没有内置任何东西可以创建嵌套的数据字典。以下是一些代码，对于带有MultiIndex的系列，通常应使用defaultdict

嵌套代码遍历MultIndex的每个级别，将层添加到字典中，直到将最深层分配给Series值为止。

In  [99]: from collections import defaultdict

In [100]: results = defaultdict(lambda: defaultdict(dict))

In [101]: for index, value in grouped.itertuples():
     ...:     for i, key in enumerate(index):
     ...:         if i == 0:
     ...:             nested = results[key]
     ...:         elif i == len(index) - 1:
     ...:             nested[key] = value
     ...:         else:
     ...:             nested = nested[key]

In [102]: results
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})})

In [106]: print json.dumps(results, indent=4)
{
    "2010": {
        "govnr": {
            "pati mara": 500.0, 
            "jess rapp": 80.0
        }, 
        "mayor": {
            "joe smith": 100.0, 
            "jay gould": 12.0
        }
    }
}

2020-07-27