tangguo

如何操作pandas数据框中的 json 列

python

我有一个pandas数据框,如下例所示:

        id  isDeleted   properties          
        1        False  {'taxa': {'value': '1,49%', 'timestamp': 1627418661971, 'source': 'IMPORT', 'sourceId': '13212434', 'updatedByUserId': 4875, 'versions': [{'name': 'taxa', 'value': '1,49%', 'timestamp': 1626565659525, 'sourceId': '151951296', 'source': 'IMPORT', 'sourceVid': [], 'sourceMetadata': '44524858245==', 'updatedByUserId': 12847957}]}, 'num_associated': {'value': '1', 'timestamp': 1619471103959, 'source': 'CALCULATED', 'sourceId': 'RollupProperties', 'updatedByUserId': None, 'versions': [{'name': 'num_associated', 'value': '1', 'timestamp': 1619471103959, 'sourceId': 'RollupProperties', 'source': 'CALCULATED', 'sourceVid': []}]}
        2        False  {'num_associated': {'value': '1', 'timestamp': 1619471103670, 'source': 'CALCULATED', 'sourceId': 'RollupProperties', 'updatedByUserId': None, 'versions': [{'name': 'num_associated', 'value': '1', 'timestamp': 1619471103670, 'sourceId': 'RollupProperties', 'source': 'CALCULATED', 'sourceVid': []}]}, 'time_available': {'value': '12313', 'timestamp': 1621361432266, 'source': 'BATCH_UPDATE', 'sourceId': 'userId:12847957', 'updatedByUserId': 1200111, 'versions': [{'name': 'time_available', 'value': '12313', 'timestamp': 1655565656598, 'sourceId': 'userId:1200111', 'source': 'BATCH_UPDATE', 'sourceVid': [], 'requestId': '9c86-4ae8-14185858', 'updatedByUserId': 1200111}]}, 'createdate': {'value': '1612285284796', 'timestamp': 1612285284796, 'source': 'IMPORT', 'sourceId': None, 'updatedByUserId': None, 'versions': [{'name': 'createdate', 'value': '1612285284796', 'timestamp': 1612285284796, 'source': 'IMPORT', 'sourceVid': []}]}

如何将列属性转换为只有键和值,如下所示:

        id  isDeleted   properties          
        1        False  {"taxa": "1,49%", "num_associated": "1"}"
        2        False  {"num_associated":"3", "time_available": "12313", "createdate": "1612285284796"}

json键和值将永远是动态的。


阅读 158

收藏
2022-06-13

共1个答案

一尘不染

你可以试试

df['properties'] = df['properties'].apply(lambda d: {k:v.get('value') for k, v in d.items()})
print(df)

                                                                          properties
0                                           {'taxa': '1,49%', 'num_associated': '1'}
1  {'num_associated': '1', 'time_available': '12313', 'createdate': '1612285284796'}
2022-06-13