我正在尝试将json文件加载到熊猫数据框。我发现有一些嵌套的json。以下是示例json:
{'events': [{'id': 142896214, 'playerId': 37831, 'teamId': 3157, 'matchId': 2214569, 'matchPeriod': '1H', 'eventSec': 0.8935539999999946, 'eventId': 8, 'eventName': 'Pass', 'subEventId': 85, 'subEventName': 'Simple pass', 'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}], 'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}
我使用以下代码将json加载到数据帧中:
with open('EVENTS.json') as f: jsonstr = json.load(f) df = pd.io.json.json_normalize(jsonstr['events'])
以下是df.head()的输出
但是我发现了两个嵌套的列,例如位置和标签。
我尝试使用以下代码对其进行展平:
Position_data = json_normalize(data =jsonstr['events'], record_path='positions', meta = ['x','y','x','y'] )
它显示了如下错误:
KeyError: "Try running with errors='ignore' as key 'x' is not always present"
您能建议我如何展平位置和标签(那些具有嵌套数据的位置和标签)。
谢谢,Zep
如果您正在寻找一种更通用的方法来从json展开多个层次结构,则可以使用recursion并列出理解来重塑数据。下面介绍了一种替代方法:
recursion
def flatten_json(nested_json, exclude=['']): """Flatten json object with nested keys into a single level. Args: nested_json: A nested json object. exclude: Keys to exclude from output. Returns: The flattened json object if successful, None otherwise. """ out = {} def flatten(x, name='', exclude=exclude): if type(x) is dict: for a in x: if a not in exclude: flatten(x[a], name + a + '_') elif type(x) is list: i = 0 for a in x: flatten(a, name + str(i) + '_') i += 1 else: out[name[:-1]] = x flatten(nested_json) return out
然后,您可以独立于嵌套级别将其应用于数据:
新样本数据
this_dict = {'events': [ {'id': 142896214, 'playerId': 37831, 'teamId': 3157, 'matchId': 2214569, 'matchPeriod': '1H', 'eventSec': 0.8935539999999946, 'eventId': 8, 'eventName': 'Pass', 'subEventId': 85, 'subEventName': 'Simple pass', 'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}], 'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}, {'id': 142896214, 'playerId': 37831, 'teamId': 3157, 'matchId': 2214569, 'matchPeriod': '1H', 'eventSec': 0.8935539999999946, 'eventId': 8, 'eventName': 'Pass', 'subEventId': 85, 'subEventName': 'Simple pass', 'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53},{'x': 51, 'y': 49}], 'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]} ]}
用法
pd.DataFrame([flatten_json(x) for x in this_dict['events']]) Out[1]: id playerId teamId matchId matchPeriod eventSec eventId \ 0 142896214 37831 3157 2214569 1H 0.893554 8 1 142896214 37831 3157 2214569 1H 0.893554 8 eventName subEventId subEventName positions_0_x positions_0_y \ 0 Pass 85 Simple pass 51 49 1 Pass 85 Simple pass 51 49 positions_1_x positions_1_y tags_0_id tags_0_tag_label positions_2_x \ 0 40 53 1801 accurate NaN 1 40 53 1801 accurate 51.0 positions_2_y 0 NaN 1 49.0
请注意,该flatten_json代码不是我的代码,我在这里和这里都看到了它,而对原始源代码的不确定性很高。
flatten_json