1
我正在使用 unleashed_py 库来提取 Unleashed 数据。
输出示例如下,其中发票中可能有几项:
[{ 'OrderNumber': 'SO-00000742', 'QuoteNumber': None, 'InvoiceDate': '/Date(1658496322067)/', 'InvoiceLines': [{'LineNumber': 1, 'LineType': None}, {'LineNumber': 2, 'LineType': None}], 'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06', 'LastModifiedOn': '/Date(1658496322221)/'}]
我需要获得一个 df:
如果我运行下面的脚本,发票行只会附加常见字段,例如 ordernumber、quotenumber、invoicedate、guide 和 lastmodifiedon,而不会重复。
order_number = [] quote_number = [] invoice_date = [] invoicelines = [] invoice_line_number = [] invoice_line_type = [] guid = [] last_modified = [] for item in df: order_number.append(item.get('OrderNumber')) quote_number.append(item.get('QuoteNumber')) invoice_date.append(item.get('InvoiceDate')) guid.append(item.get('Guid')) last_modified.append(item.get('LastModifiedOn')) lines = item.get('InvoiceLines') for item_sub_2 in lines: invoice_line_number.append('LineNumber') invoice_line_type.append('LineType') df_order_number = pd.DataFrame(order_number) df_quote_number = pd.DataFrame(quote_number) df_invoice_date = pd.DataFrame(invoice_date) df_invoice_line_number = pd.DataFrame(invoice_line_number) df_invoice_line_type = pd.DataFrame(invoice_line_type) df_guid = pd.DataFrame(guid) df_last_modified = pd.DataFrame(last_modified) df_row = pd.concat([ df_order_number, df_quote_number, df_invoice_date, df_invoice_line_number, df_invoice_line_type, df_guid, df_last_modified ], axis = 1)
我做错什么了?
您不需要迭代,只需从您拥有的字典列表中创建数据框,然后分解InvoiceLines列然后应用pd.Series并将其与原始数据框连接起来:
InvoiceLines
pd.Series
data = [{ 'OrderNumber': 'SO-00000742', 'QuoteNumber': None, 'InvoiceDate': '/Date(1658496322067)/', 'InvoiceLines': [{'LineNumber': 1, 'LineType': None}, {'LineNumber': 2, 'LineType': None}], 'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06', 'LastModifiedOn': '/Date(1658496322221)/'}] df=pd.DataFrame(data).explode('InvoiceLines') out=pd.concat([df['InvoiceLines'].apply(pd.Series), df.drop(columns=['InvoiceLines'])], axis=1)
输出:
#out LineNumber LineType OrderNumber QuoteNumber InvoiceDate \ 0 1.0 NaN SO-00000742 None /Date(1658496322067)/ 0 2.0 NaN SO-00000742 None /Date(1658496322067)/ Guid LastModifiedOn 0 8f6b89da-1e6e-42288a24-902a-038041e04f06 /Date(1658496322221)/ 0 8f6b89da-1e6e-42288a24-902a-038041e04f06 /Date(1658496322221)/
我将日期转换和列重命名留给您,因为我相信您可以自己完成。