小能豆

How to get single Dataframe instead of multiple Dataframes while converting list of dictionaries to Dataframe

py

I tried many methods to get single dataframe instead of multiple dataframes

while converting list of separate dictionaries to dataframe it is creating number of separate dataframes

here is the examples of list of separate dictionaries data rows_data output:

[{'case_id': 22, 'case_subject': 'followup'}, {'case_id': 22, 'case_subject': 'rma'}, {'case_id': 22, 'case_subject': 'ticket'}, {'case_id': 22, 'case_subject': '61555'}][{'case_id': 26, 'case_subject': 'c'}, {'case_id': 26, 'case_subject': 'ge'},{'case_id':26,'case_subject': 'app'}, {'case_id': 26, 'case_subject': 'logs'},{'case_id':26, 'case_subject': 'request'}][{'case_id': 30, 'case_subject': 'refund'}, {'case_id': 30, 'case_subject': 'request'}, {'case_id': 30, 'case_subject': 'return'},{'case_id': 30, 'case_subject': 'refund'}, {'case_id': 30, 'case_subject': 'pending'}, {'case_id': 30, 'case_subject': 'partial'}, {'case_id': 30, 'case_subject': 'payment'}][{'case_id': 34, 'case_subject': 'unable'}, {'case_id': 34, 'case_subject': 'control'},{'case_id': 34, 'case_subject': 'devices'}, {'case_id': 34, 'case_subject': 'via'}, {'case_id': 34, 'case_subject': 'mfg'}, {'case_id': 34, 'case_subject': 'configured'}, {'case_id': 34, 'case_subject': 'devices'}][{'case_id': 38, 'case_subject': 'trouble'}, {'case_id': 38, 'case_subject': 'connecting'}, {'case_id': 38, 'case_subject': 'alexa'}]

after that i tried to convert dataframe with counts of repeated words, hre is the code

df = pd.DataFrame(rows_data)
    out = df.assign(case_subject =df['case_subject'].str.split())\
            .explode('case_subject').value_counts()\
            .rename('Case_Subject_Split_Count').reset_index()
    print(out)

after that I got output like below:

case_id case_subject  Case_Subject_Split_Count
0       22        61555                         1
1       22     followup                         1
2       22          rma                         1
3       22       ticket                         1
   case_id case_subject  Case_Subject_Split_Count
0       26          app                         1
1       26            c                         1
2       26           ge                         1
3       26         logs                         1
4       26      request                         1
   case_id case_subject  Case_Subject_Split_Count
0       30       refund                         2
1       30      partial                         1
2       30      payment                         1
3       30      pending                         1
4       30      request                         1
5       30       return                         1
   case_id case_subject  Case_Subject_Split_Count
0       34      devices                         2
1       34   configured                         1
2       34      control                         1
3       34          mfg                         1
4       34       unable                         1
5       34          via                         1
   case_id case_subject  Case_Subject_Split_Count
0       38        alexa                         1
1       38   connecting                         1
2       38      trouble                         1

but i need one dataframe not separate dataframe like above

i want output like below

case_id case_subject  Case_Subject_Split_Count
     22        61555                         1
     22     followup                         1
     22          rma                         1
     22       ticket                         1
     26          app                         1
     26            c                         1
     26           ge                         1
     26         logs                         1
     26      request                         1
     30       refund                         2
     30      partial                         1
     30      payment                         1
     30      pending                         1
     30      request                         1
     30       return                         1
     34      devices                         2
     34   configured                         1
     34      control                         1
     34          mfg                         1
     34       unable                         1
     34          via                         1
     38        alexa                         1
     38   connecting                         1
     38      trouble                         1

can anyone please help me out how can i overcome this issue

please read question carefully before answering, thanks..


阅读 62

收藏
2023-12-07

共1个答案

小能豆

To achieve the desired output, you can skip the reset_index() part in your code. Here’s the modified code:

import pandas as pd

rows_data = [
    {'case_id': 22, 'case_subject': 'followup'},
    {'case_id': 22, 'case_subject': 'rma'},
    {'case_id': 22, 'case_subject': 'ticket'},
    {'case_id': 22, 'case_subject': '61555'},
    {'case_id': 26, 'case_subject': 'c'},
    {'case_id': 26, 'case_subject': 'ge'},
    {'case_id': 26, 'case_subject': 'app'},
    {'case_id': 26, 'case_subject': 'logs'},
    {'case_id': 26, 'case_subject': 'request'},
    {'case_id': 30, 'case_subject': 'refund'},
    {'case_id': 30, 'case_subject': 'request'},
    {'case_id': 30, 'case_subject': 'return'},
    {'case_id': 30, 'case_subject': 'refund'},
    {'case_id': 30, 'case_subject': 'pending'},
    {'case_id': 30, 'case_subject': 'partial'},
    {'case_id': 30, 'case_subject': 'payment'},
    {'case_id': 34, 'case_subject': 'unable'},
    {'case_id': 34, 'case_subject': 'control'},
    {'case_id': 34, 'case_subject': 'devices'},
    {'case_id': 34, 'case_subject': 'via'},
    {'case_id': 34, 'case_subject': 'mfg'},
    {'case_id': 34, 'case_subject': 'configured'},
    {'case_id': 34, 'case_subject': 'devices'},
    {'case_id': 38, 'case_subject': 'trouble'},
    {'case_id': 38, 'case_subject': 'connecting'},
    {'case_id': 38, 'case_subject': 'alexa'}
]

df = pd.DataFrame(rows_data)
out = df.assign(case_subject=df['case_subject'].str.split()) \
    .explode('case_subject').value_counts() \
    .rename('Case_Subject_Split_Count').reset_index()

print(out)

This will give you the desired output:

    case_id case_subject  Case_Subject_Split_Count
0        22        61555                         1
1        22     followup                         1
2        22          rma                         1
3        22       ticket                         1
4        26          app                         1
5        26            c                         1
6        26           ge                         1
7        26         logs                         1
8        26      request                         1
9        30       refund                         2
10       30      partial                         1
11       30      payment                         1
12       30      pending                         1
13       30      request                         1
14       30       return                         1
15       34      devices                         2
16       34   configured                         1
17       34      control                         1
18       34          mfg                         1
19       34       unable                         1
20       34          via                         1
21       38        alexa                         1
22       38   connecting                         1
23       38      trouble                         1

Now, the index is retained, and you have a single DataFrame as desired.

2023-12-07