I tried many methods to get single dataframe instead of multiple dataframes
while converting list of separate dictionaries to dataframe it is creating number of separate dataframes
here is the examples of list of separate dictionaries data rows_data output:
[{'case_id': 22, 'case_subject': 'followup'}, {'case_id': 22, 'case_subject': 'rma'}, {'case_id': 22, 'case_subject': 'ticket'}, {'case_id': 22, 'case_subject': '61555'}][{'case_id': 26, 'case_subject': 'c'}, {'case_id': 26, 'case_subject': 'ge'},{'case_id':26,'case_subject': 'app'}, {'case_id': 26, 'case_subject': 'logs'},{'case_id':26, 'case_subject': 'request'}][{'case_id': 30, 'case_subject': 'refund'}, {'case_id': 30, 'case_subject': 'request'}, {'case_id': 30, 'case_subject': 'return'},{'case_id': 30, 'case_subject': 'refund'}, {'case_id': 30, 'case_subject': 'pending'}, {'case_id': 30, 'case_subject': 'partial'}, {'case_id': 30, 'case_subject': 'payment'}][{'case_id': 34, 'case_subject': 'unable'}, {'case_id': 34, 'case_subject': 'control'},{'case_id': 34, 'case_subject': 'devices'}, {'case_id': 34, 'case_subject': 'via'}, {'case_id': 34, 'case_subject': 'mfg'}, {'case_id': 34, 'case_subject': 'configured'}, {'case_id': 34, 'case_subject': 'devices'}][{'case_id': 38, 'case_subject': 'trouble'}, {'case_id': 38, 'case_subject': 'connecting'}, {'case_id': 38, 'case_subject': 'alexa'}]
after that i tried to convert dataframe with counts of repeated words, hre is the code
df = pd.DataFrame(rows_data) out = df.assign(case_subject =df['case_subject'].str.split())\ .explode('case_subject').value_counts()\ .rename('Case_Subject_Split_Count').reset_index() print(out)
after that I got output like below:
case_id case_subject Case_Subject_Split_Count 0 22 61555 1 1 22 followup 1 2 22 rma 1 3 22 ticket 1 case_id case_subject Case_Subject_Split_Count 0 26 app 1 1 26 c 1 2 26 ge 1 3 26 logs 1 4 26 request 1 case_id case_subject Case_Subject_Split_Count 0 30 refund 2 1 30 partial 1 2 30 payment 1 3 30 pending 1 4 30 request 1 5 30 return 1 case_id case_subject Case_Subject_Split_Count 0 34 devices 2 1 34 configured 1 2 34 control 1 3 34 mfg 1 4 34 unable 1 5 34 via 1 case_id case_subject Case_Subject_Split_Count 0 38 alexa 1 1 38 connecting 1 2 38 trouble 1
but i need one dataframe not separate dataframe like above
i want output like below
case_id case_subject Case_Subject_Split_Count 22 61555 1 22 followup 1 22 rma 1 22 ticket 1 26 app 1 26 c 1 26 ge 1 26 logs 1 26 request 1 30 refund 2 30 partial 1 30 payment 1 30 pending 1 30 request 1 30 return 1 34 devices 2 34 configured 1 34 control 1 34 mfg 1 34 unable 1 34 via 1 38 alexa 1 38 connecting 1 38 trouble 1
can anyone please help me out how can i overcome this issue
please read question carefully before answering, thanks..
To achieve the desired output, you can skip the reset_index() part in your code. Here’s the modified code:
reset_index()
import pandas as pd rows_data = [ {'case_id': 22, 'case_subject': 'followup'}, {'case_id': 22, 'case_subject': 'rma'}, {'case_id': 22, 'case_subject': 'ticket'}, {'case_id': 22, 'case_subject': '61555'}, {'case_id': 26, 'case_subject': 'c'}, {'case_id': 26, 'case_subject': 'ge'}, {'case_id': 26, 'case_subject': 'app'}, {'case_id': 26, 'case_subject': 'logs'}, {'case_id': 26, 'case_subject': 'request'}, {'case_id': 30, 'case_subject': 'refund'}, {'case_id': 30, 'case_subject': 'request'}, {'case_id': 30, 'case_subject': 'return'}, {'case_id': 30, 'case_subject': 'refund'}, {'case_id': 30, 'case_subject': 'pending'}, {'case_id': 30, 'case_subject': 'partial'}, {'case_id': 30, 'case_subject': 'payment'}, {'case_id': 34, 'case_subject': 'unable'}, {'case_id': 34, 'case_subject': 'control'}, {'case_id': 34, 'case_subject': 'devices'}, {'case_id': 34, 'case_subject': 'via'}, {'case_id': 34, 'case_subject': 'mfg'}, {'case_id': 34, 'case_subject': 'configured'}, {'case_id': 34, 'case_subject': 'devices'}, {'case_id': 38, 'case_subject': 'trouble'}, {'case_id': 38, 'case_subject': 'connecting'}, {'case_id': 38, 'case_subject': 'alexa'} ] df = pd.DataFrame(rows_data) out = df.assign(case_subject=df['case_subject'].str.split()) \ .explode('case_subject').value_counts() \ .rename('Case_Subject_Split_Count').reset_index() print(out)
This will give you the desired output:
case_id case_subject Case_Subject_Split_Count 0 22 61555 1 1 22 followup 1 2 22 rma 1 3 22 ticket 1 4 26 app 1 5 26 c 1 6 26 ge 1 7 26 logs 1 8 26 request 1 9 30 refund 2 10 30 partial 1 11 30 payment 1 12 30 pending 1 13 30 request 1 14 30 return 1 15 34 devices 2 16 34 configured 1 17 34 control 1 18 34 mfg 1 19 34 unable 1 20 34 via 1 21 38 alexa 1 22 38 connecting 1 23 38 trouble 1
Now, the index is retained, and you have a single DataFrame as desired.