I have a dataframe with 12 column. I would like to extract the rows of a column depending on the values of another column.
Sample of my dataframe
order_id order_type order_items 45 Lunch [('Burger', 5), ('Fries', 6)] 12 Dinner [('Shrimp', 10), ('Fish&Chips', 7)] 44 Lunch [('Salad', 9), ('Steak', 9)] 23 Breakfast [('Coffee', 2), ('Eggs', 3)]
I would like to extract the breakfast, lunch and dinner menu by extracting the first item of each tuple. and extract the number of orders from the next item in the tuple.
tuple
Each item is type string according to this line of code
print(type(df['order_items'][0])) >> <class 'str'>
I tried to apply a filter to extract the breakfast menu:
BreakfastLst=df.loc[df['order_type'] == 'Breakfast']['order_items']
but the output looks like this, and I can’t use a for loop to iterate through sublists and access the tuples.
for loop
2 [('Coffee', 4), ('Eggs', 7)] 7 [('Coffee', 2), ('Eggs', 3)] 8 [('Cereal', 7), ('Pancake', 8), ('Coffee', 4),... 9 [('Cereal', 3), ('Eggs', 1), ('Coffee', 1), ('...
I also tried to convert to lists:
lists
orderTypeLst = df(['order_type'])['order_items'].apply(list)
and then extract the lists by doing this:
breakFast=orderTypeLst['Breakfast'] lunch=orderTypeLst['Lunch'] dinner=orderTypeLst['Dinner']
but the output is a string. And I can’t iterate through that either.
["[('Coffee', 4), ('Eggs', 7)]", "[('Coffee', 2), ('Eggs', 3)]", "[('Cereal', 7), ('Pancake', 8), ('Coffee', 4), ('Eggs', 8)]"]
As for dictionaries I tried the below, but the output is duplicated:
dictionaries
pd.Series(outlierFile.order_type.values,index=outlierFile.order_items).to_dict()
output sample
"[('Fries', 1), ('Steak', 6), ('Salad', 8), ('Chicken', 10)]": 'Lunch', "[('Cereal', 6), ('Pancake', 8), ('Eggs', 3)]": 'Breakfast', "[('Shrimp', 9), ('Salmon', 9)]": 'Dinner', "[('Pancake', 3), ('Coffee', 5)]": 'Breakfast', "[('Eggs', 1), ('Pancake', 1), ('Coffee', 5), ('Cereal', 5)]": 'Breakfast'
my desired output is a clean version of each order_type (list or dictionary) so I can iterate through the tuples and extract the needed items.
To achieve your goal of extracting the menu items and their quantities based on the order_type column (Breakfast, Lunch, Dinner), while cleaning up the data for easy iteration, you can proceed with the following steps:
order_type
You correctly filtered the rows using df.loc, but the data was returned as a Series where each entry was a list of tuples, and this is why it looked like a string.
df.loc
Series
To easily iterate through the menu items and quantities, you need to process the order_items column correctly. Since order_items is a list of tuples, you can extract the information by iterating over the filtered DataFrame.
order_items
Here’s how to clean up your process:
import pandas as pd # Sample data data = { 'order_id': [45, 12, 44, 23], 'order_type': ['Lunch', 'Dinner', 'Lunch', 'Breakfast'], 'order_items': [ [('Burger', 5), ('Fries', 6)], [('Shrimp', 10), ('Fish&Chips', 7)], [('Salad', 9), ('Steak', 9)], [('Coffee', 2), ('Eggs', 3)] ] } df = pd.DataFrame(data) # Filter the rows based on order_type breakfast_orders = df.loc[df['order_type'] == 'Breakfast', 'order_items'] lunch_orders = df.loc[df['order_type'] == 'Lunch', 'order_items'] dinner_orders = df.loc[df['order_type'] == 'Dinner', 'order_items'] # Extract the menu items and quantities as dictionaries breakfast_menu = [] lunch_menu = [] dinner_menu = [] # Iterate through each order and extract the items and quantities for order in breakfast_orders: for item, quantity in order: breakfast_menu.append((item, quantity)) for order in lunch_orders: for item, quantity in order: lunch_menu.append((item, quantity)) for order in dinner_orders: for item, quantity in order: dinner_menu.append((item, quantity)) # Print the results for each menu print("Breakfast Menu:", breakfast_menu) print("Lunch Menu:", lunch_menu) print("Dinner Menu:", dinner_menu)
Filter by order_type: You use df.loc to filter out rows based on the order_type column (Breakfast, Lunch, or Dinner).
Breakfast
Lunch
Dinner
Iterate through each list of tuples: The order_items column contains a list of tuples like [('Burger', 5), ('Fries', 6)]. You loop through each list (filtered by order_type) and extract the item names and quantities.
[('Burger', 5), ('Fries', 6)]
Store the results in lists: After extracting the items and quantities, you store them in breakfast_menu, lunch_menu, or dinner_menu as a list of tuples, making it easier to work with them.
breakfast_menu
lunch_menu
dinner_menu
This will give you the following output:
Breakfast Menu: [('Coffee', 2), ('Eggs', 3)] Lunch Menu: [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)] Dinner Menu: [('Shrimp', 10), ('Fish&Chips', 7)]
If you want to store each menu type in a dictionary for easier access, you can modify the code as follows:
# Create a dictionary to hold the menus by order_type menu_dict = { 'Breakfast': [], 'Lunch': [], 'Dinner': [] } # Populate the dictionary for order_type in ['Breakfast', 'Lunch', 'Dinner']: orders = df.loc[df['order_type'] == order_type, 'order_items'] for order in orders: for item, quantity in order: menu_dict[order_type].append((item, quantity)) # Print the dictionary print(menu_dict)
{ 'Breakfast': [('Coffee', 2), ('Eggs', 3)], 'Lunch': [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)], 'Dinner': [('Shrimp', 10), ('Fish&Chips', 7)] }
This approach organizes the menu items into a dictionary where each order_type is a key, and the associated value is the list of items and quantities.
By filtering the DataFrame and iterating over the order_items column, you can extract the menu items and their quantities into a more usable format, whether that’s a list or a dictionary. You can easily iterate over this cleaned-up data and perform further analysis or processing as needed.