How can I merge two dataframes in pandas that both have multi-index columns. The merge should be an outer merge on a column that is present in both dataframes. See minimal example and resulting error below:
pandas
import pandas as pd # Sample DataFrames with multi-index columns data1 = { ('A', 'X'): [1, 2, 3], ('A', 'Y'): [4, 5, 6], ('B', 'X'): [7, 8, 9], ('B', 'Y'): [10, 11, 12], } data2 = { ('A', 'X'): [13, 14, 15], ('A', 'Y'): [16, 17, 18], ('B', 'X'): [19, 20, 21], ('B', 'Y'): [22, 23, 24], } df1 = pd.DataFrame(data1, index=['row1', 'row2', 'row3']) df2 = pd.DataFrame(data2, index=['row1', 'row2', 'row3']) # Merge on a specific column (e.g., ('A', 'X')) column_to_merge_on = ('A', 'X') merged_df = pd.merge(df1, df2, left_on=column_to_merge_on, right_on=column_to_merge_on) print(merged_df)
Resulting error:
ValueError: The column label 'A' is not unique. For a multi-index, the label must be a tuple with elements corresponding to each level.
For me working add list like [('A', 'X')]:
[('A', 'X')]
#changed data for match data1 = { ('A', 'X'): [1, 2, 13], ('A', 'Y'): [4, 5, 6], ('B', 'X'): [7, 8, 9], ('B', 'Y'): [10, 11, 12], } data2 = { ('A', 'X'): [13, 14, 15], ('A', 'Y'): [16, 17, 18], ('B', 'X'): [19, 20, 21], ('B', 'Y'): [22, 23, 24], } df1 = pd.DataFrame(data1, index=['row1', 'row2', 'row3']) df2 = pd.DataFrame(data2, index=['row1', 'row2', 'row3']) # Merge on a specific column (e.g., ('A', 'X')) column_to_merge_on = [('A', 'X')] merged_df = pd.merge(df1, df2, left_on=column_to_merge_on, right_on=column_to_merge_on, how='outer')
print(merged_df) A A_x B_x A_y B_y X X Y X Y Y X Y 0 1 1.0 4.0 7.0 10.0 NaN NaN NaN 1 2 2.0 5.0 8.0 11.0 NaN NaN NaN 2 13 13.0 6.0 9.0 12.0 16.0 19.0 22.0 3 14 NaN NaN NaN NaN 17.0 20.0 23.0 4 15 NaN NaN NaN NaN 18.0 21.0 24.0