给定两个数据框df_1和df_2,如何将它们连接起来,以使datetime列 位于数据框df_1之间start和end数据框内df_2:
df_1
df_2
datetime
print df_1 timestamp A B 0 2016-05-14 10:54:33 0.020228 0.026572 1 2016-05-14 10:54:34 0.057780 0.175499 2 2016-05-14 10:54:35 0.098808 0.620986 3 2016-05-14 10:54:36 0.158789 1.014819 4 2016-05-14 10:54:39 0.038129 2.384590 print df_2 start end event 0 2016-05-14 10:54:31 2016-05-14 10:54:33 E1 1 2016-05-14 10:54:34 2016-05-14 10:54:37 E2 2 2016-05-14 10:54:38 2016-05-14 10:54:42 E3
获取相应的event地方df1.timestamp之间,df_2.start以及df2.end
timestamp A B event 0 2016-05-14 10:54:33 0.020228 0.026572 E1 1 2016-05-14 10:54:34 0.057780 0.175499 E2 2 2016-05-14 10:54:35 0.098808 0.620986 E2 3 2016-05-14 10:54:36 0.158789 1.014819 E2 4 2016-05-14 10:54:39 0.038129 2.384590 E3
一个简单的解决方案是interval index从start and end设置中创建closed = both然后用于get_loc获取事件,即(希望所有日期时间都在timestamps dtype中)
interval index
start and end
closed = both
get_loc
df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both') df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])
输出:
时间戳AB事件 0 2016-05-14 10:54:33 0.020228 0.026572 E1 1 2016-05-14 10:54:34 0.057780 0.175499 E2 2 2016-05-14 10:54:35 0.098808 0.620986 E2 3 2016-05-14 10:54:36 0.158789 1.014819 E2 4 2016-05-14 10:54:39 0.038129 2.384590 E3