一尘不染

Python-如何将列值在一定范围内的两个数据框合并?

python

给定两个数据框df_1df_2,如何将它们连接起来,以使datetime列 位于数据框df_1之间start和end数据框内df_2

print df_1

  timestamp              A          B
0 2016-05-14 10:54:33    0.020228   0.026572
1 2016-05-14 10:54:34    0.057780   0.175499
2 2016-05-14 10:54:35    0.098808   0.620986
3 2016-05-14 10:54:36    0.158789   1.014819
4 2016-05-14 10:54:39    0.038129   2.384590


print df_2

  start                end                  event    
0 2016-05-14 10:54:31  2016-05-14 10:54:33  E1
1 2016-05-14 10:54:34  2016-05-14 10:54:37  E2
2 2016-05-14 10:54:38  2016-05-14 10:54:42  E3

获取相应的event地方df1.timestamp之间,df_2.start以及df2.end

  timestamp              A          B          event
0 2016-05-14 10:54:33    0.020228   0.026572   E1
1 2016-05-14 10:54:34    0.057780   0.175499   E2
2 2016-05-14 10:54:35    0.098808   0.620986   E2
3 2016-05-14 10:54:36    0.158789   1.014819   E2
4 2016-05-14 10:54:39    0.038129   2.384590   E3

阅读 904

收藏
2020-02-18

共1个答案

一尘不染

一个简单的解决方案是interval indexstart and end设置中创建closed = both然后用于get_loc获取事件,即(希望所有日期时间都在timestamps dtype中)

df_2.index = pd.IntervalIndex.from_arrays(df_2['start'],df_2['end'],closed='both')
df_1['event'] = df_1['timestamp'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)]['event'])

输出:

            时间戳AB事件
0 2016-05-14 10:54:33 0.020228 0.026572 E1
1 2016-05-14 10:54:34 0.057780 0.175499 E2
2 2016-05-14 10:54:35 0.098808 0.620986 E2
3 2016-05-14 10:54:36 0.158789 1.014819 E2
4 2016-05-14 10:54:39 0.038129 2.384590 E3
2020-02-18