我陷入了SQL问题。假设我们在Redshift中有一个像这样的数据集:
account_id day event_id 111 2019-01-01 1000 111 2019-01-02 1001 111 2019-01-02 1002 111 2019-01-10 1003 111 2019-01-25 1004 111 2019-02-05 1005 111 2019-02-24 1006 111 2019-02-28 1007 111 2019-03-02 1008 111 2019-03-15 1009 222 2019-01-01 1000 222 2019-01-02 1001 222 2019-01-02 1002 222 2019-01-10 1003 222 2019-01-25 1004 222 2019-02-05 1005 222 2019-02-24 1006 222 2019-02-28 1007 222 2019-03-02 1008 222 2019-03-15 1009
我需要选择在窗口PER ACCOUNT_ID的30天后发生的event_id,但是然后根据我发现的第一个事件日期更改新窗口的开始日期。
因此在这种情况下,对于ACCOUNT_IDS 111和222都是这样:
你得到图片..
这该怎么做?
我也找不到纯粹基于窗口函数的解决方案。
但是在PostgreSql中,递归CTE可以做到这一点。
临时表用于具有可用于连接到下一条记录的ID。
CREATE TEMPORARY TABLE tempEventDates ( id SERIAL primary key, account_id int not null, day date not null, min_day date not null, event_id int not null ); INSERT INTO tempEventDates (account_id, day, min_day, event_id) SELECT account_id, day, MIN(day) OVER (PARTITION BY account_id) as min_day, event_id FROM yourtable GROUP BY account_id, day, event_id ORDER BY account_id, day, event_id; WITH RECURSIVE RCTE AS ( SELECT id, account_id, event_id, day, min_day FROM tempEventDates WHERE day = min_day UNION ALL SELECT t.id, t.account_id, t.event_id, t.day, CASE WHEN t.day > c.min_day + interval '30 days' THEN t.day ELSE c.min_day END FROM RCTE c JOIN tempEventDates t ON t.account_id = c.account_id AND t.id = c.id + 1 ) SELECT account_id, day, event_id FROM RCTE WHERE day = min_day ORDER BY account_id, day;