一尘不染

计算在大型查询中某个事件“淎”之前和之后的事件数,直到遇到另一个事件“淎”。

sql

我有一个包含日期,事件和用户的表。有一个名为“ A”的事件。我想找出一个特定事件在SQL Bigquery中的事件“
A”之前和之后发生了多少次。事件A可能会出现多次。但是它应该只对事件进行计数,直到在条件之前和之后都遇到另一个事件A为止。
例如,

 User           Date             Events
    123          2018-02-14            X.Y.A
    123          2018-02-12            X.Y.B
    134          2018-02-10            Y.Z.A
    123          2018-02-11            A
    123          2018-02-01            X.Y.Z
    134          2018-02-05            X.Y.B
    134          2018-02-04            A
    123          2018-02-13            A

输出将是这样的。

User       Event    Before   After
123          A      1        1
123          A      0        1
134          A      0        1

其他条件保持不变。

这个问题是我先前问题的扩展。

我必须计算的事件包含一个特定的前缀。意味着我必须检查以(XYthen一些事件名称)开头的事件。因此,XYSomeEvent是我必须为其设置计数器的事件。有什么建议?


阅读 152

收藏
2021-05-16

共1个答案

一尘不染

以下是BigQuery标准SQL

#standardSQL
WITH grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
-- ORDER BY user

您可以使用下面的示例中的虚拟数据来测试/播放上面的内容

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 123 user,  '2018-02-14' dt, 'X.Y.A' event UNION ALL
  SELECT 123,       '2018-02-13', 'A'     UNION ALL
  SELECT 123,       '2018-02-12', 'X.Y.B' UNION ALL
  SELECT 123,       '2018-02-11', 'A'     UNION ALL
  SELECT 123,       '2018-02-01', 'X.Y.Z' UNION ALL
  SELECT 134,       '2018-02-10', 'Y.Z.A' UNION ALL
  SELECT 134,       '2018-02-05', 'X.Y.B' UNION ALL
  SELECT 134,       '2018-02-04', 'A'     
), grps AS (
  SELECT user, dt, event, 
    COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
  FROM `project.dataset.events`
)
SELECT dt, user, event, before, after 
FROM (
  SELECT dt, user, event, 
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
    COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
  FROM grps
)
WHERE event = 'A'
ORDER BY user

结果为

Row dt          user    event   before  after    
1   2018-02-11  123     A       1       1    
2   2018-02-13  123     A       1       1    
3   2018-02-04  134     A       0       1
2021-05-16