一尘不染

根据BigQuery中的添加和删除事件行构建数组

sql

我在BigQuery中有一个具有以下结构的表:

datetime | event  | value
==========================
1        | add    | 1
---------+--------+-------
2        | remove | 1
---------+--------+-------
6        | add    | 2
---------+--------+-------
8        | add    | 3
---------+--------+-------
11       | add    | 4
---------+--------+-------
23       | remove | 3
---------+--------+-------

我正在尝试构建一个视图,该视图list向包含数组当前状态的每一行添加一列。该数组将永远不会包含重复项。结果应该是:

datetime | event  | value | list
===================================
1        | add    | 1     | [1]
---------+--------+-------+--------
2        | remove | 1     | []
---------+--------+-------+--------
6        | add    | 2     | [2]
---------+--------+-------+--------
8        | add    | 3     | [2,3]
---------+--------+-------+--------
11       | add    | 4     | [2,3,4]
---------+--------+-------+--------
23       | remove | 3     | [2,4]
---------+--------+-------+--------

我尝试使用解析函数,但没有成功。用于数组的API十分有限。我想如果我可以使用递归WITH子句,我会成功的,不幸的是,这在BigQuery中是不可能的。

我正在使用启用了标准SQL的BigQuery。


阅读 144

收藏
2021-05-16

共1个答案

一尘不染

以下版本适用于BigQuery标准SQL,仅使用纯SQL(无JS UDF)

#standardSQL
WITH `project.dataset.events` AS (
  SELECT 1 dt,'add' event,'1' value UNION ALL
  SELECT 2,   'remove',   '1' UNION ALL
  SELECT 6,   'add',      '2' UNION ALL
  SELECT 8,   'add',      '3' UNION ALL
  SELECT 11,  'add',      '4' UNION ALL
  SELECT 23,  'remove',   '3' 
), cum AS (
  SELECT dt, event, value,
    SUM(IF(event = 'add', 1, -1)) OVER(PARTITION BY value ORDER BY dt) state
  FROM `project.dataset.events`
), pre AS (
  SELECT 
    a.dt, a.event, a.value, a.state, b.value AS b_value,
    ARRAY_AGG(b.state ORDER BY b.dt DESC)[SAFE_OFFSET(0)] b_state, 
    MAX(b.dt) b_dt 
  FROM cum a
  JOIN cum b ON b.dt <= a.dt
  GROUP BY a.dt, a.event, a.value, a.state, b.value
)
SELECT dt, event, value, 
  SPLIT(IFNULL(STRING_AGG(IF(b_state = 1, b_value, NULL) ORDER BY b_dt), '')) list_as_array,
  CONCAT('[', IFNULL(STRING_AGG(IF(b_state = 1, b_value, NULL) ORDER BY b_dt), ''), ']') list_as_string
FROM pre
GROUP BY dt, event, value
ORDER BY dt

结果是“令人惊讶”:o)与我之前回答/发布的JS UDF版本完全相同

Row dt  event   value   list_as_arr list_as_string   
1   1   add     1       1           [1]  
2   2   remove  1                   []   
3   6   add     2       2           [2]  
4   8   add     3       2           [2,3]    
                        3        
5   11  add     4       2           [2,3,4]  
                        3        
                        4        
6   23  remove  3       2           [2,4]    
                        4

注意:我认为以上可能有点过分设计-但我只是没有时间潜在地完善/优化它-应该是可行的-这要由问题所有者负责

2021-05-16