一尘不染

如何在没有手动硬编码的情况下在bigquery标准SQL中透视数据?[复制]

sql

这个问题已经在这里有了答案

如何在BigQuery中旋转数据表 (6个答案)

8个月前关闭。

我有一个如下表:

| user_id | product_purchased |
-------------------------------
|    111  |        A           |
|    111  |        B           |
|    222  |        B           |
|    222  |        B           |
|    333  |        C           |
|    444  |        A           |

我想透视表以将用户ID作为行,将用户购买的每种产品的计数作为列。因此,对于上表,它看起来像:

| user_id | product A | product B | product C |
-----------------------------------------------
|    111  |     1      |      1    |     0    |
|    222  |     0      |      2    |     0    |
|    333  |     0      |      0    |     1    |
|    444  |     1      |      0    |     0    |

我知道可以使用countif语句手动完成此操作:

#standardsql
select user_id,
       countif(product_purchased = 'A') as 'A',
       countif(product_purchased = 'B') as 'B',
       etc,
group by user_id

但是,实际上,该表有太多可能的产品,无法手动将所有选项写出。有没有办法以一种更加自动化和优雅的方式来实现这一点?


阅读 141

收藏
2021-03-17

共1个答案

一尘不染

实际上,该表有太多可能的产品,无法手动将所有选项写出

以下是BigQuery标准SQL

您可以分两个步骤进行操作-首先通过在下面运行以下命令来准备动态数据透视查询

#standardSQL
SELECT CONCAT('SELECT user_id, ', 
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ), 
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased 
  FROM `project.dataset.your_table`  
  GROUP BY product_purchased
)

结果,您将获得表示需要运行以获取所需结果的查询的字符串

例如,如果要应用于您问题中的虚拟数据

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A' 
)
SELECT CONCAT('SELECT user_id, ', 
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ), 
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased 
  FROM `project.dataset.your_table`  
  GROUP BY product_purchased
)

您将获得以下查询(在此处设置格式以便于查看)

SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id

现在,您只需运行此命令即可获得所需的结果,而无需手动编码

同样,如果要对您的问题中的伪数据运行它

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A' 
)
SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id   
-- ORDER BY user_id

您得到预期的结果

Row user_id product_A   product_B   product_C    
1   111     1           1           0    
2   222     0           2           0    
3   333     0           0           1    
4   444     1           0           0

有没有办法以一种更加自动化和优雅的方式来实现这一点?

您可以使用任意client选择
轻松地自动进行上述操作

2021-03-17