我在Postgres中有下表,在两列a_sno和中有重叠的数据b_sno。
a_sno
b_sno
create table data ( a_sno integer not null, b_sno integer not null, PRIMARY KEY (a_sno,b_sno) ); insert into data (a_sno,b_sno) values ( 4, 5 ) , ( 5, 4 ) , ( 5, 6 ) , ( 6, 5 ) , ( 6, 7 ) , ( 7, 6 ) , ( 9, 10) , ( 9, 13) , (10, 9 ) , (13, 9 ) , (10, 13) , (13, 10) , (10, 14) , (14, 10) , (13, 14) , (14, 13) , (11, 15) , (15, 11);
从前6行可以看到,两列中的数据值4,5,6和7相交/重叠,需要将其划分为一组。第7-16行和第17-18行将分别标记为组2和3。
结果输出应如下所示:
group | value ------+------ 1 | 4 1 | 5 1 | 6 1 | 7 2 | 9 2 | 10 2 | 13 2 | 14 3 | 11 3 | 15
假设所有对在它们的镜像的组合中也存在(4,5)和(5,4)。但是以下解决方案也可以在没有镜像重复对象的情况下正常工作。
(4,5)
(5,4)
所有连接都可以 按单个升序排列 ,不可能出现像我在小提琴中添加的那样的复杂情况,我们可以使用此解决方案,而无需在rCTE中重复:
我从获取a_sno每个组的最小值开始,并具有最小的关联值b_sno:
SELECT row_number() OVER (ORDER BY a_sno) AS grp , a_sno, min(b_sno) AS b_sno FROM data d WHERE a_sno < b_sno AND NOT EXISTS ( SELECT 1 FROM data WHERE b_sno = d.a_sno AND a_sno < b_sno ) GROUP BY a_sno;
结果:
grp a_sno b_sno 1 4 5 2 9 10 3 11 15
我避免使用分支和重复的(重复的)行-长链可能 会 更昂贵。我ORDER BY b_sno LIMIT 1在相关子查询中使用了递归CTE。
ORDER BY b_sno LIMIT 1
性能的关键是匹配索引,它已经由PK约束提供了:反之则PRIMARY KEY (a_sno,b_sno)不行 (b_sno, a_sno) :
PRIMARY KEY (a_sno,b_sno)
(b_sno, a_sno)
复合索引对第一个字段的查询是否也有用?
WITH RECURSIVE t AS ( SELECT row_number() OVER (ORDER BY d.a_sno) AS grp , a_sno, min(b_sno) AS b_sno – the smallest one FROM data d WHERE a_sno < b_sno AND NOT EXISTS ( SELECT 1 FROM data WHERE b_sno = d.a_sno AND a_sno < b_sno ) GROUP BY a_sno )
, cte AS ( SELECT grp, b_sno AS sno FROM t
UNION ALL SELECT c.grp , (SELECT b_sno – correlated subquery FROM data WHERE a_sno = c.sno AND a_sno < b_sno ORDER BY b_sno LIMIT 1) FROM cte c WHERE c.sno IS NOT NULL ) SELECT * FROM cte WHERE sno IS NOT NULL – eliminate row with NULL UNION ALL – no duplicates SELECT grp, a_sno FROM t ORDER BY grp, sno;
从根(最小sno)到一个或多个分支,可以按升序到达所有节点。
sno
这次,获得 所有 更大sno和重复数据删除的节点,这些节点可能在最后访问了多次UNION:
UNION
WITH RECURSIVE t AS ( SELECT rank() OVER (ORDER BY d.a_sno) AS grp , a_sno, b_sno -- get all rows for smallest a_sno FROM data d WHERE a_sno < b_sno AND NOT EXISTS ( SELECT 1 FROM data WHERE b_sno = d.a_sno AND a_sno < b_sno ) ) , cte AS ( SELECT grp, b_sno AS sno FROM t UNION ALL SELECT c.grp, d.b_sno FROM cte c JOIN data d ON d.a_sno = c.sno AND d.a_sno < d.b_sno -- join to all connected rows ) SELECT grp, sno FROM cte UNION -- eliminate duplicates SELECT grp, a_sno FROM t -- add first rows ORDER BY grp, sno;
与第一种解决方案不同,我们在这里没有得到带有NULL的最后一行(由相关子查询引起)。