我当前的查询如下所示:
WITH points AS ( SELECT unnest(array_of_points) AS p ), gtps AS ( SELECT DISTINCT ON(points.p) points.p, m.groundtruth FROM measurement m, points WHERE st_distance(m.groundtruth, points.p) < distance ORDER BY points.p, RANDOM() ) SELECT DISTINCT ON(gtps.p, gtps.groundtruth, m.anchor_id) m.id, m.anchor_id, gtps.groundtruth, gtps.p FROM measurement m, gtps ORDER BY gtps.p, gtps.groundtruth, m.anchor_id, RANDOM()
语义:
有两个输入值:
array_of_points
distance
第一段(第1-6行):
从点数组创建表以用于…
第二段(第8-14行):
对于points表中的每个点:从表中获取一个随机(!)groundtruth点measurement,其距离<distance
points
groundtruth
measurement
gtps
第三段(第16-19行):
对于表中的每个groundtruth值gtps:获取所有anchor_id值并…
anchor_id
id
p
表格示例:
id | anchor_id | groundtruth | data ----------------------------------- 1 | 1 | POINT(1 4) | ... 2 | 3 | POINT(1 4) | ... 3 | 8 | POINT(1 4) | ... 4 | 6 | POINT(1 4) | ... ----------------------------------- 5 | 2 | POINT(3 2) | ... 6 | 4 | POINT(3 2) | ... ----------------------------------- 7 | 1 | POINT(4 3) | ... 8 | 1 | POINT(4 3) | ... 9 | 6 | POINT(4 3) | ... 10 | 7 | POINT(4 3) | ... 11 | 3 | POINT(4 3) | ... ----------------------------------- 12 | 1 | POINT(6 2) | ... 13 | 5 | POINT(6 2) | ...
结果示例:
id | anchor_id | groundtruth | p ----------------------------------------- 1 | 1 | POINT(1 4) | POINT(1 0) 2 | 3 | POINT(1 4) | POINT(1 0) 4 | 6 | POINT(1 4) | POINT(1 0) 3 | 8 | POINT(1 4) | POINT(1 0) 5 | 2 | POINT(3 2) | POINT(2 2) 6 | 4 | POINT(3 2) | POINT(2 2) 1 | 1 | POINT(1 4) | POINT(4 8) 2 | 3 | POINT(1 4) | POINT(4 8) 4 | 6 | POINT(1 4) | POINT(4 8) 3 | 8 | POINT(1 4) | POINT(4 8) 12 | 1 | POINT(6 2) | POINT(7 3) 13 | 5 | POINT(6 2) | POINT(7 3) 1 | 1 | POINT(4 3) | POINT(9 1) 11 | 3 | POINT(4 3) | POINT(9 1) 9 | 6 | POINT(4 3) | POINT(9 1) 10 | 7 | POINT(4 3) | POINT(9 1)
如你看到的:
基准 (两个输入值):
详细说明:
Unique (cost=11119.32..11348.33 rows=18 width=72) Output: m.id, m.anchor_id, gtps.groundtruth, gtps.p, (random()) CTE points -> Result (cost=0.00..0.01 rows=1 width=0) Output: unnest('{0101000000EE7C3F355EF24F4019390B7BDA011940:01010000003480B74082FA44402CD49AE61D173C40}'::geometry[]) CTE gtps -> Unique (cost=7659.95..7698.12 rows=1 width=160) Output: points.p, m.groundtruth, (random()) -> Sort (cost=7659.95..7679.04 rows=7634 width=160) Output: points.p, m.groundtruth, (random()) Sort Key: points.p, (random()) -> Nested Loop (cost=0.00..6565.63 rows=7634 width=160) Output: points.p, m.groundtruth, random() Join Filter: (st_distance(m.groundtruth, points.p) < m.distance) -> CTE Scan on points (cost=0.00..0.02 rows=1 width=32) Output: points.p -> Seq Scan on public.measurement m (cost=0.00..535.01 rows=22901 width=132) Output: m.id, m.anchor_id, m.tag_node_id, m.experiment_id, m.run_id, m.anchor_node_id, m.groundtruth, m.distance, m.distance_error, m.distance_truth, m."timestamp" -> Sort (cost=3421.18..3478.43 rows=22901 width=72) Output: m.id, m.anchor_id, gtps.groundtruth, gtps.p, (random()) Sort Key: gtps.p, gtps.groundtruth, m.anchor_id, (random()) -> Nested Loop (cost=0.00..821.29 rows=22901 width=72) Output: m.id, m.anchor_id, gtps.groundtruth, gtps.p, random() -> CTE Scan on gtps (cost=0.00..0.02 rows=1 width=64) Output: gtps.p, gtps.groundtruth -> Seq Scan on public.measurement m (cost=0.00..535.01 rows=22901 width=8) Output: m.id, m.anchor_id, m.tag_node_id, m.experiment_id, m.run_id, m.anchor_node_id, m.groundtruth, m.distance, m.distance_error, m.distance_truth, m."timestamp"
说明分析:
Unique (cost=11119.32..11348.33 rows=18 width=72) (actual time=548.991..657.992 rows=36 loops=1) CTE points -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..0.011 rows=2 loops=1) CTE gtps -> Unique (cost=7659.95..7698.12 rows=1 width=160) (actual time=133.416..146.745 rows=2 loops=1) -> Sort (cost=7659.95..7679.04 rows=7634 width=160) (actual time=133.415..142.255 rows=15683 loops=1) Sort Key: points.p, (random()) Sort Method: external merge Disk: 1248kB -> Nested Loop (cost=0.00..6565.63 rows=7634 width=160) (actual time=0.045..46.670 rows=15683 loops=1) Join Filter: (st_distance(m.groundtruth, points.p) < m.distance) -> CTE Scan on points (cost=0.00..0.02 rows=1 width=32) (actual time=0.007..0.020 rows=2 loops=1) -> Seq Scan on measurement m (cost=0.00..535.01 rows=22901 width=132) (actual time=0.013..3.902 rows=22901 loops=2) -> Sort (cost=3421.18..3478.43 rows=22901 width=72) (actual time=548.989..631.323 rows=45802 loops=1) Sort Key: gtps.p, gtps.groundtruth, m.anchor_id, (random())" Sort Method: external merge Disk: 4008kB -> Nested Loop (cost=0.00..821.29 rows=22901 width=72) (actual time=133.449..166.294 rows=45802 loops=1) -> CTE Scan on gtps (cost=0.00..0.02 rows=1 width=64) (actual time=133.420..146.753 rows=2 loops=1) -> Seq Scan on measurement m (cost=0.00..535.01 rows=22901 width=8) (actual time=0.014..4.409 rows=22901 loops=2) Total runtime: 834.626 ms
实时运行时,应以大约100-1000个输入值运行。因此,到目前为止,这将花费35到350秒,这远远不算多。
我已经尝试删除RANDOM()功能。这将运行时间(对于2个输入值)从大约670ms减少到大约530ms。因此,目前这还不是主要的影响。
RANDOM()
如果更容易/更快,也可以运行2或3个独立的查询,并在软件中做一些工作(在Ruby on Rails服务器上运行)。例如随机选择?!
SELECT m.groundtruth, ps.p, ARRAY_AGG(m.anchor_id), ARRAY_AGG(m.id) FROM measurement m JOIN (SELECT unnest(point_array) AS p) AS ps ON ST_DWithin(ps.p, m.groundtruth, distance) GROUP BY groundtruth, ps.p
使用此查询,它的速度非常快( 15ms ),但有很多不足:
ps.p
如果anchor_id并且id可以将其存储在元组数组中,那也很好。例如:({[4,1],[6,3],[4,2],[8,5],[4,4]}约束:每个元组都是唯一的,每个id(在示例中==第2个值)都是唯一的,anchor_ids不是唯一的)。本示例显示查询,其中没有必须仍然应用的过滤器。应用过滤器后,它看起来像是{[6,3],[4,4],[8,5]}。
{[4,1],[6,3],[4,2],[8,5],[4,4]}
{[6,3],[4,4],[8,5]}
SELECT DISTINCT ON (ps.p) m.groundtruth, ps.p, ARRAY_AGG(m.anchor_id), ARRAY_AGG(m.id) FROM measurement m JOIN (SELECT unnest(point_array) AS p) AS ps ON ST_DWithin(ps.p, m.groundtruth, distance) GROUP BY ps.p, m.groundtruth ORDER BY ps.p, RANDOM()
现在,这给出了非常不错的结果,并且仍然非常快: 16ms 剩下要做的一件事:
ARRAY_AGG(m.anchor_id)
ARRAY_AGG(m.id)
如果anchor_id和id可以存储在元组数组中,那也很好。
我想您为此创建了一个 二维数组 。这比起来更容易处理ARRAY of record。Standardarray_agg()无法聚合多维数组。但是您可以为此轻松地编写自己的聚合函数:
ARRAY of record
array_agg()
CREATE AGGREGATE array_agg_mult (anyarray) ( SFUNC = array_cat ,STYPE = anyarray ,INITCOND = '{}' );
对于出现多次的数组中的每个anchor_id:保留一个随机数,然后删除所有其他。这还意味着从每个已删除的anchor_id的ID数组中删除相应的ID
SELECT DISTINCT ON (p) p, groundtruth, array_agg_mult(ARRAY[ARRAY[anchor_id, id]]) AS ids FROM ( SELECT DISTINCT ON (ps.p, m.groundtruth, m.anchor_id) ps.p, m.groundtruth, m.anchor_id, m.id FROM (SELECT unnest(point_array) AS p) AS ps JOIN measurement m ON ST_DWithin(ps.p, m.groundtruth, distance) ORDER BY ps.p, m.groundtruth, m.anchor_id, random() ) x GROUP BY p, groundtruth ORDER BY p, random();
子查询每个x都不同,如果有多个同级,则选择一个随机行。这样,连接保持完整。anchor_id``(p, groundtruth)``anchor_id - id
x
anchor_id``(p, groundtruth)``anchor_id - id
外部查询聚合您想要的二维数组,按排序anchor_id。如果要anchor_id随机订购,请再次使用随机:
array_agg_mult(ARRAY[ARRAY[anchor_id, id]] ORDER BY random())
最后,每个人DISTINCT ON只能随机选择1groundtruth个p。
DISTINCT ON