为了消除重复项,这可能是 PostgreSQL 中最有效的查询:
SELECT DISTINCT ON (anchor_id, groundtruth) *
FROM measurement
WHERE st_distance(p, groundtruth) < d
有关此查询样式的更多信息:
- 选择每个 GROUP BY 组中的第一行? https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group/7630564#7630564
正如评论中提到的,这给了你一个随意的挑选。如果你需要random,稍微贵一些:
SELECT DISTINCT ON (anchor_id, groundtruth) *
FROM measurement
WHERE st_distance(p, groundtruth) < d
ORDER BY anchor_id, groundtruth, random()
第二部分更难优化。EXISTS https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-EXISTS半连接可能是最快的选择。对于给定的表ps (p point)
:
SELECT DISTINCT ON (anchor_id, groundtruth) *
FROM measurement m
WHERE EXISTS (
SELECT 1
FROM ps
WHERE st_distance(ps.p, m.groundtruth) < d
)
ORDER BY anchor_id, groundtruth, random();
这可以立即停止评估p
足够接近并且它使查询的其余部分保持简单。
一定要支持这一点一个匹配的胃肠道指数 http://blog.opengeo.org/2011/09/28/indexed-nearest-neighbour-search-in-postgis/.
如果您有一个数组作为输入,请创建一个CTE https://www.postgresql.org/docs/current/queries-with.html with unnest() https://www.postgresql.org/docs/current/functions-array.html#ARRAY-FUNCTIONS-TABLE即时:
WITH ps AS (SELECT unnest(p_array) AS p)
SELECT ...
根据评论更新
如果你只需要一个单排作为答案,您可以简化:
WITH ps AS (SELECT unnest(p_array) AS p)
SELECT *
FROM measurement m
WHERE EXISTS (
SELECT 1
FROM ps
WHERE st_distance(ps.p, m.groundtruth) < d
)
LIMIT 1;
更快地使用ST_DWithin()
该功能可能更有效ST_DWithin() https://postgis.net/docs/ST_DWithin.html(以及匹配的 GiST 索引!)。
To get one行(此处使用子选择而不是 CTE):
SELECT *
FROM measurement m
JOIN (SELECT unnest(p_array) AS p) ps ON ST_DWithin(ps.p, m.groundtruth, d)
LIMIT 1;
To get 每个点占一行p
距离内d
:
SELECT DISTINCT ON (ps.p) *
FROM measurement m
JOIN (SELECT unnest(p_array) AS p) ps ON ST_DWithin(ps.p, m.groundtruth, d)
Adding ORDER BY random()
将进行此查询更贵。没有random()
, Postgres 可以选择首先GiST 索引中的匹配行。别的all必须随机检索和排序可能的匹配项。
BTW, LIMIT 1
inside EXISTS
毫无意义。读我提供的链接中的手册 https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-EXISTS or 这个相关问题 https://stackoverflow.com/questions/7710153/what-is-easier-to-read-in-exists-subqueries.