首先要说的就是:你can在同一个查询中多次使用 CTE 的结果,这是主要的CTE的特点 https://www.postgresql.org/docs/current/queries-with.html.) 你所拥有的将像这样工作(同时仍然只使用 CTE 一次):
WITH cte AS (
SELECT * FROM (
SELECT *, row_number() -- see below
OVER (PARTITION BY person_id
ORDER BY submission_date DESC NULLS LAST -- see below
, last_updated DESC NULLS LAST -- see below
, id DESC) AS rn
FROM tbl
) sub
WHERE rn = 1
AND status IN ('ACCEPTED', 'CORRECTED')
)
SELECT *, count(*) OVER () AS total_rows_in_cte
FROM cte
LIMIT 10
OFFSET 0; -- see below
注意事项 1:rank()
rank()
每个可以返回多行person_id
with rank = 1
. DISTINCT ON (person_id)
(如戈登提供的)是一个适用的替代品row_number()
- 正如附加信息所澄清的那样,这对您有用。看:
- 选择每个 GROUP BY 组中的第一行? https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group/7630564#7630564
注意事项 2:ORDER BY submission_date DESC
Neither submission_date
nor last_updated
被定义NOT NULL
。可能是一个问题ORDER BY submission_date DESC, last_updated DESC ...
See:
- PostgreSQL 按日期时间 asc 排序,首先为 null? https://stackoverflow.com/questions/9510509/postgresql-sort-by-datetime-asc-null-first/9511492#9511492
这些专栏真的应该是NOT NULL
?
你回复:
是的,所有这些列都应该是非空的。我可以添加这个约束。我将其设置为可为空,因为我们在文件中获取的数据并不总是完美的。但这是非常罕见的情况,我可以输入空字符串。
类型不允许为空字符串date
。保持列可为空。NULL
是这些情况的正确值。使用NULLS LAST
如所证明的,以避免NULL
被排序在顶部。
注意事项 3:OFFSET
If OFFSET
等于或大于 CTE 返回的行数,您将得到no row,所以也没有总数。看:
- 使用 LIMIT/OFFSET 运行查询并获取总行数 https://stackoverflow.com/questions/28888375/run-a-query-with-a-limit-offset-and-also-get-the-total-number-of-rows/28888696#28888696
临时解决方案
解决到目前为止的所有警告,并根据添加的信息,我们可能会得到以下查询:
WITH cte AS (
SELECT DISTINCT ON (person_id) *
FROM tbl
WHERE status IN ('ACCEPTED', 'CORRECTED')
ORDER BY person_id, submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC
)
SELECT *
FROM (
TABLE cte
ORDER BY person_id -- ?? see below
LIMIT 10
OFFSET 0
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(total_rows_in_cte) ON true;
现在 CTE 是actually使用过两次。这RIGHT JOIN
保证我们得到总数,无论OFFSET
. DISTINCT ON
应该对每个中仅有的几行执行 OK-ish(person_id)
在基本查询中。
But你有宽行。平均有多宽?该查询可能会导致对整个表进行顺序扫描。索引不会有太大帮助。这一切都将保留分页效率极低. See:
- 在大表上使用 OFFSET 优化查询 https://stackoverflow.com/questions/34110504/optimize-query-with-offset-on-large-table/34291099#34291099
您不能使用用于分页的索引,因为它基于 CTE 的派生表。并且您的分页实际排序标准仍然不清楚(ORDER BY id
?)。如果分页是目标,那么您迫切需要不同的查询样式。如果您只对前几页感兴趣,则还需要不同的查询样式。最好的解决方案取决于问题中仍然缺少的信息......
速度更快
对于您更新的目标:
查找 a 的最新条目person_id
by submission_date
(忽略“对于指定的过滤条件、类型、计划、状态”为了简单起见。)
And:
查找每行的最新行person_id
仅当有status IN ('ACCEPTED','CORRECTED')
基于这两个专业indices:
CREATE INDEX ON tbl (submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST)
WHERE status IN ('ACCEPTED', 'CORRECTED'); -- optional
CREATE INDEX ON tbl (person_id, submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST);
运行此查询:
WITH RECURSIVE cte AS (
(
SELECT t -- whole row
FROM tbl t
WHERE status IN ('ACCEPTED', 'CORRECTED')
AND NOT EXISTS (SELECT FROM tbl
WHERE person_id = t.person_id
AND ( submission_date, last_updated, id)
> (t.submission_date, t.last_updated, t.id) -- row-wise comparison
)
ORDER BY submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST
LIMIT 1
)
UNION ALL
SELECT (SELECT t1 -- whole row
FROM tbl t1
WHERE ( t1.submission_date, t1.last_updated, t1.id)
< ((t).submission_date,(t).last_updated,(t).id) -- row-wise comparison
AND t1.status IN ('ACCEPTED', 'CORRECTED')
AND NOT EXISTS (SELECT FROM tbl
WHERE person_id = t1.person_id
AND ( submission_date, last_updated, id)
> (t1.submission_date, t1.last_updated, t1.id) -- row-wise comparison
)
ORDER BY submission_date DESC NULLS LAST, last_updated DESC NULLS LAST, id DESC NULLS LAST
LIMIT 1)
FROM cte c
WHERE (t).id IS NOT NULL
)
SELECT (t).*
FROM cte
LIMIT 10
OFFSET 0;
这里的每组括号都是必需的。
这种复杂程度应该通过使用给定的索引而不是顺序扫描来更快地检索相对较小的顶行集。看:
- 优化 GROUP BY 查询以检索每个用户的最新行 https://stackoverflow.com/questions/25536422/optimize-group-by-query-to-retrieve-latest-row-per-user/25536748#25536748
submission_date
最有可能应该是类型timestamptz
or date
,不是character varying(255)
- 无论如何,这在 Postgres 中都是一个奇怪的类型定义。看:
- 重构字段的外键 https://stackoverflow.com/questions/24558650/refactor-foreign-key-to-fields/24560486#24560486
可能会优化更多细节,但这已经失控了。您可以考虑专业咨询。