在 Amazon Athena 中显示带有 order by 的分区

2024-01-25

我有这样的疑问:

SHOW PARTITIONS tablename;

结果是:

dt=2018-01-12
dt=2018-01-20
dt=2018-05-21
dt=2018-04-07
dt=2018-01-03

这给出了每个表的分区列表。该表的分区字段是dt这是一个日期列。我想查看已排序的分区。

该文档没有解释如何执行此操作:https://docs.aws.amazon.com/athena/latest/ug/show-partitions.html https://docs.aws.amazon.com/athena/latest/ug/show-partitions.html

我尝试通过以下方式添加订单:

SHOW PARTITIONS tablename order by dt;

但它给出了:

亚马逊雅典娜;状态代码:400;错误代码:InvalidRequestException;


AWS 目前(截至Nov 2020 https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-athena-announces-availability-of-engine-version-2/)支持两个版本的 Athena 引擎。如何选择和排序分区取决于所使用的版本。

版本1:

Use the information_schema桌子。假设你有year, month作为分区(使用一个分区键,这当然更简单):

WITH 
 a as (
SELECT partition_number as pn, partition_key as key, partition_value as val
FROM   information_schema.__internal_partitions__
WHERE  table_schema = 'my_database'
       AND table_name = 'my_table'
 )
SELECT 
  year, month
FROM (
    SELECT val as year, pn FROM a WHERE key = 'year'
) y
JOIN (
    SELECT val as month, pn FROM a WHERE key = 'month'
) m ON m.pn = y.pn
ORDER BY year, month

其输出:

  year month
0 2018    10
0 2018    11
0 2018    12
0 2019    01
...

版本2:

使用内置的$partitions功能,其中分区显式用作列,并且语法更简单:

SELECT year, month FROM my_database."my_table$partitions" ORDER BY year, month
  year month
0 2018    10
0 2018    11
0 2018    12
0 2019    01
...

有关更多信息,请参阅:

https://docs.aws.amazon.com/athena/latest/ug/querying-glue-catalog.html#querying-glue-catalog-listing-partitions https://docs.aws.amazon.com/athena/latest/ug/querying-glue-catalog.html#querying-glue-catalog-listing-partitions

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

在 Amazon Athena 中显示带有 order by 的分区 的相关文章

随机推荐