Split
你的字符串来获取数组,explode
数组并与详细表连接(在我的示例中使用 CTE 而不是它,请使用普通表)以将 desc 与代码连接。然后使用组装字符串collect_list(desc)
得到一个数组 +concat_ws()
获取连接字符串:
select concat_ws('-',collect_list(d.desc)) as code_desc
from
( --initial string explode
select explode(split('0-1-3','-')) as code
) s
inner join
(-- use your table instead of this subquery
select 0 code, 'AAA' desc union all
select 1, 'BBB' desc union all
select 2, 'CCC' desc union all
select 3, 'DDD' desc
) d on s.code=d.code;
Result:
OK
AAA-BBB-DDD
Time taken: 114.798 seconds, Fetched: 1 row(s)
如果您需要保留原始顺序,请使用posexplode
它返回元素及其在原始数组中的位置。然后您可以通过之前的记录 ID 和 pos 进行订购collect_list()
.
如果您的字符串是表格列,则使用横向视图来选择分解值。
这是更复杂的示例,具有保留顺序和横向视图。
select str as original_string, concat_ws('-',collect_list(s.desc)) as transformed_string
from
(
select s.str, s.pos, d.desc
from
( --initial string explode with ordering by str and pos
--(better use your table PK, like ID instead of str for ordering), pos
select str, pos, code from ( --use your table instead of this subquery
select '0-1-3' as str union all
select '2-1-3' as str union all
select '3-2-1' as str
)s
lateral view outer posexplode(split(s.str,'-')) v as pos,code
) s
inner join
(-- use your table instead of this subquery
select 0 code, 'AAA' desc union all
select 1, 'BBB' desc union all
select 2, 'CCC' desc union all
select 3, 'DDD' desc
) d on s.code=d.code
distribute by s.str -- this should be record PK candidate
sort by s.str, s.pos --sort on each reducer
)s
group by str;
Result:
OK
0-1-3 AAA-BBB-DDD
2-1-3 CCC-BBB-DDD
3-2-1 DDD-CCC-BBB
Time taken: 67.534 seconds, Fetched: 3 row(s)
注意distribute
+ sort
正在被使用而不是简单地order by str, pos
。分发+排序以完全分布式模式工作,order by
也可以正确工作,但在单个减速器上。