我们在 Bigquery 中收到调查 Web-hook 数据。本地语言的注释被捕获为 unicode,并且我们在该注释中确实有特殊字符。
-
Example
- 调查评论-“别老是晚点,现场补行李费太贵”
- Bigquery 数据中的评论-“\u522b\u8001\u662f\u665a\u70b9\uff0c\u73b0\u573a\u8865\u884c\u674e\u8d39\u592a\u8d35”
我们找到了解码个人评论的解决方案:-
CREATE TEMPORARY FUNCTION utf8convert(s STRING)
RETURNS STRING
LANGUAGE js AS """
return unescape( ( s ) );
""";
with sample AS (SELECT '\u522b\u8001\u662f\u665a' AS S)
SELECT utf8convert(s) from sample
当在具有数千条评论和不同语言的评论字段中实现此代码时,它不起作用。
CREATE TEMPORARY FUNCTION utf8convert(s STRING)
RETURNS STRING
LANGUAGE js AS """
return unescape( ( s ) );
""";
SELECT Comment, utf8convert(Comment) as Convert
FROM `airasia-nps.nps_production.NPSDashboard_Webhook_Data1`
where Comment is not null
运行时没有错误,但结果是 Unicode 没有更改为本地语言。结果:Unicode 格式的本地语言
-
我已经尝试过这段代码
CREATE TEMP FUNCTION DecodeUnicode(s STRING) AS (
IF(s NOT LIKE '%\\u%', s,
(SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(CAST(CONCAT('0x', x) AS INT64)))
FROM UNNEST(SPLIT(s, '\\u')) AS x
WHERE x != ''))
);
SELECT
original,
DecodeUnicode(original) AS decoded
FROM (
SELECT trim(r'$-\u6599\u91d1\u304c\u9ad8\u3059\u304e\uff01\uff01\uff01') AS original UNION ALL
SELECT trim(r'abcd')
);
shows error我认为这是因为评论以特殊字符开头?
看看这是否有效。它通过转换为 Unicode 代码点,然后转换为字符串,对其中包含 \u 的字符串进行“手动”解码。它也应该比使用 JavaScript 更快。
CREATE TEMP FUNCTION DecodeUnicode(s STRING) AS (
IF(s NOT LIKE '%\\u%', s,
(SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(CAST(CONCAT('0x', x) AS INT64)))
FROM UNNEST(SPLIT(s, '\\u')) AS x
WHERE x != ''))
);
SELECT
original,
DecodeUnicode(original) AS decoded
FROM (
SELECT r'\u522b\u8001\u662f\u665a\u70b9\uff0c\u73b0\u573a\u8865\u884c\u674e\u8d39\u592a\u8d35' AS original UNION ALL
SELECT r'abcd'
);
作为输出,这返回别老是晚点,现场补行李费太贵
and abcd
.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)