Javascript charAt() 打破多字节字符串

2024-01-28

此代码在 Nodejs v0.10.21 中崩溃

#!/usr/bin/env node
"use strict";

var urlEncoded = 'http://zh.wikipedia.org/wiki/%F0%A8%A8%8F';
var urlDecoded = decodeURI( urlEncoded );
var urlLeafEncoded = urlEncoded.substr( 29 );
var urlLeafDecoded = decodeURIComponent( urlLeafEncoded );
var urlLeafFirstCharacterDecoded = urlLeafDecoded.charAt( 0 );
var urlLeafFirstCharacterEncoded = encodeURIComponent( urlLeafFirstCharacterDecoded );

console.log( 'URL encoded = ' + urlEncoded );
console.log( 'URL decoded = ' + urlDecoded );
console.log( 'URL leaf encoded = ' + urlLeafEncoded );
console.log( 'URL leaf decoded = ' + urlLeafDecoded );
console.log( 'URL leaf first character encoded = ' + urlLeafEncoded );
console.log( 'URL leaf first character decoded = ' + urlLeafDecoded );

我收到以下错误

var urlLeafFirstCharacterEncoded = encodeURIComponent( urlLeafFirstCharacterDe
                               ^
URIError: URI malformed
    at encodeURIComponent (native)
    at Object.<anonymous> (/media/data/tmp/mwoffliner/test.js:9:36)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:901:3

Javascript 过去可以正确处理多字节字符,但在这种情况下则不然。看起来虽然“%F0%A8%A8%8F”代表一个汉字,但javascript认为它们是其中两个。我很困惑这是 javascript 运行时中的错误、编码问题还是我这边的误解。


???? http://www.fileformat.info/info/unicode/char/28a0f/index.htm位于 BMP 之外,并且由于 Javascript 仅使用 2 个字节来存储字符,因此表示为代理对 http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates. While encodeURIComponent可以对代理项对进行操作并为其生成正确的 UTF8 编码,但它无法单独读取代理项。因此,虽然encodeURIComponent("????")工作正常,encodeURIComponent("????".charAt(0))将失败。

See http://mathiasbynens.be/notes/javascript-encoding http://mathiasbynens.be/notes/javascript-encoding更多细节。还,https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent具体记录了这个用例。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Javascript charAt() 打破多字节字符串 的相关文章

随机推荐