对于缺少的十六进制编码相关问题 https://stackoverflow.com/a/11310258/367456:
$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = mb_convert_encoding($utf8, 'UTF-32BE', 'UTF-8');
$entity = vsprintf('&#x%X;', unpack('N', $binary));
return $entity;
}, $input);
这与 @Baba 的回答类似UTF-32BE http://en.wikipedia.org/wiki/UTF-32进而unpack http://php.net/unpack and vsprintf http://php.net/vsprintf以满足格式化的需要。
如果你更喜欢iconv http://php.net/iconv over mb_convert_encoding http://php.net/mb_convert_encoding,它是相似的:
$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = iconv('UTF-8', 'UTF-32BE', $utf8);
$entity = vsprintf('&#x%X;', unpack('N', $binary));
return $entity;
}, $input);
我发现这个字符串操作比在获取html实体的十六进制代码 https://stackoverflow.com/q/7482977/367456.