您遇到问题是因为您忽略了在输入期间将二进制数据解码为 Perl 字符串并在输出期间将 Perl 字符串编码为二进制数据。这样做的原因是正则表达式及其朋友split
在 Perl 字符串上正常工作。
(?<=.)
意思是“第一个字符之后”。因此,该程序无法在复姓/复合姓氏上正常工作;请记住,它们很罕见,但确实存在。为了始终正确地将名字拆分为姓氏和名字部分,您需要使用包含姓氏的字典。
Linux版本:
use strict;
use warnings;
use Encode qw(decode encode);
while (my $full_name = <DATA>) {
$full_name = decode('UTF-8', $full_name);
chomp $full_name;
my ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);
print encode('UTF-8',
sprintf('The full name is %s, the family name is %s, the given name is %s.', $full_name, $family_name, $given_name)
);
}
__DATA__
张小三
Output:
The full name is 张小三, the family name is 张, the given name is 小三.
Windows 版本:
use strict;
use warnings;
use Encode qw(decode encode);
use Encode::HanExtra qw();
while (my $full_name = <DATA>) {
$full_name = decode('GB18030', $full_name);
chomp $full_name;
my ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);
print encode('GB18030',
sprintf('The full name is %s, the family name is %s, the given name is %s.', $full_name, $family_name, $given_name)
);
}
__DATA__
张小三
Output:
The full name is 张小三, the family name is 张, the given name is 小三.