参见 Perl 的Lingua::EN::Words2Nums http://search.cpan.org/perldoc/Lingua::EN::Words2Nums and Lingua::EN::FindNumber http://search.cpan.org/perldoc/Lingua::EN::FindNumber.
特别是,源代码Lingua::EN::FindNumber http://cpansearch.perl.org/src/TMTM/Lingua-EN-FindNumber-1.2/lib/Lingua/EN/FindNumber.pm包含:
# This is from Lingua::EN::Words2Nums, after being thrown through
# Regex::PreSuf
my $numbers =
qr/((?:b(?:akers?dozen|illi(?:ard|on))|centillion|d(?:ecilli(?:ard|on)|ozen|u(?:o(?:decilli(?:ard|on)|vigintillion)|vigintillion))|e(?:ight(?:een|ieth|[yh])?|leven(?:ty(?:first|one))?|s)|f(?:i(?:ft(?:een|ieth|[yh])|rst|ve)|o(?:rt(?:ieth|y)|ur(?:t(?:ieth|[yh]))?))|g(?:oogol(?:plex)?|ross)|hundred|mi(?:l(?:ion|li(?:ard|on))|nus)|n(?:aught|egative|in(?:et(?:ieth|y)|t(?:een|[yh])|e)|o(?:nilli(?:ard|on)|ught|vem(?:dec|vigint)illion))|o(?:ct(?:illi(?:ard|on)|o(?:dec|vigint)illion)|ne)|qu(?:a(?:drilli(?:ard|on)|ttuor(?:decilli(?:ard|on)|vigintillion))|in(?:decilli(?:ard|on)|tilli(?:ard|on)|vigintillion))|s(?:core|e(?:cond|pt(?:en(?:dec|vigint)illion|illi(?:ard|on))|ven(?:t(?:ieth|y))?|x(?:decillion|tilli(?:ard|on)|vigintillion))|ix(?:t(?:ieth|y))?)|t(?:ee?n|h(?:ir(?:t(?:een|ieth|y)|d)|ousand|ree)|r(?:e(?:decilli(?:ard|on)|vigintillion)|i(?:gintillion|lli(?:ard|on)))|w(?:e(?:l(?:fth|ve)|nt(?:ieth|y))|o)|h)|un(?:decilli(?:ard|on)|vigintillion)|vigintillion|zero|s))/i;
受Perl 的艺术许可 http://dev.perl.org/licenses/artistic.html.
您可以使用正则表达式::PreSuf http://search.cpan.org/perldoc/Regex::PreSuf自动分解常见的前缀和后缀:
#!/usr/bin/perl
use strict;
use warnings;
use Regex::PreSuf;
my %singledigit = (
one => 1,
two => 2,
three => 3,
four => 4,
five => 5,
six => 6,
seven => 7,
eight => 8,
nine => 9,
);
my $singledigit = presuf(keys %singledigit);
print $singledigit, "\n";
my $text = "one two three four five six seven eight nine";
$text =~ s/($singledigit)/$singledigit{$1}/g;
print $text, "\n";
Output:
C:\Temp> cvb
(?:eight|f(?:ive|our)|nine|one|s(?:even|ix)|t(?:hree|wo))
1 2 3 4 5 6 7 8 9
恐怕此后会变得更难;-)