#!/usr/bin/perl
use strict;
use warnings;
my $string = "[p1 text1/label1] [p2 text2/label2] textX/labelX [p3 text3/label3] [...] textY/labelY textZ/labelZ [...]";
# don't split inside the [], i.e. not at blanks that have p\d in front of them
my @items = split(/(?<!p\d)\s+/, $string);
my @new_items;
# modify the items that are not inside []
@new_items = map { ($_ =~ m/\[/) ? $_ :
((split("/",$_))[1] eq ("IN")) ? "[PP $_]" :
"[BLA $_]";
} @items;
print join(' ', @new_items), "\n";
这给出了
[p1 text1/label1] [p2 text2/label2] [PP textX/labelX] [p3 text3/label3] [...] [PP textY/labelY] [PP textZ/labelZ] [...]
我认为PP
是我在这里使用的意思,否则map
必须变得更加复杂一些。
EDIT
我已编辑代码以响应您的评论。如果你使用
"[p1 text1/label1] [p2 text2/label2] textX/IN [p3 text3/label3] [...] textY/labelY textZ/labelZ [...]";
作为示例字符串,这是输出:
[p1 text1/label1] [p2 text2/label2] [PP textX/IN] [p3 text3/label3] [...] [BLA textY/labelY] [BLA textZ/labelZ] [...]
只需记住一件事:使用的正则表达式split
不会为pn
n > 9。如果遇到这种情况,最好寻找替代方案,因为可变长度后向查找尚未实现(或者至少在我的 Perl 版本(5.10.1)中还没有实现)。
EDIT 2
作为对您的第二条评论的回复,这是脚本的修改版本。您会发现我还在示例字符串中添加了一些内容,以证明它现在可以工作,即使没有pn
在 - 的里面[...]
.
#!/usr/bin/perl
use strict;
use warnings;
my $string = "[p1 text1/label1] [p2 text2/label2] textX/IN [p3 text3/label3] [...] textY/labelY textZ/labelZ [...] xyx/IN [opq rs/abc]";
# we're using a non-greedy match to only capture the contents of one set of [],
# otherwise we'd simply match everything between the first [ and the last ].
# The parentheses around the match ensure that our delimiter is KEPT.
my @items = split(/(\[.+?\])/, $string);
#print "..$_--\n" for @items; # uncomment this to see what the split result looks like
# modify the items that are not inside []
my @new_items = map {
if (/^\[/) { # items in []
$_;
}
elsif (/(?: \w)|(?:\w )/) { # an arbitrary number of items without []
my @new = map { ($_ =~ m/\[/) ? $_ :
((split("/",$_))[1] eq ("IN")) ? "[PP $_]" :
"[BLA $_]";
} split;
}
else { # some items are '', let's just discard those
}
} @items;
print join(' ', @new_items), "\n";
输出是这样的:
[p1 text1/label1] [p2 text2/label2] [PP textX/IN] [p3 text3/label3] [...] [BLA textY/labelY] [BLA textZ/labelZ] [...] [PP xyx/IN] [opq rs/abc]
我注意到您已经收到了所需的帮助,但我想我仍然可以回答您的问题......