您可以使用preg_split() http://php.net/preg_split结合一个PCRE 前瞻条件 http://www.regular-expressions.info/lookaround.html每次出现后分割字符串.
, ;
, :
, ?
, !
, .. 同时保持实际标点符号完整:
Code:
$subject = 'abc sdfs. def ghi; this is [email protected] /cdn-cgi/l/email-protection! asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => def ghi;
[2] => this is an.e[email protected] /cdn-cgi/l/email-protection!
[3] => asdasdasd?
[4] => abc xyz
)
您还可以添加缩写黑名单(Mr.、Mrs.、Dr.、..)not通过插入否定的lookbehind断言将其分成自己的句子:
$subject = 'abc sdfs. Dr. Foo said he is not a sentence; asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<!Mr.|Mrs.|Dr.)(?<=[.?!;:])\s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => Dr. Foo said he is not a sentence;
[2] => asdasdasd?
[3] => abc xyz
)