我想把这个相对聪明的问题分享给这里的大家。
我正在尝试从字符串中删除不平衡/不配对的双引号。
我的工作正在进行中,我可能即将找到解决方案。但是,我还没有找到可行的解决方案。我无法从字符串中删除未配对/未配对的双引号。
输入示例
string1=injunct! alter ego."
string2=successor "alter ego" single employer" "proceeding "citation assets"
输出应该是
string1=injunct! alter ego.
string2=successor "alter ego" single employer proceeding "citation assets"
这个问题听起来类似于使用 Java 删除不平衡/不配对的括号 https://stackoverflow.com/questions/9898455/using-java-remove-unbalanced-unpartnered-paranthesis
这是到目前为止我的代码(它不会删除所有不成对的双引号)
private String removeUnattachedDoubleQuotes(String stringWithDoubleQuotes) {
String firstPass = "";
String openingQuotePattern = "\\\"[a-z0-9\\p{Punct}]";
String closingQuotePattern = "[a-z0-9\\p{Punct}]\\\"";
int doubleQuoteLevel = 0;
for (int i = 0; i < stringWithDoubleQuotes.length() - 3; i++) {
String c = stringWithDoubleQuotes.substring(i, i + 2);
if (c.matches(openingQuotePattern)) {
doubleQuoteLevel++;
firstPass += c;
}
else if (c.matches(closingQuotePattern)) {
if (doubleQuoteLevel > 0) {
doubleQuoteLevel--;
firstPass += c;
}
}
else {
firstPass += c;
}
}
String secondPass = "";
doubleQuoteLevel = 0;
for (int i = firstPass.length() - 1; i >= 0; i--) {
String c = stringWithDoubleQuotes.substring(i, i + 2);
if (c.matches(closingQuotePattern)) {
doubleQuoteLevel++;
secondPass = c + secondPass;
}
else if (c.matches(openingQuotePattern)) {
if (doubleQuoteLevel > 0) {
doubleQuoteLevel--;
secondPass = c + secondPass;
}
}
else {
secondPass = c + secondPass;
}
}
String result = secondPass;
return result;
}
如果没有嵌套,它可能可以在单个正则表达式中完成。
有一个粗略定义的分隔符的概念,并且可以“偏差”
这些规则是为了获得更好的结果。
这完全取决于制定什么规则。这个正则表达式考虑到
按顺序排列三种可能的情况;
- 有效对
- 无效对(有偏差)
- 无效单
它也不会解析超出行尾的“”。但它确实有多种作用
行组合为单个字符串。要改变这一点,请删除\n
你在哪里看到它。
全局上下文 - 原始查找正则表达式
缩短了
(?:("[a-zA-Z0-9\p{Punct}][^"\n]*(?<=[a-zA-Z0-9\p{Punct}])")|(?<![a-zA-Z0-9\p{Punct}])"([^"\n]*)"(?![a-zA-Z0-9\p{Punct}])|")
替换分组
$1$2 or \1\2
扩展的原始正则表达式:
(?: // Grouping
// Try to line up a valid pair
( // Capt grp (1) start
" // "
[a-zA-Z0-9\p{Punct}] // 1 of [a-zA-Z0-9\p{Punct}]
[^"\n]* // 0 or more non- [^"\n] characters
(?<=[a-zA-Z0-9\p{Punct}]) // 1 of [a-zA-Z0-9\p{Punct}] behind us
" // "
) // End capt grp (1)
| // OR, try to line up an invalid pair
(?<![a-zA-Z0-9\p{Punct}]) // Bias, not 1 of [a-zA-Z0-9\p{Punct}] behind us
" // "
( [^"\n]* ) // Capt grp (2) - 0 or more non- [^"\n] characters
" // "
(?![a-zA-Z0-9\p{Punct}]) // Bias, not 1 of [a-zA-Z0-9\p{Punct}] ahead of us
| // OR, this single " is considered invalid
" // "
) // End Grouping
Perl 测试用例(没有 Java)
$str = '
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
';
print "\n'$str'\n";
$str =~ s
/
(?:
(
"[a-zA-Z0-9\p{Punct}]
[^"\n]*
(?<=[a-zA-Z0-9\p{Punct}])
"
)
|
(?<![a-zA-Z0-9\p{Punct}])
"
( [^"\n]* )
" (?![a-zA-Z0-9\p{Punct}])
|
"
)
/$1$2/xg;
print "\n'$str'\n";
Output
'
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
'
'
string1=injunct! alter ego.
string2=successor "alter ego" single employer "a" free proceeding "citation assets"
'
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)