更直接的解决方案
这个解决方案将省略空子串或仅包含空格的子串
public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) {
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastCloseBracket= 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1 && i != 0 && i!=lastCloseBracket &&
!s.substring(lastCloseBracket, i).trim().isEmpty()) {
subTreeList.add(s.substring(lastCloseBracket, i).trim());
}
}
} else if (c == markEnd) {
if (level > 0) {
level--;
lastCloseBracket = i+1;
}
}
}
if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) {
subTreeList.add(s.substring(lastCloseBracket).trim());
}
return subTreeList;
}
然后,将其用作
String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here";
List<String> between_balanced = getStrsBetweenBalancedSubstrings(input, '[', ']');
System.out.println("Result: " + between_balanced);
// => Result: [Jim ate a, with, and, and ], and more here]
原始答案(更复杂,显示了提取嵌套括号的方法)
您还可以提取平衡括号内的所有子字符串,然后用它们分割:
String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]";
List<String> balanced = getBalancedSubstrings(input, '[', ']', true);
System.out.println("Balanced ones: " + balanced);
List<String> rx_split = new ArrayList<String>();
for (String item : balanced) {
rx_split.add("\\s*" + Pattern.quote(item) + "\\s*");
}
String rx = String.join("|", rx_split);
System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));
这个函数会找到所有[]
- 平衡子串:
public static List<String> getBalancedSubstrings(String s, Character markStart,
Character markEnd, Boolean includeMarkers) {
List<String> subTreeList = new ArrayList<String>();
int level = 0;
int lastOpenBracket = -1;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == markStart) {
level++;
if (level == 1) {
lastOpenBracket = (includeMarkers ? i : i + 1);
}
}
else if (c == markEnd) {
if (level == 1) {
subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i)));
}
if (level > 0) level--;
}
}
return subTreeList;
}
See IDEONE演示
代码执行结果:
Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]']
In-betweens: ['Jim ate a', 'with', 'and', 'and ]']
学分:getBalancedSubstrings
是基于彼得·默里·拉斯特的答案如何在Java正则表达式中分割这个“树状”字符串? post.