text = "This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE."
pattern = '[A-Z]+[A-Z]+[A-Z]*[\s]+'
re.findall(pattern, text)
给出输出 -->
['TEXT ', 'CONTAINING ', 'UPPER ', 'CASE ', 'WORDS ', 'SECOND ', 'SENTENCE ']
但是,我想要这样的输出 -->
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
您可以使用这个正则表达式:
\b[A-Z]+(?:\s+[A-Z]+)*\b
正则表达式演示 https://regex101.com/r/XyMdgv/1
正则表达式详细信息:
-
\b
:字边界
-
[A-Z]+
:匹配仅包含大写字母的单词
-
(?:\s+[A-Z]+)*
:匹配 1 个以上的空格,后跟另一个大写字母的单词。匹配该组 0 次或多次
-
\b
:字边界
Code:
>>> s = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE';
>>> print (re.findall(r'\b[A-Z]+(?:\s+[A-Z]+)*\b', s))
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)