我想删除整个链接:
https://www.linkedin.com/in/ACoAAAJv1l4BATlBOVqhEEaqrVNojJPWnID9Nk0
当链接包含ACo
正则表达式应该从我的模式中删除整个链接。
regex2 = re.compile(r"\bhttps?://www.linkedin.com/in/\b[^in]+")
由于某种原因,我没有让这个工作,想法是当链接的行为以“ACo”(大写 A 和大写 C)开头时删除/in/
我们有 4 个链接,我只想打印,https://www.linkedin.com/in/joao1
and https://www.linkedin.com/in/joao2
.
unique_hrefs = ['https://www.linkedin.com/in/joao1','https://www.linkedin.com/in/joao2','https://www.linkedin.com/in/ACoAAAI3JyABlHv1LxXa27GHFneEbdrqAtMu9eY','https://www.linkedin.com/in/ACoAABWYG0kB8IXhFzDTCFGOwAZ18YbXprOLcmg']
regex = re.compile(r"\bhttps?://www.linkedin.com/in/\b[^in]+")
regex2 = re.compile(r"""\bhttps?://www\.linkedin\.com/in/ACo[^<>"'\s]*""")
filtered = [i for i in unique_hrefs if regex.search(i) and regex2.search(i)]
for i in filtered:
print(i)
Use
import re
unique_hrefs = ['https://www.linkedin.com/in/joao1','https://www.linkedin.com/in/joao2','https://www.linkedin.com/in/ACoAAAI3JyABlHv1LxXa27GHFneEbdrqAtMu9eY','https://www.linkedin.com/in/ACoAABWYG0kB8IXhFzDTCFGOwAZ18YbXprOLcmg']
pattern = re.compile(r'https?://www\.linkedin\.com/in/ACo')
results = list(filter(lambda x: not pattern.match(x), unique_hrefs))
print(results)
See Python证明.
Results: ['https://www.linkedin.com/in/joao1', 'https://www.linkedin.com/in/joao2']
.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)