假设我有一个 HTML<p>
and <br>
里面有标签。之后,我将剥离 HTML 以清理标签。我怎样才能把它们变成换行符?
我正在使用Python的美丽汤 http://www.crummy.com/software/BeautifulSoup/图书馆,如果有帮助的话。
如果没有一些细节,很难确定这完全符合您的要求,但这应该给您一个想法......它假设您的 b 标签包含在 p 元素内。
from BeautifulSoup import BeautifulSoup
import six
def replace_with_newlines(element):
text = ''
for elem in element.recursiveChildGenerator():
if isinstance(elem, six.string_types):
text += elem.strip()
elif elem.name == 'br':
text += '\n'
return text
page = """<html>
<body>
<p>America,<br>
Now is the<br>time for all good men to come to the aid<br>of their country.</p>
<p>pile on taxpayer debt<br></p>
<p>Now is the<br>time for all good men to come to the aid<br>of their country.</p>
</body>
</html>
"""
soup = BeautifulSoup(page)
lines = soup.find("body")
for line in lines.findAll('p'):
line = replace_with_newlines(line)
print line
运行此结果...
(py26_default)[mpenning@Bucksnort ~]$ python thing.py
America,
Now is the
time for all good men to come to the aid
of their country.
pile on taxpayer debt
Now is the
time for all good men to come to the aid
of their country.
(py26_default)[mpenning@Bucksnort ~]$
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)