如何从示例 HTML 中提取数据beautifulsoup
?
<Tag1>
<message code="able to extract text from here"/>
<text value="able to extract text that is here"/>
<htmlText><![CDATA[<p>some thing <lite>OR</lite>get exact data from here</p>]]></htmlText>
</Tag1>
我都尝试过.findall
and .get_text
,但是我无法从中提取文本值htmlText
元素。
预期输出:
some thing ORget exact data from here
您可以使用 BeautifulSoup 两次,首先提取htmlText
元素,然后解析内容。例如:
from bs4 import BeautifulSoup
import lxml
html = """
<Tag1>
<message code="able to extract text from here"/>
<text value="able to extract text that is here"/>
<htmlText><![CDATA[<p>some thing <lite>OR</lite>get exact data from here</p>]]></htmlText>
</Tag1>
"""
soup = BeautifulSoup(html, "lxml")
for tag1 in soup.find_all("tag1"):
cdata_html = tag1.htmltext.text
cdata_soup = BeautifulSoup(cdata_html, "lxml")
print(cdata_soup.p.text)
它将显示:
some thing ORget exact data from here
Note: lxml https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser还需要安装使用pip install lxml
。 BeautifulSoup 会自动导入它。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)