lxml cssselect 解析

2024-02-23

我有一个包含以下数据的文档:

<div class="ds-list">
    <b>1. </b> 
    A domesticated carnivorous mammal 
    <i>(Canis familiaris)</i> 
    related to the foxes and wolves and raised in a wide variety of breeds.
</div>

我想得到课堂上的一切ds-list(没有<b> and <i>标签)。目前我的代码是doc.cssselect('div.ds-list'),但所有这些都会出现在之前的换行符<b>。我怎样才能让它做我想做的事?


也许您正在寻找text_content方法?:

import lxml.html as lh
content='''\
<div class="ds-list">
    <b>1. </b> 
    A domesticated carnivorous mammal 
    <i>(Canis familiaris)</i> 
    related to the foxes and wolves and raised in a wide variety of breeds.
</div>'''
doc=lh.fromstring(content)
for div in doc.cssselect('div.ds-list'):
    print(div.text_content())

yields

1.  
A domesticated carnivorous mammal 
(Canis familiaris) 
related to the foxes and wolves and raised in a wide variety of breeds.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

lxml cssselect 解析 的相关文章

随机推荐