如果你的元素包含only文本,使用.string属性 http://www.crummy.com/software/BeautifulSoup/bs4/doc/#string:
headline = soup.find(class_='cd__headline-text')
print(headline.string)
如果包含其他标签,您可以获取当前元素中包含的所有文本以及进一步的文本,也可以仅获取当前元素中的特定文本。
The element.get_text()功能 http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text将递归并收集元素和子元素中的所有字符串,将它们与您选择的字符串(默认为空字符串)连接起来,并删除或不删除空格。
要仅获取特定字符串,您可以迭代.strings or .stripped_strings发电机 http://www.crummy.com/software/BeautifulSoup/bs4/doc/#strings-and-stripped-strings,或使用元素内容 http://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children访问所有包含的元素,然后挑选出NavigableString
type.
使用您的示例进行演示:
>>> from bs4 import BeautifulSoup
>>> markup = '<span class="cd__headline-text">Is this model too thin for Yves Saint Laurent? </span>'
>>> soup = BeautifulSoup(markup)
>>> headline = soup.find(class_='cd__headline-text')
>>> print headline.string
Is this model too thin for Yves Saint Laurent?
>>> print list(headline.strings)
[u'Is this model too thin for Yves Saint Laurent? ']
>>> print list(headline.stripped_strings)
[u'Is this model too thin for Yves Saint Laurent?']
>>> print headline.get_text()
Is this model too thin for Yves Saint Laurent?
>>> print headline.get_text(strip=True)
Is this model too thin for Yves Saint Laurent?
并添加了一个附加元素:
>>> markup = '<span class="cd__headline-text">Is this model <em>too thin</em> for Yves Saint Laurent? </span>'
>>> soup = BeautifulSoup(markup)
>>> headline = soup.find(class_='cd__headline-text')
>>> headline.string is None
True
>>> print list(headline.strings)
[u'Is this model ', u'too thin', u' for Yves Saint Laurent? ']
>>> print list(headline.stripped_strings)
[u'Is this model', u'too thin', u'for Yves Saint Laurent?']
>>> print headline.get_text()
Is this model too thin for Yves Saint Laurent?
>>> print headline.get_text(' - ', strip=True)
Is this model - too thin - for Yves Saint Laurent?
>>> headline.contents
[u'Is this model ', <em>too thin</em>, u' for Yves Saint Laurent? ']
>>> from bs4 import NavigableString
>>> [el for el in headline.children if isinstance(el, NavigableString)]
[u'Is this model ', u' for Yves Saint Laurent? ']