import mechanize
br = mechanize.Browser()
br.open("http://www.example.com/")
# follow second link with element text matching regular expression
html_response = br.follow_link(text_regex=r"cheese\s*shop", nr=1)
print br.title()
print html_response
美丽汤允许非常轻松地解析 html 内容(您可以使用 mechanize 获取),并支持正则表达式。
一些示例代码:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html_response)
rows = soup.findAll('tr')
for r in rows[2:]: #ignore first two rows
cols = r.findAll('td')
print cols[0].renderContents().strip() #print content of first column