我试图使用 python-requests 库抓取此页面
import requests
from lxml import etree,html
url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'
r = requests.get(url)
tree = etree.HTML(r.text)
print tree
但我得到了上述错误。 (重定向过多)
我尝试使用allow_redirects
参数但同样的错误
r = requests.get(url, allow_redirects=True)
我什至尝试发送标头和数据以及网址,但我不确定这是否是正确的方法。
headers = {'content-type': 'text/html'}
payload = {'ie':'UTF8','node':'976419031'}
r = requests.post(url,data=payload,headers=headers,allow_redirects=True)
如何解决此错误。出于好奇,我什至尝试过 beautiful-soup4 ,但出现了不同但相同类型的错误
page = BeautifulSoup(urllib2.urlopen(url))
urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Permanently