我正在使用Python函数urllib2.urlopen
阅读http://www.bad.org.uk/网站,但我不断收到 302 错误,即使当我访问该网站时它加载正常。有人知道为什么吗?
import socket
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' }
socket.setdefaulttimeout(10)
try:
req = urllib2.Request('http://www.bad.org.uk/', None, headers)
urllib2.urlopen(req)
return True # URL Exist
except ValueError, ex:
print 'URL: %s not well formatted' % 'http://www.bad.org.uk/'
return False # URL not well formatted
except urllib2.HTTPError, ex:
print 'The server couldn\'t fulfill the request for %s.' % 'http://www.bad.org.uk/'
print 'Error code: ', ex.code
return False
except urllib2.URLError, ex:
print 'We failed to reach a server for %s.' % 'http://www.bad.org.uk/'
print 'Reason: ', ex.reason
return False # URL don't seem to be alive
打印错误:
The server couldn't fulfill the request for http://www.bad.org.uk//site/1/default.aspx.
Error code: 302
页面位于http://www.bad.org.uk/当 cookie 被禁用时就会被破坏。
http://www.bad.org.uk/返回:
HTTP/1.1 302 Found
Location: http://www.bad.org.uk/DesktopDefault.aspx
Set-Cookie: Esperantus_Language_bad=en-GB; path=/
Set-Cookie: Esperantus_Language_rainbow=en-GB; path=/
Set-Cookie: PortalAlias=rainbow; path=/
Set-Cookie: refreshed=true; expires=Thu, 04-Nov-2010 16:21:23 GMT; path=/
Set-Cookie: .ASPXAUTH=; expires=Mon, 11-Oct-1999 23:00:00 GMT; path=/; HttpOnly
Set-Cookie: portalroles=; expires=Mon, 11-Oct-1999 23:00:00 GMT; path=/
如果我随后要求http://www.bad.org.uk/DesktopDefault.aspx without设置这些 cookie,它会给出另一个 302 和一个指向自身的重定向。
urllib2
忽略 cookie 并发送不带 cookie 的新请求,因此会导致该 URL 出现重定向循环。要处理这个问题,您需要添加一个 cookie 处理程序:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
response = opener.open('http://www.bad.org.uk')
print response.read()
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)