这是我第一次使用 Python 编程,我正在尝试登录this网页。经过搜索,我发现很多人建议使用mechanize
。为了确保我在开始编码之前正确设置,我下载了mechanize
从网站上下载 zip 并将我的 python 脚本放在解压缩的 mechanize 文件夹中。
到目前为止,我使用我发现的不同示例编写了此代码:
import mechanize
theurl = 'http://voyager.umeres.maine.edu/Login'
mech = mechanize.Browser()
mech.open(theurl)
mech.select_form(nr=0)
mech["userid"] = "MYUSERNAME"
mech["password"] = "MYPASSWORD"
results = mech.submit().read()
f = file('test.html', 'w')
f.write(results)
f.close()
From looking at the source of the webpage I believe the userid/password are the correct names for the form. When I run the script in IDLE
I get a bunch of errors including a time out error and a robot error. The full traceback:
I'm not exactly sure what I should expect either even if the code works. The login is for my school email which has class folders as well. My end game for what i'm trying to accomplish is once I log into my account I wanted to parse some folders for information and store them in a file that can be later converted in to json or RSS feed, but this is much further down the road with a much better understanding of Python just trying to give a more clear idea of what I want to accomplish.
问题是 Mechanize 尊重 robots.txt
你必须把它关掉。
解决方案:
mech = mechanize.Browser()
// needs to be set before you call open
mech.set_handle_robots(False)
编辑:该网站似乎正在使用某种附加的 POST 值
通过 Javascript 生成。重新创建自己可能会很痛苦,请检查页面的源代码以了解发生了什么。
实际发送的 POST 值:
challenge [a14b1f67-11edcc01]
charset UTF-8
login Login
origurl /Login/
password
savedpw 0
sha1 3f77d1e8c2ab0470ef8005a85f5f9c0d7aeedba6
userid sdsads
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)