Python 请求和 __doPostBack 函数

2024-01-14

我一整天都在为此苦苦挣扎。我需要从一个网站上抓取数据,该网站有一个按钮,您需要单击该按钮才能查看数据。按钮本身调用了 ASP.NET 网站使用的这个著名的 __dopostback() javascript 函数

<a id="ContentPlaceHolder1_lbCoach" class="btn btn-dark-blue" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$lbCoach','')"><i class="fa fa-eye"></i>&nbsp;Display HS Coach Info</a>

As this https://stackoverflow.com/a/5157699/3357517答案表明,我应该模仿发布请求的行为,并且应该取回数据,我通过以下方式做到了这一点:

VIEWSTATE = soup.find('input',{'id':'__VIEWSTATE'}).get('value')
EVENTVALIDATION = soup.find('input',{'id':'__EVENTVALIDATION'}).get('value')
headers = {'Cache-Control': 'no-cache',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Language': 'en-US,en;q=0.5',
       'X-Requested-With': 'XMLHttpRequest',
       'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
       'Referer': contact_url,
       'X-MicrosoftAjax': 'Delta=true'}
payload = {"ctl00$ToolkitScriptManager2":"ctl00$ContentPlaceHolder1$updCoach|ctl00$ContentPlaceHolder1$lbCoach",
       "ToolkitScriptManager2_HiddenField":"",
       "ctl00$Header1$Menu1$txtSearchBox": "",
       "ctl00$Header1$Menu1$txtSearchBox2": "",
       "__EVENTTARGET":"ctl00$ContentPlaceHolder1$lbDisplayContact",
       "__EVENTARGUMENT":"",
       "__VIEWSTATE":VIEWSTATE,
       "__SCROLLPOSITIONX":"0",
       "__SCROLLPOSITIONY":"0",
       "__EVENTVALIDATION":EVENTVALIDATION,
       "__ASYNCPOST": "true",
       }
r = s.post(contact_url,headers = headers, data=payload)
page_content = r.content.decode()
soup = BeautifulSoup(page_content, "html.parser")

响应似乎很好,但我得到的并没有什么特别的:

b'1|#||4|40|updatePanel|ContentPlaceHolder1_Bio1_udpAdminMenu|\r\n                    \r\n                |0|hiddenField|__EVENTTARGET||0|hiddenField|__EVENTARGUMENT||16992|hiddenField|__VIEWSTATE||1|hiddenField|__SCROLLPOSITIONX|0|1|hiddenField|__SCROLLPOSITIONY|0|292|hiddenField|__EVENTVALIDATION|/wEdAAxsD18kXuyPL5ofgcnYES9y+7zziCikaDB50o6O1pxxXbDWcw39S27yDoDwzfIvSl/82S52cVbB2NeFUXKE4Mx+O+TegoiNwQAdWnT22jPmzI4v73G0IN877PxHm4GlN3cV9hFWoAb20O4Q+9Ls96AskeglIWLjtf4N+HDDRWBUXzFl5Dm8D+CLbHmC0vzJAV2dMNOfX5+XKgQp7nrLXr1R1UFtN09quhqZEMqLAngnkseO4VALrQwmvGPQfIrd43K9AvIrswshyn58y8V7WKC8hka6Yg==|0|asyncPostBackControlIDs|||0|postBackControlIDs|||285|updatePanelIDs||tctl00$ContentPlaceHolder1$Bio1$udpAdminMenu,ContentPlaceHolder1_Bio1_udpAdminMenu,tctl00$ContentPlaceHolder1$udpAddress,ContentPlaceHolder1_udpAddress,tctl00$ContentPlaceHolder1$updCoach,ContentPlaceHolder1_updCoach,tctl00$ContentPlaceHolder1$updDetails,ContentPlaceHolder1_updDetails|0|childUpdatePanelIDs|||81|panelsToRefreshIDs||ctl00$ContentPlaceHolder1$Bio1$udpAdminMenu,ContentPlaceHolder1_Bio1_udpAdminMenu|2|asyncPostBackTimeout||90|48|formAction||./PlayerProfile_ContactInfo.aspx?ID=J34665D097ED|'

当我使用 Fiddler 时,请求和响应,单击实际按钮后的请求和响应和代码中的请求和响应似乎是相同的。

请求数据

响应数据

最有趣的部分,同样的请求,通过 Chrome 开发工具查看正常渲染并代替\r\n \r\n从之前的响应中,现在您可以看到整个 html,以及所有附加数据

有没有可能,我实际上正在获取数据,但不知道如何呈现它?


要抓取这种类型的网页,您必须按F12键,然后转到“网络”选项卡,最后单击其中一个页面来更改您的页面,您可以看到所有请求。

可能第一个请求是更改页面。

点击它。 您必须从页面中获取所有可用字段,并将它们与 Python 请求一起提交以更改页面。

我执行了下面照片的步骤

Python代码:

import requests,re
firstPage = requests.get('https://Site/Your-Page').text
soup = BeautifulSoup(firstPage,'html.parser')
VIEWSTATEGENERATOR  = soup.find('input',{'id':'__VIEWSTATEGENERATOR'}).get('value')
VIEWSTATE  = soup.find('input',{'id':'__VIEWSTATE'}).get('value')
for page in range(1,5):  ### I read pages one to four ###
    data = {
        "__EVENTTARGET": "rptPager$ctl0{}$lnkPage".format(page),
        "__EVENTARGUMENT": "",
        "__LASTFOCUS": "",
        "__VIEWSTATE":VIEWSTATE,
        "ddlPageSize": 24,
        "__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR
    }
    res = requests.post('https://Site/Your-Page',data=data).content.decode('utf8')
    print(res)

此代码只能获取前几页,如果需要其他页面,则必须在 Python 请求中添加另一个参数



___________________________________________
if parameter = 1
page number:    01 02 03 04 05 06 07 08 09
                 ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  "refers to"
pagination: prev 1  2  3  4  5  6  7  8  9 next
-------------------------------------------
if parameter = 2
page number:    10 11 12 13 14 15 16 17 18
                 ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑ 
pagination: prev 1  2  3  4  5  6  7  8  9 next
-------------------------------------------
if parameter = 3
page number:    19 20 21 22 23 24 25 26 27
                 ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑ 
pagination: prev 1  2  3  4  5  6  7  8  9 next
___________________________________________
  

上述参数可能位于“__VIEWSTATE”内部或...

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Python 请求和 __doPostBack 函数 的相关文章

随机推荐