我使用 python 结合 selenium 创建了一个脚本来解析id
,vikey
and cbhtmlfragid
意味着在 post http 请求中使用时用作有效负载。因为我发现很难刮id
,vikey
and cbhtmlfragid
使用请求,我想使用硒来获取它们,以便我可以在发出发布请求时使用它们。
我正在尝试使用填充结果a
在旁边的输入框中Entity Name Or Identifier
。我可以注意到结果是通过我试图以编程方式实现的发布请求填充的。
网站链接
要填充结果,必须按顺序执行本节中的步骤image这最终导致了这个image
我尝试过:
import re
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://www.businessregistration.moc.gov.kh/'
post_url = 'https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/update.html?id={}'
payload = {
'QueryString': 'a',
'SourceAppCode': 'cambodia-br-soleproprietorships',
'OriginalVersionIdentifier': '',
'nodeW772-Advanced': 'N',
'_CBASYNCUPDATE_': 'true',
'_CBHTMLFRAGNODEID_': 'W762',
'_CBHTMLFRAGID_': '',
'_CBHTMLFRAG_': 'true',
'_CBNODE_': 'W778',
'_VIKEY_': '',
'_CBNAME_': 'buttonPush'
}
def get_content(wait,link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[data-rel='#appMainNavigation']"))).click()
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[class$='menu-soleproprietorships']"))).click()
elem = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[class$='menu-brSoleProprietorSearch']")))
driver.execute_script("arguments[0].click();",elem)
item_id = driver.current_url.split("id=")[1].split("&_timestamp")[0]
x_catalyst = re.findall(r"sessionId:'(.*?)',", str(driver.page_source), flags=re.DOTALL)[0]
item = re.findall(r"viewInstanceKey:'(.*?)',", str(driver.page_source), flags=re.DOTALL)[0]
elem = re.findall(r"guid:(.*?),", str(driver.page_source), flags=re.DOTALL)[0]
return item_id,x_catalyst,item,elem
def make_post_requests(item_id,x_catalyst,item,elem):
payload['_VIKEY_'] = item
payload['_CBHTMLFRAGID_'] = elem
res = requests.post(post_url.format(item_id),data=payload,headers={
'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
'x-catalyst-session-global':x_catalyst
})
soup = BeautifulSoup(res.text,"lxml")
result_count = soup.select_one("[class='appPagerBanner']")
print(result_count)
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
item_id,x_catalyst,item,elem = get_content(wait,link)
make_post_requests(item_id,x_catalyst,item,elem)
driver.quit()
当我执行上面的脚本时,我发现那里没有结果。所以,我想我走错地方了。
如何让我的脚本使用发布请求填充结果?