我对抓取非常陌生,有一个问题。我正在抓取世界计量仪的新冠数据。因为它是动态的——我用硒来做。
代码如下:
from selenium import webdriver
import time
URL = "https://www.worldometers.info/coronavirus/"
# Start the Driver
driver = webdriver.Chrome(executable_path = r"C:\Webdriver\chromedriver.exe")
# Hit the url and wait for 10 seconds.
driver.get(URL)
time.sleep(10)
#find class element
data= driver.find_elements_by_class_name("odd" and "even")
#for loop
for d in data:
country=d.find_element_by_xpath(".//*[@id='main_table_countries_today']").text
print(country)
电流输出:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//*[@id='main_table_countries_today']"}
(Session info: chrome=96.0.4664.45)
刮擦表内世界计量仪新冠数据你需要诱导WebDriver等待为了元素可见性()并使用数据框 from Pandas你可以使用以下内容定位策略:
代码块:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
options = Options()
options.add_argument("start-maximized")
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.worldometers.info/coronavirus/")
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#main_table_countries_today"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()
控制台输出:
[ # Country,Other TotalCases NewCases ... Deaths/1M pop TotalTests Tests/ 1M pop Population
0 NaN World 264359298 632349.0 ... 673.3 NaN NaN NaN
1 1.0 USA 49662381 89259.0 ... 2415.0 756671013.0 2267182.0 3.337495e+08
2 2.0 India 34609741 3200.0 ... 336.0 643510926.0 459914.0 1.399198e+09
3 3.0 Brazil 22118782 12910.0 ... 2865.0 63776166.0 297051.0 2.146975e+08
4 4.0 UK 10329074 53945.0 ... 2124.0 364875273.0 5335159.0 6.839070e+07
.. ... ... ... ... ... ... ... ... ...
221 221.0 Samoa 3 NaN ... NaN NaN NaN 2.002800e+05
222 222.0 Saint Helena 2 NaN ... NaN NaN NaN 6.103000e+03
223 223.0 Micronesia 1 NaN ... NaN NaN NaN 1.167290e+05
224 224.0 Tonga 1 NaN ... NaN NaN NaN 1.073890e+05
225 NaN Total: 264359298 632349.0 ... 673.3 NaN NaN NaN
[226 rows x 15 columns]]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)