我试图在 Selenium 和 Python 中找到损坏的链接,但在代码中出现错误:
import requests
from selenium import webdriver
chrome_driver_path = "D:\\drivers\\chromedriver.exe"
driver=webdriver.Chrome(chrome_driver_path)
driver.get('https://google.co.in/')
links = driver.find_elements_by_css_selector("a")
images = driver.find_elements_by_css_selector("img")
for link in links:
r = requests.head(link.get_attribute('href')
print(r.status_code == 200)
无法在页面上找到损坏的链接是否还有其他解决方案?
Getting:
引发 MaxRetryError(_pool, url, error 或 ResponseError(cause))
urllib3.exceptions.MaxRetryError:
HTTPSConnectionPool(主机='myaccount.google.com',端口=443):最大
url 重试次数超出:/?utm_source=OGB&utm_medium=app(由
SSLError(SSLEOFError(8, 'EOF 发生违反协议
(_ssl.c:777)'),))
在处理上述异常的过程中,又出现了一个异常:
self._sslobj.do_handshake() ssl.SSLEOFError: EOF 发生违规
协议 (_ssl.c:777)
在处理上述异常的过程中,又出现了一个异常:
回溯(最近一次调用最后一次):
要查找页面上链接的状态,您可以使用以下解决方案:
-
代码块:
import requests
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://google.co.in/')
links = driver.find_elements_by_css_selector("a")
for link in links:
r = requests.head(link.get_attribute('href'))
print(link.get_attribute('href'), r.status_code)
-
控制台输出:
https://mail.google.com/mail/?tab=wm 302
https://www.google.co.in/imghp?hl=en&tab=wi 200
https://www.google.co.in/intl/en/options/ 301
https://myaccount.google.com/?utm_source=OGB&utm_medium=app 302
https://www.google.co.in/webhp?tab=ww 200
https://maps.google.co.in/maps?hl=en&tab=wl 302
https://www.youtube.com/?gl=IN 200
https://play.google.com/?hl=en&tab=w8 302
https://news.google.co.in/nwshp?hl=en&tab=wn 301
https://mail.google.com/mail/?tab=wm 302
https://www.google.com/contacts/?hl=en&tab=wC 302
https://drive.google.com/?tab=wo 302
https://www.google.com/calendar?tab=wc 302
https://plus.google.com/?gpsrc=ogpy0&tab=wX 302
https://translate.google.co.in/?hl=en&tab=wT 200
https://photos.google.com/?tab=wq&pageId=none 302
https://www.google.co.in/intl/en/options/ 301
https://docs.google.com/document/?usp=docs_alc 302
https://books.google.co.in/bkshp?hl=en&tab=wp 200
https://www.blogger.com/?tab=wj 405
https://hangouts.google.com/ 302
https://keep.google.com/ 302
https://earth.google.com/web/ 200
https://www.google.co.in/intl/en/options/ 301
https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.co.in/ 200
https://www.google.co.in/webhp?hl=en&sa=X&ved=0ahUKEwj0qNPqnqHbAhXYdn0KHXpeAo0QPAgD 200
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)