我购买了第一个 VPS,它运行 CentOS 7 64 位。在我今天开始使用这个 VPS 之前,我对 CentOS 7 的经验绝对为零,所以请对我宽容一点。
当尝试使用 Scrapy 和 Selenium 抓取一些动态生成的内容时,脚本最终失败,日志会抛出一个错误,内容如下:
DevToolsActivePort file doesn't exist
在日志的下一行,它会提取有关 Chrome WebDriver 的信息:
(Driver info: chromedriver=2.40.565383 ...
因此我怀疑这个问题与定位无关webdriver
.
我在下面包含了部分日志。当第一次查询 Selenium 时,脚本的执行总是开始挂起很长一段时间,然后最终失败,这就是为什么我没有包含日志的纯 Scrapy 部分。
倒数第二个答案,有 4 票这个线程 https://stackoverflow.com/questions/50642308/org-openqa-selenium-webdriverexception-unknown-error-devtoolsactiveport-file-d reads, “此错误消息意味着 ChromeDriver 无法启动/生成新的 Web 浏览器,即 Chrome 浏览器会话。”
我已经按照以下步骤安装了 Chrome 浏览器这些说明 https://tecadmin.net/install-google-chrome-in-centos-rhel-and-fedora/来自官方存储库。
Chrome 安装在/usr/bin/google-chrome
目录,而chromedriver
位于/usr/local/bin/
目录。两个目录都已添加到PATH
.
我尝试过搜索本非官方 Selenium 文档中的第 7.1 节例外 https://selenium-python.readthedocs.io/api.html?highlight=exception#module-selenium.common.exceptions与此错误有关的任何事情,但空手而归。
当我尝试通过 SSH 在 VPS 上启动 Google Chrome 时,收到一条错误消息: [83526:83526:0622/212649.156252:ERROR:zygote_host_impl_linux.cc(88)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
并链接到不再可用的页面...当我尝试使用以下命令打开 Chrome 时 --no-sandbox
参数,然后我得到错误: (google-chrome-stable:85573): Gtk-WARNING **: cannot open display:
[0622/221013.556327:ERROR:nacl_helper_linux.cc(310)] NaCl helper process running without a sandbox! Most likely you need to configure your SUID sandbox correctly
.
我的代码没有任何问题,尽管我无论如何都会将其包含在下面。我的脚本在我自己的计算机上本地运行良好。
这是怎么回事?我现在很茫然。任何帮助将不胜感激!
我还没有尝试弄乱选项参数webdriver.Chrome(...)
,但我计划在发布完这个问题后立即尝试这个。
以上只是我尝试纠正这种情况的一些方法。
问题开始时的日志的一部分
2018-06-22 20:31:22 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:41533/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "platformName": "any", "goog:chromeOptions": {"extensions": [], "args": []}}}, "desiredCapabilities": {"browserName": "chrome", "version": "", "platform": "ANY", "goog:chromeOptions": {"extensions": [], "args": []}}}
2018-06-22 20:32:22 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2018-06-22 20:32:22 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.amazon.ca/b/ref=sr_aj?node=2055586011> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/bldsprt/public_html/spiders/selen.py", line 53, in parse
self.driver = webdriver.Chrome('/usr/local/bin/chromedriver')
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: DevToolsActivePort file doesn't exist
(Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 3.10.0-862.3.3.el7.x86_64 x86_64)
2018-06-22 20:32:23 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2018-06-22 20:32:23 [scrapy.core.engine] INFO: Closing spider (finished)
2018-06-22 20:32:23 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 310,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 111488,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 6, 23, 0, 32, 23, 3101),
'log_count/DEBUG': 4,
'log_count/ERROR': 1,
'log_count/INFO': 8,
'memusage/max': 54161408,
'memusage/startup': 46567424,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'spider_exceptions/WebDriverException': 1,
'start_time': datetime.datetime(2018, 6, 23, 0, 31, 21, 19959)}
2018-06-22 20:32:23 [scrapy.core.engine] INFO: Spider closed (finished)
[root@host spiders]#
脚本的一部分
self.driver = (executable_path='../../../../usr/local/bin/chromedriver')
self.driver.get(response.url)
self.driver.set_window_size(960, 540)
self.driver.wait = WebDriverWait(self.driver, 10)
next = self.driver.find_element_by_xpath('//a[@id="pagnNextLink"]')
href = next.get_attribute('href')
self.driver.quit()