我正在使用编写一些自动化软件selenium==3.141.0
, python 3.6.7
, chromedriver 2.44
.
大多数逻辑可以由单个浏览器实例执行,但对于某些部分,我必须启动 10-20 个实例才能获得不错的执行速度。
一旦涉及到执行的部分ThreadPoolExecutor
,浏览器交互开始抛出此错误:
WARNING|05/Dec/2018 17:33:11|connectionpool|_put_conn|274|Connection pool is full, discarding connection: 127.0.0.1
WARNING|05/Dec/2018 17:33:11|connectionpool|urlopen|662|Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))': /session/119df5b95710793a0421c13ec3a83847/url
WARNING|05/Dec/2018 17:33:11|connectionpool|urlopen|662|Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcee7ada048>: Failed to establish a new connection: [Errno 111] Connection refused',)': /session/119df5b95710793a0421c13ec3a83847/url
浏览器设置:
def init_chromedriver(cls):
try:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument(f"user-agent={Utils.get_random_browser_agent()}")
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(driver_paths['chrome'],
chrome_options=chrome_options,
service_args=['--verbose', f'--log-path={bundle_dir}/selenium/chromedriver.log'])
driver.implicitly_wait(10)
return driver
except Exception as e:
logger.error(e)
相关代码:
ProfileParser
实例化一个网络驱动程序并执行一些页面交互。我认为交互本身并不相关,因为一切都可以在没有ThreadPoolExecutor
。
然而,简而言之:
class ProfileParser(object):
def __init__(self, acc):
self.driver = Utils.init_chromedriver()
def __exit__(self, exc_type, exc_val, exc_tb):
Utils.shutdown_chromedriver(self.driver)
self.driver = None
collect_user_info(post_url)
self.driver.get(post_url)
profile_url = self.driver.find_element_by_xpath('xpath_here')]').get_attribute('href')
当运行时ThreadPoolExecutor
,此时出现上面的错误self.driver.find_element_by_xpath
or at self.driver.get
这是工作:
with ProfileParser(acc) as pparser:
pparser.collect_user_info(posts[0])
这些选项不起作用: (connectionpool errors
)
futures = []
#one worker, one future
with ThreadPoolExecutor(max_workers=1) as executor:
with ProfileParser(acc) as pparser:
futures.append(executor.submit(pparser.collect_user_info, posts[0]))
#10 workers, multiple futures
with ThreadPoolExecutor(max_workers=10) as executor:
for p in posts:
with ProfileParser(acc) as pparser:
futures.append(executor.submit(pparser.collect_user_info, p))
UPDATE:
我找到了一个临时解决方案(这不会使这个最初的问题无效) - 实例化一个webdriver
在外面ProfileParser
班级。不知道为什么它有效,但最初却不起作用。我想是某些语言细节的原因?
感谢您的回答,但问题似乎不是出在ThreadPoolExecutor
max_workers
限制 - 正如您在其中一个选项中看到的那样,我尝试提交单个实例,但它仍然不起作用。
目前的解决方法:
futures = []
with ThreadPoolExecutor(max_workers=10) as executor:
for p in posts:
driver = Utils.init_chromedriver()
futures.append({
'future': executor.submit(collect_user_info, driver, acc, p),
'driver': driver
})
for f in futures:
f['future'].done()
Utils.shutdown_chromedriver(f['driver'])