问题:
怎样才能代理scrapy https://scrapy.org/请求与socks5
?
我知道我可以使用polipo https://www.irif.fr/~jch/software/polipo/ to convert Socks代理至Http Proxy https://www.codevoila.com/post/16/convert-socks-proxy-to-http-proxy-using-polipo
But:
我想设置一个中间件或进行一些更改scrapy.Request
import scrapy
class BaseSpider(scrapy.Spider):
"""a base class that implements major functionality for crawling application"""
start_urls = ('https://google.com')
def start_requests(self):
proxies = {
'http': 'socks5://127.0.0.1:1080',
'https': 'socks5://127.0.0.1:1080'
}
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={'proxy': proxies} # proxy should be string not dict
)
def parse(self, response):
# do ...
pass
我应该分配什么proxies
多变的?
有可能的。
Socks5 的 HTTP 代理
Install python 代理 https://github.com/qwj/python-proxy
$ pip3 install pproxy
Run
$ pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv
使用 HTTP 代理的 Scrapy
创建中间件(middlewares.py
)
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = "http://127.0.0.1:8181"
将其分配给DOWNLOADER_MIDDLEWARES
(settings.py
)
DOWNLOADER_MIDDLEWARES = {
'PROJECT_NAME_HERE.middlewares.ProxyMiddleware': 350
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)