我的代码基于Scrapy Image Pipeline:如何重命名图像?我一周前测试过它,它对我自己的蜘蛛有效。
# This pipeline is designed for an item with multiple images
class ImagesWithNamesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
# values in field "image_name" must have suffix ".jpg"
# you can only change "image_name" to your own image name filed "images"
# however it should be a list
for (image_url, image_name) in zip(item[self.IMAGES_URLS_FIELD], item["image_names"]):
yield scrapy.Request(url=image_url, meta={"image_name": image_name})
def file_path(self, request, response=None, info=None):
image_name = request.meta["image_name"]
return image_name
以下是如何ImagePipeline
works:
管道将执行image_downloaded
-> get_images
-> file_path
为了。 (“->”表示调用)
-
image_downloaded
:保存图像get_images
通过调用返回persist_file
-
get_images
:将图像转换为JPEG
-
file_path
: 返回相对路径图像的
我扫了一遍ImagePipeline的源代码并没有找到用于重命名图像的特殊字段。 Scrapy 将以这种方式重命名它:
def file_path(self, request, response=None, info=None):
image_guid = hashlib.sha1(to_bytes(url)).hexdigest() # change to request.url after deprecation
return 'full/%s.jpg' % (image_guid)
因此我们应该重写方法file_path
。根据FilePipeline的源代码ImagePipeline继承了哪个,我们只需要返回相对路径 and persist_file
会把事情做好的。