我正在使用 scrapy 来消费来自 RabbitMQ 的消息(url),但是当我使用yield 调用将我的 url 作为参数传递的解析方法时。该程序不在回调方法中。下面是我的以下代码蜘蛛
# -*- coding: utf-8 -*-
import scrapy
import pika
from scrapy import cmdline
import json
class MydeletespiderSpider(scrapy.Spider):
name = 'Mydeletespider'
allowed_domains = []
start_urls = []
def callback(self,ch, method, properties, body):
print(" [x] Received %r" % body)
body=json.loads(body)
url=body.get('url')
yield scrapy.Request(url=url,callback=self.parse)
def start_requests(self):
cre = pika.PlainCredentials('test', 'test')
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='10.0.12.103', port=5672, credentials=cre, socket_timeout=60))
channel = connection.channel()
channel.basic_consume(self.callback,
queue='Deletespider_Batch_Test',
no_ack=True)
print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()
def parse(self, response):
print response.url
pass
cmdline.execute('scrapy crawl Mydeletespider'.split())
我的目标是将 url 响应传递给解析方法
要使用来自rabbitmq的url,你可以看看scrapy-rabbitmq https://github.com/roycehaynes/scrapy-rabbitmq包裹:
Scrapy-rabbitmq 是一个工具,可让您使用 Scrapy 框架通过 Scrapy 蜘蛛从 RabbitMQ 提供 URL 并对其进行排队。
要启用它,请在您的settings.py
:
# Enables scheduling storing requests queue in rabbitmq.
SCHEDULER = "scrapy_rabbitmq.scheduler.Scheduler"
# Don't cleanup rabbitmq queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True
# Schedule requests using a priority queue. (default)
SCHEDULER_QUEUE_CLASS = 'scrapy_rabbitmq.queue.SpiderQueue'
# RabbitMQ Queue to use to store requests
RABBITMQ_QUEUE_NAME = 'scrapy_queue'
# Provide host and port to RabbitMQ daemon
RABBITMQ_CONNECTION_PARAMETERS = {'host': 'localhost', 'port': 6666}
# Bonus:
# Store scraped item in rabbitmq for post-processing.
# ITEM_PIPELINES = {
# 'scrapy_rabbitmq.pipelines.RabbitMQPipeline': 1
# }
在你的蜘蛛中:
from scrapy import Spider
from scrapy_rabbitmq.spiders import RabbitMQMixin
class RabbitSpider(RabbitMQMixin, Spider):
name = 'rabbitspider'
def parse(self, response):
# mixin will take urls from rabbit queue by itself
pass
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)