为什么我的 scrapy 蜘蛛没有遵循我的项目解析函数中的请求回调?

2024-03-29

我正在抓取一个网站来检查各种产品的库存状态。不幸的是,这需要实际单击产品页面上的“添加到购物车”并检查下一页的消息以确定是否有库存(即它需要解析两个响应)。

我跟着优秀的文档 http://doc.scrapy.org/en/latest/topics/request-response.html#passing-additional-data-to-callback-functions对于这种情况,并编写了我的解析函数来返回Request带有对我的辅助解析函数的回调的对象。然而,这个函数很少被调用。大多数产品只会在日志中看到“退货请求之前”,但一小部分产品确实会正确调用它。

知道这里出了什么问题吗?我已经没有主意了。

foo/spiders/atlantic_firearms_spider.py:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.http import FormRequest
from foo.items import AtlanticFirearmsItem

import datetime
import re

class AtlanticFirearmsSpider(CrawlSpider):
    name = "atlantic_firearms"
    allowed_domains = ["atlanticfirearms.com"]
    start_urls = [
        "http://www.atlanticfirearms.com"
    ]

    rules = (
        Rule(SgmlLinkExtractor(allow=['detail.html']), callback='parse_product'),
        Rule(SgmlLinkExtractor(allow=[], deny=['/bro', '/news', '/howtobuy', '/component/search', 'askquestion'])),
    )

    def parse_product(self, response):
      hxs = HtmlXPathSelector(response)
      product = AtlanticFirearmsItem()
      add_to_cart = any([hxs.select("descendant-or-self::input[@name = 'addtocart']"),
                         hxs.select("descendant-or-self::input[@value = 'Add to Cart']"),
                         hxs.select("//a[text() = 'Add to Cart']")])
      product['url'] = response.url
      product['as_of_time'] = datetime.datetime.now()

      if add_to_cart:
          # attempt to add to cart to verify availability
          request = FormRequest.from_response(response, formname="addtocartForm", callback=self.parse_add_to_cart)
          request.meta['product'] = product
          print "Before return request"
          return request
      else:
          product['in_stock'] = False
          return product

    def parse_add_to_cart(self, response):
        print "Inside parse_add_to_cart"
        product = response.meta['product']
        hxs = HtmlXPathSelector(response)
        product['in_stock'] = not(hxs.select("//text()[contains(.,'We regret to inform you that this product')]"))
        return product

foo/items.py:

from scrapy.item import Item, Field

class AtlanticFirearmsItem(Item):
    in_stock = Field()
    url = Field()
    as_of_time = Field()

Edit:按要求添加日志文件:

2013-09-21 07:25:14-0500 [scrapy] INFO: Scrapy 0.18.2 started (bot: foo)
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Optional features available: ssl, http11
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Overridden settings: {'SPIDER_MODULES': ['foo.spiders'], 'BOT_NAME': 'foo'}
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRef
reshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Spider opened
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com> (referer: None)
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cloudflare.com': <GET http://www.cloudflare.com/email-protection>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.constantcontact.com': <GET http://www.constantcontact.com/jmml/email-marketing.jsp>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.fdicreative.com': <GET http://www.fdicreative.com/>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redjacketfirearms.com': <GET https://www.redjacketfirearms.com/>
2013-09-21 07:25:17-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-ammunition-45acp-500-round-case-detail.
html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.h
tml?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Filtered duplicate request: <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pisto
l-9mm-detail.html?Itemid=0> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-223-ar15-rifle-ammo-500-round-case-deta
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/us-palm-air-save-plate-carrier-detail.html?I
temid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/545-x-39-russian-ak74-ammo-1080-round-case-d
etail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/red-army-standard-7-62x39mm-360-round-range-
pack-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-mp5-style-rifle-detail.html?Itemid=0> (
referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/wolf-ammunition-for-sale-ak47-detail.html?Item
id=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/dsa-zm4-flat-top-ar15-carbine-dszm4cv1r-detail.html
?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/m92-ak47-yugoslavian-7-62x39mm-bolt-hold-open-
metal-mags-pack-of-two-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-v94-9mm-mp5-style-pistol-full-size-deta
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/zastava-ak-47-m70b1-pap-7-62x39mm-rifles-w-2-hi-cap
-mags-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ptr-91-gi-rifle-939-atlanticfirearms-com-detail.htm
l?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/pap-m92-7-62x39-pistol-detail.html?Itemid=0> (refer
er: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/content/article/86-static-pages/159-resources.html> (referer: http://www.atlan
ticfirearms.com)
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atsconsultingcorp.com': <GET http://www.atsconsultingcorp.com/>                                  [52/1905]
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bullseyemarket.com': <GET http://www.bullseyemarket.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.corilam.com': <GET http://www.corilam.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'chancebrownrealestate.com': <GET http://chancebrownrealestate.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.delsolservices.com': <GET http://www.delsolservices.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.elkhornoutfitters.com': <GET http://www.elkhornoutfitters.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.frontierlogistics.com': <GET http://www.frontierlogistics.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gpstrackingkey.com': <GET http://www.gpstrackingkey.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.hanshawkennedy.com': <GET http://www.hanshawkennedy.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'worldenv.com': <GET http://worldenv.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'purgexonline.com': <GET http://purgexonline.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bumpfirestocks.com': <GET http://bumpfirestocks.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texrestaurantequipment.com': <GET http://www.texrestaurantequipment.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.houston-refinance.com': <GET http://www.houston-refinance.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'johnson-bryan.com': <GET http://johnson-bryan.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'kanesforms.com': <GET http://kanesforms.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.markfoxrealestate.com': <GET http://www.markfoxrealestate.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.mphoa.org': <GET http://www.mphoa.org/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.outfitterwebsites.com': <GET http://www.outfitterwebsites.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'outdoortrailsnetwork.com': <GET http://outdoortrailsnetwork.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.psychologicalriskservices.com': <GET http://www.psychologicalriskservices.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rcshouston.com': <GET http://www.rcshouston.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rollingcreekcarwash.com': <GET http://www.rollingcreekcarwash.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'slammc.com': <GET http://slammc.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texassaltwaterfishingguide.com': <GET http://www.texassaltwaterfishingguide.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.waynepigment.com': <GET http://www.waynepigment.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bancroftfeldman.com': <GET http://bancroftfeldman.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'elilanddesign.com': <GET http://elilanddesign.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpharms.com': <GET http://dpharms.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'contractlandstaff.com': <GET http://contractlandstaff.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'knightsplumbing.com': <GET http://knightsplumbing.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ati-omni-5-56-poly-competition-m4-carbine-detail.ht
ml?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-accessories/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/dallas-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-accessories/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/component/content/?Itemid=803&id=148> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/houston-texas-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/california-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/browse-our-products.html> (referer: http://www.atlanticfirearms.com/component/virtuemart
/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0)
Inside parse_add_to_cart
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Scraped from <200 http://www.atlanticfirearms.com/browse-our-products.html>
        {'as_of_time': datetime.datetime(2013, 9, 21, 7, 25, 18, 365559),
         'in_stock': True,
         'url': 'http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0'}
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/www.atlanticfirearms.com> (referer: http://www.atlanticfirearms.com/dallas-gun-shop.html
)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/login-or-register/editaddress.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/privacy-policy.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/subscribe.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/links.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunbroker.com': <GET http://www.gunbroker.com/user/dealernetwork.asp>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.auctionarms.com': <GET http://www.auctionarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunsamerica.com': <GET http://www.gunsamerica.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ar15.com': <GET http://www.ar15.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.olyarms.com': <GET http://www.olyarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cheaperthandirt.com': <GET http://www.cheaperthandirt.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ammoman.com': <GET http://www.ammoman.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.net': <GET http://www.ak47.net/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atf.treas.gov': <GET http://www.atf.treas.gov/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'caag.state.ca.us': <GET http://caag.state.ca.us/firearms/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.nra.org': <GET http://www.nra.org/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.masterpiecearms.com': <GET http://www.masterpiecearms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'atlantic1.readyhosting.com': <GET http://atlantic1.readyhosting.com/programming/listview.asp?CatId=2>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vulcanarmament.com': <GET http://www.vulcanarmament.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bushmaster.com': <GET http://www.bushmaster.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rockriverarms.com': <GET http://www.rockriverarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpmsinc.com': <GET http://dpmsinc.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.colt.com': <GET http://www.colt.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.armalite.com': <GET http://www.armalite.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redstick-firearms.com': <GET http://www.redstick-firearms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vectorarms.com': <GET http://www.vectorarms.com/indexframe.html>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.arsenalinc.com': <GET http://www.arsenalinc.com/about.htm>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.com': <GET http://www.ak47.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.jldenter.com': <GET http://www.jldenter.com/store/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.springfield-armory.com': <GET http://www.springfield-armory.com/index.shtml>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.dsarms.com': <GET http://www.dsarms.com/>
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT twice, forcing unclean shutdown

发布我之前的评论作为答案。

由于您所有的 POST 请求(来自FormRequest.from_response()) 被重定向到http://www.atlanticfirearms.com/browse-our-products.html,你应该设置dont_filter=True:

    if add_to_cart:
        # attempt to add to cart to verify availability
        request = FormRequest.from_response(response, formname="addtocartForm",
                      callback=self.parse_add_to_cart, dont_filter=True)

See 根据请求提供 Scrapy 文档 http://doc.scrapy.org/en/latest/topics/request-response.html#request-objects:

不过滤(boolean) – 指示调度程序不应过滤此请求。当您想要多次执行相同的请求以忽略重复项过滤器时,可以使用此选项。

此外,您可能想要设置CONCURRENT_REQUESTS = 1在购物车中逐一添加商品(我想知道服务器如何处理并行购物车添加。)

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

为什么我的 scrapy 蜘蛛没有遵循我的项目解析函数中的请求回调? 的相关文章

随机推荐

  • 无需表单即可在屏幕上绘图

    是否可以在不使用表单应用程序的情况下在屏幕上创建一个大的白色矩形 如果可能的话 它应该覆盖整个屏幕 我知道我必须使用System Drawing并尝试了几个步骤 但没有一个真正在我的屏幕上打印任何内容 方法一 调用Windows API 你
  • ANDROID从url获取内容

    我在使用 Android 时遇到问题 我必须能够从生成内容的 URL 中读取内容 然后将它们放入数组中 分割读取的字符串 我使用了不同的方法 但无法读取该字符串 我能怎么做 这是我现在使用的函数 继续出现字符串 nn va private
  • 具有可比较列表与 TreeSet

    选项 1 创建一个实现 Comparable 的列表 并在每次添加值时使用 collections sort List l 对它进行排序 选项 2 创建一个 TreeSet 它始终保持排序 哪一个会更快 我问这个是因为 List 为我提供了
  • git am 错误:“补丁不适用”

    我正在尝试使用 git 将多个提交从一个项目移动到第二个类似的项目 所以我创建了一个补丁 包含 5 个提交 git format patch 4af51 stdout gt changes patch 然后将补丁移动到第二个项目的文件夹并想
  • 如何在 C# 中使用 SendMessageTimeout 获取窗口标题

    我需要枚举所有打开的窗口并获取它们的标题 但问题是某些窗口属于同一进程但属于不同的线程 该线程被阻塞 等待互斥体 因此 我不能对属于我自己的进程的窗口使用 GetWindowText 因为这将导致 SendMessage 调用 这将阻止我的
  • 向类型类实例添加类约束

    我正在尝试实现康托配对函数 作为 通用 Pair 类型类 如下所示 module Pair Pair CantorPair where Pair interface class Pair p where pi a gt a gt p a k
  • 如何使用 C++ 从 RAM 运行可执行文件?

    如何使用 C 从 RAM 运行可执行文件 可执行文件位于 RAM 中 并且我知道地址 如何从我的程序中调用该程序 这类事情通常来自世界的黑暗角落 结合像metasploit这样的工具 用内存创建进程会很棒 因此一些人尝试重新实现Create
  • 向应用程序小部件发送更新广播

    我有一个带有配置活动的应用程序小部件 我想在单击活动中的 确定 按钮时触发小部件的更新 我写了这段代码 Intent initialUpdateIntent new Intent AppWidgetManager ACTION APPWID
  • 为什么 firefox/chrome 显示的页面与 IE8 不同?

    当我看着 我看到最新版本 with Firefox and Chrome 但是一个旧版本 with IE8 另外 通过屏幕抓取PHP Curl给我一个旧版本 我试过了CTRL 刷新在 IE8 中 但我无法让它向我显示最新版本 无论heade
  • 警告:格式字符串不是字符串文字[重复]

    这个问题在这里已经有答案了 我从以下行收到 格式字符串不是字符串文字 警告 NSString formattedString NSString alloc initWithFormat format arguments valist 我在以
  • 从 Delphi 7 中的 C++ DLL 接收字符串数组

    我正在用 C 创建一个 DLL 它将在 Delphi 7 项目中使用 这个问题与this one https codereview stackexchange com questions 43347 produce an array of
  • 尝试从列表中删除对象时迭代期间的并发修改

    我试图在多个列表中循环 最后比较它们的名称 如果它们不匹配 则将其从列表中删除 但我收到此错误 迭代期间并发修改 虽然我已经复制了原始列表只是为了避免这个错误 但仍然得到它 我尝试的是 globals filteredPollsList p
  • 在 R 中获取生存估计

    我试图获得不同人在特定时间的生存估计 我的代码如下 s Surv outcome 1 outcome 2 survplot survfit s person list 1 plot survplot mark time FALSE summ
  • 什么是serialVersionUID?为什么要使用它?

    当出现以下情况时 Eclipse 会发出警告serialVersionUID不见了 可序列化类 Foo 未声明静态final long 类型的serialVersionUID 字段 What is serialVersionUID为什么它很
  • 如何优化Excel VBA点击url

    VBA 运行时出现运行时错误 70 有时代码运行顺利 但有时则不然 想知道是否有更可靠的代码可以继续 它总是停在If link innerHTML Balance Sheet Then end if Public Sub Get Dim i
  • 如何以编程方式将 QMainWindow 大小调整为其最小大小

    当我有一个带有网格布局的 QMainWindow 时 当用鼠标调整它的大小时 它不会低于其中所有控件正确显示所需的最小尺寸 在我的应用程序中 我有时会以编程方式隐藏控件 但随后窗口保持相同的大小 其余控件看起来分散开来 它们之间的空间太大
  • 将图像文件打开到 WriteableBitmap

    问题就在这里 我想从本地驱动器打开一个文件 然后将其放入 WritableBitmap 中 以便我可以编辑它 但问题是 我无法从 Uri 或类似的东西创建 WritableBitmap 我也知道如何将文件打开到 BitmapImage 中
  • 设置EditText的HINT多行

    我知道我可以更改 EditeText 文本的行数 但我也可以更改 EditText 提示的行数吗 我在网上找不到解决方案 谢谢您的帮助 My Code Override public void onCreateOptionsMenu Men
  • 在控制台窗口中以横向树格式漂亮打印输出

    我有一本使用 Python 创建的字典 d a Adam Book 4 b Bill TV 6 Jill Sports 1 Bill Computer 5 c Bill Sports 3 d Quin Computer 3 Adam Com
  • 为什么我的 scrapy 蜘蛛没有遵循我的项目解析函数中的请求回调?

    我正在抓取一个网站来检查各种产品的库存状态 不幸的是 这需要实际单击产品页面上的 添加到购物车 并检查下一页的消息以确定是否有库存 即它需要解析两个响应 我跟着优秀的文档 http doc scrapy org en latest topi