您的代码会想到两件事。首先,邮件程序代码是否正在执行,其次,smtpuser
应填充参数。
以下是使用 Scrapy 通过 Gmail 发送电子邮件的工作代码。这个答案有 4 个部分:电子邮件代码、完整示例、日志记录和 Gmail 配置。提供了完整的示例,因为需要协调一些事情才能使其正常工作。
电子邮件代码
要让 Scrapy 发送电子邮件,您可以在 Spider 类中添加以下内容(下一节中的完整示例)。这些示例让 Scrapy 在爬行完成后发送电子邮件。
有两块代码需要添加,第一块用于导入模块,第二块用于发送电子邮件。
导入模块:
from scrapy import signals
from scrapy.mail import MailSender
在你的 Spider 类定义中:
class MySpider(Spider):
<SPIDER CODE>
@classmethod
def from_crawler(cls, crawler):
spider = cls()
crawler.signals.connect(spider.spider_closed, signals.spider_closed)
return spider
def spider_closed(self, spider):
mailer = MailSender(mailfrom="[email protected] /cdn-cgi/l/email-protection",smtphost="smtp.gmail.com",smtpport=587,smtpuser="[email protected] /cdn-cgi/l/email-protection",smtppass="MySecretPassword")
return mailer.send(to=["A[email protected] /cdn-cgi/l/email-protection"],subject="Some subject",body="Some body")
完整示例
综上所述,本示例使用位于以下位置的 dirbot 示例:
https://github.com/scrapy/dirbot https://github.com/scrapy/dirbot
只需要编辑一个文件:
./dirbot/spiders/dmoz.py
这是整个工作文件,其中导入位于顶部附近,电子邮件代码位于蜘蛛类的末尾:
from scrapy.spider import Spider
from scrapy.selector import Selector
from dirbot.items import Website
from scrapy import signals
from scrapy.mail import MailSender
class DmozSpider(Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/",
]
def parse(self, response):
"""
The lines below is a spider contract. For more info see:
http://doc.scrapy.org/en/latest/topics/contracts.html
@url http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/
@scrapes name
"""
sel = Selector(response)
sites = sel.xpath('//ul[@class="directory-url"]/li')
items = []
for site in sites:
item = Website()
item['name'] = site.xpath('a/text()').extract()
item['url'] = site.xpath('a/@href').extract()
item['description'] = site.xpath('text()').re('-\s[^\n]*\\r')
items.append(item)
return items
@classmethod
def from_crawler(cls, crawler):
spider = cls()
crawler.signals.connect(spider.spider_closed, signals.spider_closed)
return spider
def spider_closed(self, spider):
mailer = MailSender(mailfrom="[email protected] /cdn-cgi/l/email-protection",smtphost="smtp.gmail.com",smtpport=587,smtpuser="[email protected] /cdn-cgi/l/email-protection",smtppass="MySecretPassword")
return mailer.send(to=["A[email protected] /cdn-cgi/l/email-protection"],subject="Some subject",body="Some body")
更新此文件后,从项目目录运行标准爬网命令来爬网并发送电子邮件:
$ scrapy crawl dmoz
Logging
通过返回的输出mailer.send
方法中的spider_closed
方法,Scrapy 会自动将结果添加到其日志中。以下是成功和失败的例子:
成功日志消息:
2015-03-22 23:24:30-0000 [scrapy] INFO: Mail sent OK: To=['A[email protected] /cdn-cgi/l/email-protection'] Cc=None Subject="Some subject" Attachs=0
错误日志消息 - 无法连接:
2015-03-22 23:39:45-0000 [scrapy] ERROR: Unable to send mail: To=['[email protected] /cdn-cgi/l/email-protection'] Cc=None Subject="Some subject" Attachs=0- Unable to connect to server.
错误日志消息 - 身份验证失败:
2015-03-22 23:38:29-0000 [scrapy] ERROR: Unable to send mail: To=['[email protected] /cdn-cgi/l/email-protection'] Cc=None Subject="Some subject" Attachs=0- 535 5.7.8 Username and Password not accepted. Learn more at 5.7.8 http://support.google.com/mail/bin/answer.py?answer=14257 sb4sm6116233pbb.5 - gsmtp
Gmail 配置
要将 Gmail 配置为以这种方式接受电子邮件,您需要启用“访问不太安全的应用程序”,您可以在登录帐户时通过以下 URL 执行此操作:
https://www.google.com/settings/security/lesssecureapps