检测电子邮件是否为"递送状态通知"并提取信息 - Python




用 .parsestr(email) 解析后得到的对象是这样的:

{'Content-Transfer-Encoding': 'quoted-printable',
 'Content-Type': 'text/plain; charset=ISO-8859-1',
 'Date': 'Mon, 14 Mar 2011 11:26:24 +0000',
 'Delivered-To': '[email protected] /cdn-cgi/l/email-protection',
 'From': 'Mail Delivery Subsystem <[email protected] /cdn-cgi/l/email-protection>',
 'MIME-Version': '1.0',
 'Message-ID': '<[email protected] /cdn-cgi/l/email-protection>',
 'Received': 'by with SMTP id 8cs63078wfm;\r\n        Mon, 14 Mar 2011 04:26:24 -0700 (PDT)',
 'Return-Path': '<>',
 'Subject': 'Delivery Status Notification (Failure)',
 'To': 'se[email protected] /cdn-cgi/l/email-protection',
 'X-Failed-Recipients': '[email protected] /cdn-cgi/l/email-protection'}

Firstly,如何在不使用正则表达式的情况下判断这是一个 DSN?



The 电子邮件文档 http://docs.python.org/library/email#differences-from-mimelib say:

Parser 类的公共接口没有区别。确实如此 有一些额外的智慧 识别消息/传递状态类型 消息,它表示为 包含单独的消息实例 每个标头块的消息子部分 在交货状态通知中


基本上,我需要能够可靠地检测电子邮件是否为 DSN,然后提取原始消息,以便我可以使用 email.Parser() 对其进行解析并获取有关它的信息。

您引用的文档说 http://docs.python.org/library/email#differences-from-mimelib如果消息是多部分的DSN https://www.rfc-editor.org/rfc/rfc1894.html:

import email

msg = email.message_from_string(emailstr)

if (msg.is_multipart() and len(msg.get_payload()) > 1 and 
    msg.get_payload(1).get_content_type() == 'message/delivery-status'):
    # email is DSN
    print(msg.get_payload(0).get_payload()) # human-readable section
    for dsn in msg.get_payload(1).get_payload():
        print('action: %s' % dsn['action']) # e.g., "failed", "delivered"
    if len(msg.get_payload()) > 2:
        print(msg.get_payload(2)) # original message

交货状态通知的格式(来自rfc 3464 https://www.rfc-editor.org/rfc/rfc3464#page-7):

A DSN is a MIME message with a top-level content-type of
multipart/report (defined in [REPORT]).  When a multipart/report
content is used to transmit a DSN:

(a) The report-type parameter of the multipart/report content is

(b) The first component of the multipart/report contains a human-
    readable explanation of the DSN, as described in [REPORT].

(c) The second component of the multipart/report is of content-type
    message/delivery-status, described in section 2.1 of this

(d) If the original message or a portion of the message is to be
    returned to the sender, it appears as the third component of the

