您的代码将循环遍历每个帖子并打印其数据。将发布数据添加到数据帧的代码部分不是循环的一部分(在 python 中缩进是有意义的!),因此您只能看到数据帧中一个提要的数据。
您可以在循环浏览提要时构建帖子列表,然后在最后创建一个数据框:
import feedparser
import pandas as pd
rawrss = [
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
]
feeds = [] # list of feed objects
for url in rawrss:
feeds.append(feedparser.parse(url))
posts = [] # list of posts [(title1, link1, summary1), (title2, link2, summary2) ... ]
for feed in feeds:
for post in feed.entries:
posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init
您可以通过组合两个 for 循环来对此进行一些优化:
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary))