帮助!阅读下面的scrapy代码和爬虫的结果。我想从中抓取一些数据http://china.fathom.info/data/data.json http://china.fathom.info/data/data.json,并且仅Scrapy被允许。但我不知道如何控制yield的顺序。我期待着处理所有解析成员循环中请求,然后返回组项,但似乎yield item总是在yield请求之前执行。
start_urls = [
"http://china.fathom.info/data/data.json"
]
def parse(self, response):
groups = json.loads(response.body)['group_members']
for i in groups:
group_item = GroupItem()
group_item['name'] = groups[i]['name']
group_item['chinese'] = groups[i]['chinese']
group_item['members'] = []
members = groups[i]['members']
for member in members:
yield Request(self.person_url % member['id'], meta={'group_item': group_item, 'member': member},
callback=self.parse_member, priority=100)
yield group_item
def parse_member(self, response):
group_item = response.meta['group_item']
member = response.meta['member']
person = json.loads(response.body)
ego = person['ego']
group_item['members'].append({
'id': ego['id'],
'name': ego['name'],
'chinese': ego['chinese'],
'role': member['role']
})
MongoDB 上的数据 https://i.stack.imgur.com/woLIT.jpg
你需要在最后的回调中产生该项目,parse
并没有停下来parse_member
来完成,所以group_item
in parse
没有改变,而parse_member
工作中。
不要屈服group_item
of parse
,仅此一项parse_member
,因为您已经复制了上一项meta
并且您已经恢复了它parse_member
with response.meta['group_item']
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)