我想知道为什么列出all_links
and all_titles
不想接收列表中的任何记录titles
and links
。我也尝试过.extend()
方法,但没有帮助。
import requests
from bs4 import BeautifulSoup
all_links = []
all_titles = []
def title_link(page_num):
page = requests.get(
'https://www.gumtree.pl/s-mieszkania-i-domy-sprzedam-i-kupie/warszawa/page-%d/v%dc9073l3200008p%d'
% (page_num, page_num, page_num))
soup = BeautifulSoup(page.content, 'html.parser')
links = ['https://www.gumtree.pl' + link.get('href')
for link in soup.find_all('a', class_ ="href-link tile-title-text")]
titles = [flat.next_element for flat in soup.find_all('a', class_ = "href-link tile-title-text")]
print(titles)
for i in range(1,5+1):
title_link(i)
all_links = all_links + links
all_titles = all_titles + titles
i+=1
print(all_links)
import pandas as pd
df = pd.DataFrame(data = {'title': all_titles ,'link': all_links})
df.head(100)
#df.to_csv("./gumtree_page_1.csv", sep=';',index=False, encoding = 'utf-8')
#df.to_excel('./gumtree_page_1.xlsx')
当我运行你的代码时,我得到了
NameError Traceback (most recent call last)
<ipython-input-3-6fff0b33d73b> in <module>
16 for i in range(1,5+1):
17 title_link(i)
---> 18 all_links = all_links + links
19 all_titles = all_titles + titles
20 i+=1
NameError: name 'links' is not defined
这指出了一个问题 - 名为变量links
未在全局范围内定义(您将其添加到all_links
)。您可以阅读有关 python 范围的内容here https://www.geeksforgeeks.org/namespaces-and-scope-in-python/。你需要return
链接和标题来自title_link
。与此类似的东西:
def title_link(page_sum):
# your code here
return links, titles
for i in range(1,5+1):
links, titles = title_link(i)
all_links = all_links + links
all_titles = all_titles + titles
print(all_links)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)