您需要添加用户代理:
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
def find_city(zip_code):
zip_code = str(zip_code)
url = 'http://www.unitedstateszipcodes.org/' + zip_code
source_code = requests.get(url,headers=headers)
完成后,响应为 200,您将获得源代码:
In [8]: url = 'http://www.unitedstateszipcodes.org/54115'
In [9]: headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
In [10]: url = 'http://www.unitedstateszipcodes.org/54115'
In [11]: source_code = requests.get(url,headers=headers)
In [12]: source_code.status_code
Out[12]: 200
如果你想要详细信息,很容易解析:
In [59]: soup = BeautifulSoup(plain_text, "lxml")
In [60]: soup.find('div', id='zip-links').h3.text
Out[60]: 'ZIP Code: 54115'
In [61]: soup.find('div', id='zip-links').h3.next_sibling.strip()
Out[61]: 'De Pere, WI 54115'
In [62]: url = 'http://www.unitedstateszipcodes.org/90210'
In [63]: source_code = requests.get(url,headers=headers).text
In [64]: soup = BeautifulSoup(source_code, "lxml")
In [65]: soup.find('div', id='zip-links').h3.text
Out[66]: 'ZIP Code: 90210'
In [70]: soup.find('div', id='zip-links').h3.next_sibling.strip()
Out[70]: 'Beverly Hills, CA 90210'
您还可以将每个结果存储在数据库中,然后首先尝试在数据库中进行查找。