Python scrapy爬虫 生成 启动 crawlspider命令 爬取示例网站的数据案例

2023-11-10

创建一个scrapy项目
scrapy startproject myscrapy
生成一个爬虫
scrapy genspider example example.com
启动爬虫
scrapy crawl example
生成crawlspider
scrapy genspider -t crawl example "example.com"

案例:爬取 网站的数据

import scrapy
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


class Spider(scrapy.Spider):
    name = ''
    allowed_domains = ['.com']
    start_urls = ['http://.com/']
    page = 1

    def __init__(self):
        chrome_options = Options()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--disable-gpu')
        self.browser = webdriver.Chrome(executable_path=r'C:\Program Files\Google\Chrome\Application\chromedriver.exe',
                                        chrome_options=chrome_options)

    # def closed(self, spider):
    #     print("spider closed")
    #     self.browser.close()

    def parse(self, response):
        res_div_list = response.xpath("//div[@class='recruit-list']")
        for div in res_div_list:
            item = {}
            item["title"] = div.xpath(".....
            yield scrapy.Request("https://.com/....?...="...., callback=self.detail,
                                 meta={
                                     "item": item
                                 })

        # res = response.xpath("/html").extract()
        # print(res)
        while self.page <= 0:
            self.page += 1
            next_url = self.start_urls[0] + "?index=" + self.page.__str__()
            yield scrapy.Request(next_url, callback=self.parse)  # 这个URL用callback方法处理

    def detail(self, response):
        item = response.meta["item"]
        item["duty"] = response.xpath("//div[@class='duty-text']//li[@class='explain-item']/text()").extract()[0]
        yield item

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Python scrapy爬虫 生成 启动 crawlspider命令 爬取示例网站的数据案例 的相关文章

随机推荐