scrapy假死是怎么回事？

Z时代
2024-01-10
分类：技术分享

http://news.ifeng.com/listpage/11502/20150924/1/rtlist.shtml
我爬的是上面凤凰网的链接，链接里的日期通过循环来变，但是运行一段时间后，scrapy就停止爬取了，但还在运行，而且python进程占用cpu过高，到百分之三四十。
下面是log信息截图：图片描述

10：35时候停止抓取了，12点38我把它关了。

下面是代码：

    def parse(self, response):
        div = response.xpath('//div[@class="newsList"]')
        #这个是获得新闻的链接
        for ul in div.xpath('ul'):
            for li in ul.xpath('li'):
                url = li.xpath('a/@href')[0].extract()
                yield scrapy.Request(url, callback=self.parse_detail)
        spans = response.xpath('//div[@class="m_page"]')
        #这个是获得翻页链接的
        for span in spans.xpath('span'):
            url = span.xpath('a/@href')[0].extract()            yield scrapy.Request(url, callback=self.parse)