Empty .json file

Z时代
2024-01-10
分类：问答

我已经编写了这段简短的spider代码，以从新闻首页提取标题。

import scrapy
class HackerItem(scrapy.Item): #declaring the item
    hackertitle = scrapy.Field()
class HackerSpider(scrapy.Spider):
    name = 'hackernewscrawler'
    allowed_domains = ['news.ycombinator.com'] # website we chose
    start_urls = ['http://news.ycombinator.com/']
   def parse(self,response):
        sel = scrapy.Selector(response) #selector to help us extract the titles
        item=HackerItem() #the item declared up
# xpath of the titles
        item['hackertitle'] = 
sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()
# printing titles using print statement.
        print (item['hackertitle']

但是当我运行代码 scrapy scrawl hackernewscrawler -o hntitles.json -t json

我得到一个空的.json文件，其中没有任何内容。

回答：

你应该将print语句更改为yield：

import scrapy
class HackerItem(scrapy.Item): #declaring the item
    hackertitle = scrapy.Field()
class HackerSpider(scrapy.Spider):
    name = 'hackernewscrawler'
    allowed_domains = ['news.ycombinator.com'] # website we chose
    start_urls = ['http://news.ycombinator.com/']
    def parse(self,response):
        sel = scrapy.Selector(response) #selector to help us extract the titles
        item=HackerItem() #the item declared up
# xpath of the titles
        item['hackertitle'] = sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()
# return items
        yield item

然后运行：

scrapy crawl hackernewscrawler -o hntitles.json -t json

以上是 Empty .json file 的全部内容，来源链接： utcz.com/qa/417889.html

Empty .json file

回答：

其他人也看了：