Empty .json file

我已经编写了这段简短的spider代码,以从新闻首页提取标题。

import scrapy

class HackerItem(scrapy.Item): #declaring the item

hackertitle = scrapy.Field()

class HackerSpider(scrapy.Spider):

name = 'hackernewscrawler'

allowed_domains = ['news.ycombinator.com'] # website we chose

start_urls = ['http://news.ycombinator.com/']

def parse(self,response):

sel = scrapy.Selector(response) #selector to help us extract the titles

item=HackerItem() #the item declared up

# xpath of the titles

item['hackertitle'] =

sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()

# printing titles using print statement.

print (item['hackertitle']

但是当我运行代码 scrapy scrawl hackernewscrawler -o hntitles.json -t json

我得到一个空的.json文件,其中没有任何内容。

回答:

你应该将print语句更改为yield

import scrapy

class HackerItem(scrapy.Item): #declaring the item

hackertitle = scrapy.Field()

class HackerSpider(scrapy.Spider):

name = 'hackernewscrawler'

allowed_domains = ['news.ycombinator.com'] # website we chose

start_urls = ['http://news.ycombinator.com/']

def parse(self,response):

sel = scrapy.Selector(response) #selector to help us extract the titles

item=HackerItem() #the item declared up

# xpath of the titles

item['hackertitle'] = sel.xpath("//tr[@class='athing']/td[3]/a[@href]/text()").extract()

# return items

yield item

然后运行:

scrapy crawl hackernewscrawler -o hntitles.json -t json

以上是 Empty .json file 的全部内容, 来源链接: utcz.com/qa/417889.html

回到顶部