scrapy怎么实现自定爬取深度?

scrapy怎么实现自定爬取深度?

我想实现对一个网址的图片进行下载,然后对该网址里面所有a标签的链接页面的图片进行下载,以此类推,这是我写的代码,求教大佬怎么做到在spider里用循环实现自定义深度啊

import scrapy

from ..items import ImgspiderItem

full_img_list = []

class TestSpiderSpider(scrapy.Spider):

name = 'test_spider'

url = input("请输入要爬取的网址:")

start_urls = [url]

def parse(self, response):

img_list = response.xpath('//img/@src').extract()

a_list = response.xpath('//a/@href').extract()

if img_list:

item = ImgspiderItem()

for img in img_list:

if img is not None:

if img[0:4]!='http':

img = 'https:'+img

full_img_list.append(img)

elif img[0:5]!='https':

img = 'https:'+img.split(':',1)[1]

full_img_list.append(img)

else:

full_img_list.append(img)

item['image_urls'] = full_img_list

yield item

for a in a_list:

if a is not None:

if a[0:4]!='http':

a = 'https:'+a

elif a[0:5]!='https':

a = 'https:'+a.split(':',1)[1]

yield scrapy.Request(

a,

callback=self.parse

)

以上是 scrapy怎么实现自定爬取深度? 的全部内容, 来源链接: utcz.com/p/937800.html

回到顶部