scrapy怎么实现自定爬取深度?
我想实现对一个网址的图片进行下载,然后对该网址里面所有a标签的链接页面的图片进行下载,以此类推,这是我写的代码,求教大佬怎么做到在spider里用循环实现自定义深度啊
import scrapyfrom ..items import ImgspiderItem
full_img_list = []
class TestSpiderSpider(scrapy.Spider):
name = 'test_spider'
url = input("请输入要爬取的网址:")
start_urls = [url]
def parse(self, response):
img_list = response.xpath('//img/@src').extract()
a_list = response.xpath('//a/@href').extract()
if img_list:
item = ImgspiderItem()
for img in img_list:
if img is not None:
if img[0:4]!='http':
img = 'https:'+img
full_img_list.append(img)
elif img[0:5]!='https':
img = 'https:'+img.split(':',1)[1]
full_img_list.append(img)
else:
full_img_list.append(img)
item['image_urls'] = full_img_list
yield item
for a in a_list:
if a is not None:
if a[0:4]!='http':
a = 'https:'+a
elif a[0:5]!='https':
a = 'https:'+a.split(':',1)[1]
yield scrapy.Request(
a,
callback=self.parse
)
以上是 scrapy怎么实现自定爬取深度? 的全部内容, 来源链接: utcz.com/p/937800.html