在Scrapy中本地运行所有spider

有没有一种方法可以在不使用Scrapy守护程序的情况下运行Scrapy项目中的所有spider程序?曾经有一种使用来运行多个Spider的方法scrapy crawl,但是该语法已删除,Scrapy的代码也进行了很多更改。

我尝试创建自己的命令:

from scrapy.command import ScrapyCommand

from scrapy.utils.misc import load_object

from scrapy.conf import settings

class Command(ScrapyCommand):

requires_project = True

def syntax(self):

return '[options]'

def short_desc(self):

return 'Runs all of the spiders'

def run(self, args, opts):

spman_cls = load_object(settings['SPIDER_MANAGER_CLASS'])

spiders = spman_cls.from_settings(settings)

for spider_name in spiders.list():

spider = self.crawler.spiders.create(spider_name)

self.crawler.crawl(spider)

self.crawler.start()

但是,一旦在上注册了spiderself.crawler.crawl(),我就会得到所有其他spider的断言错误:

Traceback (most recent call last):

File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 138, in _run_command

cmd.run(args, opts)

File "/home/blender/Projects/scrapers/store_crawler/store_crawler/commands/crawlall.py", line 22, in run

self.crawler.crawl(spider)

File "/usr/lib/python2.7/site-packages/scrapy/crawler.py", line 47, in crawl

return self.engine.open_spider(spider, requests)

File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 1214, in unwindGenerator

return _inlineCallbacks(None, gen, Deferred())

--- <exception caught here> ---

File "/usr/lib/python2.7/site-packages/twisted/internet/defer.py", line 1071, in _inlineCallbacks

result = g.send(result)

File "/usr/lib/python2.7/site-packages/scrapy/core/engine.py", line 215, in open_spider

spider.name

exceptions.AssertionError: No free spider slots when opening 'spidername'

有什么办法吗?我宁可不要只是为了像这样运行所有spider而开始子类化Scrapy核心组件。

回答:

为什么不只使用以下内容:

scrapy list|xargs -n 1 scrapy crawl

以上是 在Scrapy中本地运行所有spider 的全部内容, 来源链接: utcz.com/qa/422599.html

回到顶部