如何在一个Scrapy项目中为不同的Spider使用不同的管道

Z时代
2024-01-10
分类：问答

有一个令人毛骨悚然的项目，其中包含多个spider。我有什么方法可以定义为哪个spider使用哪个管道？并非我定义的所有管道都适用于每个spider。

回答：

在Pablo Hoffman的解决方案的基础上，你可以在process_itemPipeline对象的方法上使用以下装饰器，以便它检查pipeline你的Spider属性是否应执行。例如：

def check_spider_pipeline(process_item_method):
    @functools.wraps(process_item_method)
    def wrapper(self, item, spider):
        # message template for debugging
        msg = '%%s %s pipeline step' % (self.__class__.__name__,)
        # if class is in the spider's pipeline, then use the
        # process_item method normally.
        if self.__class__ in spider.pipeline:
            spider.log(msg % 'executing', level=log.DEBUG)
            return process_item_method(self, item, spider)
        # otherwise, just return the untouched item (skip this step in
        # the pipeline)
        else:
            spider.log(msg % 'skipping', level=log.DEBUG)
            return item
    return wrapper

为了使此装饰器正常工作，蜘蛛程序必须具有管道属性，其中包含要用于处理项目的管道对象的容器，例如：

class MySpider(BaseSpider):
    pipeline = set([
        pipelines.Save,
        pipelines.Validate,
    ])
    def parse(self, response):
        # insert scrapy goodness here
        return item

然后在一个pipelines.py文件中：

class Save(object):
    @check_spider_pipeline
    def process_item(self, item, spider):
        # do saving here
        return item
class Validate(object):
    @check_spider_pipeline
    def process_item(self, item, spider):
        # do validating here
        return item

所有Pipeline对象仍应在ITEM_PIPELINES中的设置中进行定义（以正确的顺序进行更改-这样很好，以便可以在Spider上指定顺序）。

以上是如何在一个Scrapy项目中为不同的Spider使用不同的管道的全部内容，来源链接： utcz.com/qa/432381.html