Scrapy：如何在Spider中使用项目以及如何将项目发送到管道？

Z时代
2024-01-10
分类：问答

我是新手scrapy，我的任务很简单：

对于给定的电子商务网站：

抓取所有网站页面

查找产品页面

如果网址指向产品页面

创建一个物品

处理项目以将其存储在数据库中

我创建了Spider，但产品只是打印在一个简单的文件中。

我的问题是关于项目结构的：如何在Spider中使用项目以及如何将项目发送到管道？

我找不到使用项目和管道的项目的简单示例。

回答：

如何使用蜘蛛网中的物品？

好吧，项目的主要目的是存储你爬网的数据。scrapy.Items基本上是字典。要声明你的物品，你将必须创建一个类并添加一个类scrapy.Field：

import scrapy
class Product(scrapy.Item):
    url = scrapy.Field()
    title = scrapy.Field()

现在，你可以通过导入产品在蜘蛛中使用它。

有关高级信息，我让你在此处检查文档

如何将项目发送到管道？

首先，你需要告诉spider使用custom pipeline。

在settings.py文件中：

ITEM_PIPELINES = {
    'myproject.pipelines.CustomPipeline': 300,
}

你现在可以编写管道并处理你的项目。

在pipeline.py文件中：

from scrapy.exceptions import DropItem
class CustomPipeline(object):
   def __init__(self):
        # Create your database connection
    def process_item(self, item, spider):
        # Here you can index your item
        return item

最后，在你的Spider中，你需要在yield填充物品后对其进行操作。

spider.py示例：

import scrapy
from myspider.items import Product
class MySpider(scrapy.Spider):
    name = "test"
    start_urls = [
        'http://www.exemple.com',
    ]
def parse(self, response):
    doc = Product()
    doc['url'] = response.url
    doc['title'] = response.xpath('//div/p/text()')
    yield doc # Will go to your pipeline

以上是 Scrapy：如何在Spider中使用项目以及如何将项目发送到管道？的全部内容，来源链接： utcz.com/qa/415120.html

Scrapy：如何在Spider中使用项目以及如何将项目发送到管道？

回答：

其他人也看了：