scrapy不打印出异常堆栈跟踪

是否有一种特殊的机制来强制scrapy打印出所有python异常/堆栈跟踪。

我犯了一个简单的错误,即弄错了一个列表属性,导致AttributeError出现,该错误未在日志中完整显示:

2019-11-15 22:13:50 [scrapy] INFO: Dumping Scrapy stats:

{'downloader/request_bytes': 264,

'downloader/request_count': 1,

'downloader/request_method_count/GET': 1,

'downloader/response_bytes': 40342,

'downloader/response_count': 1,

'downloader/response_status_count/200': 1,

'finish_reason': 'finished',

'finish_time': datetime.datetime(2019, 11, 15, 22, 13, 50, 860480),

'log_count/CRITICAL': 1,

'log_count/DEBUG': 1,

'log_count/INFO': 1,

'response_received_count': 1,

'scheduler/dequeued': 1,

'scheduler/dequeued/memory': 1,

'scheduler/enqueued': 1,

'scheduler/enqueued/memory': 1,

'spider_exceptions/AttributeError': 1,

'start_time': datetime.datetime(2019, 11, 15, 22, 13, 49, 222371)}

因此它显示的AttributeError计数为1,但是没有告诉我在哪里以及如何进行操作,我不得不手动在代码中放置ipdb.set_trace()来找出哪里出错了。Scrapy本身继续执行其他线程而不打印任何内容

ipdb>

AttributeError: "'list' object has no attribute 'match'"

> /Users/username/Programming/regent/regentscraper/spiders/regent_spider.py(139)request_listing_detail_pages_from_listing_id_list()

138 volatile_props = ListingScanVolatilePropertiesItem()

--> 139 volatile_props['position_in_search'] = list_of_listing_ids.match(listing_id) + rank_of_first_item_in_page

140

抓取设置

# -*- coding: utf-8 -*-

# Scrapy settings for regentscraper project

#

# For simplicity, this file contains only settings considered important or

# commonly used. You can find more settings consulting the documentation:

#

# http://doc.scrapy.org/en/latest/topics/settings.html

# http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html

# http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html

import sys

import os

import django

sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__name__), os.pardir)))

print sys.path

os.environ['DJANGO_SETTINGS_MODULE'] = 'regent.settings'

django.setup() #new for Django 1.8

BOT_NAME = 'regentscraper'

SPIDER_MODULES = ['regentscraper.spiders']

NEWSPIDER_MODULE = 'regentscraper.spiders'

ITEM_PIPELINES = {

'regentscraper.pipelines.ListingScanPipeline': 300,

}

回答:

我遇到了如上所述的同一事件。我的环境中使用以下版本:

  • Django(1.11.4)
  • Scrapy(1.4.0)
  • scrapy-djangoitem(1.1.1)

然后,我在snan加载的dnango设置中添加了“ LOGGING_CONFIG = None”,从而解决了该问题。我创建了一个新的django的设置文件,并将其设置为settings_scrapy,其内容如下:

mysite.settings_scrapy

try:

from mysite.settings import *

LOGGING_CONFIG = None

except ImportError:

pass

然后,将设置文件加载为scrapy的设置文件,如下所示:

import sys

import os

import django

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings_scrapy'

django.setup()

在那之后,在爬虫和管道上的异常上使用stacktrace。

以上是 scrapy不打印出异常堆栈跟踪 的全部内容, 来源链接: utcz.com/qa/423826.html

回到顶部