从文件抓取读取URL列表到抓取？

Z时代
2024-01-10
分类：问答

我刚刚安装了scrapy，并按照他们的简单dmoz 教程进行了工作。我只是查找了python的基本文件处理，并试图使搜寻器从文件中读取URL列表，但出现了一些错误。这可能是错误的，但我试了一下。有人请教给我一个读取URL列表的例子吗？提前致谢。

from scrapy.spider import BaseSpider
class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    f = open("urls.txt")
    start_urls = f
    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

回答：

你很近。

f = open("urls.txt")
start_urls = [url.strip() for url in f.readlines()]
f.close()

…最好还是使用上下文管理器来确保文件按预期关闭：

with open("urls.txt", "rt") as f:
    start_urls = [url.strip() for url in f.readlines()]

以上是从文件抓取读取URL列表到抓取？的全部内容，来源链接： utcz.com/qa/424634.html

从文件抓取读取URL列表到抓取？

回答：

其他人也看了：