求解?爬取电影使用协程出现'任务已销毁,但仍处于挂起状态!'?

求解?爬取电影使用协程出现'任务已销毁,但仍处于挂起状态!'?

爬取某一部电影 于网上从学习 一步一步操作 没有出现代码错误 但还是出现
'任务已销毁,但仍处于挂起状态!' 在网上看了很多没有看到合适的解决方法 需要把所有的 任务下载完毕 而不是跳过该任务
源代码

import requests

from lxml import etree

import re

from urllib import parse

import asyncio

import aiohttp

import aiofiles

# 发起请求

def get_page_source(url):

# 获取网址的源代码

resp = requests.get(url)

return resp.text

def get_iframe_src(url):

# 1.拿到页面源代码

page_source = get_page_source(url)

# 2.获取源代码中视频的原始地址url 即iframe中的src

obj = re.compile(r'u60c5"},"url":"(?P<src_url_one>.*?)"')

result = obj.search(page_source)

src_url = result.group('src_url_one')

src_url = src_url.replace('\\', '')

src_url = 'https://api.imgqiyu.com/bf.php?url=' + src_url

return src_url

# 获取第一层

def get_fisrt_m3u8_url(src_url):

page_source = get_page_source(src_url)

obj = re.compile(r'"source":"(?P<m3u8_url>.*?)"', re.S)

result = obj.search(page_source)

m3u8_url = result.group('m3u8_url')

return m3u8_url

# 获取第二层m3u8地址进行保存

def get_m3u8_file(fisrt_m3u8_url):

# 下载第一层

print('下载第一层m3u8')

fisrt_m3u8 = get_page_source(fisrt_m3u8_url)

second_m3u8_url = fisrt_m3u8.split()[-1]

second_m3u8_url = parse.urljoin(fisrt_m3u8_url, second_m3u8_url)

print('下载成功 并合并第二层地址', second_m3u8_url)

# 下载第二层

print('开始下载第二层')

second_m3u8 = get_page_source(second_m3u8_url)

with open("second_m3u8.txt", mode='w', encoding='utf-8') as f:

f.write(second_m3u8)

print('下载成功并保存')

async def download_one(url):

for i in range(10):

try:

print('开始下载')

file_name =url.split("/")[-1].strip()

async with aiohttp.ClientSession() as session:

async with session.get(url) as resp:

content = await resp.content.read()

async with aiofiles.open(f'./ts文件_加密/{file_name}',mode='wb')as f:

await f.write(content)

print(url,'下载成功')

except:

print(f'下载失败,第{i}次,即将重新下载')

async def download_all_ts():

tasks = []

with open('second_m3u8.txt', mode='r', encoding='utf_8') as f:

for line in f:

if line.startswith('#'):

continue

print(line)

line = "https://v9.dious.cc" + line

print(line)

task = asyncio.create_task(download_one(line))

tasks.append(task)

await asyncio.wait(tasks)

def main():

# url = 'https://www.bj-pfct.com/play/36552-2-1/'

# # 该网址进行了js加密 目前还不会 将直接用第一个的 M3U8文件网址获取开始

# # 返回播放器地址

# print('返回播放器地址')

# src_url = get_iframe_src(url)

# print('返回成功', src_url)

# # 3.请求是src中的页面源代码 获取真正的M3U8文件地址(主要获取该文件) 不用一定要iframe

# fisrt_m3u8_url = get_fisrt_m3u8_url(src_url)

# # get_m3u8_file(fisrt_m3u8_url)

asyncio.run(download_all_ts())

if __name__ == '__main__':

main()

出现的错误是

Traceback (most recent call last):

File "D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\runners.py", line 43, in run

return loop.run_until_complete(main)

File "D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\base_events.py", line 574, in run_until_complete

self.run_forever()

File "D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\base_events.py", line 541, in run_forever

self._run_once()

File "D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\base_events.py", line 1750, in _run_once

event_list = self._selector.select(timeout)

File "D:\APP\anaconda3.5.3\envs\env_env1\lib\selectors.py", line 323, in select

r, w, _ = self._select(self._readers, self._writers, [], timeout)

File "D:\APP\anaconda3.5.3\envs\env_env1\lib\selectors.py", line 314, in _select

r, w, x = select.select(r, w, w, timeout)

ValueError: too many file descriptors in select()

Task was destroyed but it is pending!

task: <Task pending coro=<download_one() running at D:\pythonProject\pythonscrapy\多线程与多进程\14.异步协程_实战_网吧电影.py:88> wait_for=<Future cancelled> cb=[gather.<locals>._done_callback() at D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\tasks.py:691]>

Task was destroyed but it is pending!

task: <Task pending coro=<download_one() running at D:\pythonProject\pythonscrapy\多线程与多进程\14.异步协程_实战_网吧电影.py:88> wait_for=<Future cancelled> cb=[gather.<locals>._done_callback() at D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\tasks.py:691]>

Task was destroyed but it is pending!

task: <Task cancelling coro=<download_one() running at D:\pythonProject\pythonscrapy\多线程与多进程\14.异步协程_实战_网吧电影.py:88> wait_for=<Future finished result=None> cb=[gather.<locals>._done_callback() at D:\APP\anaconda3.5.3\envs\env_env1\lib\asyncio\tasks.py:691]>

Exception ignored in: <coroutine object download_one at 0x00000129B68FF248>

RuntimeError: coroutine ignored GeneratorExit


回答:

代码写的不错,用了异步操作,不过异步操作有一个缺点,需要注意并发数量,不宜设置过大,否则可能引发这种错误。你的这篇代码的错误就是与并发有关,因为描述太多,导致了你的 select 函数调用出现bug,当然这个问题很好解决,加个限制就行。在你代码的asyncio.run 函数之前增加如下代码即可:

import resource

resource.setrlimit(resource.RLIMIT_NOFILE, (1024, 2048))

以上是 求解?爬取电影使用协程出现&#x27;任务已销毁,但仍处于挂起状态!&#x27;? 的全部内容, 来源链接: utcz.com/p/938915.html

回到顶部