python多线程get请求报错urllib3.connectionpool Failed to parse headers
程序需求简述
使用多线程批量向指定的一些url发送get请求(这些url都不重复)
问题描述
在requests请求中已经设置了timeout为3秒
,程序运行后先是正常输出,然后在一段时间内没输出请求结果,观察发现进程中有大量线程未关闭,程序运行一段时间后出现标题所述的错误(详细错误信息已在下方贴出),查阅了相关案例可能是keep_alive的问题,于是设置了 req.keep_alive = False
但是无果
各位前辈帮忙看下是啥原因造成的,万分感谢,问题可能描述得不清楚,请见谅
程序代码贴在最后了
详细错误信息如下
URL内容使用XXX替换了,报错时内容中的URL是能正常访问的
2020-01-18 02:33:55,363 urllib3.connectionpool [WARNING] - Failed to parse headers (url=https://XXX/XXX.conf): [MissingHeaderBodySeparatorDefect()], unparsed data: "Mí\x99\x81M\x8fMIû+Pµ!ó:aç\x96\x90QÔIhvNOÄÍùS\x16.\x03UiqØÉó\x0c\x9b®'Oj\x15þ\x06\x1b\x93\x18\x8dçøÈþjw\x89è\\\x0bõ\x7f\x10Q*¢\xa0\x06ÿm/\x02^(aÐ\x12\x9b˯ÈkfÙSÉ\x81\x9a8§\xa0\\\x9938g\x88Âdñ=ÊaÑuv®\x8e^õ2\x9a»»\x1cÎê¾ásóÆðAÅ:÷ú¯·2®\x1fyä{¼ãÀ¢¦,ÃR7L\x9ff!`\x15\x81<©*»{ï(+.ÐW½Ñ»ß\x8dÅ.\x1c¨·¢\x91àr´cÙÆ=-ÄÜ¡;HttpOnly;Path=/;Secure\r\nSet-Cookie: NSC_AAAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_EPAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_USER=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TEMP=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_PERS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_BASEURL=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: CsrfToken=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: CtxsAuthId=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: ASP.NET_SessionId=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TMAA=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_TMAS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TEMP=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_PERS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_AAAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nConnection: close\r\nContent-Length: 551\r\nCache-control: no-cache, no-store, must-revalidate\r\nPragma: no-cache\r\nContent-Type: text/html\r\n\r\n"Traceback (most recent call last):
File "D:\soft\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 441, in _make_request
assert_header_parsing(httplib_response.msg)
File "D:\soft\Python\Python37\lib\site-packages\urllib3\util\response.py", line 71, in assert_header_parsing
raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: "Mí\x99\x81M\x8fMIû+Pµ!ó:aç\x96\x90QÔIhvNOÄÍùS
程序代码
def checking(url):
# 业务逻辑
try:
url_new = '%s/xxx.html' % url
header = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
req = requests.session()
req.keep_alive = False # 尝试关闭urllib中的keep-alive
res = req.get(url_new, headers=header,timeout=3, verify=False, allow_redirects=False)
if 'target_text' in str(res.content):
logger.info('[+] task %s is SUCC' % (url))
else:
logger.info('[-] task %s is FAIL' % (url))
except:
pass
def get_url_list(filename):
url_list = []
with open(filename, 'r', encoding='utf-8') as file:
while True:
url = file.readline().strip()
if not url:
break
else:
if url != '': url_list.append(url)
print('\r已读取 %s' % len(url_list), end='', flush=True)
print('')
return url_list
if __name__ == '__main__':
# 读取url
url_list = get_url_list('data/host.txt')
thread_list = []
for url in url_list:
thread = Thread(target=checking, args=(url,)).start()
thread_list.append(thread)
time.sleep(0.05)
for th in thread_list:
th.join()
print('全部线程结束')
回答:
用这种写法试试
with requests.session() as req: pass
以上是 python多线程get请求报错urllib3.connectionpool Failed to parse headers 的全部内容, 来源链接: utcz.com/a/159394.html