python多线程get请求报错urllib3.connectionpool Failed to parse headers

python多线程get请求报错urllib3.connectionpool Failed to parse headers

程序需求简述

使用多线程批量向指定的一些url发送get请求(这些url都不重复)

问题描述

在requests请求中已经设置了timeout为3秒,程序运行后先是正常输出,然后在一段时间内没输出请求结果,观察发现进程中有大量线程未关闭,程序运行一段时间后出现标题所述的错误(详细错误信息已在下方贴出),查阅了相关案例可能是keep_alive的问题,于是设置了 req.keep_alive = False 但是无果

各位前辈帮忙看下是啥原因造成的,万分感谢,问题可能描述得不清楚,请见谅

程序代码贴在最后了

详细错误信息如下

URL内容使用XXX替换了,报错时内容中的URL是能正常访问的

2020-01-18 02:33:55,363 urllib3.connectionpool [WARNING] - Failed to parse headers (url=https://XXX/XXX.conf): [MissingHeaderBodySeparatorDefect()], unparsed data: "Mí\x99\x81M\x8fMIû+Pµ!ó:aç\x96\x90QÔIhvNOÄÍùS\x16.\x03UiqØÉó\x0c\x9b®'Oj\x15þ\x06\x1b\x93\x18\x8dçøÈþjw\x89è\\\x0bõ\x7f\x10Q*¢\xa0\x06ÿm/\x02^(aÐ\x12\x9b˯ÈkfÙSÉ\x81\x9a8§\xa0\\\x9938g\x88Âdñ=ÊaÑuv®\x8e^õ2\x9a»»\x1cÎê¾ásóÆðAÅ:÷ú¯·2®\x1fyä{¼ãÀ¢¦,ÃR7L\x9ff!`\x15\x81<©*»{ï(+.ÐW½Ñ»ß\x8dÅ.\x1c¨·¢\x91àr´cÙÆ=-ÄÜ¡;HttpOnly;Path=/;Secure\r\nSet-Cookie: NSC_AAAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_EPAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_USER=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TEMP=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_PERS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_BASEURL=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: CsrfToken=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: CtxsAuthId=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: ASP.NET_SessionId=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TMAA=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_TMAS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT;Secure\r\nSet-Cookie: NSC_TEMP=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_PERS=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nSet-Cookie: NSC_AAAC=xyz;Path=/;expires=Wednesday, 09-Nov-1999 23:12:40 GMT\r\nConnection: close\r\nContent-Length: 551\r\nCache-control: no-cache, no-store, must-revalidate\r\nPragma: no-cache\r\nContent-Type: text/html\r\n\r\n"

Traceback (most recent call last):

File "D:\soft\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 441, in _make_request

assert_header_parsing(httplib_response.msg)

File "D:\soft\Python\Python37\lib\site-packages\urllib3\util\response.py", line 71, in assert_header_parsing

raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)

urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: "Mí\x99\x81M\x8fMIû+Pµ!ó:aç\x96\x90QÔIhvNOÄÍùS

程序代码

def checking(url):

# 业务逻辑

try:

url_new = '%s/xxx.html' % url

header = {

'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'

}

req = requests.session()

req.keep_alive = False # 尝试关闭urllib中的keep-alive

res = req.get(url_new, headers=header,timeout=3, verify=False, allow_redirects=False)

if 'target_text' in str(res.content):

logger.info('[+] task %s is SUCC' % (url))

else:

logger.info('[-] task %s is FAIL' % (url))

except:

pass

def get_url_list(filename):

url_list = []

with open(filename, 'r', encoding='utf-8') as file:

while True:

url = file.readline().strip()

if not url:

break

else:

if url != '': url_list.append(url)

print('\r已读取 %s' % len(url_list), end='', flush=True)

print('')

return url_list

if __name__ == '__main__':

# 读取url

url_list = get_url_list('data/host.txt')

thread_list = []

for url in url_list:

thread = Thread(target=checking, args=(url,)).start()

thread_list.append(thread)

time.sleep(0.05)

for th in thread_list:

th.join()

print('全部线程结束')


回答:

用这种写法试试

with requests.session() as req:

pass

以上是 python多线程get请求报错urllib3.connectionpool Failed to parse headers 的全部内容, 来源链接: utcz.com/a/159394.html

回到顶部