Pythonurllib实用方法、属性、流程总结

编程

一、urllib、urllib2、urllib3、requests

urllib2室python2中的,python3合并了urllib和urllib2到urllib目录下,所以python3直接使用urllib。

urllib3是一个三方库,它提供了连接池、客户端SSL/TLS验证、文件编码上传、HTTP重定向、gzip和deflate压缩编码、HTTP和SOCKS代理等功能。

requests也是一个三方库,它依赖于urllib3,做了一些封装,所以一般使用requests的比较多。

二、urlopen

from urllib import request,parse

response = request.urlopen(r"http://www.baidu.com", timeout=3000)

# <class "http.client.HTTPResponse">

print(type(response))

content = response.read()

# <class "bytes">

print(type(content))

print(content.decode("utf-8"))

# 传递参数

param = parse.urlencode({"id": "2"})

data = bytes(param, encoding="utf8")

response = request.urlopen(r"http://www.baidu.com", data=data)

urlopen的timeout可以设置超时时间,data可以设置参数。

urlencode是把参数编码为url参数:

param = parse.urlencode({"id": "2", "name": "中文"}, encoding="utf-8")

# id=2&name=%E4%B8%AD%E6%96%87

print(param)

# %E4%B8%AD%E6%96%87

print(parse.quote("中文"))

print(parse.unquote("%E4%B8%AD%E6%96%87"))

三、Response

方法或属性

说明

read()

获取网页内容

status

HTTP状态码,200表示成功

getcode()

HTTP状态码,和status相同

reason

状态信息,成功为ok

msg

成功为ok

getheader("header_name")

获取指定header

getheaders()

获取所有header,元组列表

version

获取版本信息

debuglevel

获取调试等级

closed

获取对象是否关闭布尔值

geturl()

获取请求URL

info()

其他相应信息信息

import urllib.request

response = urllib.request.urlopen("http://www.baidu.com", timeout=3000)

# 获取网页内容

print(response.read().decode("utf-8"))

# 获取指定header

print(response.getheader("Content-Type"))

# 以元组列表获取头信息

print(response.getheaders())

# 获取版本信息

print(response.version)

# 获取状态码

print(response.status)

# 获取调试等级

print(response.debuglevel)

# 获取对象是否关闭布尔值

print(response.closed)

# 获取URL

print(response.geturl())

# 获取HTTP状态码

print(response.getcode())

# 获取msg

print(response.msg)

# 获取状态信息

print(response.reason)

# 获取其他信息

print(response.info())

四、Request

from urllib import request, parse

url = "http://127.0.0.1:8080/test/user"

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"

}

data = {"id": "1", "name": "tim"}

params = parse.urlencode(data)

byte_params = bytes(params, encoding="utf-8")

rst = request.Request(url=url, data=byte_params, headers=headers, method="POST")

rst.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")

rst.add_header("Accept-Encoding", "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2")

rst.add_header("Accept-Language", "gzip, deflate, br")

response = request.urlopen(rst)

print(response.read().decode("utf-8"))

五、异常

URLError在urllib库的error模块,继承了OSError类,由request模块产生的异常都可以通过捕获这个类来处理,URLError包含一个属性reason表示错误原因。

HTTPError是URLError的子类,有3个属性,code表示HTTP状态码,reason表示错误原因,headers是返回头信息。

from urllib import request,error

url = "http://127.0.0.1:8080/test/user"

try:

response = request.urlopen(url, timeout=1)

except error.HTTPError as e:

print(e.reason, e.code, e.headers)

print("HTTPError:" + str(type(e)))

except error.URLError as e:

print(e.reason)

print("URLError:" + str(type(e)))

else:

print("success")

六、urllib handler处理流程

七、cookie

7.1 获取cookie

from http import cookiejar

from urllib import request

url = "http://127.0.0.1:8080/test/cookie"

cookie = cookiejar.CookieJar()

handler = request.HTTPCookieProcessor(cookie)

opener = request.build_opener(handler)

response = opener.open(url)

print(response.read().decode("utf-8"))

for ck in cookie:

print(ck.name + ":" + ck.value)

7.2 cookie保存与重用

from http import cookiejar

from urllib import request

url = "http://127.0.0.1:8080/test/cookie"

fielname = r"F: mpcookies.txt"

# cookie = cookiejar.MozillaCookieJar(filename=fielname)

cookie = cookiejar.LWPCookieJar(filename=fielname)

handler = request.HTTPCookieProcessor(cookie)

opener = request.build_opener(handler)

response = opener.open(url)

print(response.read().decode("utf-8"))

cookie.save(ignore_discard=True, ignore_expires=True)

# cookie = cookiejar.MozillaCookieJar()

cookie = cookiejar.LWPCookieJar()

cookie.load(fielname, ignore_discard=True, ignore_expires=True)

handler = request.HTTPCookieProcessor(cookie)

opener = request.build_opener(handler)

response = opener.open(url)

print(response.read().decode("utf-8"))

7.3 服务端代码

@RequestMapping("/cookie")

public String cookie(HttpServletRequest request,

HttpServletResponse response,

@CookieValue(value = "pyck", required = false,defaultValue = "dfck") String pyck

){

Cookie[] cookies = request.getCookies();

if(cookies != null){

for(Cookie cookie : cookies){

System.out.println(cookie.getName() + " " + cookie.getValue());

}

}

Cookie cookie=new Cookie("pyck","happy");

response.addCookie(cookie);

System.out.println("pyck:" + pyck);

return pyck;

}

八、代理

from urllib.error import URLError

from urllib.request import ProxyHandler, build_opener

proxy = ProxyHandler({

"http": "http://127.0.0.1:7777",

"https": "http://127.0.0.1:8888"

})

opener = build_opener(proxy)

try:

response=opener.open("https://www.baidu.com")

print(response.read().decode("utf-8"))

except URLError as e:

print(e.reason)

九、Auth

这里的auth是指HTTPBasicAuth,HTTPBasicAuth一般是服务器实现的,直接配置的用户密码和权限,不是我们常见的登录,因为一般我们都是自己实现登录。

不过我们还是有必要了解一下HTTPBasicAuth,很多的监控组件不会自己实现登录注册,就会简单的使用服务器提供的HTTPBasicAuth,例如Tomcat的监控。

下面就介绍一下python中利用HTTPBasicAuth,先下载Tomcat,然后tomcat根目录下conf目录下的tomcat-users.xml,tomcat-users节点下添加:

<role rolename="admin-gui"/>

<role rolename="manager-gui"/>

<role rolename="manager-jmx"/>

<role rolename="manager-script"/>

<role rolename="manager-status"/>

<user username="tim" password="123456" roles="admin-gui,manager-gui,manager-jmx,manager-script,manager-status"/>

在tomcat的bin目录下执行startup脚本就可以启动

from urllib.request import HTTPPasswordMgrWithDefaultRealm

from urllib.request import HTTPBasicAuthHandler

from urllib.request import build_opener

from urllib import request, error

username = "tim"

password = "123456"

url = "http://localhost:8080/manager/status"

pwdMg = HTTPPasswordMgrWithDefaultRealm()

pwdMg.add_password(None, url, username, password)

auth_handler = HTTPBasicAuthHandler(pwdMg)

opener = build_opener(auth_handler)

try:

response = opener.open(url)

html = response.read().decode("utf8")

print(html)

except error.URLError as e:

print(e.reason)

# 没有auth,401

try:

response = request.urlopen(url)

except error.HTTPError as e:

print(e.reason, e.code, e.headers)

else:

print("success")

十、总结

以上是 Pythonurllib实用方法、属性、流程总结 的全部内容, 来源链接: utcz.com/z/511205.html

回到顶部