19

(2)请用requests库的get()函数访问如下一个网站20次,打印返回状态,text()内容,计算text()属性和content属性所返回网页内容的长度。(不同学号选做如下网页,必做及格)‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬

import requests

from lxml import etree

url='https://www.baidu.com/'

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3823.400 QQBrowser/10.7.4307.400'

}

req = requests.get(url=url, headers=headers)

req.encoding = 'utf-8'

a=req.text

b=req.content

print(req.text)

print(req.status_code)

print(len(str(a)))

print(len(str(b)))

for i in range(20):

req = requests.get(url=url, headers=headers)

print(req.status_code)

这是一个简单的html页面,请保持为字符串,完成后面的计算要求。(良好)

19

爬中国大学排名网站内容,http://www.zuihaodaxue.com/zuihaodaxuepaiming2018.html‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬

import requests

from lxml import etree

import csv

url='https://www.shanghairanking.cn/rankings/bcur/201911'

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3823.400 QQBrowser/10.7.4307.400'

}

req=requests.get(url=url,headers=headers)

req.encoding='utf-8'

# print(req.text)

html=etree.HTML(req.text)

rank=html.xpath("//td[@class='align-left']/a/text()")

r=1

with open(r'E:\python\test.csv', 'w', newline='')as f:

csv_write = csv.writer(f, dialect='excel')

csv_write.writerow(['rank','name'])

for i in rank:

item=[]

item.append(r)

item.append(i)

r = r + 1

print(item)

csv_write.writerow(item)

 

19


‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪

以上是 19 的全部内容, 来源链接: utcz.com/a/76307.html

回到顶部