xpath爬虫百科网页返回结果为空，请问该如何解决？

Z时代
2024-02-07
分类：IT

xpath爬虫百科网页返回结果为空
xpath爬虫百科网页返回结果为空，请问该如何解决？

import urllib.request
import urllib.parse
from lxml import etree
def query(content):
    # 请求地址
    url = 'https://baike.baidu.com/item/' + urllib.parse.quote(content)
    # 请求头部
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
    }
    # 利用请求地址和请求头部构造请求对象
    req = urllib.request.Request(url=url, headers=headers, method='GET')
    # 发送请求，获得响应
    response = urllib.request.urlopen(req)
    # 读取响应，获得文本
    text = response.read().decode('utf-8')
    # 构造 _Element 对象
    html = etree.HTML(text)
    # 使用 xpath 匹配数据，得到匹配字符串列表
    sen_list = html.xpath('//div[contains(@class,"lemma-summary") or contains(@class,"lemmaWgt-lemmaSummary")]//text()')
    # 过滤数据，去掉空白
    sen_list_after_filter = [item.strip('\n') for item in sen_list]
    # 将字符串列表连成字符串并返回
    return ''.join(sen_list_after_filter)
if __name__ == '__main__':
    while (True):
        content = input('查询词语：')
        result = query(content)
        print("查询结果：%s" % result)

请赐教，不胜感激。

回答：

curl https://baike.baidu.com/item/叶挺 -v

* Uses proxy env variable NO_PROXY == '127.0.0.1,localhost'
*   Trying 157.255.77.133...
* TCP_NODELAY set
* Connected to baike.baidu.com (157.255.77.133) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: C=CN; ST=beijing; L=beijing; OU=service operation department; O=Beijing Baidu Netcom Science Technology Co., Ltd; CN=baidu.com
*  start date: Jul  5 05:16:02 2022 GMT
*  expire date: Aug  6 05:16:01 2023 GMT
*  subjectAltName: host "baike.baidu.com" matched cert's "*.baidu.com"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign RSA OV SSL CA 2018
*  SSL certificate verify ok.
> GET /item/%E5%8F%B6%E6%8C%BA HTTP/1.1
> Host: baike.baidu.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 302 Found
< Connection: keep-alive
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8
< Date: Mon, 16 Jan 2023 02:58:23 GMT
< Location: /item/%E5%8F%B6%E6%8C%BA/299649
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< P3p: CP=" OTI DSP COR IVA OUR IND COM "
< Server: nginx/1.8.0
< Set-Cookie: X_ST_FLOW=0; expires=Mon, 16-Jan-2023 03:08:23 GMT; Max-Age=600; path=/
< Set-Cookie: BAIDUID=5FC2411197265E4B5F181B1BB28C2293:FG=1; expires=Tue, 16-Jan-24 02:58:23 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
< Set-Cookie: BAIDUID=5FC2411197265E4BE03F24AD35285668:FG=1; expires=Tue, 16-Jan-24 02:58:23 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
<
* Connection #0 to host baike.baidu.com left intact* Closing connection 0

响应 302 需要处理重定向 Location: /item/%E5%8F%B6%E6%8C%BA/299649

以上是 xpath爬虫百科网页返回结果为空，请问该如何解决？的全部内容，来源链接： utcz.com/p/938722.html

xpath爬虫百科网页返回结果为空，请问该如何解决？

回答：

其他人也看了：