python 数据抓取

python 数据抓取

from pyquery import PyQuery as pq  

from proxies import proxy

import requests

from fake\_useragent import UserAgent

ua=UserAgent()

headers = {"user-agent": ua.random}

def url\_open(url):

headers = {'User-Agent':ua.random}

response = requests.get(url,headers\=headers,proxies\=proxy())

if response.status\_code==200:

return response

while(response.status\_code==401):

print("再来一次")

response = requests.get(url, headers\=headers, proxies\=proxy())

if response.status\_code==200:

return response

response = url\_open("https://china.guidechem.com/datacenter/msds/c/7821.html")

doc = pq(response.text)

dic = {}

dicc={}

for i in range(100):

if i<15:

dic\[doc("table:eq(3) tr:eq(%d) td:eq(0)"%i).text()\] = doc("table:eq(3) tr:eq(%d) td:eq(1)"%i).text()

else:

dicc\[doc("table:eq(3) tr:eq(%d) td:eq(0)" % i).text()\] = doc("table:eq(3) tr:eq(%d) td:eq(1)" % i).text()

print(dic)

print(dicc)

现在这个网站很多页面,好像每个页面规则不一样
比如

https://china.guidechem.com/datacenter/msds/c/756.html

https://china.guidechem.com/datacenter/msds/c/7821.html

这两个页面的756.html 匹配的数据错位了,能不能同时一个规则这两个都能匹配正确呢,数据不错哇

以上是 python 数据抓取 的全部内容, 来源链接: utcz.com/a/158210.html

回到顶部