python 数据抓取
from pyquery import PyQuery as pq from proxies import proxy
import requests
from fake\_useragent import UserAgent
ua=UserAgent()
headers = {"user-agent": ua.random}
def url\_open(url):
headers = {'User-Agent':ua.random}
response = requests.get(url,headers\=headers,proxies\=proxy())
if response.status\_code==200:
return response
while(response.status\_code==401):
print("再来一次")
response = requests.get(url, headers\=headers, proxies\=proxy())
if response.status\_code==200:
return response
response = url\_open("https://china.guidechem.com/datacenter/msds/c/7821.html")
doc = pq(response.text)
dic = {}
dicc={}
for i in range(100):
if i<15:
dic\[doc("table:eq(3) tr:eq(%d) td:eq(0)"%i).text()\] = doc("table:eq(3) tr:eq(%d) td:eq(1)"%i).text()
else:
dicc\[doc("table:eq(3) tr:eq(%d) td:eq(0)" % i).text()\] = doc("table:eq(3) tr:eq(%d) td:eq(1)" % i).text()
print(dic)
print(dicc)
现在这个网站很多页面,好像每个页面规则不一样
比如
https://china.guidechem.com/datacenter/msds/c/756.htmlhttps://china.guidechem.com/datacenter/msds/c/7821.html
这两个页面的756.html 匹配的数据错位了,能不能同时一个规则这两个都能匹配正确呢,数据不错哇
以上是 python 数据抓取 的全部内容, 来源链接: utcz.com/a/158210.html