爬取apkpure网站,headers已经完全照搬浏览器数据requests发起请求为啥还是返回403?
代码如下,有无大佬解答 orz
import requestsurl = 'https://d.apkpure.com/b/APK/tv.danmaku.bilibilihd?version=latest'
# url = 'https://apkpure.com/cn/bi-li-bi-li-hd/tv.danmaku.bilibilihd/download'
headers = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36',
'Referer': 'https://apkpure.com/cn/bi-li-bi-li-hd/tv.danmaku.bilibilihd/download',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Cookie': '_apk_uid=33BXPQ2B2D4dwk21M25xf5cBNyZmNZEb; apkpure__lang=cn; apkpure__country=SG; __gsas=ID=673f0f7a0a46f976:T=1682566363:S=ALNI_MaMd8xd1x6YMPK5-rwWI5cK-NBr8g; _qimei=attar72DjXiABtN49PwA89dnMC0Mm952; g_state={"i_p":1683345673685,"i_l":1}; apkpure__policy_review=20180525; recommend_id=; apkpure__sample=0.8095019481907848; _dt_sample=0.3399179836773658; _dt_referrer_fix=0.793447613425174; _tag_sample=0.9914902904015179; _home_article_entry_sample=0.3423054671955059; _related_recommend=0.7101508526816152; _download_detail_sample=0.1538198911838038; _f_sp=993198767; _gid=GA1.2.1389546599.1688103882; download_id=1086909951247822; m1=19539; m2=fdf23cba2c548d13a95bc4edd58f669c; apkpure__next=/cn/bi-li-bi-li-hd/tv.danmaku.bilibilihd/download; _usi=s:dd50b518b9000f69890b75e64f334863285493385f65340fb873b24922151af0.rjLEMaPiYFEdP54GVVtrXHzlxwG9uExUPxvMtBmjDh0; _user_tag=j:{"language":"cn","source_language":"zh-CN","country":"SG"}; __gpi=UID=00000be5e508338b:T=1682566355:RT=1688113399:S=ALNI_MZngMPfqB_NNxJHA02IooXEANtiYw; __gads=ID=ea35821c8c3a777a-22cec57fb5df0018:T=1682566355:RT=1688113399:S=ALNI_MbpNpIXZxmiNm7rAEaUjiI4ZV0HTw; translate-token=MTY4ODExMzQ3MjMxOQ==; FCNEC=[["AKsRol_AzE7pwI8N-CyJIojidVZomCi52Mou9SjVdhwFSzJxOmOGA2c9ayhF0z6XU9T8PpIp7khxOsFKiW3NcLztwBPwGK2ILwDWxQxJalAdGzEUJPXJS9TTaxJoRhh7xfztwGgw03SAR2ZEnnYuzqU32jipA4UlvA=="],null,[]]; _ga=GA1.2.1726778835.1682566355; _apk_sid=1.1.1688113397076.30.6.1688113488603.-480; _client_id=GA1.2.1726778835.1682566355; _ga_NT1VQC8HKJ=GS1.1.1688113399.31.1.1688113519.11.0.0',
'Host': 'd.apkpure.com',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?1',
'sec-ch-ua-platform': '"Android"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-site',
'Upgrade-Insecure-Request': '1'
}
response = requests.get(url=url, headers=headers)
print(response.status_code)
回答:
先分享一共有意思的细节,在解答这个问题之前,习惯性的准备看看熟悉的apkpure官网最底部(本意目的是想看看网站的法律协议对于爬取规则是否有界定,比如说认为获取为非法行为),结果发现这个网站已经被和谐了。搜索结果也无法找到。这里分享两个搜索的截图。
因此,这是第一个可能存在的问题,第二个可能的问题我想可能是触发了反爬机制,比如说请求的次数多,请求的头部信息中包含的浏览器数据内容多,都可能是触发拦截。
所以爬虫上可以做一点改进:
import requestsimport time
import random
url = 'https://d.apkpure.com/b/APK/tv.danmaku.bilibilihd?version=latest'
headers = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Mobile Safari/537.36',
'Referer': 'https://apkpure.com/cn/bi-li-bi-li-hd/tv.danmaku.bilibilihd/download',
}
session = requests.Session()
session.headers.update(headers)
try:
response = session.get(url)
response.raise_for_status()
print(response.status_code)
except requests.exceptions.RequestException as e:
print("请求发生异常:", e)
# 添加随机延迟
time.sleep(random.uniform(0.5, 1))
解释一下,这段代码是在你的代码基础上修改的,其实是做的简化删减,只保留必要的User-Agent,简化了一部分header,同时在末尾的地方,加了 一个0.5秒到1秒的随机延迟,这一点其实不应该陌生,网站的反爬机制一般对于固定时间请求获取的都会有怀疑和拦截,这么做可以一定程度的绕过,当然,可以将时间范围调整更宽,频率更低。
最后一点,建议是在代理环境使用吧,如果你在当前环境使用,被ban掉访问的可能性很大。
以上是 爬取apkpure网站,headers已经完全照搬浏览器数据requests发起请求为啥还是返回403? 的全部内容, 来源链接: utcz.com/p/938929.html