Python爬虫实战:网易云音乐爬取[Python基础]

python

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理。

本次目标

爬取网易云音乐

https://music.163.com/

 

环境

  • python 3.6
  • pycharm

爬虫代码

导入工具

import requests

import re

请求网站、解析网站数据

def get_music_url(music_id, music_title):

url = "https://api.zhuolin.wang/api.php"

headers = {

"Accept": "*/*",

"Accept-Encoding": "gzip, deflate, br",

"Accept-Language": "zh-CN,zh;q=0.9",

"Cache-Control": "no-cache",

"Connection": "keep-alive",

"Cookie": "UM_distinctid=175aca5b31d39e-06d658eceb014a-3962420d-1fa400-175aca5b31e92e",

"Host": "api.zhuolin.wang",

"Pragma": "no-cache",

"Referer": "https://music.zhuolin.wang/",

"Sec-Fetch-Dest": "script",

"Sec-Fetch-Mode": "no-cors",

"Sec-Fetch-Site": "same-site",

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",

}

params = {

"callback": "jQuery111305698848623906863_1604919341715",

"types": "url",

"id": "{}".format(music_id),

"source": "netease",

"_": "1604919341751",

}

response = requests.get(url=url, params=params, headers=headers)

html_data = response.text

if music_url == "":

print("无音频下载链接")

def music_id():

url = "https://music.163.com/discover/toplist"

headers = {

"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"

}

response = requests.get(url=url, headers=headers)

lis = re.findall("<li><a href="(.*?)">(.*?)</a></li>", response.text, re.S)[0:100]

for i in lis:

music_id = i[0].split("id=")[-1]

title = i[1]

pattern = re.compile(r"[/:*?"<>|]") # "/ : * ? " < > |"

music_title = re.sub(pattern, "_", title) # 替换为下划线

get_music_url(music_id, music_title)

保存数据

else:

path = "保存地址" + music_title + ".mp3"

response = requests.get(url=music_url)

with open(path, mode="wb") as f:

f.write(response.content)

print(music_title, music_url)

运行代码,结果如下图

 

以上是 Python爬虫实战:网易云音乐爬取[Python基础] 的全部内容, 来源链接: utcz.com/z/530611.html

回到顶部