Python爬虫训练:爬取酷燃网视频数据

python

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理

项目目标

爬取酷燃网视频数据

https://krcom.cn/

 

环境

Python3.6

pycharm

 

爬虫代码

import pprint

import requests

import re

def download_video(title, url):

filename_video = "C:UsersAdministratorDesktop酷燃网" + title + ".mp4"

response_video = requests.get(url=url)

with open(filename_video, mode="wb") as f:

f.write(response_video.content)

def download_mp3(title, url):

filename_mp3 = "C:UsersAdministratorDesktop酷燃网" + title + ".mp3"

response_mp3 = requests.get(url=url)

with open(filename_mp3, mode="wb") as f:

f.write(response_mp3.content)

for page in range(0, 101, 20):

url = "https://krcom.cn/aj/hot/loadingmore?ajwvr=6&cursor=0;2020102014&YmdH=&__rnd=1603176486876"

headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"

}

response = requests.get(url=url, headers=headers)

html_data = response.text.encode("utf-8").decode("unicode_escape")

urls = re.findall("vid=(.*?)"", html_data, re.S)

titles = re.findall("<h3 class="V_autocut_2l">(.*?)<", html_data, re.S)

data = zip(urls, titles)

for i in data:

vid = i[0]

title = i[1]

page_url = "https://krcom.cn/aj/dash/media?media_ids={}&protocols=dash&watermarks=krcom".format(vid)

response_2 = requests.get(url=page_url, headers=headers)

html_json = response_2.json()

video_url = html_json["data"]["list"][0]["details"][1]["play_info"]["url"]

mp3_url = html_json["data"]["list"][0]["details"][-1]["play_info"]["url"]

download_video(title, video_url)

download_mp3(title, mp3_url)

print(title)

以上是 Python爬虫训练:爬取酷燃网视频数据 的全部内容, 来源链接: utcz.com/z/530999.html

回到顶部