♥各位如果想要交流的话,可以加下QQ交流群:974178910,里面有各种你想要的学习资料。♥
♥欢迎大家关注公众号【不温卜火】,关注公众号即可以提前阅读又可以获取各种干货哦,同时公众号每满1024及1024倍数则会抽奖赠送机械键盘一份+IT书籍1份哟~♥
前三篇博文主要讲解了如何破解三种加密方式的方法。但是我们在爬取的时候是不是发现其实是挺繁琐的。那么接下来的这篇文章我给大家介绍的就是能够方便我们抓取网页内容的fiddler抓包工具。
爬虫是爬取看到的数据(可见即可爬),有些app或web的数据直接通过网页不好分析,这个时候需要使用fiddler工具帮助分析请求和响应。
fiddler作为客户端和服务端的中间代理,可以找到请求也可以找到响应。
说了这么多,下面我们先来下载fiddler:
fiddler中文版下载地址(非官方):http://www.32r.com/soft/43364.html
下载完成后打开如下图:
我们下载完成之后是不是不会使用?没关系,在此博主给出抓包工具总结。
fiddler抓包工具总结:https://www.cnblogs.com/yyhh/p/5140852.html
下面我们需要修改的部分配置
fiddler如果抓取本地电脑web端的数据,代理就需要设置了,默认会代理本地的浏览器。
前面介绍了那么多,下面就可以开始进入正题了。
下面下给出QQ音乐的Web端的网址:https://y.qq.com/
但是因为版权原因,有些音乐需要会员才能播放,建议使用会员账号登录,然后获取cookie。
Web端:搜索框—>自己喜欢的音乐(eg:春娇与志明)—>选择一个首歌播放—>等待加载完成。
再次查看抓包工具端: ctrl+F(搜索mp4格式)查看类型—> 选择最大的文件保存到本地
博主本人是让文件保存到桌面上,你们测试的时候自己保存到自己常用的地方。然后用系统自带的播放器即可打开。如果下载正确的话,是可以能够正常听到音乐的。如下图:
我们抓包成功以后,就可以开始分析链接了。
通过上图,我们可以看到上下的Raw分别代表的是请求和响应。不过我们不用管下方的响应。
在请求框中,我们发现GET是可直接访问的,这个时候我们可以点击打开这个链接(首次打开的时候会有风险提示,我们只需继续访问就可以了)
URL如下:
https://123.6.21.20/amobile.music.tc.qq.com/C400003K4R1k1UFHot.m4a?guid=455328485&vkey=989DD22B314AA8F897A78B189A6599625AF831C0160449364EA8300956E1B684F72196D02C0170DA55F6E8215E5D9782DF0365EFE921E7BD&uin=4116&fromtag=66
我们进入这个URL之后,如果能播放音乐代表我们找的是正确的。如下图:
?找到了链接,接下来我们分析这个URL的参数,参数在右侧的WebForms内:
上图中的这个url对应的响应就是需要抓取的数据,下面我们多测试几个音乐,观察参数。
我们再来抓取另一首歌曲的的相关信息。
经过多次的对比发现vkey不同,那么vkey怎么获取到呢?
找啊找,终于在上图中的的url的响应中找到了vk,其实直接找purl更方便
那么图上的url,怎么发送请求,又要观察它的参数
经过多次的对比发现,有多个参数不同,特别是songmid也不同,songmid怎么获取呢?找啊找,终于在搜索页面找到了songmid
在上图中我们看到songmid不同,但是为了更近一步验证。我们先对其格式化一下。
先给出JSON在线解析网站: https://www.sojson.com/ 下面进行JSON解析
通过格式化之后,我们看着就很明显了。
下面我们需要找的就是搜索页面每首歌曲的ID
通过抓包我们发现这个返回的是个JSON字符串
通过对比我们发现此部分正是我们需要的部分。下面我们来找下歌曲的mid
首先我们先来看下完整网址
https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.center&searchid=41287009571950165&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w=%E6%98%A5%E5%A8%87%E4%B8%8E%E5%BF%97%E6%98%8E&g_tk_new_20200303=5381&g_tk=5381&loginUin=459804692&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0
打开如下图所示:
我们发现参数有很多那么可以我们尝试只添加名称看能否正常访问。链接如下:
https://c.y.qq.com/soso/fcgi-bin/client_search_cp?w=春娇与志明
?那么我们就可以知道精简过后的URL
网址:https://c.y.qq.com/soso/fcgi-bin/client_search_cp 精简后的参数:w=春娇与志明 获取:歌曲名字和mid
# encoding: utf-8
'''
@author 李华鑫
@create 2020-10-21 18:06
Mycsdn:https://buwenbuhuo.blog.csdn.net/
@contact: 459804692@qq.com
@software: Pycharm
@file: test.py
@Version:1.0
'''
import requests
import re
import os
from lxml import etree
class QQMusicSpider:
def __init__(self):
self.url1 = "https://c.y.qq.com/soso/fcgi-bin/client_search_cp"
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
"Referer": "https://y.qq.com/",
"Cookie": "pgv_pvi=2567757824; ts_uid=64446952; pgv_pvid=6809245960; userAction=1; RK=0ID4VI/yGY; ptcz=6f4eaa825e99af36068b5d51b8e34c1c0087b9c95c44f7b062d5cff7b3c70d22; euin=owSF7e4kNKnz; qm_keyst=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; uin=278455900; tmeLoginType=2; psrf_qqopenid=8E8FC9D9774A8788D4DF1FD73F2B8D88; psrf_qqaccess_token=22EEB4580A1D37E9AB61B4A2CD6FC0CE; psrf_qqunionid=; qqmusic_key=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; psrf_musickey_createtime=1602572393; psrf_access_token_expiresAt=1610348393; psrf_qqrefresh_token=1873C9B55CC272116295F7B5AC7DE014; yqq_stat=0; pgv_info=ssid=s3690755542; ts_refer=ADTAGmyqq; pgv_si=s1594386432; yplayer_open=1; qqmusic_fromtag=66; yq_playschange=0; yq_playdata=; player_exist=1; yq_index=1; ts_last=y.qq.com/n/yqq/song/002t78Qs1Av9Kn.html"
}
self.w = ""
def get_search_content(self):
"""获取搜索结果"""
data = self.parse_content(url=self.url1, headers=self.headers, params={"w": self.w}).decode("utf-8")
song_mid_list = re.findall(r'"songmid":"(.*?)"', data)
song_name_list = re.findall(r'"songname":"(.*?)"', data)
return list(zip(song_name_list, song_mid_list))
def parse_content(self, url, headers, params={}):
"""解析url,返回响应字节"""
response = requests.get(url, headers=headers, params=params)
return response.content
def parse_json(self, url, headers, params={}):
"""解析url,返回json"""
response = requests.get(url, headers=headers, params=params)
return response.json()
def parse_xpath(self, html):
"""使用xpath解析html,返回xpath对象"""
etree_obj = etree.HTML(html)
return etree_obj
def start(self):
"""开始爬虫"""
self.w = input("请输入搜索的歌曲/歌手:")
search_data = self.get_search_content()
for index,value in enumerate(search_data):
print("({}){}".format(index+1,value[0]))
index = int(input("请输入歌曲编号进行下载:"))
print(search_data[index-1])
if __name__ == '__main__':
QQMusicSpider().start()
经过上一小部分的代码测试,我们已经能够获取到mid和歌手信息。下面我们根据歌曲的mid搜索vk。
下面我们现在抓包工具中找到vk的URL
https://u.y.qq.com/cgi-bin/musics.fcg?-=getplaysongvkey6182361656698969&g_tk=5381&sign=zzancg5ohny8sa7re612a04ee9b6b58b82bcdaa064240f964&loginUin=459804692&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8¬ice=0&platform=yqq.json&needNewCode=0&data=%7B%22req%22%3A%7B%22module%22%3A%22CDN.SrfCdnDispatchServer%22%2C%22method%22%3A%22GetCdnDispatch%22%2C%22param%22%3A%7B%22guid%22%3A%22455328485%22%2C%22calltype%22%3A0%2C%22userip%22%3A%22%22%7D%7D%2C%22req_0%22%3A%7B%22module%22%3A%22vkey.GetVkeyServer%22%2C%22method%22%3A%22CgiGetVkey%22%2C%22param%22%3A%7B%22guid%22%3A%22455328485%22%2C%22songmid%22%3A%5B%22004Uln1G2Aunqw%22%5D%2C%22songtype%22%3A%5B0%5D%2C%22uin%22%3A%22459804692%22%2C%22loginflag%22%3A1%2C%22platform%22%3A%2220%22%7D%7D%2C%22comm%22%3A%7B%22uin%22%3A459804692%2C%22format%22%3A%22json%22%2C%22ct%22%3A24%2C%22cv%22%3A0%7D%7D
我们打开之后如下图:
下面我们找到data内的数据并复制出来
{"req_0":{"module":"vkey.GetVkeyServer","method":"CgiGetVkey","param":{"guid":"455328485","songmid":["%s"],"songtype":[0],"uin":"0","loginflag":1,"platform":"20"}},"comm":{"uin":0,"format":"json","ct":24,"cv":0}}
?说明我们测试是成功的。
网址:https://u.y.qq.com/cgi-bin/musics.fcg 精简后的参数:data=xxx,这里参数有有{}注意参数拼接问题 获取:purl
import requests
import re
import os
from lxml import etree
class QQMusicSpider:
def __init__(self):
self.url1 = "https://c.y.qq.com/soso/fcgi-bin/client_search_cp"
self.url2 = 'https://u.y.qq.com/cgi-bin/musicu.fcg?data={"req_0":{"module":"vkey.GetVkeyServer","method":"CgiGetVkey","param":{"guid":"602087500","songmid":["%s"],"songtype":[0],"uin":"0","loginflag":1,"platform":"20"}},"comm":{"uin":0,"format":"json","ct":24,"cv":0}}'
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
"Referer": "https://y.qq.com/",
"Cookie": "pgv_pvi=2567757824; ts_uid=64446952; pgv_pvid=6809245960; userAction=1; RK=0ID4VI/yGY; ptcz=6f4eaa825e99af36068b5d51b8e34c1c0087b9c95c44f7b062d5cff7b3c70d22; euin=owSF7e4kNKnz; qm_keyst=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; uin=278455900; tmeLoginType=2; psrf_qqopenid=8E8FC9D9774A8788D4DF1FD73F2B8D88; psrf_qqaccess_token=22EEB4580A1D37E9AB61B4A2CD6FC0CE; psrf_qqunionid=; qqmusic_key=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; psrf_musickey_createtime=1602572393; psrf_access_token_expiresAt=1610348393; psrf_qqrefresh_token=1873C9B55CC272116295F7B5AC7DE014; yqq_stat=0; pgv_info=ssid=s3690755542; ts_refer=ADTAGmyqq; pgv_si=s1594386432; yplayer_open=1; qqmusic_fromtag=66; yq_playschange=0; yq_playdata=; player_exist=1; yq_index=1; ts_last=y.qq.com/n/yqq/song/002t78Qs1Av9Kn.html"
}
self.w = ""
def get_search_content(self):
"""获取搜索结果"""
data = self.parse_content(url=self.url1, headers=self.headers, params={"w": self.w}).decode("utf-8")
song_mid_list = re.findall(r'"songmid":"(.*?)"', data)
song_name_list = re.findall(r'"songname":"(.*?)"', data)
return list(zip(song_name_list, song_mid_list))
def get_vk(self, songmid):
"""根据歌曲的mid搜索vk"""
data = self.parse_json(url=self.url2 % songmid, headers=self.headers)
print(data)
def parse_content(self, url, headers, params={}):
"""解析url,返回响应字节"""
response = requests.get(url, headers=headers, params=params)
return response.content
def parse_json(self, url, headers, params={}):
"""解析url,返回json"""
response = requests.get(url, headers=headers, params=params)
return response.json()
def parse_xpath(self, html):
"""使用xpath解析html,返回xpath对象"""
etree_obj = etree.HTML(html)
return etree_obj
def start(self):
"""开始爬虫"""
self.w = input("请输入搜索的歌曲/歌手:")
search_data = self.get_search_content()
for index,value in enumerate(search_data):
print("({}){}".format(index+1,value[0]))
index = int(input("请输入歌曲编号进行下载:"))
self.get_vk(search_data[index-1][1])
if __name__ == '__main__':
QQMusicSpider().start()
# 代码如下
data["req_0"]["data"]["midurlinfo"][0]["purl"]
理论上到这里就可以获取到我们所需要的URL。
首先我们回忆下刚开始使用抓包工具抓取的音乐的URL
我们把url复制出来:
https://123.6.21.20/amobile.music.tc.qq.com/C400003K4R1k1UFHot.m4a?guid=455328485&vkey=EA9D246E2EB9BDCFFBF840E6479B26495C97E5C0EE12BC8FA81BDAEFB47399C5169A4813DA033FA9FEE710887216632680B11940CFC77F64&uin=0&fromtag=66
下面我们看下我们上一步中截取出来的purl
把它复制出来
C400003K4R1k1UFHot.m4a?guid=602087500&vkey=D81DEE79601184E377220701E1EB24FA3B4F9B869F8C4D77E8A82ACAA73A088D8A6FB8C934D8430328B3FA93628F4B723AC0759214464A6F&uin=0&fromtag=66
下面我们拼接一下url试试:
https://123.6.21.20/amobile.music.tc.qq.com/C400003K4R1k1UFHot.m4a?guid=602087500&vkey=D81DEE79601184E377220701E1EB24FA3B4F9B869F8C4D77E8A82ACAA73A088D8A6FB8C934D8430328B3FA93628F4B723AC0759214464A6F&uin=0&fromtag=66
?我们拼接的和我们所想的是一样的。
网址:https://123.6.21.20/amobile.music.tc.qq.com/ 参数:拼接purl 获取:歌曲字节
import requests
import re
import os
from lxml import etree
requests.packages.urllib3.disable_warnings()
class QQMusicSpider:
def __init__(self):
self.url1 = "https://c.y.qq.com/soso/fcgi-bin/client_search_cp"
self.url2 = 'https://u.y.qq.com/cgi-bin/musicu.fcg?data={"req_0":{"module":"vkey.GetVkeyServer","method":"CgiGetVkey","param":{"guid":"602087500","songmid":["%s"],"songtype":[0],"uin":"0","loginflag":1,"platform":"20"}},"comm":{"uin":0,"format":"json","ct":24,"cv":0}}'
self.url3 = "https://123.6.21.20/amobile.music.tc.qq.com/"
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
"Referer": "https://y.qq.com/",
"Cookie": "pgv_pvi=2567757824; ts_uid=64446952; pgv_pvid=6809245960; userAction=1; RK=0ID4VI/yGY; ptcz=6f4eaa825e99af36068b5d51b8e34c1c0087b9c95c44f7b062d5cff7b3c70d22; euin=owSF7e4kNKnz; qm_keyst=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; uin=278455900; tmeLoginType=2; psrf_qqopenid=8E8FC9D9774A8788D4DF1FD73F2B8D88; psrf_qqaccess_token=22EEB4580A1D37E9AB61B4A2CD6FC0CE; psrf_qqunionid=; qqmusic_key=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; psrf_musickey_createtime=1602572393; psrf_access_token_expiresAt=1610348393; psrf_qqrefresh_token=1873C9B55CC272116295F7B5AC7DE014; yqq_stat=0; pgv_info=ssid=s3690755542; ts_refer=ADTAGmyqq; pgv_si=s1594386432; yplayer_open=1; qqmusic_fromtag=66; yq_playschange=0; yq_playdata=; player_exist=1; yq_index=1; ts_last=y.qq.com/n/yqq/song/002t78Qs1Av9Kn.html"
}
self.w = ""
def get_search_content(self):
"""获取搜索结果"""
data = self.parse_content(url=self.url1, headers=self.headers, params={"w": self.w}).decode("utf-8")
song_mid_list = re.findall(r'"songmid":"(.*?)"', data)
song_name_list = re.findall(r'"songname":"(.*?)"', data)
return list(zip(song_name_list, song_mid_list))
def get_purl(self, songmid):
"""根据歌曲的mid搜索vk"""
data = self.parse_json(url=self.url2 % songmid, headers=self.headers)
return data["req_0"]["data"]["midurlinfo"][0]["purl"]
def parse_content(self, url, headers, params={}):
"""解析url,返回响应字节"""
response = requests.get(url, headers=headers, params=params)
return response.content
def parse_json(self, url, headers, params={}):
"""解析url,返回json"""
response = requests.get(url, headers=headers, params=params)
return response.json()
def parse_xpath(self, html):
"""使用xpath解析html,返回xpath对象"""
etree_obj = etree.HTML(html)
return etree_obj
def start(self):
"""开始爬虫"""
self.w = input("请输入搜索的歌曲/歌手:")
search_data = self.get_search_content()
for index,value in enumerate(search_data):
print("({}){}".format(index+1,value[0]))
index = int(input("请输入歌曲编号进行下载:"))
purl = self.get_purl(search_data[index-1][1])
print(self.url3+purl)
if __name__ == '__main__':
QQMusicSpider().start()
# encoding: utf-8
'''
@author 李华鑫
@create 2020-10-24 14:37
Mycsdn:https://buwenbuhuo.blog.csdn.net/
@contact: 459804692@qq.com
@software: Pycharm
@file: QQ音乐.py
@Version:1.0
'''
import requests
import re
import os
from lxml import etree
class QQMusicSpider:
def __init__(self):
self.url1 = "https://c.y.qq.com/soso/fcgi-bin/client_search_cp"
self.url2 = 'https://u.y.qq.com/cgi-bin/musicu.fcg?data={"req_0":{"module":"vkey.GetVkeyServer","method":"CgiGetVkey","param":{"guid":"602087500","songmid":["%s"],"songtype":[0],"uin":"0","loginflag":1,"platform":"20"}},"comm":{"uin":0,"format":"json","ct":24,"cv":0}}'
self.url3 = "https://123.6.21.20/amobile.music.tc.qq.com/"
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
"Referer": "https://y.qq.com/",
"Cookie": "pgv_pvi=2567757824; ts_uid=64446952; pgv_pvid=6809245960; userAction=1; RK=0ID4VI/yGY; ptcz=6f4eaa825e99af36068b5d51b8e34c1c0087b9c95c44f7b062d5cff7b3c70d22; euin=owSF7e4kNKnz; qm_keyst=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; uin=278455900; tmeLoginType=2; psrf_qqopenid=8E8FC9D9774A8788D4DF1FD73F2B8D88; psrf_qqaccess_token=22EEB4580A1D37E9AB61B4A2CD6FC0CE; psrf_qqunionid=; qqmusic_key=Q_H_L_2a6K5_50eO5-itw83Uio7kRWq8WgGbPI1BGlM1c8ng4qZt_ZAxTRmHb8B4SQBWD; psrf_musickey_createtime=1602572393; psrf_access_token_expiresAt=1610348393; psrf_qqrefresh_token=1873C9B55CC272116295F7B5AC7DE014; yqq_stat=0; pgv_info=ssid=s3690755542; ts_refer=ADTAGmyqq; pgv_si=s1594386432; yplayer_open=1; qqmusic_fromtag=66; yq_playschange=0; yq_playdata=; player_exist=1; yq_index=1; ts_last=y.qq.com/n/yqq/song/002t78Qs1Av9Kn.html"
}
self.w = ""
def get_search_content(self):
"""获取搜索结果"""
data = self.parse_content(url=self.url1, headers=self.headers, params={"w": self.w}).decode("utf-8")
song_mid_list = re.findall(r'"songmid":"(.*?)"', data)
song_name_list = re.findall(r'"songname":"(.*?)"', data)
return list(zip(song_name_list, song_mid_list))
def get_purl(self, songmid):
"""根据歌曲的mid搜索vk"""
data = self.parse_json(url=self.url2 % songmid, headers=self.headers)
return data["req_0"]["data"]["midurlinfo"][0]["purl"]
def save_song(self, url,song_name):
"""下载歌曲"""
print("-"*100)
filename = song_name+".mp3"
print("{}下载中...".format(filename))
data = self.parse_content(url=url, headers=self.headers)
if data:
with open(filename,"wb") as file:
file.write(data)
print("{}下载完毕".format(filename))
else:
print("{}下载失败,建议豪华绿钻用户使用会员账号修改程序cookie".format(filename))
print("-" * 100)
def parse_content(self, url, headers, params={}):
"""解析url,返回响应字节"""
response = requests.get(url, headers=self.headers, params=params,verify=False)
return response.content
def parse_json(self, url, headers, params={}):
"""解析url,返回json"""
response = requests.get(url, headers=headers, params=params,verify=False)
return response.json()
def parse_xpath(self, html):
"""使用xpath解析html,返回xpath对象"""
etree_obj = etree.HTML(html)
return etree_obj
def start(self):
"""开始爬虫"""
self.w = input("输入要搜索的歌曲/歌手:")
search_data = self.get_search_content()
while True:
for index, value in enumerate(search_data):
print("({}){}".format(index + 1, value[0]))
print("(0)退出程序")
index = int(input("选择歌曲编号进行下载:"))
if index == 0:
break
purl = self.get_purl(search_data[index - 1][1])
self.save_song(url=self.url3 + purl,song_name=search_data[index - 1][0])
if __name__ == '__main__':
QQMusicSpider().start()