b站视频排行榜爬取

2023-05-16

bilibili排行榜爬取

众所周知,B站学习软件。哈哈哈哈,今天我们就爬取B站的排行榜。废话不多说了,直接开始了。

#分析:
我们看图一可以发现每个是视频的info都在li的标签里,我可以用xpath得到,在这里我想获得视频的封面,播放量,综合得分以及视频链接;除了封面,其它的都可以得到,后来我在另一个另一个链接中发现了,我在后面会讲到。
图一:
在这里插入图片描述

我们点开视频链接,进入视频播放页,F12一下,点击network,让视频播放,会发现有许多xhr文件不断刷新(如图二文件),它以m4s结尾
图二:
在这里插入图片描述

我们可推断视频是每段小段m4s的文件结合起来。我复制其中一个链接,打开后,如图三
图三:
在这里插入图片描述
这时我们该想另一件事,即使我们能获得这个文件,我们该怎么获取这样一个个链接,我找了好大一会,找不到,那我们就应该换一种思路,是不是有一个完整的视频链接,它会保存到什么地方,最后被我找到了,它其实隐藏在一开始的elements中,这是我们在里面搜索一下window,会发现图四:
图四:
在这里插入图片描述
这时我们可以打开页面源码,把进行查看,我第一眼感觉他是json文件,这里我们可以用正则获取,我们分析一下:


dic={"code":0,"message":"0","ttl":1,"data":{"from":"local","result":"suee","message":"","quality":80,"format":"flv","timelength":146787,"accept_format":"hdflv2,flv,flv720,flv480,mp4","accept_description":["高清 1080P+","高清 1080P","高清 720P","清晰 480P","流畅 360P"],"accept_quality":[112,80,64,32,16],"video_codecid":7,"seek_param":"start","seek_type":"offset","dash":{"duration":147,"minBufferTime":1.5,"min_buffer_time":1.5,"video":[{"id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":1288827,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640032","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1005","indexRange":"1006-1385"},"segment_base":{"initialization":"0-1005","index_range":"1006-1385"},"codecid":7},{"id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":777178,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1178","indexRange":"1179-1558"},"segment_base":{"initialization":"0-1178","index_range":"1179-1558"},"codecid":12},{"id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":937924,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640028","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1003","indexRange":"1004-1383"},"segment_base":{"initialization":"0-1003","index_range":"1004-1383"},"codecid":7},{"id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":567464,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{"initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{"id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":557917,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001F","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1007","indexRange":"1008-1387"},"segment_base":{"initialization":"0-1007","index_range":"1008-1387"},"codecid":7},{"id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":339786,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1182","indexRange":"1183-1562"},"segment_base":{"initialization":"0-1182","index_range":"1183-1562"},"codecid":12},{"id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":217071,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{"initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{"id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":353246,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001E","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{"Initialization":"0-1028","indexRange":"1029-1408"},"segment_base":{"initialization":"0-1028","index_range":"1029-1408"},"codecid":7}],"audio":[{"id":30280,"baseUrl":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{"Initialization":"0-907","indexRange":"908-1299"},"segment_base":{"initialization":"0-907","index_range":"908-1299"},"codecid":0},{"id":30216,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":67328,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{"Initialization":"0-932","indexRange":"933-1324"},"segment_base":{"initialization":"0-932","index_range":"933-1324"},"codecid":0},{"id":30232,"baseUrl":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{"Initialization":"0-907","indexRange":"908-1299"},"segment_base":{"initialization":"0-907","index_range":"908-1299"},"codecid":0}]},"support_formats":[{"quality":112,"format":"hdflv2","new_description":"1080P 高码率","display_desc":"1080P","superscript":"高码率"},{"quality":80,"format":"flv","new_description":"1080P 高清","display_desc":"1080P","superscript":""},{"quality":64,"format":"flv720","new_description":"720P 高清","display_desc":"720P","superscript":""},{"quality":32,"format":"flv480","new_description":"480P 清晰","display_desc":"480P","superscript":""},{"quality":16,"format":"mp4","new_description":"360P 流畅","display_desc":"360P","superscript":""}]},"session":"b80375f9a61937c9ce93ee13909c1bca"}
for key,value in dic['data'].items():
    print(key,':',value)
print('===================================')
for key,value in dic['data']['dash'].items():
    print(key,':',value)
print('===================================')
for key,value in dic['data']['support_formats'][0].items():
    print(key,':',value)

dic是我们得到json数据,经过我一成一成剥开,发现他的视频与音频是两个文件,那就是分开的,我们可以下载后合成。我们看下我分析的结果:
图五:
在这里插入图片描述
accept_description指的是视频画质,accept_quality指的是视频画质对应的id,这里我没有会员,所以最高获取高清 1080的画质视频,视频文件在video的baseUrl中,音频文件在audio的baseUrl。
同时我带着试试的想法吧图一红线的那一串字符复制,在视频链接的elements中搜寻,居然找到(如图七),我打开了链接就是原先封面,并且我在其它视频链接中试试,得到的都是视频封面,我们用正则就可以得到。
图七:
在这里插入图片描述
我们的的分析完成了,接下来上代码。

代码:

1:引入库


import re
from random import randint
import requests
from lxml import etree
from time import sleep
import json
import os

2:建立session,共享cookie


# 建立session
print('建立session')
session = requests.Session()
base_url = 'https://www.bilibili.com/'
base_headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'cookie': 自己的cookie,
    'referer': 'https://www.google.com/',
}
session.get(url=base_url, headers=base_headers)
sleep(randint(3,5))

3:爬取视频排行榜:(在这里我感觉headers加上referer是非常重要的,referer也就是你上一级网页链接)


# 爬取排行榜视频:
print('爬取排行榜视频')
dic={}
leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'
leaderboard_headers = {
    'referer': leaderboard_url,
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
    'cache-control': 'max-age=0',
}
response = session.get(url=leaderboard_url, headers=leaderboard_headers)
sleep(randint(3,5))
content = response.content
html = etree.HTML(content)
info_list = html.xpath('//ul[@class="rank-list"]/li')
for li in info_list:
    name = li.xpath('div[2]/div[2]/a/text()')[0]             #视频名字
    href = 'https:'+li.xpath('div[2]/div[2]/a/@href')[0]     #视频链接
    score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0]+'综合得分'               #综合得分
    play_volume=li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip()        #播放量
    list=[href,score,play_volume]
    dic[name]=list
    # print(name,href,score,play_volume)
    # print(dic)

在这里我把视频的name作为字典的key,而视频链接,综合得分,播放量放在列表里,list作为字典的value。

4:在这里我爬取时有时候session没法用,我就勇try一下,如果session可以,就不要except,不可以,我就勇request.get求求,不要忘了加入cookie。

我在进行爬取时,把视频链接与音频链接放入一个列表,再把这个列表放入前面的列表中


#得到音频链接
print('视频爬取')
video_headers={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'cache-control': 'max-age=0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'referer':leaderboard_url,
}
num=0
for i in dic.keys():
video_url=dic[i][0]
#获取封面链接
try:
    response=session.get(url=video_url,headers=video_headers)
except:
    video_headers['cookie'] = 自己的cookie,
    response=requests.get(url=video_url,headers=video_headers)
text = response.text
img_url=re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">',text).group(1)
dic[i].append(img_url)              #照片链接添加到列表里
data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)
data = json.loads(data)
# print(data)

try:
    time = data['data']['dash']['duration']
    minute = int(time) // 60
    second = int(time) % 60
    #视频链接
    video_url = data['data']['dash']['video'][0]['baseUrl']
    #音频链接
    audio_url = data['data']['dash']['audio'][0]['baseUrl']
    list=[video_url,audio_url]
    dic[i].append(list)
    print(video_url)
    print(audio_url)
    print('视频时长{}分{}秒'.format(minute, second))
except KeyError:
    time = data['data']['timelength'] // 1000
    minute = int(time) // 60                   # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。
    second = int(time) % 60
    video_url = data['data']['durl'][0]['url']
    list = [video_url]
    dic[i].append(list)
    print('视频时长{}分{}秒'.format(minute, second))  

5:视频音频下载

'origin': 'https://www.bilibili.com',
'referer': 'https://www.bilibili.com/',

都有这两个,然后我添加进去成功了


#下载视频与音频
print('下载')
headers={
    'cookie':自己的cookie,
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
    'origin': 'https://www.bilibili.com',
    'referer': 'https://www.bilibili.com/',
}

path=r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)
bool=mkdir(path)
if bool==1:
    video_path=path+'\_video.mp4'
    audio_path=path+'\_audio.mp4'
    save_path=path+'\{}.mp4'.format(num)
    info_path=path+'\{}.text'.format(num)
    img_path=path+'\{}.png'.format(num)
    num += 1
    print('{}视频开始爬取'.format(i))

    with open(video_path, 'wb') as f:  # 视频部分
        response = requests.get(dic[i][-1][0], headers=headers)
        print(response.status_code)
        f.write(response.content)
    print('{}视频爬取完成'.format(i))

    print('{}音频开始爬取'.format(i))
    with open(audio_path, 'wb') as f:  # 音频部分
        response = requests.get(dic[i][-1][-1], headers=headers)
        f.write(response.content)
    print('{}音频爬取完成'.format(i))

6:封面下载与info保存:


#封面下载
with open(img_path, 'wb') as f:
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
    url = 'http://i2.hdslb.com/bfs/archive/273ed274d5cf2556e162f8d1f7eef3b63bd2f31b.jpg'
    response = requests.get(url=dic[i][3], headers=headers)
    f.write(response.content)
#info保存
with open(info_path,'w') as f:
    info=i+'\n'+dic[i][1]+'\n'+dic[i][2]
    f.write(info)

7:视频合成

先要视频合成必须以管理员身份运行编辑器,我用的是pycharm,还有就是编辑器编码要变成’gbk’,不能’utf-8’

cmd=r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path,audio_path,save_path)
    p = os.popen(cmd)

全部代码:

import re
from random import randint
import requests
from lxml import etree
from time import sleep
import json
import os


def get_link_and_img():
    # 建立session
    print('建立session')
    session = requests.Session()
    base_url = 'https://www.bilibili.com/'
    base_headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
        'cookie': 自己的cookie,
        'referer': 'https://www.google.com/',
    }
    session.get(url=base_url, headers=base_headers)
    sleep(randint(3, 5))

    # 爬取排行榜视频:
    print('爬取排行榜视频')
    dic = {}
    leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'
    leaderboard_headers = {
        'referer': leaderboard_url,
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
        'cache-control': 'max-age=0',
    }
    response = session.get(url=leaderboard_url, headers=leaderboard_headers)
    sleep(randint(3, 5))
    content = response.content
    html = etree.HTML(content)
    info_list = html.xpath('//ul[@class="rank-list"]/li')
    for li in info_list:
        name = li.xpath('div[2]/div[2]/a/text()')[0]  # 视频名字
        href = 'https:' + li.xpath('div[2]/div[2]/a/@href')[0]  # 视频链接
        score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0] + '综合得分'  # 综合得分
        play_volume = li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip()  # 播放量
        list = [href, score, play_volume]
        dic[name] = list
        # print(name,href,score,play_volume)
        # print(dic)

        # 视频爬取
        print('视频爬取')
        video_headers = {
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
            'cache-control': 'max-age=0',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
            'referer': leaderboard_url,
        }
        num = 0
        for i in dic.keys():
            video_url = dic[i][0]
            # 获取封面链接
            try:
                response = session.get(url=video_url, headers=video_headers)
            except:
                video_headers['cookie'] = 自己的cookie
                response = requests.get(url=video_url, headers=video_headers)
            text = response.text
            img_url = re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">', text).group(1)
            dic[i].append(img_url)  # 照片链接添加到列表里
            data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)
            data = json.loads(data)
            # print(data)

            try:
                time = data['data']['dash']['duration']
                minute = int(time) // 60
                second = int(time) % 60
                video_url = data['data']['dash']['video'][0]['baseUrl']
                audio_url = data['data']['dash']['audio'][0]['baseUrl']
                list = [video_url, audio_url]
                dic[i].append(list)
                # print(video_url)
                # print(audio_url)
                # print('视频时长{}分{}秒'.format(minute, second))
            except KeyError:
                time = data['data']['timelength'] // 1000
                minute = int(time) // 60  # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。
                second = int(time) % 60
                video_url = data['data']['durl'][0]['url']
                list = [video_url]
                dic[i].append(list)
                # print('视频时长{}分{}秒'.format(minute, second))

            # 下载视频与音频
            print('下载视频音频')
            headers = {
                'cookie': 自己的cookie,
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
                'origin': 'https://www.bilibili.com',
                'referer': 'https://www.bilibili.com/',
            }

            path = r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)
            bool = mkdir(path)
            # print(bool)
            # print(path)

            if bool==1:
                video_path = path + '\_video.mp4'
                audio_path = path + '\_audio.mp4'
                save_path = path + '\{}.mp4'.format(num)
                info_path = path + '\{}.text'.format(num)
                img_path = path + '\{}.png'.format(num)
                print('{}视频开始爬取'.format(i))

                with open(video_path, 'wb') as f:  # 视频部分
                    response = requests.get(dic[i][-1][0], headers=headers)
                    print(response.status_code)
                    f.write(response.content)
                print('{}视频爬取完成'.format(i))

                print('{}音频开始爬取'.format(i))
                with open(audio_path, 'wb') as f:  # 音频部分
                    response = requests.get(dic[i][-1][-1], headers=headers)
                    f.write(response.content)
                print('{}音频爬取完成'.format(i))

                # 封面下载
                with open(img_path, 'wb') as f:
                    headers = {
                        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
                    }
                    response = requests.get(url=dic[i][3], headers=headers)
                    f.write(response.content)

                # info保存
                with open(info_path, 'w') as f:
                    info = i + '\n' + dic[i][1] + '\n' + dic[i][2]
                    f.write(info)

                # 音频视频合成
                composite(video_path, audio_path, save_path)
                sleep(randint(5, 8))

            else:
                print('{}已经被爬取'.format(i))
            num = num + 1


def mkdir(path):
    folder = os.path.exists(path)
    if not folder:                      # 判断是否存在文件夹如果不存在则创建为文件夹
        os.makedirs(path)
        return 1
    else:
        return 0


def composite(video_path, audio_path, save_path):
    cmd = r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path, audio_path, save_path)
    p = os.popen(cmd)
    # print(p.read())


get_link_and_img()

这里面的下载视频与音频还有封面,以及合成视频音频可以再def一个函数,看起来比较好看,容易读。

这里我把字典的对应表示出来key:[href,sorce,play_volume,[video_url,audio_url]]。

另外可以见到我里面有sleep,为什么呢?因为我们是讲武德的。
在这里插入图片描述
好了,这一期爬虫就到处为止,如果你有不懂得。
下面是我微信公众号。可以关注一下
在这里插入图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

b站视频排行榜爬取 的相关文章

  • 解决服务器不能复制粘贴的方法

    1 打开远程的服务器 在服务器的任务栏随便一块空白处右击鼠标 选择 启动任务管理器 2 在打开的任务管理器中 我们找到 rdpclip exe 这个进程 如果没有找到就算了 3 找到这个进程后 选择 34 结束进程 34 4 然后再往服务器
  • 最小二乘法,最大似然估计什么情况下统一

    机器学习中 线性回归算法用到最小二乘法 逻辑回归算法用到最大似然估计 在推导梯度的过程中 发现结果一样 这是为何呢 目录 一 最小二乘法1 基本思想2 作用3 如何求解最小二乘法 二 最大似然估计1 概念2 似然估计的思想是3 如何求解最大
  • Linux 系统目录结构

    树状目录结构 bin xff1a bin 是 Binaries 二进制文件 的缩写 这个目录存放着最经常使用的命令 boot xff1a 这里存放的是启动 Linux 时使用的一些核心文件 xff0c 包括一些连接文件以及镜像文件 dev
  • MySql8.x版本my.cnf文件配置详解

    my cnf for 8 0版本 注意 xff1a 个别建议可能需要根据实际情况作调整 xff0c 请自行判断或联系我 xff0c 本人不对这些建议结果负相应责任 本配置文件主要适用于MySQL 8 0版本 client port 61 3
  • YouTube推荐系统

    The YouTube Video Recommendation System RecSys2010 实际的YouTube推荐系统在不断的改进 我们这里看到的是2010年RecSys会议上YouTube的一篇关于推荐的文章 xff0c 事实
  • Mysql配置文件/etc/my.cnf解析

    Mysql配置文件 etc my cnf解析 客户端设置 client port 61 3306 默认情况下 xff0c socket文件应为 usr local mysql mysql socket 所以可以ln s xx tmp mys
  • 数据结构:使用链栈实现回文判断

    题目 xff1a 回文判断 试写一个算法 xff0c 判断依次读入的一个以 64 为结束符的字母序列 xff0c 是否为形如 序列1 amp 序列2 模式的字符序列 其中序列1和序列2中都不含字符 amp xff0c 且序列2是序列1的逆序
  • Python 读取pdf

    好久没有更新 xff0c 主要是工作比较忙 抽空记录一个最近用到的东西 用python 读取pdf 话不多少 还是先上代码 span class token keyword import span pdfplumber span class
  • Python中的字符串相似度

    文章目录 一 Python字符串相似度二 Python相似度评估1 在计算图片的相似度时 xff0c 我自己用到过余弦距离2 欧式距离3 曼哈顿距离4 切比雪夫距离5 闵可夫斯基距离6 标准化欧氏距离7 马氏距离8 编辑距离 一 Pytho
  • 浅谈linux开发板用户登录之getty/login/passwd

    最近在排查一个关于用户登录的问题 xff0c 需要了解开发板启动以及远程登录进行用户名和密码验证背后的原理 经过查询学习 xff0c 简单总结如下 文章目录 前言一 Linux开发板登录机制二 getty login passwd1 get
  • 【实用技巧】rpm包下载,安装。获取rpm资源

    1 rpm包下载 我们使用yum install命令的时候一般下载下来会直接安装 xff0c 但是如果我们只想下载rpm包而不安装该怎么做呢 xff1f 安装 yum utils yum span class token function
  • 新手折腾wsl

    新手折腾wsl图形界面 本文记录一些本人 xff08 未学习Linux相关知识 xff09 折腾wsl踩过的坑 xff0c 以及参考的有效的解决方案 换源 这个搞过的都懂 xff0c 不翻墙的话 xff0c 用本身的那个源 xff0c 更新
  • Windows server

    显示win server信息 xff1a 进入cmd下输入systeminfo 启动控制台 没有功能可以添加 xff1a mmc 启动服务器管理器 xff1a services msc 进入防火墙的配置 xff1a wf msc 添加Win
  • 配置与管理DNS服务器

    实训目的 项目环境及要求 win2012 1 xff08 已经安装了long com域 xff09 xff08 并且是long com的域控 xff09 win2012 4 xff08 在这台服务器上部署DNS服务 xff09 win7 x
  • 配置与管理Web和FTP服务器

    实训目的 项目环境要求 win2012 1 xff1a 已经安装long com的域并且已经安装DNS win2012 4 xff1a 部署服务 win7 安装服务 添加功能 勾选 安装完成 进入IIS管理器 在此完成网站的创建和FTP的创
  • 公式推导

    公式推导 事件为相互独立的情况 xff1a n 个相互独立且服从相同分布的事件 X 1 X 2 X n xff0c 其标准差为 期望为 则总的的事件的期望和方差分别为 xff1a E X 1 43 X 2 43 X n
  • Windows 10 docker 容器添加新端口映射的方法与步骤

    在Docker容器已经创建后 xff0c 需要添加新的端口映射 xff0c 即对已经存在的Docker容器添加新的端口映射 xff0c 可以通过以下步骤来添加 xff0c 即通过修改配置文件的方法 1 Windows 10 下 Docker
  • 配置yum源挂载mount /dev/sr0 /iso报错mount: 在 /dev/sr0 上找不到媒体

    span class token punctuation span root 64 localhost span class token punctuation span span class token comment umount de
  • Debian之CA认证

    安装服务 root 64 debian etc chrony span class token comment apt install y openssl span 配置文件 root 64 debian etc chrony span c
  • 字符串内建函数

    find函数查找 strint example span class token operator 61 span span class token string 34 hello world good night 34 span inde

随机推荐

  • 列表·元组·字典

    使用索引访问列表元素 list explam span class token operator 61 span span class token punctuation span span class token string 39 xi
  • Python函数

    默认参数 def print info span class token punctuation span name age span class token operator 61 span span class token number
  • 辗转相除法原理讲解

    首先介绍一下辗转相除法 xff1a 即m 和 n求最大公因数 xff08 假设m大于n xff09 xff0c 先用 m 除以 n xff0c 如果余数 r 为 0 xff0c 则 n 就是最大公因数 xff0c 否则 xff0c 将 n
  • 手把手系列---安装SpotBugs、并快速上手使用

    手把手系列 安装SpotBugs 手把手系列前言一 SpotBugs是什么 xff1f 二 SpotBugs 的下载1 在线安装 xff08 三步 xff09 2 网页下载百度云下载到本地 三 使用SpotBugs常用配置SpotBugS使
  • windows安装vcpkg过程下载失败问题的解决方法

    vcpkg的中文文档 xff1a https github com microsoft vcpkg blob master README zh CN md 第一步 xff1a 从GitHub拉取 git clone https github
  • 51单片机定时器初值计算问题

    最近在看51单片机的定时器与中断 xff0c 作为51单片机比较重点的内容 xff0c 很多人也花费了很长时间在这上面 xff0c 有些问题网上的资料方法各不相同 xff0c 也看得云里雾里 xff0c 比如定时器的初值计算问题 xff0c
  • Go 在 Windows 上用户图形界面 GUI 解决方案 Go-WinGUI 国产(使用cef 内核)

    Go 在 Windows 上用户图形界面 GUI 解决方案 Go WinGUI 国产 xff08 使用cef 内核 xff09 参考文章 xff1a xff08 1 xff09 Go 在 Windows 上用户图形界面 GUI 解决方案 G
  • MXNet 中文文档

    MXNet 中文文档 MXNet 中文文档 MXNet设计和实现简介编程接口 Symbol 声明式的符号表达式NDArray命令式的张量计算KVStore 多设备间的数据交互读入数据模块训练模块 系统实现 计算图 计算图优化内存申请 引擎数
  • mybatis-plus整合springboot自动生成文件

    mybatis plus整合springboot自动生成dao层 导入依赖 span class token tag span class token tag span class token punctuation lt span dep
  • c++实现——TT的神秘礼物

    题意 TT 是一位重度爱猫人士 xff0c 每日沉溺于 B 站上的猫咪频道 有一天 xff0c TT 的好友 ZJM 决定交给 TT 一个难题 xff0c 如果 TT 能够解决这个难题 xff0c ZJM 就会买一只可爱猫咪送给 TT 任务
  • 简单差分方法的应用

    题意 Thanks to everyone s help last week TT finally got a cute cat But what TT didn t expect is that this is a magic cat O
  • 咕咕东的奇妙序列 --找规律

    题目描述 咕咕东 正在上可怕的复变函数 xff0c 但对于稳拿A Plus的 咕咕东 来说 xff0c 她早已不再听课 xff0c 此时她在睡梦中突然想到了一个奇怪的无限序列 xff1a 112123123412345 这个序列由连续正整数
  • HRZ学英语

    思路 xff1a 这道题的要求很简单 第一个出现的26字母序列 xff0c 其字典序改成最小的即可 解释一下 1 给定序列 gt 61 26个 从左向右每26个字母为一组 xff0c 如果这组的 变成字母后满足26字母即可 xff0c 搜索
  • ZJM 与纸条

    ZJM 的女朋友是一个书法家 xff0c 喜欢写一些好看的英文书法 有一天 ZJM 拿到了她写的纸条 xff0c 纸条上的字暗示了 ZJM 的女朋友 想给 ZJM 送生日礼物 ZJM 想知道自己收到的礼物是不是就是她送的 xff0c 于是想
  • TT数鸭子

    题目描述 这一天 xff0c TT因为疫情在家憋得难受 xff0c 在云吸猫一小时后 xff0c TT决定去附近自家的山头游玩 TT来到一个小湖边 xff0c 看到了许多在湖边嬉戏的鸭子 xff0c TT顿生羡慕 此时他发现每一只鸭子都不
  • 中级软件设计师备考---软件工程2

    目录 软件测试分类和要求 测试用例设计 测试阶段 McCabe复杂度 软件维护 软件过程改进 CMMI CMM英文版 CMM中文版 CMMI 软件测试分类和要求 分类 灰盒测试 多用于集成测试阶段 不仅关注输出 输入的正确性 同时也关注程序
  • 数据库复习——第三章

    3 1 SQL概述 SQL支持关系数据库三级模式结构 SQL语言的功能 SQL功能动词数据查询SELECT数据定义CREATE DROP ALTER数据操纵INSERT UPDATE DELETE数据控制GRANT REVOKE Drop
  • 【ubuntu】ubuntu 安装软件的时候,执行add-apt-repository失败,update-ca-certificates

    在使用 ubuntu18 安装GCC 10 0的时候 xff0c 需要先执行add apt repository xff0c 结果报错了ERROR ubuntu toolchain r user or team does not exist
  • SQL语句练习(Student,Course,SC表)

    Create table Student 主码 xff0c 姓名 xff08 唯一 xff09 xff0c 性别 xff08 男 女 xff09 xff0c 年龄 xff08 18 25 xff09 span class token key
  • b站视频排行榜爬取

    bilibili排行榜爬取 众所周知 xff0c B站学习软件 哈哈哈哈 xff0c 今天我们就爬取B站的排行榜 废话不多说了 xff0c 直接开始了 分析 xff1a 我们看图一可以发现每个是视频的info都在li的标签里 xff0c 我