3-访问逻辑 - 推心置腹分析
1、 请求页面并进行抓包
https://match.yuanrenxue.cn/match/3
2、对抓包进行分析
判断发现每次数据请求前先请求 https://match.yuanrenxue.cn/jssm
分析参数发现这链接会返回一个 sessionid,
数据请求的 cookie 中会带上这个 sessionid,判断该参数为重要参数
3、请求链接获取 sessionid
https://match.yuanrenxue.cn/jssm
将抓包中的信息写成 Python 代码,但是发现请求后的 cookie 值为空
考虑问题可能出现在 headers 的参数中,多次组合参数,cookie 值还是为空
搜索后发现还有一种可能,服务器端对 headers 的参数顺序进行了验证,因为 requests 在发起请求时会对 headers 参数进行排序,Python 的字典是无序的
现在需要将 headers 的参数顺序固定
采用了如下的方法:
import requests
session = requests.session()
# 使用 session 来保持 headers 的顺序不会改变
session.headers.clear()
session.headers.update(headers)
resp = session.post(url)
cookies = resp.cookies
cookie = requests.utils.dict_from_cookiejar(cookies)
print(cookie)
return cookie
修改后成功获取 cookie 中的 sessionid
{'sessionid': 'nunhqjqjhq6anqpfxdbq9bu9iubqw5gt'}
4、最终结果
Python 代码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2023/4/27 14:14
# @Author : QYF
# @File : 3.py
from collections import Counter
import requests
session = requests.session()
def get_sessionid():
url = "https://match.yuanrenxue.cn/jssm"
headers = {
"Host": "match.yuanrenxue.cn",
"Connection": "keep-alive",
"Content-Length": "0",
"sec-ch-ua": "\"Chromium\";v=\"112\", \"Google Chrome\";v=\"112\", \"Not:A-Brand\";v=\"99\"",
"sec-ch-ua-mobile": "?0",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
"sec-ch-ua-platform": "\"Windows\"",
"Accept": "*/*",
"Origin": "https://match.yuanrenxue.cn",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://match.yuanrenxue.cn/match/3",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cookie": "Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1682492258,1682492748,1682496325; Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1682496325,1682575933; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1682575951; sessionid=nunhqjqjhq6anqpfxdbq9bu9iubqw5gt; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1682575959"
}
# 使用 session 来保持 headers 的顺序不会改变
session.headers.clear()
session.headers.update(headers)
resp = session.post(url)
cookies = resp.cookies
cookie = requests.utils.dict_from_cookiejar(cookies)
print(cookie)
return cookie
num_list = []
for page in range(1, 6):
url = 'https://match.yuanrenxue.cn/api/match/3?page=' + str(page)
print(url)
cookie = get_sessionid()
headers = {
"Host": "match.yuanrenxue.cn",
"Connection": "keep-alive",
"sec-ch-ua": "\"Chromium\";v=\"112\", \"Google Chrome\";v=\"112\", \"Not:A-Brand\";v=\"99\"",
"Accept": "application/json, text/javascript, */*; q=0.01",
"X-Requested-With": "XMLHttpRequest",
"sec-ch-ua-mobile": "?0",
"User-Agent": "yuanrenxue.project",
"sec-ch-ua-platform": "\"Windows\"",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://match.yuanrenxue.cn/match/3",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cookie": "Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1682492258,1682492748,1682496325; Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1682496325,1682575933; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1682575951; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1682575959; sessionid={}".format(cookie['sessionid'])
}
session.headers.clear()
session.headers.update(headers)
resp = session.get(url)
data = resp.json()
print(data)
for num in data['data']:
num_list.append(num['value'])
print(num_list)
c = Counter(num_list)
print(dict(c))
c1 = sorted(c.items(), key=lambda x: x[1], reverse=True)
print(c1)
请求结果
https://match.yuanrenxue.cn/api/match/3?page=1
{'sessionid': 'nunhqjqjhq6anqpfxdbq9bu9iubqw5gt'}
{'status': '1', 'state': 'success', 'data': [{'value': 2838}, {'value': 7609}, {'value': 8717}, {'value': 6923}, {'value': 5325}, {'value': 4118}, {'value': 8884}, {'value': 8717}, {'value': 2680}, {'value': 3721}]}
https://match.yuanrenxue.cn/api/match/3?page=2
{'sessionid': 'nunhqjqjhq6anqpfxdbq9bu9iubqw5gt'}
{'status': '1', 'state': 'success', 'data': [{'value': 8490}, {'value': 3148}, {'value': 6025}, {'value': 8526}, {'value': 8529}, {'value': 6481}, {'value': 9489}, {'value': 6599}, {'value': 5500}, {'value': 8717}]}
https://match.yuanrenxue.cn/api/match/3?page=3
{'sessionid': 'nunhqjqjhq6anqpfxdbq9bu9iubqw5gt'}
{'status': '1', 'state': 'success', 'data': [{'value': 185}, {'value': 8498}, {'value': 6102}, {'value': 9222}, {'value': 8717}, {'value': 2008}, {'value': 9827}, {'value': 8717}, {'value': 8224}, {'value': 2929}]}
https://match.yuanrenxue.cn/api/match/3?page=4
{'sessionid': 'nunhqjqjhq6anqpfxdbq9bu9iubqw5gt'}
{'status': '1', 'state': 'success', 'data': [{'value': 3762}, {'value': 567}, {'value': 672}, {'value': 8717}, {'value': 9524}, {'value': 7159}, {'value': 986}, {'value': 505}, {'value': 6535}, {'value': 9491}]}
https://match.yuanrenxue.cn/api/match/3?page=5
{'sessionid': 'nunhqjqjhq6anqpfxdbq9bu9iubqw5gt'}
{'status': '1', 'state': 'success', 'data': [{'value': 3612}, {'value': 9095}, {'value': 7357}, {'value': 9307}, {'value': 5650}, {'value': 2109}, {'value': 23}, {'value': 8717}, {'value': 2110}, {'value': 2792}]}
[2838, 7609, 8717, 6923, 5325, 4118, 8884, 8717, 2680, 3721, 8490, 3148, 6025, 8526, 8529, 6481, 9489, 6599, 5500, 8717, 185, 8498, 6102, 9222, 8717, 2008, 9827, 8717, 8224, 2929, 3762, 567, 672, 8717, 9524, 7159, 986, 505, 6535, 9491, 3612, 9095, 7357, 9307, 5650, 2109, 23, 8717, 2110, 2792]
{2838: 1, 7609: 1, 8717: 7, 6923: 1, 5325: 1, 4118: 1, 8884: 1, 2680: 1, 3721: 1, 8490: 1, 3148: 1, 6025: 1, 8526: 1, 8529: 1, 6481: 1, 9489: 1, 6599: 1, 5500: 1, 185: 1, 8498: 1, 6102: 1, 9222: 1, 2008: 1, 9827: 1, 8224: 1, 2929: 1, 3762: 1, 567: 1, 672: 1, 9524: 1, 7159: 1, 986: 1, 505: 1, 6535: 1, 9491: 1, 3612: 1, 9095: 1, 7357: 1, 9307: 1, 5650: 1, 2109: 1, 23: 1, 2110: 1, 2792: 1}
[(8717, 7), (2838, 1), (7609, 1), (6923, 1), (5325, 1), (4118, 1), (8884, 1), (2680, 1), (3721, 1), (8490, 1), (3148, 1), (6025, 1), (8526, 1), (8529, 1), (6481, 1), (9489, 1), (6599, 1), (5500, 1), (185, 1), (8498, 1), (6102, 1), (9222, 1), (2008, 1), (9827, 1), (8224, 1), (2929, 1), (3762, 1), (567, 1), (672, 1), (9524, 1), (7159, 1), (986, 1), (505, 1), (6535, 1), (9491, 1), (3612, 1), (9095, 1), (7357, 1), (9307, 1), (5650, 1), (2109, 1), (23, 1), (2110, 1), (2792, 1)]