我正在尝试从 URL 中抓取动态内容:https://www.prokabaddi.com/stats/0-102-total-points-statistics https://www.prokabaddi.com/stats/0-102-total-points-statistics。尝试过使用selenium、BeautifulSoup,但两者都给我带来了一个空列表。
我的代码是:
url = "https://www.prokabaddi.com/stats/0-102-total-points-statistics"
# create a new Chrome session
driver = webdriver.Chrome()
driver.get(url)
soup.find_all("div", class_="sipk-lb-playerName")
这将返回一个空列表。当我在控制台中检查数据时,数据是存在的,但在页面源中数据和div标签不存在。我相信这与js渲染的内容有关。
如何从此 URL 中提取玩家姓名和积分。
进入开发工具并查看 XHR。您将看到直接提取数据的网址。它以 json 形式返回,但可以将其转换为表格:
Code:
import requests
from pandas.io.json import json_normalize
url = 'https://www.prokabaddi.com/sifeeds/kabaddi/static/json/1_0_102_stats.json'
jsonData = requests.get(url).json()
table = json_normalize(jsonData['data'])
Output:
print (table.head(5).to_string())
match_played player_id player_name position_id position_name rank team team_full_name team_id team_name value
0 101 197 Pardeep Narwal 8.0 Raider 1 PAT Patna Pirates 6 PAT 1055
1 116 81 Rahul Chaudhari 8.0 Raider 2 TT Tamil Thalaivas 29 TT 987
2 118 41 Deepak Niwas Hooda 1.0 All Rounder 3 JAI Jaipur Pink Panthers 3 JAI 892
3 115 26 Ajay Thakur 8.0 Raider 4 TT Tamil Thalaivas 29 TT 811
4 88 326 Rohit Kumar 8.0 Raider 5 BEN Bengaluru Bulls 1 BEN 689
并过滤以仅获取名称和分数:
print (table[['player_name','value']])
player_name value
0 Pardeep Narwal 1055
1 Rahul Chaudhari 987
2 Deepak Niwas Hooda 892
3 Ajay Thakur 811
4 Rohit Kumar 689
5 Maninder Singh 673
6 Rishank Devadiga 619
7 Kashiling Adake 612
8 Anup Kumar 596
9 Pawan Kumar Sehrawat 572
10 Manjeet Chhillar 562
11 Sandeep Narwal 533
12 Monu Goyat 475
13 Jang Kun Lee 462
14 Sachin Tanwar 456
15 Nitin Tomar 445
16 Jasvir Singh 412
17 Rajesh Narwal 397
18 Sukesh Hegde 395
19 Meraj Sheykh 393
20 Naveen Kumar 364
21 Vikash Kandola 358
22 Prashanth Kumar Rai 358
23 K. Prapanjan 357
24 Shrikant Jadhav 342
25 Siddharth Sirish Desai 337
26 Ran Singh 319
27 Ravinder Pahal 317
28 Deepak Narwal 306
29 Wazir Singh 300
.. ... ...
359 Rohit Kumar Prajapat 1
360 Kazuhiro Takano 1
361 Inderpal Bishnoi 1
362 Amit Kumar 1
363 Sunil Subhash Lande 1
364 Atif Waheed 1
365 Nithesh B R 1
366 Mohammad Taghi Paein Mahali 1
367 Yong Joo Ok 1
368 Vishnu Uthaman 1
369 Ajvender Singh 1
370 Sanju 1
371 Ravinandan G.M. 1
372 Navjot Singh 1
373 Parvesh Attri 1
374 Hardeep Duhan 1
375 Parveen Narwal 1
376 Ajay Singh 1
377 Nitin Kumar 1
378 Jishnu 1
379 Naveen Narwal 1
380 M. Abishek 1
381 Vikas Chhillar 1
382 Aman 1
383 Satywan 1
384 Vikram Kandola 1
385 Emad Sedaghatnia 1
386 Aashish Nagar 1
387 Ajinkya Rohidas Kapre 1
388 Munish 1
[389 rows x 2 columns]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)