跳转至

探索 NBA 数据

我们首先安装 Goldsberry 包,项目源地址:

https://github.com/bradleyfay/py-Goldsberry

使用 pip 安装:

pip install py-goldsberry 

该包的接口与 pandas 兼容,可以与 pandasDataFrame 一起使用。

In [1]:

import goldsberry as gb
import pandas as pd

当前使用的版本号为:

In [2]:

gb.__version__

Out[2]:

'0.8.0.1'

球员信息

获得 2015-2016 赛季运动员的名单:

In [3]:

players = gb.PlayerList().players()
players = pd.DataFrame(players)

players.head()

Out[3]:

DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Acy, Quincy 2012 Y 203112 quincy_acy 1 SAC Sacramento kings 1610612758 Kings 2015
1 Adams, Jordan 2014 Y 203919 jordan_adams 1 MEM Memphis grizzlies 1610612763 Grizzlies 2015
2 Adams, Steven 2013 Y 203500 steven_adams 1 OKC Oklahoma City thunder 1610612760 Thunder 2015
3 Afflalo, Arron 2007 Y 201167 arron_afflalo 1 NYK New York knicks 1610612752 Knicks 2015
4 Ajinca, Alexis 2008 Y 201582 alexis_ajinca 1 NOP New Orleans pelicans 1610612740 Pelicans 2015

球员总数为:

In [4]:

print len(players)
464

通过查询特定的 TEAM_ABBREVIATION,我们可以查看某个球队本赛季的球员,比如 2014-2015 赛季的总冠军金州勇士 GSW

In [5]:

gsw_players = players.ix[players["TEAM_ABBREVIATION"] == "GSW"]

gsw_players[["DISPLAY_LAST_COMMA_FIRST", "FROM_YEAR", "TEAM_ABBREVIATION", "TEAM_CITY", "TEAM_NAME", "PERSON_ID"]]

Out[5]:

DISPLAY_LAST_COMMA_FIRST FROM_YEAR TEAM_ABBREVIATION TEAM_CITY TEAM_NAME PERSON_ID
30 Barbosa, Leandro 2003 GSW Golden State Warriors 2571
33 Barnes, Harrison 2012 GSW Golden State Warriors 203084
52 Bogut, Andrew 2005 GSW Golden State Warriors 101106
86 Clark, Ian 2013 GSW Golden State Warriors 203546
103 Curry, Stephen 2009 GSW Golden State Warriors 201939
135 Ezeli, Festus 2012 GSW Golden State Warriors 203105
164 Green, Draymond 2012 GSW Golden State Warriors 203110
209 Iguodala, Andre 2004 GSW Golden State Warriors 2738
262 Livingston, Shaun 2004 GSW Golden State Warriors 2733
263 Looney, Kevon 2015 GSW Golden State Warriors 1626172
279 McAdoo, James Michael 2014 GSW Golden State Warriors 203949
377 Rush, Brandon 2008 GSW Golden State Warriors 201575
398 Speights, Marreese 2008 GSW Golden State Warriors 201578
414 Thompson, Jason 2008 GSW Golden State Warriors 201574
415 Thompson, Klay 2011 GSW Golden State Warriors 202691

球员比赛数据

通过 DISPLAY_LAST_COMMA_FIRST,我们来查询宣布本赛季之后退役的科比布莱恩特(Kobe, Bryant)的信息:

In [6]:

kobe = players.ix[players["DISPLAY_LAST_COMMA_FIRST"].str.contains("Kobe")]

kobe

Out[6]:

DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
64 Bryant, Kobe 1996 Y 977 kobe_bryant 1 LAL Los Angeles lakers 1610612747 Lakers 2015

为了方便,我们将 KobeID 放到变量中去:

In [7]:

kobe_id = 977

我们来看本赛季 Kobe 的比赛记录:

In [8]:

kobe_logs = gb.player.game_logs(kobe_id)

kobe_logs = pd.DataFrame(kobe_logs.logs())

# 最近五场比赛
kobe_logs.head()

Out[8]:

AST BLK DREB FG3A FG3M FG3_PCT FGA FGM FG_PCT FTA ... PF PLUS_MINUS PTS Player_ID REB SEASON_ID STL TOV VIDEO_AVAILABLE WL
0 3 0 6 7 3 0.429 16 5 0.313 4 ... 2 -19 17 977 6 22015 1 3 1 L
1 0 0 4 14 4 0.286 25 6 0.240 4 ... 0 -6 19 977 5 22015 0 0 1 L
2 4 1 1 14 4 0.286 28 9 0.321 3 ... 4 -2 25 977 2 22015 0 2 1 L
3 2 0 9 11 4 0.364 24 10 0.417 4 ... 0 16 27 977 12 22015 2 1 1 W
4 5 0 3 11 7 0.636 21 10 0.476 12 ... 3 6 38 977 5 22015 2 2 1 W

5 rows × 27 columns

截至到全明星赛前,本赛季 Kobe 一共参加了 44 场比赛,其场均数据为:

In [9]:

kobe_logs.Game_ID

Out[9]:

0     0021500795
1     0021500776
2     0021500767
3     0021500747
4     0021500734
5     0021500720
6     0021500697
7     0021500662
8     0021500653
9     0021500638
10    0021500614
11    0021500608
12    0021500592
13    0021500576
14    0021500549
15    0021500539
16    0021500476
17    0021500458
18    0021500455
19    0021500440
20    0021500435
21    0021500422
22    0021500385
23    0021500370
24    0021500349
25    0021500342
26    0021500325
27    0021500308
28    0021500301
29    0021500286
30    0021500269
31    0021500263
32    0021500253
33    0021500244
34    0021500214
35    0021500201
36    0021500188
37    0021500151
38    0021500135
39    0021500095
40    0021500077
41    0021500059
42    0021500045
43    0021500031
44    0021500017
Name: Game_ID, dtype: object

In [10]:

def show_avg_info(avg):
    print "得分:{:.1f}".format(avg.ix["PTS"])
    print "篮板:{:.1f}".format(avg.ix["REB"])
    print "助攻:{:.1f}".format(avg.ix["AST"])
    print "盖帽:{:.1f}".format(avg.ix["BLK"])
    print "时间:{:.1f}".format(avg.ix["MIN"])
    print "抢断:{:.1f}".format(avg.ix["STL"])
    print "失误:{:.1f}".format(avg.ix["TOV"])
    print "犯规:{:.1f}".format(avg.ix["PF"])
    print "投篮:{:.1f}%".format(avg.ix["FGM"] * 100 / avg.ix["FGA"])
    print "三分:{:.1f}%".format(avg.ix["FG3M"] * 100 / avg.ix["FG3A"])
    print "罚篮:{:.1f}%".format(avg.ix["FTM"] * 100 / avg.ix["FTA"])
    print "后篮板:{:.1f}".format(avg.ix["DREB"])
    print "前篮板:{:.1f}".format(avg.ix["OREB"])
    print "正负值:{:.1f}".format(avg.ix["PLUS_MINUS"])

show_avg_info(kobe_logs.mean())
得分16.9
篮板4.2
助攻3.4
盖帽0.2
时间29.3
抢断1.0
失误2.2
犯规1.9
投篮34.9%
三分28.0%
罚篮80.3%
后篮板3.5
前篮板0.7
正负值-7.9

再看一下史提芬库里的场均数据(不要问我为什么跪着看球):

In [11]:

curry_id = 201939
curry_logs = gb.player.game_logs(curry_id)
curry_logs = pd.DataFrame(curry_logs.logs())

show_avg_info(curry_logs.mean())
得分29.8
篮板5.3
助攻6.6
盖帽0.2
时间33.9
抢断2.1
失误3.3
犯规2.0
投篮50.8%
三分45.4%
罚篮91.2%
后篮板4.5
前篮板0.9
正负值15.5

当然我们也可以对比一下职业生涯的数据:

In [12]:

kobe_career = gb.player.career_stats(kobe_id)
curry_career = gb.player.career_stats(curry_id)

职业生涯最高:

In [13]:

def show_career_high(career):
    career_high = pd.DataFrame(career.career_high()).ix[[0,1,5]]
    print career_high[["GAME_DATE", "STAT", "STAT_VALUE", "VS_TEAM_CITY", "VS_TEAM_NAME"]]

print "Kobe"
show_career_high(kobe_career)

print "Curry"
show_career_high(curry_career)
Kobe
     GAME_DATE STAT  STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
0  JAN 22 2006  PTS          81      Toronto      Raptors
1  JAN 24 2010  REB          16      Toronto      Raptors
5  JAN 15 2015  AST          17    Cleveland    Cavaliers
Curry
     GAME_DATE STAT  STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
0  FEB 27 2013  PTS          54     New York       Knicks
1  DEC 28 2015  REB          14   Sacramento        Kings
5  DEC 27 2013  AST          16      Phoenix         Suns

本赛季最高:

In [14]:

def show_season_high(career):
    career_high = pd.DataFrame(career.season_high()).ix[[0,1,5]]
    print career_high[["GAME_DATE", "STAT", "STAT_VALUE", "VS_TEAM_CITY", "VS_TEAM_NAME"]]

print "Kobe"
show_season_high(kobe_career)

print "Curry"
show_season_high(curry_career)
Kobe
     GAME_DATE STAT  STAT_VALUE VS_TEAM_CITY  VS_TEAM_NAME
0  FEB 02 2016  PTS          38    Minnesota  Timberwolves
1  FEB 04 2016  REB          12  New Orleans      Pelicans
5  NOV 15 2015  AST           9      Detroit       Pistons
Curry
     GAME_DATE STAT  STAT_VALUE VS_TEAM_CITY VS_TEAM_NAME
0  OCT 31 2015  PTS          53  New Orleans     Pelicans
1  DEC 28 2015  REB          14   Sacramento        Kings
5  JAN 25 2016  STL           5  San Antonio        Spurs

比赛信息

In [15]:

game_ids = gb.GameIDs()
game_ids = pd.DataFrame(game_ids.game_list())

game_ids.head()

Out[15]:

AST BLK DREB FG3A FG3M FG3_PCT FGA FGM FG_PCT FTA ... PTS REB SEASON_ID STL TEAM_ABBREVIATION TEAM_ID TEAM_NAME TOV VIDEO_AVAILABLE WL
0 28 4 45 29 8 0.276 124 56 0.452 46 ... 147 64 22015 7 DET 1610612765 Detroit Pistons 11 1 W
1 30 2 36 23 9 0.391 87 53 0.609 34 ... 142 46 22015 9 SAC 1610612758 Sacramento Kings 15 1 W
2 34 2 30 21 9 0.429 86 52 0.605 13 ... 123 38 22015 10 SAS 1610612759 San Antonio Spurs 13 1 W
3 29 6 36 35 16 0.457 95 52 0.547 15 ... 131 46 22015 10 GSW 1610612744 Golden State Warriors 15 1 W
4 34 8 38 31 8 0.258 104 52 0.500 16 ... 122 46 22015 10 SAC 1610612758 Sacramento Kings 20 1 L

5 rows × 29 columns

获得运动员的头像

In [16]:

from IPython.display import Image

Image("http://stats.nba.com/media/players/230x185/"+str(kobe_id)+".png")

Out[16]:

In [17]:

Image("http://stats.nba.com/media/players/230x185/"+str(curry_id)+".png")

Out[17]:

More

修改了 goldsberry\player\_Player.py 代码中的错误,使之能够查询退役球员的信息,修改后的代码在本文件夹下,放到安装目录之后下面的代码均可以运行:

In [18]:

from goldsberry.player import _Player as pl_old

1997 年的球员列表:

In [19]:

players_1997 = pl_old.PlayerList(1997)

players_1997 = pd.DataFrame(players_1997)

乔丹的球员 ID:

In [20]:

jordan_id = players_1997["PERSON_ID"].ix[players_1997["DISPLAY_LAST_COMMA_FIRST"].str.contains("Jordan, Michael")]
jordan_id = jordan_id[jordan_id.index[0]]
jordan_id

Out[20]:

893

乔丹在 1997-1998 赛季常规赛表现:

In [21]:

jordan_logs_1997 = pl_old.game_logs(jordan_id, season="1997")
jordan_logs_1997 = pd.DataFrame(jordan_logs_1997.logs())

show_avg_info(jordan_logs_1997.mean())
得分28.7
篮板5.8
助攻3.5
盖帽0.5
时间38.9
抢断1.7
失误2.3
犯规1.8
投篮46.5%
三分23.8%
罚篮78.4%
后篮板4.2
前篮板1.6
正负值7.3

乔丹在 1997-1998 赛季季后赛表现:

In [22]:

jordan_logs_1997 = pl_old.game_logs(jordan_id, season="1997", seasontype=2)
jordan_logs_1997 = pd.DataFrame(jordan_logs_1997.logs())

show_avg_info(jordan_logs_1997.mean())
得分32.4
篮板5.1
助攻3.5
盖帽0.6
时间41.0
抢断1.5
失误2.1
犯规2.2
投篮46.2%
三分30.2%
罚篮81.2%
后篮板3.5
前篮板1.6
正负值7.5

头像:

In [23]:

Image("http://stats.nba.com/media/players/230x185/"+str(jordan_id)+".png")

Out[23]:



回到顶部