Donovan Mitchell’s scoring using R

Robert Bischoff

March 4th, 2019

The Utah Jazz and Donovan Mitchell had an exciting week. Jazz fans know Donovan Mitchell started off the season slow but has been on fire lately. I wanted to take a look at some of his recent scoring, but I wanted to do it in R because it’s fun to write my own code and I can run this analysis over again whenever I want. Maybe you like stats and want to do it yourself and maybe you don’t. The code is hidden by default, but click the code button and you can follow along*.

View this post on Instagram

🕷🕸🤐

A post shared by Donovan Mitchell (@spidadmitchell) on

First step for the analysis is to get the packages and download the data. I’ll use R to download the up to date stats from NBA.com.

Here’s what the first few rows look like:

# first step - load packages
# install.packages('devtools') # uncomment to install if needed
# install.packages('pacman') # uncomment to install if needed
# devtools::install_github('stephematician/statsnbaR') # uncomment to install if needed
library(pacman) # this package loads or downloads packages as needed.
# load all packages used
p_load(statsnbaR,tidyverse,formattable,ggthemes) 
# download player data
players <- player_game_logs('nba',2018) # takes a long time depending on connection
formattable(head(players))
season person_id player_name team_abbr team_name game_id game_date home win mins fgm fga fg3m fg3a ftm fta oreb dreb reb ast stl blk tov pf pts plus_minus video
22018 201935 James Harden HOU Houston Rockets 21800710 2019-01-23 FALSE TRUE 40 17 38 5 20 22 25 6 9 15 4 5 0 5 3 61 19 TRUE
22018 202689 Kemba Walker CHA Charlotte Hornets 21800225 2018-11-17 TRUE FALSE 45 21 34 6 14 12 12 0 7 7 4 4 0 9 2 60 1 TRUE
22018 201935 James Harden HOU Houston Rockets 21800925 2019-02-28 TRUE TRUE 44 16 32 8 18 18 18 2 5 7 10 4 1 4 1 58 10 TRUE
22018 201935 James Harden HOU Houston Rockets 21800659 2019-01-16 TRUE FALSE 45 16 34 5 19 21 23 1 9 10 6 1 1 4 5 58 -8 TRUE
22018 201935 James Harden HOU Houston Rockets 21800646 2019-01-14 TRUE TRUE 34 17 33 6 15 17 18 2 7 9 2 2 1 5 4 57 25 TRUE
22018 200746 LaMarcus Aldridge SAS San Antonio Spurs 21800619 2019-01-10 TRUE TRUE 49 20 33 0 0 16 16 4 5 9 4 0 4 5 2 56 7 TRUE

Improvement since the start of the season

Next, let’s get Mitchell’s data, look at his scoring over the season, and plot his per-game scoring.

# filter for Donovan Mitchell
DM45 <- players %>% filter(player_name == "Donovan Mitchell")
# arrange the games from most recent to 
DM45 <- DM45 %>% arrange(desc(game_date))
# create a data frame containing splits of Mitchell's scoring over the season
splits <- tibble(Player = "Donovan Mitchell",
                 L10 = round(mean(DM45$pts[1:10]),1),
                 L20 = round(mean(DM45$pts[1:20]),1),
                 L30 = round(mean(DM45$pts[1:30]),1),
                 L40 = round(mean(DM45$pts[1:40]),1),
                 L50 = round(mean(DM45$pts[1:50]),1),
                 `All games` = round(mean(DM45$pts),1))
# display table
formattable(splits, align = c('l','c','c','c','c','c','c'))
Player L10 L20 L30 L40 L50 All games
Donovan Mitchell 27.5 28.4 25.7 24.4 23.5 23.3

It’s pretty obvious how his overall scoring has gone up throughout the year. Next year’s task is to figure out how to not start as slow as his first two years.

Now let’s look at the plot:

# plot each game's scoring over the course of the season
ggplot(DM45, aes(game_date,pts)) + geom_path(size = 1.5, color = "#002B5C") +
  geom_point(size = 2, color = "#F9A01B") +
  ggtitle("Donovan Mitchell's per game scoring") + theme_fivethirtyeight()

He had a rough stretch through a tough schedule from mid-November to mid-December. Since then his scoring has been higher and more consistent with only one game below 20 points since just after the start of the year.

Scoring comparison

Let’s cherry pick January 5th, as his scoring really picks up then. We’ll compare the top scorers after that date and take a look at scoring and efficiency. My biggest beef with Mitchell, as much as I love watching him play, is he’s not the most efficient scorer at times and can put up a lot of misses in a hurry.

Let’s use true shooting percentage as the efficiency metric. It makes it a little easier to compare players across positions.

# define a function for true shooting percentage
TSPcalc <- function(pts,fga,fta){
  TSPresult = round(pts / (2 * (fga + (.44 * fta))) * 100,1)
  return(TSPresult)
}
# filter data by date
playersSub <- players %>% filter(game_date >= "2019-01-05")
# calculate true shooting percentage by game and games played
playersSub <- playersSub %>% group_by(player_name) %>% 
  mutate(TSP = TSPcalc(pts,fga,fta), games = n())
# summarize data for each player 
playersummary <- playersSub %>% group_by(player_name) %>% mutate(games = n()) %>% 
  summarize_at(vars(pts,fga,fta),sum)
# add games played to data
games <- playersSub %>% select(player_name,games) %>% summarize_at(vars(games),max)
playersummary <- left_join(playersummary,games, by = 'player_name')
# calculate overall true shooting percentage and points per game and sort by scoring
playersummary <- playersummary %>% group_by(player_name) %>% 
  mutate(TSP = TSPcalc(pts,fga,fta), PPG = round(pts/games,1)) %>% arrange(desc(PPG))
# get pretty column names
names(playersummary) <- toupper(names(playersummary))
names(playersummary)[1] <- "Player"
# display table of top ten scorers
formattable(playersummary[1:10,], align = c('l','c','c'),
            list(PPG = color_tile("yellow","green"),
                 TSP = color_tile("yellow","green")))
Player PTS FGA FTA GAMES TSP PPG
James Harden 1022 701 305 25 61.2 40.9
Paul George 690 486 182 22 60.9 31.4
Bradley Beal 690 506 144 24 60.6 28.8
Giannis Antetokounmpo 663 421 220 23 64.0 28.8
Donovan Mitchell 653 523 164 23 54.9 28.4
Joel Embiid 476 322 178 17 59.5 28.0
Stephen Curry 668 483 89 24 64.0 27.8
Karl-Anthony Towns 600 388 138 22 66.9 27.3
Kawhi Leonard 403 280 117 15 60.8 26.9
Blake Griffin 662 473 183 25 59.8 26.5

Fifth in the NBA in scoring over the last two months is impressive, but he’s the least efficient scorer in the top 10. It doesn’t help that he’s the only go to scorer the Jazz have, but how bad is that efficiency?

Let’s look at the averages for everyone who has played at least ten games.

# filter by ten games
playersummary <- playersummary %>% filter(GAMES >= 10)
# get averages
averages <- playersummary %>% ungroup() %>%  summarize_at(vars(PPG,TSP),mean)
averages <- averages %>% mutate_at(1:2, round, 1)
formattable(averages, align = 'c')
PPG TSP
10.5 55.5

So Donovan is below the average true shooting percentage for this period, not by a lot but still not great.

Let’s look at his true shooting percentage by game to see if there are any trends.

# filter for Donovan again and arrange by date
DM45 <- playersSub %>% filter(player_name == "Donovan Mitchell") %>% 
  arrange(desc(game_date))
# plot data
ggplot(DM45, aes(game_date,TSP)) + geom_path(size = 1.5, color = "#002B5C") +
  geom_point(size = 2, color = "#F9A01B") +
  ggtitle("Donovan Mitchell's true shooting percentage") + theme_fivethirtyeight()

It looks like he started out great, had a rough stretch, and has been shooting well the last few games.

Mitchell’s best games vs other stars

Let’s see how he does in his best (most efficient) games compared to other stars during this stretch. We’ll look at players averaging more than 20 points per game during this period. I’ll use a true shooting percentage above 60, as that represents the upper quartile of performance. What this should show is who can score the most points while still being efficient.

# true shooting percentage quartiles
TSPsum <- summary(playersummary$TSP)
# filter best scorers
stars <- playersummary %>% filter(PPG >= 20)
# filter by best scorers and TSP above 60
starsStats <- playersSub %>% filter(player_name %in% stars$Player, TSP > 60) %>%  arrange(desc(TSP))
# recalculate games played
starsStats <- starsStats %>% group_by(player_name) %>% mutate(games = n())
# summarize data for each player 
starSummary <- starsStats %>% group_by(player_name) %>% 
  summarize_at(vars(pts,fga,fta),sum)
# add games played to data
games <- starsStats %>% select(player_name,games) %>% summarize_at(vars(games),max)
starSummary <- left_join(starSummary,games, by = 'player_name')
# calculate overall true shooting percentage and points per game and sort by scoring
starSummary <- starSummary %>% group_by(player_name) %>% 
  mutate(TSP = TSPcalc(pts,fga,fta), PPG = round(pts/games,1)) %>% arrange(desc(PPG))
# get pretty column names
names(starSummary) <- toupper(names(starSummary))
names(starSummary)[1] <- "Player"
# display table of top ten scorers
formattable(starSummary[1:10,], align = c('l','c','c'),
            list(PPG = color_tile("yellow","green"),
                 TSP = color_tile("yellow","green")))
Player PTS FGA FTA GAMES TSP PPG
James Harden 713 446 214 16 66.0 44.6
Paul George 405 241 104 11 70.6 36.8
Russell Westbrook 173 115 42 5 64.8 34.6
Donovan Mitchell 196 131 43 6 65.4 32.7
Stephen Curry 452 283 53 14 73.8 32.3
Giannis Antetokounmpo 540 318 180 17 68.0 31.8
Kemba Walker 285 204 49 9 63.2 31.7
Bradley Beal 463 297 95 15 68.3 30.9
Joel Embiid 277 147 110 9 70.9 30.8
Blake Griffin 430 264 123 14 67.6 30.7

I’m actually a little surprised to see Mitchell in the top 10. He’s only had six games that match the criteria, but in those games he has been money. I’m not surprised to see him next to Westbrook, as both can be streaky players. Here’s hoping that streakiness is a temporary, early-career problem, not a permanent weakness.

Wins and losses

Last, let’s look at how he does in wins and losses over the course of the entire season.

# filter Donovan Mitchell from the full season log
DM45 <- players %>% filter(player_name == "Donovan Mitchell")
# get true shooting percentage
DM45 <- DM45 %>% mutate(TSP = TSPcalc(pts,fga,fta))
results <- DM45 %>% group_by(win) %>% mutate(TSP = mean(TSP),PTS = mean(pts),
                                             FGM = mean(fgm),
                                             FGA = mean(fga)) %>% 
  select(c(9,28:31))  %>% 
  distinct()
results[,2:5] <- round(results[,2:5],1)
names(results) <- toupper(names(results))
formattable(results, align = "c")
WIN TSP PTS FGM FGA
TRUE 55.5 24.2 8.7 19.2
FALSE 46.8 22.1 8.0 21.3

He shoots better and scores more in a win, not a big surprise. I think the main takeaway here is that he shoots almost two shots less a game and makes one more shot per game in a win, despite the fewer field goal attempts.

One last look to see if this changes over the time span we’ve been looking at.

# filter Donovan Mitchell from the full season log
DM45 <- playersSub %>% filter(player_name == "Donovan Mitchell")
# get true shooting percentage
DM45 <- DM45 %>% mutate(TSP = TSPcalc(pts,fga,fta))
results <- DM45 %>% group_by(win) %>% mutate(TSP = mean(TSP),PTS = mean(pts),
                                             FGM = mean(fgm),
                                             FGA = mean(fga)) %>% 
  select(c(9,28,30:32))  %>% 
  distinct()
results[,2:5] <- round(results[,2:5],1)
names(results) <- toupper(names(results))
formattable(results, align = "c")
WIN TSP PTS FGM FGA
TRUE 57.3 28.2 10.0 21.6
FALSE 50.1 28.8 9.8 26.0

This last look is a bit more informative I think. Mitchell actually scores slightly more in a loss. Mainly he’s just taking four and a half fewer shots in wins while making basically the same number of shots. In other words, he’s a lot more efficient in a win.

I’m a big Donovan Mitchell fan, and I love the way he and the team are playing. I also can’t place his efficiency solely on his shoulders. Mitchell’s efficiency will improve, but I think he’s going to need someone else who can create their own shot to take some pressure off him.

*You’ll need a little knowledge of R to follow along, but it’s free, open source, and used by some of the top scientists, journalists, and data scientists around the world. Best of all, there are a lot of tutorials and guides online.