Key Stats to Success
What stats are the most significant in having a successful season, and how does Buffalo look in these categories?
With the NFL preseason upon us, the NHL season isn’t too far away. In hopes of gaining some optimism for this season after a 62 point year for the Buffalo Sabres, I explored what factors are most crucial—statistically speaking—in a teams overall success, in order to start a discussion on how the Sabres might fare in these areas in the upcoming season.
I accomplished this by utilizing a Generalized Linear Model (GLM). It’s “fancy stats”, but the results are fairly easy to interpret. Long story short, it’s a more complicated version of that good old point-slope Y = mX+b formula, in which multiple inputs are used to predict a variable. I pulled data from NHL.com/stats and prepped several “season summary” variables for seasons going back to 2011-2012 (not including lockout seasons) to create a model that predicted total points for the season. After some data prep, my initial list of variables was:
- Face-off Win Percentage
- Penalty Kill Percentage
- Power Player Percentage
- Shots Against Per Game
- Shots For Per Game
- Indicator of goalie with most games played being in the top ten save percentages
- Indicator of having a top ten goal scorer
- Indicator of having a top ten assist getter
- Percentage of games scoring first
- Shooting Percentage
- Major Penalties Per Game
- Minor Penalties Per Game
- Penalty Minutes Per Game
- Percent of games the starting goalie started
- Secondary Scoring percent (defined as percent of goals scored by non top-5 goal scorers)
- Corsi Percent
- Shot Blocking Percent
- Average Draft Round on Roster
- Average Overall Pick on Roster
- Average Age on Roster/
Several of these variables were rounded in a way that creates “buckets” for an easier GLM process. For example, Shots For Per Game is rounded to the nearest shot, Minor Penalties Per Game is rounded to the nearest .25, etc. Once I had my variables, I used 70 percent of the data to create, or “train”, the model, and the remaining 30 percent to test it.
The GLM is designed to mathematically determine which of these variables are the most “statistically significant”. In other words, variation in which of these variables have the most impact on overall points. This significance is represented by a probability- the lower the probability, the more likely the variable is significant. The process of creating a model involves experimenting with different variable combinations, groupings within variables, etc. across multiple “runs” of the model in order to get the most accurate one. While trying different combinations, the above mentioned P-values are considered, as well as univariate graphs. These graphs show the predicted values (in red) of the test data set compared to the actual values (in blue). Here’s an example of the Shooting Percentage univariate graph from the initial model run:
The predicted and actual total points are averaged within each shooting percentage “bucket” and plotted against each other. The yellow bars represent the frequency distribution across the shooting buckets. This graph shows that the model created by the train data performs reasonably well when used to score (predict) the test data. The difference between predicted and actual tends to be larger in buckets with less data, as less data means less reliability of the results.
After experimenting with several combinations, here are the variables included in the final model as well as their P-values:
- Penalty Kill Percentage:
- Indicator of goalie with most games played being in the top ten save percentages:
- Percent of Games Scoring First:
- Shooting Percentage:
- Percent of games the starting goalie started
- Corsi For Percent [CF / (CF + CA)]/
These variables were all statistically significant in the final model. When the test data set was scored using this model, the average absolute error between actual and predicted total points was.
Let’s take a look at the univariates for each of these variables, as well as see where the Sabres left off in each category. Recall that the red line is the test set prediction from the model and the blue line is the actual points data from the test set, so the charts are used to both get an idea of the general validity of the model as well as a look at the actual statistics.
Sabres Last Season: 77.9% (22nd)
NHL Average: 79.7%
The Sabres PK struggled last season, preventing them from taking advantage of being one of the cleaner teams in the league (12th in fewest PIMs). With some fresh faces this season, it will be interesting to see if there’s a significant improvement in penalty killing.
Goalie with Top Ten SV%
Sabres Last Season: No
The Sabres signed Carter Hutton to replace Robin Lehner as the main man in the pipes this year. Hutton’s coming off a career year in which he was in the top ten for save percentage, which is a great sign for Buffalo (literally). For better or worse, Hutton’s play this year is a critical factor for the Sabres, who have struggled to find consistency in their goaltending.
Percent of Games Scoring First
Sabres Last Season: 45.1% (22nd)
NHL Average: 50%
Although it was tough to see the Sabres get scored on first more often than not last season, they weren’t nearly as bad as the basement of the league. That crown goes to the Senators, who scored first in just 36.6% of their games. With the additional offensive power added this season as well as a goalie coming off a hot year, this is a stat I can definitely see the Sabres improving on in the upcoming season.
Sabres Last Season: 7.74% (30th)
NHL Average: 9.18%
Only the Montreal Canadiens had a worst shooting percentage than Buffalo last year, bringing up the rear of the NHL at 7.70%. That said, I have a lot of optimism for this stat given some of the new names in the lineup this season (Conor Sheary - 13.8%, Jeff Skinner - 10.7%). Hopefully this is an area that sees immediate improvement, as the univariate graph’s actual results (blue) show a significant jump once even just the 8% range is reached.
Percent of Games Started by Starting Goalie
Sabres last season: 61.0%
NHL Average: 65%
Besides some volatility in the lower values, there does seem to be a benefit in having a clear starter command most of the work, without going so far as to have one guy start upwards of 90% of the games. With Hutton likely claiming the starting role this season, it’ll be interesting to see how often Ullmark sees action given that he’s likely the long term plan at this point. Given the results above, it could be ideal to see Ullmark play somewhere around 15 games or so.
Corsi For Percent
Sabres Last Season: 47.6% (26th)
NHL Average: 50%
Corsi For percent helps proxy for possession in general, as it’s defined as Corsi For divided by (Corsi For + Corsi Agaisnt). The Sabres need to see an improvement in this area if they want to start picking up enough points to even be in the playoff conversation. With off-season acquisitions such as Dahlin, Skinner, and Sheary joining up with the likes of Eichel and Mittelstadt, this is another stat that I’m optimistic about. I wouldn’t be at all surprised if the Sabres improved to 50%+ in this category this coming season.
All things considered from the results of this model, I’m excited to see how the Sabres start off next season. i believe there’s reason to expect significant improvement in these areas given GM Jason Botterill’s busy off-season. Of course only time will tell when they actually get there, but Buffalo is doing what it needs to be doing in order to become a playoff team once again.
How many points do you expect the Sabres to finish with in the 2018-2019 season?
|Less than 70||20|