Wednesday, March 27, 2013

Building NCAA Tournament Bracket Spreadsheets

Recently I wanted to do some analysis on the NCAA tournament brackets, but I couldn't find a simply source of data.  There are lots of .pdfs of previous year's brackets on the internet, but that doesn't convert to Excel very well.
So, let's do something about it.

I have a spreadsheet template built on one designed by the great folks at Vertex42.com that does a lot of the calculations.  We just have to input the first round's games and the scores for the following rounds.  I built spreadsheet brackets for 2000-2012.   I want to build brackets all the way back to 1985.  

When we do, we'll make them available for anyone to use.  There's great analytics here..we just need to get the data into a usable format.

So, download the .zip file.  Open the template file, input the first round games on the second tab, then flip back to the first tab and add score for the rest of the rounds.  The spreadsheet will populate the winners into the next round and add the info to the game summaries at the bottom.  Remember, make sure you get the seeds right...the box scores usually have the winner first.  We need to make sure the correct seed is first.


If you want to complete a year, please add a comment below about which year you're working on.  When you're done with a spreadsheet, email it to me at paul@huskermath.com and I'll add it to the .zip file.

I've been using the CBSSports site to fill in first round data.  

Thanks for the help!

Paul

2000-2012 - complete
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
- complete
1985- complete

Monday, March 25, 2013

A crazy Sweet 16?

This year saw the first #15 seed, Florida Gulf Coast, advance to the Sweet 16.  In all, it feels like an absolutely crazy year, with one upset after another.  

But is the current Sweet 16 lineup really that much crazier than past years?


If the top 4 seeds in each region advance to the Sweet 16 then the average seeding in the tournament is 2.5.  So, an average seeding higher than 2.5 indicates that a lower seeded team advanced in place of one of the top four in each region.



This chart shows two data points for the last 11 Sweet 16 rounds.  The line in blue is the average seed of the 16 teams playing each year.  The red line marks a hypothetically perfect year...a year in which the 1, 2, 3, and 4 seeds in each region advance to the Sweet 16.

Over the 11 years, the average seed of the Sweet 16 round is 4.35.  From 2007-2009 the average seed decreased significantly, but then increased dramatically in 2010, 2011, dropped a bit in 2012, and then reached an 11 year high this year.  So maybe you aren't imagining things.  There really is a #15 seed in the Sweet 16, and it really is crazy.


Each year the NCAA publishes bracket with 32 teams on each side.  Although the names of the regional have changed depending on the locations, the design of the bracket doesn't.  If I give each bracket a number looking like this I can break down the averages even further.




Remember, the average seed for a 'perfect' region is 2.5.  From 2003-2013 there have been only five 'perfect' regions in the Sweet 16, and two of those five were in 2009.  


So, next year, when you plan your brackets before the tournament starts, you might want to remember this...pick at least one team in the top four of each region to lose in the first or second round.  



This year's West Regional, which appears in this chart as Region 2, had the 2nd highest average seed of the 11 years. 

And FGCU wasn't even playing in the West.  It is is the South, or Region 3 on the chart for 2013.



If I assign a 'bracket position' to each game, numbered 1-16, it looks like the chart to below.



Returning to the idea of a 'perfect region', for each bracket position, there is an expected seed.  And from that we can calculate a variance each year from the expected or perfect seed.  


Over the 11 years, the delta between the team actually sitting in the bracket position and the expected perfect seed, looks like this:












See that bright red -13 on the bottom line?  Yeah, that's FGCU.  

The green zeroes represent bracket positions where the expected team (#1, 2, 3, or 4 seed) advanced to the Sweet 16.  The average number of expected seeds advancing over the 11 years was 9.9.  In 2013, 10 expected seeds advanced.  


Based on that, one has to conclude that FGCU's advance to the Sweet 16, while very Cinderella-ish, is largely responsible for the 11-year high average seed in the Sweet 16 for 2013.  


So, how did your bracket turn out?  Raise your hand if you picked FCGU in the Sweet 16.


Liar.


GBR!




Tuesday, March 12, 2013

Comparing Nebraska's and Alabama's Dynasties

The first part of this analysis was published on CornNation.com in January.  The second and third parts, however, have not been published yet.

Intro

There are myriad ways to compare the 5-year dynasties that Nebraska and Alabama have put together.  Some indicate Nebraska’s was more impressive, others indicate that Alabama’s was moreso.   This is my attempt to compare the two using defensible statistical analysis.  It is not the final word on this issue; I’m doing it simply to attempt to get at the question of who accomplished more during their 5-year run.


I organized the analysis along offense, defense, margin of victory, win-loss record, and strength of schedule.

Where I state that there is sufficient evidence to conclude 'X', the statement is based on standard hypothesis testing (t-tests) and evaluated at the alpha=.10 level of significance.  The conclusions I draw regarding win/loss records and the overall conclusions are subjective and not based on hypothesis testing.


Offense 1


This first chart illustrates the average points that Nebraska and Alabama scored when you categorize opponents by end of season ranking.  For simplicity sake, I used the end of season Congrove Composite Index.  Using the end of season ranking is better because it’s actually available and goes a long way towards to identifying teams that were ranked at one point in the season but should not have been or who were unranked or lower ranked but proved to be better that season.

 The number next to each point on the graph is the number of games Nebraska and Alabama played against teams of that rank.

As you would expect, as the opponents’ rank goes down, the average score that Nebraska and Alabama scored climbs.  For opponents of all ranks, Nebraska’s average points scored is markedly higher.  For all ranks, the average Nebraska score is 42.82 and the average Alabama score is 33.84 There is sufficient evidence to conclude that Nebraska's offense was superior to Alabama's.

Offense 2

This next chart compares Nebraska and Alabama scoring as a percentage of the average scoring allowed by their opponents.  While this is much the same as the chart above, it factors in the additional information of their opponent's defenses.  Obviously, if Nebraska's average scoring difference came because it played a 5-year slate of defensive duds then the argument that Nebraska scores more points is suspect.


Considering Nebraska and Alabama scoring as a percentage of their opponents’ scoring defense, one would expect that the percentage would remain basically steady, or show a slight increase as the rank of an opponent decreases.  This, however, does not seem to be the case.  If anything, there is a slight negative correlation between percentage scored and opponent rank.  It’s difficult to say why this is the case, but I would speculate that it’s because 150% of an opponent ranked 1-10 is, in real points, much less than 150% of an opponent ranked 100-110.  These games would be the times that 3rd and 4th string is played, which sometimes leads to offensive mistakes and ‘garbage time’ points for the opponent.

The flatter trend of the Alabama line may indicate that Alabama  as more consistent on offense than Nebraska over the five years.  While they did not put up the sheer number of points that Nebraska did, their offensive production was remarkably steady.  Nebraska, on the other, was less consistent, and their performance against teams ranked 101-110 actually underperformed Alabama’s comparably ranked opponents.  This notwithstanding, there is sufficient evidence to conclude that NU's offense was superior to Alabama's.


Margin of Victory



As one would expect, the average margin of victory by Nebraska and Alabama increases as their opponents’ rank decreases.  Both show a steady and reasonably linear relationship between average margin of victory and opponent rank.  The exception is Nebraska’s average margin of victory against opponent’s ranked 21-30.  This is because there are only two games here, and Nebraska lost one of them, leading to a much smaller average.

Generally, we can state that both teams did what great teams as supposed to do…they consistently beat other teams up.  As their opponents’ rank decreases, those beatings are more severe.  It’s worth noting that Nebraska's average margin of victory against teams ranked 1-10 was 2.5 times that of Alabama’s (17.5 vs 6.3).  Against teams ranked 11-20, Nebraska’s average margin of victory as almost three times that of Alabama, (20.0 vs  7.3). For opponents of all ranks, Nebraska’s average margin of victory was 28.2 points and Alabama's was 22.0 points.  There is sufficient evidence to conclude that Nebraska had a greater average margin of victory than Alabama.


Defense 1

This next chart illustrates a very similar comparison to Offense 1, but it compares the defenses that Nebraska and Alabama put on the field by measuring the average opponent score, again broken down by end of season rank.  As above, the numbers on the chart indicate the number of teams Nebraska and Alabama played in that rank group.




For teams ranked in the top-10, the difference in scoring defense is small. For Nebraska, it’s 18.2, for Alabama it’s 20.2. For most other opponent rank categories 
Alabama as a slight performance advantage, ranging from about 3-7 points.  For opponents of all ranks, Nebraska's opponents averaged 14.41 points and Alabama’s averaged 11.82 points. There is insufficient evidence to to identify defense as better than the other over the entire five years.



Defense 2

 The data points illustrated in the next chart are the average opponents’ score as a percentage of Nebraska and Alabama season scoring defense. A lower percentage indicates a better performance for Nebraska and Alabama .




At first blush, I would have expected there to be a negative correlation between the average percentage score by an opponent and the opponent’s rank…better opponents should do better offensively against Nebraska and Alabama than crummy opponents should. Both teams’ opponent scoring shows this general trend, with a decreasing effect as the quality of opponent decreases. This might be explained by the fact that games against far inferior opponents present opportunities to play the 3rd and 4th string, often meaning the opponent has opportunities to score that would not have otherwise been presented had the starters remained in the game.

Nebraska’s defensive performance shows no obvious correlation between rank and opponent points.  Alabama, shows a strong negative correlation between Opponent rank and the percent of scoring they allowed their opponents.  In other words, Alabama held inferior opponents to well under their season scoring offense average but gave up points to highly ranked teams.

For opponents of all ranks, 
Nebraska’s opponents' average score as a percentage of NU’s scoring defense was 102%.  Alabama’s opponents' average score as a percentage of Alabama’s scoring defense was 101%.  As with Defense 1, there is insufficient evidence to conclude that one defense is better than the other.

Wins and Losses

This one is simple: Nebraska had a better overall win-loss record (95.2% vs 89.6%) and a much better winning percentage versus top-20 teams (95% vs 83%). Nebraska  also had three undefeated seasons while Alabama had one defeated season. Only twice did Nebraska's winning percentage dip to 92% for the season (’93 and ’96). Alabama had four seasons at or below 92% (’08-86%, ’10-75%, and ’11-92%). 




For Alabama , six of their seven losses (86%) were to teams ranked 1-20 (UF-2008-#1, Utah-2008-#4, Auburn-2010-#2, LSU-2011-#2, LSU-2010-#11, and A&M-2012-#5) while two of Nebraska’s three losses (67%) were in the top 20 (FSU-1993-#1, ASU-1996-#4). Alabama  and Nebraska both lost to one team ranked 21-30 (South Carolina-2010-#27 and Texas-2006-#25)

 Looking at where losses occurred, Alabama lost three at home, two in Bowl/CCGs, and two away. Nebraska lost zero at home, one away, and two in Bowl/CCGs.). 




Though it is a subjective assessment, NU's three undefeated seasons, zero home losses, and the same number of bowl and conference championship game losses is sufficient to conclude that NU's win-loss record is better than Alabama's.

Strength of Schedule

Considering the season average rank of Nebraska’s and Alabama’s opponents, Nebraska’s opponent's five year season average is 48.29 and Alabama’s is 53.82.  There is sufficient evidence to conclude that NU’s average season opponent ranking was more difficult than Alabama’s. It follows, therefore, that Nebraska’s dynasty was established during seasons of greater difficulty than Alabama’s.


Combined with Nebraska’s better win-loss record, this may present the strongest evidence that Nebraska’s dynasty was more impressive than Alabama’s. It’s hard to ignore that Nebraska's three undefeated seasons were all more difficult than the average of the 10 seasons considered, while Alabama’s were less difficult than the average of the 10 seasons considered. Alabama’s one season more difficult than average was 2010…the season in which it lost three games and finished with a .75 win-loss record. Finally, the most recent two seasons, in which Alabama claimed its back to back National Championships, were the two least difficult seasons considered in this analysis.


Conclusion 

Nebraska was better on offense; neither team demonstrated a clear superiority in defense; and Nebraska had a better win-loss record and a stronger average strength of schedule. 

I'll allow my readers to make the final conclusions.  Who has the best 5-year dynasty?

Monday, March 11, 2013

Army Accessions 1993-2012

I know, it's not college football, but I got a really great data set to work with, and being an Army Officer and all....

Did you ever wonder where all those Soldiers come from?





Saturday, March 9, 2013

Placing Fumbles in Context (Part 2)

Continuing my breakdown of fumbles, Part 2 looks at fumbles by distance to go, player position, and quarter.


Distance (to go)


Looking at the entire football field, the breakdown of fumbles by down and distance looks like this (fumbles on kickoffs and punts are excluded):

1st and 10 accounts for the overwhelming majority of fumbles, but it accounts for the lion's share of the down-distance pairings during a game, so there's nothing particularly surprising in that.  

For distances of 10 or greater, 12% of fumbles occur on 2nd down, 5% on 3rd down, and less than 1% on 4th down.  
For distances of fewer than 10 yards, 2nd down accounts for 21%, 3rd down accounts for 17%, and 4th down accounts for 3% of fumbles.

Because the fumble percentages correspond closely to the actual play distribution by distance to go in a football game I'm led to conclude that distance to go is not a significant contributing factor to the probability of a fumble occurring on a play.



Player Position





Across the FBS, QB's accounted for about 50% of fumbles in the opponent red zone. The percentage of QB fumbles decreased steadily as the team approached the end zone. Running backs' percentage of fumbles increased steadily as a team approached the end zone. WRs were most likely to fumble in the middle of the field. Interestingly, DBs accounted for a not-insignificant number of fumbles. I suppose this is following interceptions or fumble recoveries.




When I look at this chart, absolutely nothing important jumps out at me. While there is some variation in frequency of fumbles between quarters, there's no reason to think that it is due to any reason other than chance.  


Conclusion

And this concludes my breakdown of fumbles.  If there's a useful takeaway from Parts 1 and 2, I think it is the improbable frequency of fumbles on punt returns.  


Thursday, March 7, 2013

What can and can't sports analytics do?

Andrew Sharp at SBNation has a great article called Paralysis by Analysis in which he details his visit, as a confessed analytics skeptic, to the the MIT Sloan Sports Analytics Conference.  This conference, to guys like me, is like making the Hajj to Mecca for the world's muslims.  It has to be done, but everybody knows its damn expensive, so Allah (or in my case, Nate Silver) understands if it doesn't work out.

It got me thinking, along with a negative comment left by a reader this week, that some folks are misunderstanding what I'm trying to do, and what sports data and statistics analysis can do and can't do.

What can't sports analytics do?  It can't predict what's going to happen on the next play, series, inning, snap, or whatever.  It can't explain WHY something happened. And it can't take the place of a coach's experience.

What can sports analytics do?  It can provide insights into aspects of the game that are not readily apparent to someone watching, coaching, or browsing the box scores. It can serve as an early warning to coaches and managers about potential problem areas and trends before they manifest themselves in the box score (at which time it's probably too late).  And it can function as a way to evaluate players and coaches in a (mostly) objective manner.

The negative comment I mentioned above said this:  

After all of that it means really nothing...You still cannot prevent these kind of mistakes, and you surely will never be able to look at these graphs and charts, and decide before the next play "the fumbles a coming, better tell so and so to hang on to the ball".....Pretty much a big ole waste of time.....
The comment was directed at the first part of a two-part piece on fumbles that I wrote earlier this week.  I appreciate the commenter's feedback, but I think he's missing the point.  Or maybe I failed to help him understand the point.

That piece wasn't about saying "this play will result in a fumble".  It was about digging into the limited data available to identify relationships between separate events that might be exploitable.  What I found was that fumbles on punt returns occur far more often than they should if they happened at the same frequency as punts.  They don't, and that is an exploitable nugget of information.  A coach could take that to heart and realize that he needs to place more emphasis (read: time, practice, and coaching) into the act of catching and returning a punt.

Whether the analytics are the low budget work I'm doing or the amazing technology gathering and analysis that companies who went to the Sloan conference are engaging in; we are trying to do the same thing...uncover the hidden information in the game so coaches and players and make better informed decision.

GBR!

Paul

Wednesday, March 6, 2013

The Heat is on

This is a heat map of all FBS seasons from 2002-2012.  A team's average PF are on the Y-axis and average PA are on the X-axis.  The average power ranking by Jeff Howell is the value.  It sort of illustrates that if you score a lot of points and your opponents don't you will be highly ranked.

I marked where NU 2012 ended up.

So simple, yet so hard.


Does NU fit the profile of elite-win teams?



Ask any true Nebraska fan and he or she should be able to tell you, almost reflexively, how many win Tom Osborn always had.  “9” is a magical number to Cornhusker fans, and has become the de facto minimum standard of what Husker Nation will tolerate.

Tom Osborne, however, played more than 12 games per season only 6 times, and those were 13 game seasons.  He averaged 12.24 games per season over his career.  Bo Pelini has played 3 14-game seasons and 2 13-game seasons for an average of 13.6 games per season.  

Should that extra game and ½ mean that the minimum standard should be raised?  Is a 9-win season no longer the impressive feat that it was under Tom Osborne?  Both are subjective questions outside the scope of analysis based on statistics.  What is within that scope, however, is a look at where Nebraska is in relation to other 9, 10, 11, and 12 win teams.  With that in mind it might help frame the issue of whether Nebraskans want to hold Coach Pelini to a 10- or 11-win standard.

To do this I took all teams with 9 or greater wins from 2002-2012 and calculated the average PF and average PA for 9-, 10-, 11-, and 12- win teams.  There weren’t enough 13- and 14-win teams to draw statistical inferences from.  Using that info, I broke the teams into conference averages as well.  

I’ll skip the rest of the nerdy stuff and get right to the point.  

Finding 1:  Offensively, (particularly as a member of the B1G) Nebraska is well positioned to move into the realms of  10-11 win teams.  The 2012 Huskers performed at an offensive level that is well above the average for 9 and 10 win teams, slightly above average for 11-win teams, and right at average for 12-win B1G teams.  Despite NU’s turnover problems they scored a lot of points.


Finding 2:  Nebraska’s defensive performance this year is well-below average for even 9-win teams over the last decade.  For B1G teams, it is even worse.  Nebraska’s PA this year would be in the bottom 10% of 10 win teams, the bottom 5% of 9 and 12-win teams, and dead last for 11-win B1G teams.

The conclusion is clear:  Offensively, Nebraska matches the profile of elite win teams.  Defensively, Nebraska's performance does not merit consideration as an elite win team and will almost certainly preclude it from becoming one if it does not improve.  Bo Pelini’s emphasis needs to be on the defense next year.

Tuesday, March 5, 2013

Placing Fumbles in Context (Part 1)

The official statistics published by the NCAA list fumbles under the 'turnover margin' section.  For Nebraska, it records 22 fumbles lost (YIKES!) for 2012.  And that's basically it.  Other than to elicit the obvious response of "damn, that's a lot of fumbles" there isn't much more useful information there.

However, with a little digging I was able to come up with some more specific data on fumbles from 2010-2012 that helps place fumbles into a better context within the space and time of the game.

First, let's look at the types of plays that teams were running when fumbles occurred. There are really only four kinds of plays that can result in a fumble: rush (duh), reception, KO returns, and punt returns.
Imagine that a team is moving the ball from right to left...or from the South to North endzone if you're sitting in the pressbox at Memorial Stadium. Fumbled KO returns accounted for about 2/3 of fumbles in the endzone across the FBS, and it was about that for NU. Rush fumbles show a marked decrease as teams approach the end zone, but then increase significantly inside the 10-yard line. NU shows a similar trend. Generally, NU's fumbles appear to be similarly distributed by type and location as the rest of the NCAA's.

UPDATE - 2PM, 6 March

When I first wrote this analysis I failed to break down the actual number of plays (all plays) for each down.  Because of that, I may have drawn exactly the wrong conclusion about fumbles by down.

First down accounts for 38.4% of all FBS plays, but only 33% of fumbles.  That means that 1st down is actually the safest down when it comes to fumbles, rather than the most dangerous as I assessed yesterday.  Third down is slightly more dangerous, but the real fumble danger zone is 4th downs and punt returns.  Fumbles occur on 4th down and punt returns almost twice as often as they should if they occurred at the same frequency as the plays are run.

DownNCAA FumAll NCAA Plays
1
32.95%
38.42%
2
27.60%
28.73%
3
19.59%
17.96%
4
3.12%
1.88%
P
13.61%
6.16%
K
6.25%
6.85%
Grand Total
100.00%
100.00%

Using the methodology to calculated a 'lost points' value for turnovers I proposed in this post, the average 'lost points' for each play type looks like this:

Fumble PlayAve Lost Points
KO Ret-6.14
Punt Ret-5.6
Rec-5.71
Rush-5.73
Overall Ave-5.74

While the other play types have nearly identical average values, KO returns are different, and cost more on average. 
While 1st and 2nd down are clearly the 'danger zone' for fumbles, the chart above shows that fumbles on returns are a not-insignificant portion of all fumbles, and considering the preponderance of return fumbles deep in an opponents' territory, I think this is presents strong evidence to justify spending extra time on safely fielding returns. 

I was surprised when I first saw this chart and how clearly it illustrates that fumbles are more likely to occur on 1st and 2nd Downs than on 3rd downs. 

This can be accounted for somewhat by greater number of 1st and 2nd downs in a game, but I don't think that is enough to account for the entire difference. Something appears to be happening here. Does knowing that you have a fresh set of downs lend itself to a false sense of security? Are coaches calling riskier plays on 1st and 2nd downs? I don't know the answer, but it's worth looking into I think. 


 When I compare NU's by-frequency to the NCAA's, I see this:


DownNCAA %NU %
1
32.95%
27.08%
2
27.60%
27.08%
3
19.59%
25.00%
4
3.12%

K
6.25%
6.25%
P
10.49%
14.58%



Nebraska's fumbles are much more uniformly distributed across 1st, 2nd, and 3rd down situations.  They fumbled slightly higher percentage of punts than the NCAA average, and they had a significantly greater percentage of fumbles coming on 3rd down.  This may be another needed point of emphasis for the coaching staff to address in 2013.  While this analysis doesn't cover this, I suspect that this symptomatic of NU's generally poorer execution on 3rd down across all offensive areas.  Also, NU's fumble percentage for punt returns is even higher than the already-high FBS average.

Stay tuned for Part 2 where I'll break down fumbles by Distance-To-Go, Player Position, and Game Quarter.  

GBR!

As always, you can download the data supporting my analyses.

The I used data in this analysis from cfbstats.com and Rivals.com.

@paul_dalen