Monday, March 12, 2018

Simulating the NCAA Tournament

This is March, as Jon Rothstein likes to remind us all, and with March comes the NCAA Mens Basketball Tournament! Bracket-mania is undoubtedly sweeping your social groups/school/workplace/inner cabal, as it is mine, and with it, questions of "How do I fill out my bracket?" "What upset should I pick?" "Why did my wife leave me?" "Who should I pick as my Final Four?" I too have been confronted with such questions, and since I have an extreme aversion to decision making, to avoid my phobia I created a March Madness Simulator to aid in my bracket making (and yours too)!

The simulator works using data from Kenpom.com, probably the single best college basketball analytics site (no, I'm not biased in giving someone at The Athletic a plug - I challenge you to name a better college basketball site anywhere). I scraped Adjusted Offense and Adjusted Defense scores for each tournament team, then feature-scaled each to a value of 0.5. Then, I subtracted the feature-scaled Adjusted Defense score from 0.5, to put the values in ascending order from best to worst (by itself, Adjusted Defense is best when it's lowest, but by subtracting it from 0.5, the best scores are the highest). Then, I added the two scores together to determine a teams' overall ability. We'll call this their "Power Score". The top 10 teams in the country by Power Score are displayed below.


SchoolPower Score
Virginia0.866
Villanova0.858
Duke0.829
Cincinnati0.803
Purdue0.799
Michigan St.0.797
North Carolina  0.781
Gonzaga 0.776
Kansas 0.758
Michigan 0.757

After calculating a power score for each matchup, I then set about simulating tournament brackets. To simulate a matchup, I first calculated a teams' expected odds of winning using Pythagorean Expectation. Ken Pomeroy discussed how an exponent of 10.25 is generally most accurate when dealing with adjusted scores such as his own, so I used these scores to calculate win probabilities. For example, let's say Virginia (Power Score of .866) played against Michigan (.757). To determine Virginia's win probability, the calculation would be:



So Virginia would win that game about 80% of the time that they played against Michigan.

For "last four in" teams, I approximated their power level by finding the average Power Score of teams ranked 65-75 in AdjEM by Kenpom, and then plugged that into the 64 team bracket.

To simulate a game between two teams, I used a random number generator to spit out a random percent. If the percentage is lower than the expected win percentage of team A, then team A is considered to have won the game. If it's higher than the expected win percentage of team A, then team B is considered to have won. If the random number generator spits out anything between 0% and 79.9%, then the simulation credits UVA with the win, but if the random number generator spits out a number over 79.9%, then Michigan gets credit for the win.

I scraped this year's bracket and then ran a simulation for each first-round game. Then, I took the winners from each of those games and pitted them against their appropriate opponents, and ran another simulation, and repeated until I had simulated the entire tournament.

Then I did that 99,999 more times.

The result? Masses of dead digital basketball players, killed from the exhaustion of being forced to play several million basketball games in the span of an hour, and round by round probabilities for each team! The results are as follows:

Odds to reach the round of 32:



SchoolRound of 32
Virginia (1)99.79%
North Carolina (2)99.31%
Duke (2)99.29%
Purdue (2)99.27%
Tennessee (3)98.00%
Kansas (1)97.95%
Michigan St. (3)97.49%
Cincinnati (2)97.43%
Villanova (1)97.06%
Auburn (4)96.10%
Texas Tech (3)96.05%
Gonzaga (4)94.77%
Wichita St. (4)94.68%
Michigan (3)90.75%
Xavier (1)88.16%
Ohio St. (5)88.10%
Arizona (4)84.77%
West Virginia (5)84.21%
TCU (6)82.12%
Florida (6)81.35%
Clemson (5)77.55%
Houston (6)76.28%
Kentucky (5)73.05%
Texas A&M (7)72.49%
Virginia Tech (8)62.54%
Nevada (7)62.26%
Creighton (8)61.38%
Butler (10)60.33%
Seton Hall (8)60.07%
Miami FL (6)52.57%
Florida St. (9)51.42%
Last In (11/16)51.30%
Rhode Island (7)50.06%
Oklahoma (10)49.94%
Missouri (8)48.58%
Loyola Chicago (11)47.43%
North Carolina St. (9)39.93%
Arkansas (7)39.67%
Kansas St. (8)38.62%
Texas (10)37.74%
Alabama (9)37.46%
Providence (10)27.51%
Davidson (12)26.95%
San Diego St. (11)23.72%
New Mexico St. (12)22.45%
Murray St. (12)15.79%
Buffalo (13)15.23%
South Dakota St. (12)11.90%
Montana (14)9.25%
Marshall (13)5.32%
UNC Greensboro (13)5.23%
Stephen F. Austin (14)3.95%
College of Charleston (13)3.90%
Georgia St. (15)2.57%
Bucknell (14)2.51%
Penn (16)2.05%
Wright St. (14)2.00%
Cal St. Fullerton (15)0.73%
Iona (15)0.71%
Lipscomb (15)0.69%
UMBC (16)0.21%

This is fairly straightforward - it's essentially the odds of one team winning while facing the other. Take Providence vs. Texas A&M for example. Texas A&M's Pythagorean expectation says that they should beat Rhode Island 72.48% of the time, and the simulated results of that game played over and over again have the Aggies winning 72.49% of the time - virtually identical as predicted.

Where could we see some first-round upsets? Xavier seems to be fairly weak for a #1 seed, and for what it's worth, they rank only 15th in the country in Power Score, but they're also up against either Texas Southern (#247) or North Carolina Central (#309 in the nation). Our approximated Last-In value severely overrates these teams - while Xavier is undoubtedly the weakest #1 seed, it seems like their struggles would probably not come against schools in the lower-half of skill nationwide.

There is some bona-fide upset material here though! Miami (6) won only 52.57% of their games against Loyola (11), and Butler (10) is actually favored heavily over Arkansas (7). Other than that, it's business as usual - good teams beat bad teams most of the time. It's just the way it happens.

Onto the Sweet Sixteen, where things start getting dicey.

Odds to reach the Sweet Sixteen


SchoolSweet Sixteen
Virginia (1)92.70%
Duke (2)91.78%
Villanova (1)90.33%
North Carolina (2)83.07%
Purdue (2)83.05%
Cincinnati (2)81.51%
Michigan St. (3)78.68%
Tennessee (3)75.09%
Kansas (1)74.91%
Xavier (1)65.75%
Texas Tech (3)65.40%
Gonzaga (4)64.02%
Michigan (3)60.21%
Auburn (4)55.19%
West Virginia (5)50.16%
Wichita St. (4)45.73%
Arizona (4)44.15%
Kentucky (5)43.43%
Clemson (5)39.16%
Ohio St. (5)34.17%
Houston (6)33.15%
Florida (6)31.65%
TCU (6)19.71%
Seton Hall (8)16.83%
Florida St. (9)16.11%
Missouri (8)14.73%
Texas A&M (7)14.35%
Miami FL (6)13.52%
Nevada (7)12.89%
Butler (10)11.53%
Loyola Chicago (11)11.28%
Davidson (12)9.67%
North Carolina St. (9)8.13%
Last In (11/16)8.11%
Virginia Tech (8)6.28%
New Mexico St. (12)5.43%
Arkansas (7)5.39%
Texas (10)5.29%
Creighton (8)5.24%
San Diego St. (11)4.95%
Oklahoma (10)4.12%
Rhode Island (7)4.04%
Murray St. (12)3.79%
Buffalo (13)2.75%
Providence (10)2.56%
Alabama (9)2.53%
Kansas St. (8)2.07%
Montana (14)1.68%
South Dakota St. (12)1.15%
UNC Greensboro (13)0.66%
Stephen F. Austin (14)0.41%
Marshall (13)0.32%
Georgia St. (15)0.30%
Bucknell (14)0.30%
College of Charleston (13)0.22%
Penn (16)0.13%
Wright St. (14)0.11%
Iona (15)0.05%
Cal St. Fullerton (15)0.03%
Lipscomb (15)0.03%
UMBC (16)0.00%
As one might expect, the top 16 teams are heavily favored to make the Sweet Sixteen, especially compared to the rest of the field. The worst 4th seed, Arizona, still made the Sweet Sixteen in 44% of the simulations. This isn't to say that the field will be solely the top 16 teams - only that they showed up the most often. 

Note the drop-off in percentage appearance, however, for some of the favored upset teams like Butler and Loyola - both teams make it to the Sweet Sixteen just 10% of the time! Why? If Butler wins, they have the pleasure of running into Purdue (2) on the way to the Sweet Sixteen 99% of the time, and if Loyola wins, they usually run into Tennessee. Ouch.

Did someone say "Elite Eight"? It's Elite Eight time.

Odds to reach Elite Eight


SchoolElite Eight
Virginia (1)81.78%
Villanova (1)76.24%
Purdue (2)60.35%
Duke (2)60.28%
Cincinnati (2)60.22%
North Carolina (2)52.72%
Kansas (1)46.82%
Gonzaga (4)44.47%
Michigan St. (3)34.31%
Xavier (1)28.81%
Michigan (3)28.76%
Tennessee (3)28.52%
Auburn (4)25.71%
Texas Tech (3)24.89%
Ohio St. (5)19.40%
Clemson (5)17.60%
Houston (6)12.67%
West Virginia (5)11.87%
Wichita St. (4)8.86%
Florida (6)8.39%
Kentucky (5)7.57%
Arizona (4)6.82%
Seton Hall (8)6.32%
Nevada (7)5.29%
Butler (10)4.56%
Texas A&M (7)4.48%
TCU (6)4.02%
Florida St. (9)3.59%
Missouri (8)3.11%
Miami FL (6)2.44%
North Carolina St. (9)2.43%
Creighton (8)2.18%
Virginia Tech (8)2.05%
Loyola Chicago (11)1.89%
Texas (10)1.62%
Arkansas (7)1.57%
New Mexico St. (12)1.12%
Davidson (12)0.87%
San Diego St. (11)0.82%
Last In (11/16)0.77%
Kansas St. (8)0.68%
Rhode Island (7)0.65%
Oklahoma (10)0.64%
Alabama (9)0.59%
Providence (10)0.39%
Murray St. (12)0.26%
South Dakota St. (12)0.19%
Montana (14)0.17%
Buffalo (13)0.11%
UNC Greensboro (13)0.09%
Stephen F. Austin (14)0.02%
Georgia St. (15)0.02%
Bucknell (14)0.01%
College of Charleston (13)0.01%
Marshall (13)0.01%
Penn (16)0.01%
Wright St. (14)0.00%
Iona (15)0.00%
Cal St. Fullerton (15)0.00%

Failed to reach Elite Eight in simulations: Lipscomb (15), UMBC (16)

Our first casualties! In all 100,000 simulations, neither Lipscomb nor UMBC reached the Elite Eight at any point. Some, like Cal State Fullerton, Iona, and Wright State, made it only once. And then there's Virginia and Villanova, each reaching the Elite Eight in more than 75% of simulations (if you don't have either team in your bracket, you need to take a long, hard look at yourself in the mirror).

I find it surprising (and at the same time, unsurprising) that Kansas has relatively poor odds of reaching the Elite Eight compared to their fellow #1 seeds - it's the fate of sharing a bracket with Duke.

It's the Final (Four) Countdown! Ba-da-da-da, ba-da-da-da-da, ba-da-da-da...

Odds to reach Final Four



SchoolFinal Four
Virginia (1)61.51%
Villanova (1)56.56%
Duke (2)46.30%
North Carolina (2)31.78%
Purdue (2)25.65%
Gonzaga (4)24.39%
Cincinnati (2)24.09%
Michigan St. (3)23.67%
Kansas (1)15.88%
Michigan (3)15.27%
Xavier (1)12.34%
Ohio St. (5)7.91%
Tennessee (3)7.38%
Texas Tech (3)6.66%
Auburn (4)6.54%
Houston (6)5.43%
West Virginia (5)4.84%
Clemson (5)4.28%
Wichita St. (4)3.12%
Kentucky (5)2.60%
Arizona (4)2.13%
TCU (6)1.56%
Florida (6)1.55%
Texas A&M (7)1.26%
Seton Hall (8)1.04%
Nevada (7)0.87%
Butler (10)0.81%
Florida St. (9)0.73%
Missouri (8)0.62%
Creighton (8)0.51%
Virginia Tech (8)0.48%
North Carolina St. (9)0.28%
Miami FL (6)0.28%
Loyola Chicago (11)0.20%
Texas (10)0.18%
Arkansas (7)0.18%
Rhode Island (7)0.17%
Oklahoma (10)0.16%
San Diego St. (11)0.15%
Davidson (12)0.14%
Kansas St. (8)0.10%
New Mexico St. (12)0.10%
Alabama (9)0.09%
Last In (11/16)0.08%
Providence (10)0.05%
Murray St. (12)0.03%
South Dakota St. (12)0.03%
Montana (14)0.02%
Buffalo (13)0.01%
UNC Greensboro (13)0.01%

Failed to reach Final Four in simulations: Lipscomb (15), UMBC (16), Cal State Fullerton (15), Wright State (14), Penn (16), Marshall (13), College of Charleston (13), Bucknell (14), Georgia State (15), Stephen F Austin (14), Iona (15)

Almost all of our 14+ seeds have fallen! And in true #GoACC fashion, three of our top four teams are ACC teams. There's a considerable gap between the top three teams (Virginia, Villanova, Duke) and the fourth best team (UNC). In bracket building, it looks like the Final Four will most likely consist of our top three, plus a surprise mystery team (BAW GAWD, THAT'S RHODE ISLAND'S MUSIC!).

We're almost there! Which teams reach the championship game?

Odds to reach Championship Game


SchoolChampionship Game
Virginia (1)48.43%
Villanova (1)38.35%
Duke (2)24.71%
Cincinnati (2)15.15%
Purdue (2)13.04%
North Carolina (2)11.35%
Michigan St. (3)10.12%
Gonzaga (4)8.38%
Kansas (1)5.22%
Michigan (3)4.57%
Tennessee (3)3.27%
Xavier (1)3.13%
Texas Tech (3)2.27%
Ohio St. (5)1.90%
Auburn (4)1.63%
West Virginia (5)1.60%
Houston (6)1.22%
Clemson (5)1.01%
Kentucky (5)0.97%
Wichita St. (4)0.84%
Arizona (4)0.70%
Florida (6)0.37%
TCU (6)0.32%
Nevada (7)0.26%
Butler (10)0.17%
Texas A&M (7)0.17%
Seton Hall (8)0.15%
Creighton (8)0.13%
Virginia Tech (8)0.09%
Missouri (8)0.09%
Florida St. (9)0.08%
Miami FL (6)0.06%
Texas (10)0.04%
Loyola Chicago (11)0.03%
North Carolina St. (9)0.03%
Arkansas (7)0.03%
Davidson (12)0.03%
Rhode Island (7)0.02%
Oklahoma (10)0.02%
Kansas St. (8)0.02%
San Diego St. (11)0.01%
New Mexico St. (12)0.01%
Alabama (9)0.01%
Last In (11/16)0.01%
Providence (10)0.00%
Murray St. (12)0.00%
Failed to reach Championship Game in simulations: Lipscomb (15), UMBC (16), Cal State Fullerton (15), Wright State (14), Penn (16), Marshall (13), College of Charleston (13), Bucknell (14), Georgia State (15), Stephen F Austin (14), Iona (15), UNC Greensboro (13), Buffalo (13), Montana (14), South Dakota State (12)

I hope you have UVA or Villanova in the title game - there's a 68% chance to see either team in the title game according to our simulations.

Alright, here's what you've been waiting for: how frequently teams have won our NCAA championships!

Odds to win Championship Game

SchoolChampions
Virginia (1)30.38%
Villanova (1)23.48%
Duke (2)13.23%
Cincinnati (2)6.79%
Purdue (2)5.85%
North Carolina (2)4.36%
Michigan St. (3)4.58%
Gonzaga (4)3.13%
Kansas (1)1.68%
Michigan (3)1.41%
Tennessee (3)0.90%
Xavier (1)0.83%
Texas Tech (3)0.64%
Ohio St. (5)0.48%
Auburn (4)0.41%
West Virginia (5)0.44%
Houston (6)0.29%
Clemson (5)0.25%
Kentucky (5)0.20%
Wichita St. (4)0.17%
Arizona (4)0.16%
Florida (6)0.08%
TCU (6)0.07%
Nevada (7)0.04%
Butler (10)0.02%
Texas A&M (7)0.03%
Seton Hall (8)0.02%
Creighton (8)0.01%
Virginia Tech (8)0.01%
Missouri (8)0.01%
Florida St. (9)0.01%
Miami FL (6)0.01%
Texas (10)0.01%
Loyola Chicago (11)0.00%
North Carolina St. (9)0.00%
Arkansas (7)0.00%
Davidson (12)0.00%
Rhode Island (7)0.01%
Oklahoma (10)0.00%
Kansas St. (8)0.00%
San Diego St. (11)0.00%
New Mexico St. (12)0.00%
Alabama (9)0.00%
Last In (11/16)0.00%
Failed to reach Championship Game in simulations: Lipscomb (15), UMBC (16), Cal State Fullerton (15), Wright State (14), Penn (16), Marshall (13), College of Charleston (13), Bucknell (14), Georgia State (15), Stephen F Austin (14), Iona (15), UNC Greensboro (13), Buffalo (13), Montana (14), South Dakota State (12), Providence (10)

Virginia, Villanova, and Duke have double-digit championship percentages, and everyone else is playing for hope. All I can do is pray for someone to knock off Duke and save us from misery.

If you're interested in the code and raw figures I used to calculate this, it's up on github here! Have fun with it, and as always, to HELL with georgia!

No comments:

Post a Comment