/cdn.vox-cdn.com/uploads/chorus_image/image/46270876/GettyImages-163779062.0.jpg)
The big question going into the Selection Sunday event was what combination of Brown, Georgetown, Princeton, and Ohio State would make the NCAA Tournament field as an at-large selection. There was a ton of chatter on the issue, and the stock response I gave to everyone was, "It's going to be close." There was no other reasonable answer: The delineations between the quartet of teams was minuscule, the eventual decisions made on the candidacy of the four teams turning on the application of the selection criteria dictated to the Selection Committee.
The selection criteria that the NCAA has developed is a good enough way to construct a tournament field but it isn't an exceptional method for determining the relative strength of teams. The RPI remains a highly incomplete mechanism for ranking teams, especially because the measure does not consider margin of victory (the RPI's lone inputs are wins and losses, and study after study has determined that margin of victory is an important factor in valuing a team's strength). The strength of schedule measure that the NCAA utilizes is built from the RPI, a shaky foundation in and of itself. The NCAA's quality wins and losses analysis turns on the RPI and overall record is inherently misleading as it does not consider strength of schedule. The Selection Committee is also charged with considering head-to-head results, but a head-to-head outcome only captures a moment in time and does not consider whether a consequence was an upset or how often Team A is likely to topple Team B. It is a finite metric that exists in a single moment of time.
Again: These are all vehicles to populate a bracket, but they're not necessarily methods to determine the relative strength of teams. This became important in the bubble conversation this past weekend, with the Bears, Hoyas, Tigers, and Buckeyes all with comparable criteria-focused resumes. Assuming that the selection criteria was jettisoned into space for all of eternity and the Selection Committee was permitted to adopt another method to pick two of these four teams to fill out the rest of the championship bracket, which duo was the strongest of the foursome?
To address that issue, I fired up the Massey Ratings prediction machine. Ken Massey uses a well-respected system for analyzing the performance of teams in a cohort. Massey uses score, venue, and date of games to calculate his ratings and creates a rating based quantity that is the result of applying a Bayesian win-loss correction to his power ratings (in other words, his model makes sense). Assuming that Brown, Georgetown, Princeton, and Ohio State would meet on a neutral field, the 12 possible matchup scenarios led to the following win probability in each hypothetical meeting:
BROWN | GEORGETOWN | PRINCETON | OHIO STATE | |
BROWN | XXXXXXXXXX | 54% | 50% | 56% |
GEORGETOWN | 46% | XXXXXXXXXXX | 45% | 51% |
PRINCETON | 50% | 55% | XXXXXXXXXX | 56% |
OHIO STATE | 44% | 49% | 44% | XXXXXXXXXX |
How to read this: Focus first on the rows, then the columns. For example, Brown's win probability against Georgetown is 54 percent; Georgetown's win probability against Brown is 46 percent.
That's absolutely insane: Every single one of the matchups is a toss-up game with Brown-Ohio State and Princeton-Ohio State on the fringe of being a toss-up scenario. Massey's model is essentially offering that these are all evenly-matched teams with Brown and Princeton having a slight advantage over Georgetown and Ohio State. In a year in which there were highly-fractured debates over the resumes of the four schools with little difference in their profiles, a science-infused analysis confirms that these teams were frighteningly similar in the context of their relative strengths. Utilizing the theory of "If these teams played 100 times, how many times would Team A win?" doesn't provide rubber-stamped clarity to the situation. These teams were stupidly close to each other, although it is interesting that the Selection Committee went with Brown and Ohio State instead of Brown and Princeton given that the Bears and Tigers would, in a log5 tournament where the teams were seeded based on their rank in the Massey Ratings, come out with more victories than the Hoyas and Buckeyes in Massey's model:
RANK | TEAM | WIN PROBABILITY |
1. | Princeton | 29.29% |
2. | Brown | 28.43% |
3. | Georgetown | 21.91% |
4. | Ohio State | 20.37% |
The Selection Committee was ultimately choosing between similarly situated teams, none of which was dominantly stronger than another. Princeton probably has a beef that it was excluded from the field almost exclusively on the Tigers' head-to-head loss to the Bears -- Princeton is a 50-50 opponent to Brown and was slightly more powerful than Georgetown and Ohio State -- but it's not like the Tigers heavily distinguished themselves from the Hoyas and Buckeyes: All four teams would win a four-team tournament between the schools around a quarter of the time each. This makes my brain hurt.
The takeaway here, I think, is pretty straightforward: While the selection criteria adopted for composing the bracket isn't bulletproof, it's not like the Selection Committee would have had an easier time completing the field using advanced metrics that focus on the strength of teams and their performance aspects. This was still an impossible decision to make, and Princeton suffered a bit due to the application of the selection criteria rules and Ohio State was the beneficiary of some questionable mandatory guidance.