Article

College football playoffs, statistical modeling and smarter marketing decisions

There’s a high probability that if you’re reading this, you’re intrigued by the prowess of statistical modeling, or you’re a college football fan or maybe you just want to know how statistical modeling can create a competitive advantage that will inspire strategic business decisions. If it’s the later, stay with me. This is worth the read. I promise.

There’s power in understanding consumer spend patterns for your business. From a marketing perspective, you can reduce spend and implement a hyper-focused marketing campaign that targets consumers during periods when they have more discretionary income and are more likely to purchase a product. And—spoiler alert—I ran a model for one of our clients that recognizes its consumer spend pattern and we created a data-driven campaign that worked.

So, what does this have to do with college football, you ask? Well, I used the exact same statistical model to determine the teams that were selected for the college football playoffs (CFPs) as the one I used to determine if sales performance increases (and to what extent) during weeks when consumers are paid compared to weeks when they’re not. Read on.

I set out with a simple goal, to accurately predict the teams that will make the CFPs. The nine-month journey began with wading through four years (2014-2017) of university- and team-specific data for all 129 Division 1 college football teams, and I came out with a powerful algorithm that helps understand the CFP at new depths. (Yes, I’m a nerd, and I know how to have fun in my free time.)

The purpose of this research is to establish a statistical model that:

  • Identifies the CFP selection committee’s evaluation criteria and the significance (i.e. coefficient weights) given to each benchmark
  • Determines the four teams and respective rankings that should have made the CFP

But there are a plethora of reasons and approaches to create statistical models to understand and improve your business. More on that later.

The best place to start analysis that’s of a similar magnitude to this project is to establish a hypothesis. This focuses the research and subsequent analysis. The hypothesis of my CFP analysis was that the selection committee neither selected the correct teams nor ranked them correctly. If either part of my hypothesis was proven entirely or partially true, it would mean team matchups are subject to influence and may impact which team wins the national championship.

Now that we have a hypothesis in our back pocket, we can select a statistical model that works best for the data we have and for our theory. Unless the analysis is purely novel, we have permission to stand on the shoulders of giants and use previous research as a guide in the model-selection process.

Although my research was primarily novel—given that the CFP was only established in 2014—there’s similar research that predicted the 68 college basketball teams in the March Madness NCAA men’s basketball tournament. The NCAA research applied a Probit statistical model (i.e. binary dependent variables), and upon investigation, I found that the same model would be most effective to prove out my CFP hypothesis.

Once I selected the model, I moved onto gathering data to assess the CFP selection committee’s evaluation criteria and the significance given to each principle within it.

The good news is, there wasn’t much guesswork in determining which data attributes to gather—the CFP selection committee publishes its information on the web. I also gathered and analyzed some publicly available information put out by the CFP selection committee and sports analyst. And, I gathered and evaluated school and team-specific information in order to detect a voting bias.

Once I gathered and verified all the data from all my sources, it was time to harmonize it. This is where things get technical, but don’t worry, I’ll break it all down.

  • Using Callahan’s Intelligence Platform statistical software, I ran a multivariate Probit regression analysis
  • After performing diagnostics to stabilize the data model, I removed variables until those that were left were statistically significant
  • The end result was a highly predictive model/algorithm (McFadden R-Squared of ~ 0.86)

The results of the analysis indicated that conference champions, strength of schedule, point value of largest loss, average point differential throughout the season and a school’s rights and licensing revenue earned will statistically influence which team makes the college football playoffs. This is the first hint of bias in the selection committee voting process.

The results also demonstrate that the playoff committee does in fact, give considerations, whether direct or indirect, to the profitability of having a particular school included in the college football playoffs. Schools with stronger brand positions are more likely to produce larger merchandise sales and game viewership than a school with a lesser brand position. This is magnified in the CFP, so the final team matchup is likely to include two teams with established brand positions.

In order to test the data’s accuracy and to take a deeper dive through reperformance, I pushed all university and team-related information through the model for each year.

This is where things get interesting. But, don’t worry, I’ll break it all down for you.

  • For 2014-2017, the statistical model agreed with 14 of the 16 teams (87.5%) the playoff committee selected, but only 3 of the 16 team rankings (18.8%).
  • In 2014, the statistical model predicted that Texas Christian University (TCU) should’ve made the playoffs, and that Oregon shouldn’t have.
  • In 2017, the statistical model predicted that the University of Central Florida (UCF) should’ve made the playoffs, and that Alabama shouldn’t have.

All this to say, the hypothesis was proven correct. And, if you were paying attention to the CFP landscape in 2014 and 2017, you’ll remember that these two cases were heavily debated.

So, stepping back into my day job as a business analyst at a marketing agency, our analytics group was tasked with understanding consumer spend patterns for one of our clients.

As a team, we brainstormed different ways to approach the challenge (e.g. creating test groups and monitoring them over time, etc.), and we determined the most efficient and cost-effective way would be to perform a statistical analysis by tagging paydays (based on federal paydays) up to and through the weekend (with consideration for bank holidays and when funds are available).

Then, we used the client’s sales data to tag pay periods as a binary variable (1,0), and used the Callahan Intelligence platform’s statistical software to perform a Probit regression analysis.

The results not only proved our hypothesis that customers, on average, spend more on our client’s products immediately after a payday, it also showed how much more they spend. Using this knowledge, we ran a highly targeted marketing campaign during pay weeks to increase sales performance, and then we paused the campaign during non-pay weeks.

We made a smarter, cost-conscious, strategic decision to improve our client’s bottom line, and we got to use a whole bunch of math!