Podcast

A predictive model that forecasts Oscar winners in advance? Inconceivable!

Callahan’s senior business analyst used his spare time once again (football season is over) to construct an algorithm to predict which movies have the highest chance of winning “Best Picture” at the 2019 Oscars. To do this, James used a predictive model that looked for trends and key indicators that in the end, created a formula he says is 86 percent accurate.

These same principles can be applied with a business’ data to create a predictive model that forecasts future results and trends by using a combination of traditional sales data merged with a creative combination of other additional data sets. In this episode, James and Eric explain the correlation between their Oscars model and how that same approach could uncover aha when applied to business and marketing challenges.

Listen here (Subscribe on iTunesStitcherGoogle PlayGoogle PodcastsPocket Casts or your favorite podcast service. You can also ask Alexa or Siri to “play the Uncovering Aha! podcast.”):

Welcome to Callahan’s Uncovering Aha! podcast. We talk about a range of topics for marketing decision-makers, with a special focus on how to uncover insights in data to drive brand strategy and inspire creativity. Featuring James Meyerhoffer-Kubalik and Eric Melin.

Eric:
Welcome to Uncovering Aha, the Callahan podcast about business intelligence analytics. Today, you may notice a different voice. I am not Jan-Eric Anderson or Zach Pike. I am Eric Melin, Senior Social Media Community Manager at Callahan. And joining me today for this special episode is our senior business analyst, James Meyerhoffer-Kubalik. Welcome.

James:
Thank you. Thank you for having me.

Eric:
Today we are going to talk about predictive analysis and this is kind of a jumping off point from a podcast from a month ago that, and if everybody can go back and listen to if you’d like, where James put together an algorithm, if you will.

James:
Correct.

Eric:
And again, I am not the business intelligence guy so I’m going to moderate this discussion and let James talk with authority about it. But he put together a algorithm to predict which teams were going to make it into the college football playoffs. And the reason that you did this if I’m not mistaken, is to save yourself a certain amount of suffering because you’re a huge Ohio State fan.

James:
Yeah, it hurts at times.

Eric:
And you wanted to know whether they actually had a chance of making it. And what was the conclusion?

James:
They had no chance, based on their loss to Iowa by 29 points.

Eric:
And through this, we’re kind of using these fun one-off things that James is building as a way of talking about the importance of predictive analytics when it comes to your business. Right? So this is one of the things that we do here at Callahan. It’s one of our differentiators, if you will, that we are able to take all the inputs from a company. And then when we have the variables that we’re putting in about what they’re going to do for their business plan going forward, we kind of know what to expect and how to improve it.

James:
And I think the key there is it’s unbiased.

Eric:
Awesome. Well, that brings us to today’s episode in which we’re going to do another fun-one off algorithm discussion. And this one is about the 2019 Academy Awards, the 91st.

James:
The 91st.

Eric:
Academy Awards. And the reason I’m hosting this episode is because I’m also a film critic. I’m the president of the Kansas City Film Critic Circle. I’ve been doing this for about 15 years, in my spare time. Rotten Tomatoes certified, another algorithm problem, which we can get into later, actually discussion. But James has built an algorithm that he is trying to predict what movie is going to win Best Picture this year at the 2019 Oscars. And before we get into this year, tell us a little bit about the one that you created a couple years ago.

James:
Okay. Correct. So in 2017, as part of a class at Wichita State, we were creating kind of an algorithm for Best Picture. Some of the things we use where IMDB rating, the IMDB category, Golden Globe wins, Golden Globe nominations. We put those all in, went from 1985, that’s back to the future talk forward. And we came out with a model that was 36% accurate I would say. While that was okay at the time, there was always that thing in back of my mind. I was like, “Well, we can do so much better.” Kind of fast forward in a year and a half when I started at Callahan, I met you and in the back of my mind a little Leprechaun, it was like, “Oh, we can do better. We can do better.” And so to that point this year I met with you, we talked about the process, we talked about changes in the process and then I got your input was very valuable and I was able to kind of create, go back from 1985 and gather more variables. I actually didn’t try and travel back in time, just went back to 1995 data-wise.

Eric:
Your flux capacitor is no longer functioning.

James:
No, it’s broken. So like I said, through your input, we were able to understand what you thought was a good predictor. So based on past awards, based on box office and things like that, outside of the actual voting process. And we were able to create a very predictive model.

Eric:
It’s funny because when you first mentioned this to me, I was like, “Ooh, I’m all in for this.” And then you showed me the inputs, the variables that you were using. And one of the first things I mentioned was the Critic’s Choice Awards because I used to be a member of the BFCA. I know that as a predictor of the Oscars in recent years, there’s nothing that’s more accurate. When I saw that you didn’t have that in there, we started talking, I started talking crap on the Golden Globes and how terrible they are. So you added in all of these other predictive award shows, so to speak, not even shows, Guild Awards as well, like the Producer’s Guild, the Directors Guild and whatnot. And what you came up with was really great because we talk about removing bias and I have bias. I think the Golden Globes are trash. And when you found out, through your data analysis, tell me about what the Golden Globes mean for the Oscars in the way that you discovered it going back that far.

James:
Sure. So to your point. What we did is I’ll just kind of go briefly through the process. I brought in all these variables and I bucketed the years based on some 1985 to 1995. I cut it. I went 1996 to 2010. And then we talked about a change that happened in 2011, Ford. So I-

Eric:
And that is in the way that they vote, the weighted ballot.

James:
Correct.

Eric:
And it’s really important because the Academy of Motion Picture Arts and Sciences is a closed body and we are trying to predict the winner that a closed body of people votes on.

James:
Correct.

Eric:
Go ahead.

James:
Correct. So after that, what I was able to do is run this analysis on those different sections of years. And kind of what fell out, so you have to do a lot of diagnostics to your model to stabilize it and remove all the errors and whatnot in your model. But kind of what came out of with the Golden Globes is I landed on Golden Globe nominations. What I found out there is if you are nominated for the Golden Globes, you’re more likely to win the Oscar Best Picture. But if you win the Golden Globes, it actually has a negative effect. So you’re less likely to win. So go figure.

Eric:
Well, this is great because I have a bias. Right. My bias is I don’t like the Golden Globes and they proved again this year why they’re a terrible voting body, according to my taste, right. I’m not a big fan of Green Book or Bohemian Rhapsody. So I kind of was laughing at those. But what you’re saying is that through the data my bias was confirmed as a predictor of the Oscars in one respect as getting a nomination, it’s great, but when you actually win it’s not so great.

James:
Correct. Correct.

Eric:
So that’s, that’s exciting for me because I would like nothing more than for Green book and Bohemian Rhapsody to not win Best Picture this year as a film critic

James:
And I’m not biased so I don’t have an opinion on that.

Eric:
Exactly. That’s why we do this.

James:
Yep.

Eric:
Let’s talk a little bit more about some of the other variables and how you design this algorithm before we get to your actual predictions.

James:
Okay. So the top weighted variable, so that would be the most important, and you kind of saw this, but Nate Silver uses this as well. So one of the things we did is we went out and researched past models. We tried to understand the limitations of past models. We did see a lot of restrictions and I wouldn’t hate to call out Nate Silver, but what I noticed he did is he did 25 years worth of data, but he didn’t cut them where he should have. He just include them all. So when you do that, you really reduce the true impact or weight of the variable, so what it’s importance actually is. By cutting it from 2011 forward, we were accurately able to reflect today instead of having the past weighted down.

Eric:
And again, what you’re trying to get at here is that from 2011 forward, the voting body is more like the voting body today.

James:
Correct.

Eric:
Whereas, in the past, many of the voting body, who has always been accused of being white and old and male, has died.

James:
Right.

Eric:
And so they’re no longer voting,

James:
Right Right. So you wouldn’t want that past to really taint the present. And so that’s why you’d want to create separate models so you can see how the variable changes over time. So then I could really tell you a story that “Hey, this was important back then, but it’s not important anymore.”

Eric:
Gotcha.

James:
So one of the things I thought was most interesting was that, and you’re going to know more about this, of course, than I, the DGA for outstanding director. So whatever film won or the director of that film, that’s the highest indicator of who is most likely to win Oscar’s Best Picture.

Eric:
That’s interesting.

James:
I don’t know if you have context to that?

Eric:
Well, certainly it makes sense, although the Best Picture and Best Director have split more times in recent memory in the last decade than ever before. What you’re saying is that if the director won the DGA, the chances are far greater than their film will win Best Picture. In other words, winning the DGA is not a predictor of the Best Director Award, but rather a predictor of Best Picture.

James:
Correct.

Eric:
That’s really interesting.

James:
Another one we had is Critic’s Choice Award, Best Picture. That was kind of a second weight of importance there. Then we had the Producer’s Guild Award for Best Picture was the third. And then we went to the Golden Globe nominations, which we talked about earlier. Another one which I thought was interesting was the box office. At the box office average at the time the Oscars took to place, so what we did there is we divided what that total was divided by the number of weeks running to get kind of the average at that time. Otherwise, movies that have been running longer would kind of skew that. What that told us is that movies, that on average, gross more weekly do worse when considering Oscar’s Best Picture. So you have those examples I think we talked about in the past, like Avatar. I can’t remember who won-

Eric:
That lost to Hurt Locker, which I believe made $9 million, and Avatar is the biggest motion picture of all time.

James:
Right. So what this says is that, through this statistical analysis, is it understood how the system moved and that was typically what happened. So to me, just looking at the data, without a bias, I see that it’s kind of a obscurity thing where they might, are more likely to vote on things that are more obscure, based on the box office figures.

Eric:
Well, this is an outlier year for sure. And before we reveal your pick for Best Picture, let’s say that this variable, in particular, has affected the model. And the reason is that Roma is a movie this year that’s on Netflix, and it’s one of the top contenders. It’s already won the DGA, as you said, it’s already won the Critic’s Choice Award. It didn’t win the PGA, but we’re considering it very high on this algorithm for some of those other reasons. And the box office throws it off because Netflix did not put this film in theaters in a way that theaters would then report their box office returns. Netflix is very mysterious and they hold onto all of their data. So what they did was they bought theaters, they bought full houses of theaters regardless of the amount of people that came and they paid the theater to, in essence, rent the auditorium.

James:
Correct. Yeah.

Eric:
Right? So without any box office returns from Roma, this throws this part of the algorithm off a lot.

James:
Correct. And even I did see some numbers I saw out there was like $3.3 million. But even though I wouldn’t be able to accept that because of kind of what we talked about. Now, you have a company with capital as opposed to an individual consumer going and paying and having that choice to go see it or not, so that you’re taking that consumer out of the picture and bringing in really a titan, a company with capital. That kind of skews those figures. So I would never want to bring it in any way because they wouldn’t be that bias to that number.

Eric:
And there’s no way, in the history of the Oscars up to this point, to compare box office receipts with what Netflix might call eyeballs at home. Right. And I believe that the way that Netflix measures this at home is by accounts that watched it, which is funny because they’ve also been accused of putting out inaccurate stats because so many people share the same Netflix account. Right? So even then if we say, “Two million Netflix accounts worldwide saw the film over the weekend,” that could mean that up to six million people watched it.

James:
Correct.

Eric:
Depending on how many people were in the room or used the account. And then the other thing is how long did they watch it? Does Netflix give us starts? How much time did they watch it for? I have a lot of friends that watched this movie and shut it off after a half hour because it’s black and white. It’s in Spanish mostly, and it’s really slow. I saw it in the theater twice and thought it was beautiful. It’s my number one of the year. So I love Roma and I would love for it to win Best Picture. But for a lot of people, the Netflix home experience may not be the best way to see it, which is really ironic. So knowing all that and me being very upfront about my bias, let’s talk about the results.

James:
Sure. So when we did include box office weekly averages the kind we talked about before that had a negative impact because it’s looking like based on the system that people are voting for more obscure films for Best Picture. So when we include that in there, because we can’t really account for any domestic box office for Roma, Roma actually comes out ahead 48.7% of first-place votes is how you would read this compared to Green Book at 45%. So they’re like neck and neck.

Eric:
And before we get any further, when [Ampus 00:15:19] votes, the first thing that qualifies you to win Best Picture is if you have over 50% of first-place votes. The moment they keep going down the weighted ballot and the moment they come up with something that has 50% or more of first-place votes, that is the one that’s declared Best Picture. So that’s another reason why this is a really great model.

James:
Right. Right. So then we kind of had a talk about the importance of this year and how it’s an anomaly, right? We know that, as we talked about box office with Roma, so then your kind of challenge, was to me, was what about excluding them altogether as a factor? So we did. As we do, a state of people, we ran another model because we love it. It was interesting the Green Book and Roma flipped, but so it actually kind of re-weighted the whole group of nominees. And so Green Book was the number one at 21.66% of first-place votes to Roma at 21.08%.

Eric:
That is about as close as you can get.

James:
Yes, yes. And like I said, this year’s an anomaly so we use the past to predict the future. Well, even though we have confidence in the past as our model was .86 accurate or 86% accurate, this year’s anomaly, which kind of throws a wrench into things. But we can still reduce that risk and uncertainty to an extent to feel more confident about our picks. Whether we’re in Oscar’s pool or whatnot. We can definitely use this to be more confident in our predictions.

Eric:
Tell me about .86% and what that means to you. When you say that that’s what we’re scoring at right now.

James:
Sure. Yeah. No, that’s a good point because I’m really an economist/statistician. What that means, in my world, is that the variables that I set are highly predictive or indicators of the Oscar Best Picture. Those factors account for 86% of the variability in the dependent factor or whether they won Best Picture. So all those variables combined that I used, whether they are statistically significant or if they were controlled variables, they account for that. So having a 100% is like you have a mathematical identity. So this is saying that it’s close to a mathematical identity, which is essentially a sure thing. So that’s what it’s saying. So it’s saying that there’s 14% out there. I could probably find some other variables. I’ll never be able to get to a 100% that I could bring into continuing to improve my model. But my model did improve from 2017 from .36 to .86 and based on my conversations with you and becoming more educated on how I should split the data and everything else like that. So I feel really confident about my model.

Eric:
Okay. So let’s go over the results one more time for everybody. We have two sets of results. Let’s talk about the first one. With the box office and everything else in there, we have Roma at the top, at 48.7. We have Green Book right under it at 45%. And then Black Klansman, the favorite, Vice, A Star Is Born, Bohemian Rhapsody and Black Panther all have three, two or less than 1%. And so they’re not even in the race anymore, which is funny because when you, before you spit out this data, I thought A Star Is Born and, maybe slightly, Black Klansman were in there as possibilities. But what you’re saying is this is a two-film race in both of your analysis?

James:
Right. So based on kind of how they fell out, whether they won the various awards that we talked about, that were statistically significant or control variables, a lot of these films didn’t win those. So that’s where you get below that 1%, and it kind of knocks them out of the race altogether.

Eric:
Okay. So Roma by just over 3%, if we are considering box office. If we remove box office, which we have to statistically, because we don’t have it for one of the films, right? So we have to remove it as a variable and now we have Green Book on top at 21.66, and Roma right under it at 21.08.

James:
Only if those films could like kind of hold hands and take number one together. That’s essentially as close as it is.

Eric:
Oh boy, Peter Farrelly and Alfonso Cuaron.

James:
Wouldn’t that be great if they tied?

Eric:
Yeah, so okay. But then right underneath it, this is also interesting, because now instead of having single digits that go from three, two to zero almost immediately the other films are a little bit higher. Black Panther rates third at 14.8. Vice rates fourth at 14.6 and then we have Black Klansman and A Star Is Born, which were the two that I thought had more of a chance at 9.54, the favorite at 6.8. And this warms my heart, I love this Bohemian Rhapsody 1.93%. One of my least favorite movies of the year. And God, I love Queen, but boy did they ruin that story. So anyway, we’re going to wrap things up. But at this point, I guess what we’re trying to get at is that when you put together a predictive model like this, it can be really, really useful for your business because a lot of people go with their gut.

Eric:
And that’s what I did. Right? And that’s what I do every year when I pick the Oscars. And that’s why I lose my Oscar pool every year because I go with my gut, and I let my bias affect things. Oftentimes, I’ll pick this actor because I like this performance better. And even though I know this other one is favored to win, right? I let my bias get in the way, and so when you do this kind of a thing, you can use the past to inform the future. And if you have a flexible model, there’s a lot of ways to decide what you’re going to do going forward with your business.

James:
You could probably instead of running it like a market test through predictive analytics and stuff, we’re like I said before, we’re kind of an unbiased look. We can, like you said, make them more concrete, help you make a more concrete decision about things. In the end, you could end up saving a lot of money. So you’re not running like nine, 18-week market tests.

Eric:
Right. I have been working with James on a couple of different accounts, and it’s really, really interesting to see from a social media standpoint how much money we’ve saved by being strategic and how much lift we’ve gotten over previous years by doing it when we were supposed to do it and on what we were supposed to do it. And the target. So let’s end the podcast there. But I want a final prediction. What you’re saying essentially is Green Book or Roma? That’s the number one takeaway from this podcast.

James:
Right.

Eric:
But knowing what you know about the box office and that weird outlier year that we’re having this year, what does your gut tell you? Does it tell you to go with the other model and pick Green Book because it has .5% more of a chance?

James:
Yeah, I would probably go with the other model because there are those things that we can’t really account for with Netflix and whatnot. I would definitely go with the other model. I’d reduced that variable altogether and feel more confident at the end of the day.

Eric:
You’ve just broken my heart.

James:
I do that. I’ll do that with data. Yes, I break hearts with data.

Eric:
Oh, my God. Well, this is Eric, and James signing off. Thanks for listening this week.