The Real Best Picture:
A Look at the Data

Enthusiasm Curbed, February 20, 2019

Overview

Using critic scores for each Best Picture nominee from Metacritic, we first examine how the critics view each of the films. Next, using over 200 year-end top 10 lists, we simulate which movie would win Best Picture given this population of voters. Since the diverse authors of these lists include well-known critics, staff of local papers, and niche international film websites, this is perhaps the best approximation of a true Best Picture available. One can also infer a prediction for the Oscar Best Picture winner based on this simulation; however, the ability/motivation for the Academy to select the best movie is observed to be questionable, at best.

There must be a better way

I like movies. I dislike awards shows. I particularly dislike the Oscars.1 First of all, who is this "Academy," and why do they have so much cultural power to define the "best" movies? As this excellent NYT article by Brooks Barnes confirmed this past week, the Academy of Motion Picture Arts and Sciences is made up of some pretty ridiculous people. You should really read the whole piece, but I'll include a few of my favorite highlights below.

One Academy member explains why he or she is voting for "Vice":

"I'm pals with one of the producers, so I feel like I have to"

Apparently a few other Academy members thought "Roma" was the best film, but they have mixed feelings about how to vote because Netflix is destroying the good old days -- or something like that. Don't worry though, as Barnes notes:

A couple of those in the anti-Netflix group told me that they would vote for Cuarón for best director as a way to assuage their guilt.

Ahh good, let's mislead the American public regarding the quality of movies because they'd rather stream a movie than pay $15 at a theater. These heroes are saving us all from ourselves one vote at a time.

One other voter believed that apparently no heroes wear capes. He described superhero movies as:

"the stuff that oozes out of dumpsters behind fast-food restaurants"

The kicker: this dude hadn't even seen "Black Panther" yet. But is his reasoning better or worse than the voter who didn't want to vote for "Black Panther" because he didn't want to give Disney any more influence? Did I mention he was at a rival studio. The average middle school bully is less vindictive than these people.

Finally, this last voter really made me laugh:

"I love this nasty little movie, but I don't want to throw my vote away."

Apparently, no one explained to this member that the Academy uses a preferential voting system designed to prevent wasted votes! There is also some other great stuff in the article such as the studio executive voting for "Green Book" out of rage. He is apparently tired of being told what movies to like and dislike. Hmm… he might want to reconsider his role in the Oscars!

Let's look at the data

Since this current system is so terrible, it would make sense to give up caring who wins the Oscar for best picture. What even is the definition of "best" they are awarding anyway (ostensibly "best group of pals on the production team" is not an acceptable definition)? Best social message? Best backstory? Best symbol of hope? Best escape from realizing that life is "one big kick in the urethra?" (Thanks Bojack). Nobody knows, and most people have realized it is healthy not to care. Yet, alas, I get mad about way too much stuff. As XKCD famously put it, duty calls!

Duty Calls

Therefore, I decided to turn to the critics, who are paid to have ostensibly interesting opinions about movies. Admittedly, these critics likely also fall prey to many of the same questionable tendencies as those plaguing Oscar voters -- people are people. And I'm sure many have friends making movies or working at studios, but at least they are required to be correct often enough that someone will read them. And most importantly, I actually can get data on what the individual critics think, unlike the mysterious Academy.

Though I could have just performed a simple Rotten Tomatoes search, I wanted a more complete picture of how the critics perceived each Best Picture nominee. Which film was the most consistently rated? Which film polarized the raters? Is there a clear winner? Are all the films about equal in critics eyes? These are all questions which can be more appropriately answered using Metacritic.

And for a more in depth overview of how all these sites rate movies, I highly suggest Alissa Wilkinson's article explaining the various aggregators. The article also has some nice introspection on the role of a critic.

In brief, Metacritic gives each movie a score from 0 to 100, known as the METASCORE. It's a bit like a grade. Rotten Tomatoes, in contrast, simply gives the percentage of critics who like a movie. This is indeed also a score from 0 to 100; however, one can easily imagine a good movie that is easily likeable, but not great, gaining a high Rotten Tomatoes score. The METASCORE for that movie, meanwhile, should be just above average (60-80 ish).

Wilkinson explains a bit more detail in her article:

Metacritic, maintains an even smaller and more exclusive group of critics than Rotten Tomatoes — its aggregated scores cap out around 50 reviews per movie, instead of the hundreds that can make up a Tomatometer score. Metacritic’s score for a film is different from Rotten Tomatoes’ insofar as each individual review is assigned a rating on a scale of 100 and the overall Metacritic score is a weighted average, the mechanics of which Metacritic absolutely refuses to divulge. But because the site’s ratings are even more carefully controlled to include only experienced professional critics — and because the reviews it aggregates are given a higher level of granularity, and presumably weighted by the perceived influence of the critic’s publication — most critics consider Metacritic a better gauge of critical opinion.

One clear advantage of Metacritic, as Wilkinson mentions, is the grade given by each critic. On Rotten Tomatoes, what does it mean that a single critic liked a movie? Did he find it the least displeasing of the multitudinous superhero movies to come out this year, or is it one of his favorite movies of all time? The Metacritic grade from each critic at least gives us an approximation. And since the METASCORE is hidden behind a weighted average formula with additional scaling (those fun bell curves don't go away), I'm going to largely ignore it. Instead, I'll work with the individual critic scores for each movie and aggregate them myself.2

Let's start by plotting the individual critic scores given for each Best Picture nominee.

Each dot corresponds to a single critic's score of a film. Stacked dots indicate that multiple critics gave the film that score.

As one can see, the majority of scores are multiples of 10. This makes sense, since the Metacritic readers probably can't tell if an A.O. Scott review should be a 67 or a 63. But they might be able to tell if it should be a score of 70 instead of a 60. Therefore, they generally keep the scores to nice round numbers. Some critics do write to Metacritic and correct their scores, thus some may be very precise. For example, A.A. Dowd's score for Green Book is a 58; maybe he is like the sadistic teacher who just doesn't want to see that one kid pass.3

Another noteworthy detail from this plot is that a lot of critics give scores of 100 to good movies. Furthermore, the reviews for "Green Book," "Vice," and "Bohemian Rhapsody" are quite spread out -- indicating that critics' opinions of these films vary significantly. Given the subject matters of "Green Book" and "Vice," this is not that surprising.

Another, less convoluted way to visualize this data is to plot the average critic score for each movie. The plot below shows just that, along with bootstrapped 95% confidence intervals (always give a measure of the variation, my former Physics professor shouts in my mind).

Each black dot corresponds to the simple average critic rating for a given film. The error bars give 95% confidence intervals obtained through statistical resampling. The red dots give the METASCORE, a proprietary weighted average of the scores.

Again, we see that "Roma" is the film most favored by the critics. However, for those of you who took statistics, the overlapping confidence intervals between "Roma" and "The Favourite" raise some doubt if "Roma" is unequivocally the most liked movie. However, our later analysis of year-end top 10 lists will add further evidence that "Roma" is the number one choice of the critics.

Interestingly, the red dots show the aggregated METASCORE, which is not always close to the average. For example, the "Bohemian Rhapsody" METASCORE is not even included in the confidence interval. Therefore, we suspect that some of the critics who rated this film negatively are weighted quite highly in the Metascore weighted average rating.

In contrast, the METASCORE for "Roma" is higher than the simple average. This perhaps indicates that the potentially highfalutin critics preferred by Metacritic favored the film more than those critics who share my plebian taste. For what it's worth, I thought Roma was important, but also slow and a bit like listening to a grandpa recall his childhood with beautiful imagery to make it more bearable. (I think it's safe to assume Metacritic will not be adding my views to the METASCORE anytime soon).

In order to find these specific critics and to see which critics like which nominees, I have constructed the heatmap below showing how each critic rated each film. Note that only those critics who reviewed over half of the films are included. As can be seen, the two critics rating "Bohemian Rhapsody" most negatively were Richard Roeper and A.O. Scott, thus it makes some sense that these two giants of film criticism would be weighted heavily in the Metacritic algorithm. However, the score discrepancy can also be partly due to Metacritic adjusting the scores to give a larger separation between films.4

PLEASE NOTE: This whole post is optimized for viewing on a computer, and NOT on a mobile device. Several of the upcoming charts are interactive and do not render correctly, unless on a computer.

Each square corresponds to a single review. Hover over the square with your mouse to discover the details of the review. To isolate specific films, click on the dots on the legend on the left. Several movies may be chosen at once by using shift+click. The critics are sorted left to right based on the average score given by that reviewer for the selected films. By selecting only the films you have seen, you can see which critics best match up with your tastes.5

It is also interesting to examine how these Metacritic scores compare to those on Rotten Tomatoes and IMDb. In particular, the IMDb score is an audience score and gives some idea how the public enjoyed each film. However, one must be careful since IMDb reviewers are often movie fans and ostensibly saw the movie, which means that much of the general public is not reflected in any of the ratings.

In any case, the data is available in the plot below. Interestingly, the IMDb scores are all quite close (audience scores are often quite high due to the aforementioned biases -- unless a smear campaign happens). We especially notice that "Green Book" and "Bohemian Rhapsody" do better with general audiences than with critics. The movie that does the worst on both axes is "Vice."

Each circle in the left plot corresponds to a single film. Using your mouse and its scroll wheel, you can zoom in on the left plot, as well as click and drag to bring certain points to the center of view. Clicking on a point, will highlight a single histogram in the right chart, which shows the distribution of critic reviews on Metacritic for that film.

To see the usefulness of Metacritic, I highly suggest you click on yellow dot for "Roma," then click on the orange dot corresponding to "BlacKkKlansman." The two films are almost on top of each other in the left plot, with similar Rotten Tomatoes and IMDb scores, but the difference between histograms in the right chart is significant. Thus, I highly suggest looking at individual critic scores on Metacritic when evaluating whether to watch a film. You can even use the previously shown heatmap to find which critics you agree with and just look at their scores.

So who would win?

All of the previous charts have shown that "Roma" is the movie most loved by critics. However, this does not necessarily mean that it would always win the Oscar for Best Picture if the critics were voting. This phenomena can occur because second place votes can matter a lot in the preferred voting method used for deciding Best Picture.

To understand the voting process, I highly suggest this article written by Walt Hickey at FiveThirtyEight. He also wrote another good article on how this voting process can favor films with "with broad majority appeal over ones that have strong plurality appeal." In case you care, this is one reason why many political scientists favor ranked-choice voting; it forces the candidates to try to appeal to everyone instead of villainizing all your opponents and their supporters.

Anyway, here is the process as nicely described in the second FiveThirtyEight article:

  1. Count up all the first-choice votes.

  2. If a movie gets a majority, that’s the winner. If none does, eliminate the last-place movie from contention.

  3. Take all of the ballots with the eliminated movie at the top. Reapportion them to each voter’s next preferred movie.

  4. Go back to step two.

Unfortunately, fully simulating this process requires knowing how each critic ranks each film in order from 1-8. I don't have this data. However, Metacritic nicely collects the data from all the year end top 10 films lists put out by various critics and publications. The sources are quite varied and include the usual critics as well as those from local papers and some interesting film blogs. This population is certainly more diverse than the select Metacritic approved critics used earlier. Metacritic aggregates the list data using its own scoring system, which you can definitely check out, but it seems a bit arbitrary to me. Instead, I figured I might as well try to approximate the Oscar voting process as closely as possible.

To start, I had the data from approximately 300 top ten lists. Interestingly, a good number of these lists did not include a single one of the Oscar Best Picture nominees on the list. This is odd but partly because these critics were choosing from many films including documentaries and probably wanted to show how unique they were. I made the explicit modeling decision to not include these critics in the simulation. This is quite an assumption, but I am also having trouble imagining an Oscar voter who does not have a single Best Picture nominee in his or her top ten films of the year list (I could be really wrong about this).

Either way, I first wanted to visualize how the nominees perform on these lists. Is "Roma" again the most liked film? The resounding answer is, yes. To see this, I plotted below how many times each film was placed at a specific rank in the top 10 lists. For example, the top left plot shows that "Roma" was chosen as the best film of the year on approximately 65 lists, chosen as the second best film on ~30 lists, third best film on ~20 lists, etc...

Plots showing the number of times each film was placed at a specific rank in the approximately 300 top ten films of the year lists from critics. For example, "The Favourite" was listed as the second best film of the year on 15 separate lists.

I provide another way of viewing this info in the interactive chart below, which shows all of the films on the same plot. By clicking and dragging with your mouse, you can also brush over specific ranks on the list to see how many times each film is included at that position on the lists. By brushing over all ranks, one can see that "Roma" is included on 161 of the top ten lists, many more times than any other film. "The Favourite" comes in second with just over 100 inclusions on the lists. At the low end, "Bohemian Rhapsody" is included on just 6 lists, and does not get a single first place vote! That's pretty amazing for a film that won Best Drama at the Golden Globes.

Click and drag over horizontal axis values on the top plot to see how many times each film is included at those ranks in the bottom bar chart.

Finally, we can simulate the Oscar Voting process from these lists. The simple result is that "Roma" is such a lopsided favorite of the critics that it would win Best Picture without any reapportioning of second-place votes. One may wonder how this is true since 'Roma' only has 65 first place votes out of the 247 lists, and this is not even close to 50%. However, what we really care about is how often it is first on the list among the best picture nominees on each list. Since "Roma" also has lots of second and third place votes among all 2018 films, this happens quite a lot.

For a simple demonstration of my methodology, take the example top ten list below decided by staff consensus POPCULTURE.COM (yes, I consider the staff consensus to be one voter in my simulation):

  1. Paddington 2
  2. Spider-Man: Into the Spider-Verse
  3. Black Panther
  4. A Quiet Place
  5. Widows
  6. Hereditary
  7. Incredibles 2
  8. Mission: Impossible – Fallout
  9. Avengers: Infinity War
  10. A Star Is Born

This list then gets translated into the following list only including the Best Picture nominees:

  1. Black Panther
  2. A Star Is Born
  3. ?
  4. ?
  5. ?
  6. ?
  7. ?
  8. ?

where the ? can be filled in randomly, from distributions, etc.

When we transform the lists like so, "Roma" gets 51% of the first place votes and wins our simulated Oscars automatically in the first round.

Since this is a pretty boring result, I decided to use a statistical resampling approach to see on average how often this would occur. In this approach, I resample with replacement from all of the lists and simulate the preferential voting system. But what happens if both "Black Panther" and "A Star Is Born" both finish last in subsequent rounds of voting, and the remaining votes must be reapportioned. For our purposes, the answer didn't really matter. I tried making the unknown rankings (question marks in above list) completely random and/or drawing from distributions of Metacritic scores for that film -- "Roma" always wins among the critics making these top ten lists.

For a visualization of the results after the first round, see the histogram below. We see that in our simulation "Roma" automatically wins without needing to advance to a second round of voting about 70% of the time. That is pretty incredible. The other 30% of the time, "Roma" basically wins after reapportioning second place votes.

The histogram shows how often "Roma" wins automatically after a first round of voting. The simulation used a bootstrapping approach where top ten lists were statistically resampled.

If you are interested in more of the details of the simulation, feel free to comment or ask; the code involves a bit of trickery, but is not too complex. Overall, these lopsided results are pretty boring; however, rest assured, these critics are not the Oscar voters, so there is plenty of room for surprises come Sunday. For what it's worth, betting markets seem to have "Roma" as a 60%-70% favorite to win, which seems maybe a bit low to me.6 However, I generally think these prediction markets are pretty well calibrated, so that's probably not too far off. Therefore, even though "Roma" definitely wins 'Real Best Picture' under my artificial criterion, I'm sure the Academy will find a way to create drama (which could have been used in the first hour of Roma...sorry!).

Conclusion

Ultimately, the critics loved "Roma," and the data reflects that quite well. It's really quite boring how universally they loved it -- to tell the truth. Now's the embarrasing moment when I admit that I have only seen one of this year's Best Picture nominees. It was indeed "Roma," which I'm glad I saw once but will probably never watch again. My favorite movie this year was the documentary "Minding the Gap;" I really hope it wins something. But who knows? I don't know if Hulu is supposed to be destroying the film industry like Netflix, or who the producers friends are, or if movies about skateboarders are also "the stuff that oozes out of dumpsters behind fast-food restaurants." I guess I'll find out soon and probably be angry for a little bit, just like every other year. Just another year of being angry on the internet.

Thanks to:

First big thanks is to the people who develop and maintain Altair, a python visualization library. It is a beautiful library, and I couldn't imagine how easy it would be to make some of these charts a mere couple of weeks ago. Also big thanks to Metacritic for just being an awesome site, which I spend far too much time on. And lastly to the Oscar voters whose pettiness encouraged me to write this post!


  1. In fairness, there have been recent attempts to better appeal to millennials like me, apparently the Oscars are just terrible at it.

  2. The central assumption of this piece is that the people at Metacritic assigning scores to each critic review are somewhat accurate and consistent -- at least hopefully on a relative scale. See footnote 3, gives some reason to doubt this is always true.

  3. I have no idea if A.A. Dowd actually wrote to Metacritic to correct his score. The tweet below would indicate he generally does not, since he has the same rating for "BlacKkKlansman" and "A Star Is Born" on Metacritic but shows a clear preference here.

  4. What Metacritic does is essentially the opposite of grading on a curve. In grading, the goal is often to bring the low end up to a higher average. In Metacritic's case, they want to further separate the scores so that they aren't all bunched together.

  5. If you are wondering which critics are most correlated with each other; I am working on it. I actually have a preliminary result, but the problem is that the confidence intervals around correlation coefficients are quite large for a modest number of points -- especially when r is not that close to 1.

  6. This year is particularly hard to predict since every previous awards show, which usually help predict the Oscars, awarded the top prize to a different film. Thus, classical prediction methods won't work as well. For more see this article by Todd VanDerWerff at Vox.

Source code

I am working on getting everything nice and clean, keep an eye on Github

You can follow me on Twitter for lots of random thoughts on random topics.