It’s that time of year again. Last year, you all participated in our second annual subparforecasting competition, where you made probabilistic predictions on some things that might happen in 2022. The results are now in, and this is our summary of how things went.
You also might want to look at last year’s results for comparison’s sake.
Subparforecasting is a joking reference to to superforecasting, which comes from Philip Tetlock’s research about how to find and develop a group of people who are outsize good at making predictions.
A key part of Tetlock’s process is getting feedback: making predictions, seeing how they work out, and learning from the process. Matthew Yglesias did a nice summary of this in his piece on how to be less full of shit. You can also look at his follow-up post on how his own predictions fared last year.
We scored everyone’s submissions as a way of generating that feedback, and we’re using almost the same approach as last year.
Last year we used the logarithmic scoring rule. For log scoring, you should think of the probability p of a prediction as a number between 0 and 1, with 0 representing 0%, and 1 representing 100%.
A traditional log score is then defined as follows.
\text{score}(p) = \begin{cases} \ln(p), & \text{if the outcome was true} \\ \ln(1-p) & \text{if the outcome was false} \end{cases}
We’re essentially using log-scoring, but we’re subtracting out \ln(0.5) from the score (which increases the score, since \ln(0.5) is negative!). As a result, your score is always zero if your prediction is 50%, and higher scores are better. This reflects the fact that we treat a 50% prediction as a kind of a baseline, as we’ll discuss below.
This is a bit easier to understand with an example. Say you predicted that a given outcome something was 90% to happen, and it did. Then your score would be \ln(0.9) - \ln(0.5) = 0.59. If it didn’t happen, your score would be \ln(1-0.9) = \ln(0.1) = -1.61.
By “absolutely confident”, we mean a prediction of either 0% or 100%. An absolutely confident prediction that turns out wrong should in some sense be infinitely surprising! And log scoring says as much, punishing you infinitely for such a prediction.
There were a lot fewer absolutely confident scores than last year, just a single one, rather than a dozen or so last year.
As you might remember, 50% of the absolutely confident predictions were wrong in 2021. This year, the one fully confident prediction turned out to be wrong.
As we explained last year, not making a prediction was the same as assigning a probability of 50%. That’s why you’ll notice in our rankings that everyone is counted as having made the same number of predictions.
Here are our rankings, which are just the average over everyone’s guesses.
name | score |
---|---|
Franco Baseggio | 0.305 |
Sharon Fenick | 0.255 |
Michael Farbiarz | 0.248 |
Jonas Peters | 0.246 |
Nick Salter (and family) | 0.246 |
Dmitry Gorenburg | 0.245 |
Blanca | 0.234 |
Dianne Newman | 0.220 |
Zev Isaac Minsky-Primus | 0.207 |
Yaron Minsky | 0.204 |
Gabriel Farbiarz | 0.188 |
Richard Primus | 0.182 |
Averagey McAverageface | 0.181 |
Lois | 0.181 |
Ida Gorenburg | 0.172 |
Eyal Minsky-Fenick | 0.153 |
Lucas Kimball | 0.141 |
Yair | 0.124 |
Lisa | 0.123 |
Ada Fenick | 0.122 |
Alan Promer | 0.122 |
Sigal Minsky-Primus | 0.114 |
Martha A. Escobar | 0.105 |
Megan Lewis | 0.097 |
Romana Primus | 0.092 |
NAFTALY MINSKY | 0.086 |
Eitan Minsky-Fenick | 0.085 |
Sarah Williams | 0.083 |
Shula Minsky | 0.078 |
Jacob Gorenburg | 0.077 |
Monse | 0.046 |
David R H Miller | 0.040 |
Jeremy Dauber | 0.033 |
Sam Wurzel | 0.022 |
Sarah Farbiarz | 0.001 |
Michelle Fisher | -0.006 |
Sally Gottesman | -0.011 |
Debra Fine | -0.030 |
Daniel Primus Cohen | -0.052 |
Elana Farbiarz | -0.068 |
Eli Cohen | -0.096 |
Belinda | -0.109 |
Nava Minsky-Primus | -0.164 |
Sanjyot Dunung | -0.354 |
nancy koziol | -inf |
In order to represent the wisdom of the crowds, we added a synthetic player, Averagey McAverageface. Averagey’s prediction is just the average of everyone else’s prediction on any particular question. Averagey did pretty well this year, though he didn’t make it to the very top of the ranking. That’s pretty inline with last year.
Another thing you can look at is how many of the scores are positive, which corresponds to doing better than someone who didn’t answer any questions, which is in turn the same as guessing 50% for everything.
Last year, most people’s predictions were in negative territory, but this year, a solid majority of people are positive. That’s a sign of real progress!
The following graphic lets you visualize the data and see how the predictors did collectively and individually.
This visualization works best on an ordinary computer , not a tablet or phone. That’s because you need to hover with your mouse in order to uncover more information. To hover over something, move the tip of your mouse pointer over the element in question and leave it there for a moment!
Each barbell represents a question. Hover over the line to see what the actual question was.
The ends of the barbell represent the truth-value of the result, with false on the left and true on the right. Depending on which way the given question worked out, the appropriate end is highlighted as green.
The red squares correspond to individual predictions. Hovering will tell you the person’s identity and their probabilistic prediction.
You can click a prediction square to select it, at which point it will turn blue, as will all of that person’s predictions across all the questions.
The black mark is the average of all the predictions. The predictions are sorted by the average probability.
Here’s the visualization: