Super Tuesday is coming up, and I have my predictions.
Last cycle, I predicted the relative performance of Sanders and Clinton in each race, and my predictions were uncannily accurate. I did better than polling and other predicting agencies or individuals. This year, things are more complicated, and I have less confidence now than I did near the end of the last primary cycle. One reason is the larger number of candidates that are not as clearly distinct. Another reason is Bloomberg. It is simply hard to tell what effect he is having.
A third point of difficulty is, of course, those odd states. Minnesota is one of these this year, since Senator Kobuchar is popular here, so a model based any information outside of Minnesota does not help. Same with Vermont (Sanders).
This model is a little complex so I’ll explain how it works. The simple version is that I predict the performance of each candidate, in absolute terms (but incidentally scaled from 0 to 100) based on a regression model using ethnic makeup of each state. The regression model is derived from actual performance of that individual (out of 100% among the investigated individuals). However, the “actual performance” data exists only for Iowa, New Hampshire, Nevada and South Carolina. This is not enough data, and it is for various reasons screwy data. So, I add in the polling data form the better polled states (such as California, Texas, etc.) So actual and polling data from 13 states are used to predict all of the states. Since some of those polling states are on Super Tuesday and some are not, the resulting table of data includes a mix of polling and prediction. This is fair because those polling data are used IN the prediction.
The original ethnic data included various flavors of Asian and Hispanic numbers. when including these numbers, statistical confidence dropped. As was the case last cycle, the best predictive data is simply percentage of white vs black in a state. This makes sense for a lot of reasons we can discuss at another time. I will simply point out at this time that when it comes do Democratic Primary and Caucus results, #BLM in a big way.
R-squared values for the regression runs was generally close to 0.85.
Bottom line: Sanders comes in first in most states, with Biden second in number of firsts. But remember, this is a delegate fight, so the number of delegates matters.
I’m not going to try to predict the number of delegates since that is so dependent on things like the 15% threshold that it would be easier to just wait until Wednesday and see how it comes out!
Here are my predictions, UPDATED to reflect recent changes in the field of candidates:
Since Buttigieg has now dropped out and his share of the vote might make a big difference if shifted to one of the other candidates, how seriously should we take this prediction? (I have no idea other than it seems in general that his ideas are closer to Biden than to Sanders.)
I’ve updated the data.
It’s now Thursday and it is clear that the 10 states for Sanders and only 4 states for Biden prediction was not fulfilled. In fact the numbers were essentially just the opposite.* (The Bloomburg and Warren results were also way off.) It just goes to show that predicting human behavior and the direction of humans’ choices is fraught with pitfalls. I’m sure the results were a surprise and shock to most if not all of the political pundits, to the Sanders camp, and probably the Trump-Putin camp as well.
* The California vote count is not expected to be finished for weeks. Like our justice system it grinds slowly.