Super Tuesday is coming up, and I have my predictions.
Last cycle, I predicted the relative performance of Sanders and Clinton in each race, and my predictions were uncannily accurate. I did better than polling and other predicting agencies or individuals. This year, things are more complicated, and I have less confidence now than I did near the end of the last primary cycle. One reason is the larger number of candidates that are not as clearly distinct. Another reason is Bloomberg. It is simply hard to tell what effect he is having.
A third point of difficulty is, of course, those odd states. Minnesota is one of these this year, since Senator Kobuchar is popular here, so a model based any information outside of Minnesota does not help. Same with Vermont (Sanders).
This model is a little complex so I’ll explain how it works. The simple version is that I predict the performance of each candidate, in absolute terms (but incidentally scaled from 0 to 100) based on a regression model using ethnic makeup of each state. The regression model is derived from actual performance of that individual (out of 100% among the investigated individuals). However, the “actual performance” data exists only for Iowa, New Hampshire, Nevada and South Carolina. This is not enough data, and it is for various reasons screwy data. So, I add in the polling data form the better polled states (such as California, Texas, etc.) So actual and polling data from 13 states are used to predict all of the states. Since some of those polling states are on Super Tuesday and some are not, the resulting table of data includes a mix of polling and prediction. This is fair because those polling data are used IN the prediction.
The original ethnic data included various flavors of Asian and Hispanic numbers. when including these numbers, statistical confidence dropped. As was the case last cycle, the best predictive data is simply percentage of white vs black in a state. This makes sense for a lot of reasons we can discuss at another time. I will simply point out at this time that when it comes do Democratic Primary and Caucus results, #BLM in a big way.
R-squared values for the regression runs was generally close to 0.85.
Bottom line: Sanders comes in first in most states, with Biden second in number of firsts. But remember, this is a delegate fight, so the number of delegates matters.
I’m not going to try to predict the number of delegates since that is so dependent on things like the 15% threshold that it would be easier to just wait until Wednesday and see how it comes out!
Here are my predictions, UPDATED to reflect recent changes in the field of candidates:
That model made certain assumptions, and allowed me to produce two projections (well, many, but I picked two) depending on how each candidate actually fairs with different ethnic groups (White, Back, Hispanic, since those are the groupings typically used).
The two different versions of this model were designed to favor each candidate differently. The Clinton-favored model started with the basic assumption that among white Democratic Party voters, both candidates are similar, and that Clinton has a strong lead among Hispanic voters and an even stronger lead among African American voters. The Sanders-favored model assumes that Sanders has a stronger position among White voters and less of a disadvantage among non-White voters.
The logic behind the equivalence among White voters is that this his how the two candidates did in Iowa, which is a representative of the United States White vote, unadulterated by a favorite son effect in New Hampshire. Nevada failed to indicate that this assumption should be changed.
The favoring of Clinton among non-White voters is based on national polling with respect to ethnic effects. The logic behind the Sanders-favored version is that Sanders’ strategy, to win, has to involve a large young, white, male turnout (evidenced in the polls) and a narrowing of the gap among African American and Hispanic voters.
In that model, presented here, I used statewide demographic data to establish the ethnic term. However, that is incorrect, because one’s chances of engaging in the Republican vs. Democratic process in one’s state is tied to ethnicity. More Whites are Republicans, more Blacks are Democrats. I knew that at the time I worked out the model, but sloth and laziness, combined with lack of time, caused me to simplify.
The newer version of the model adjusts for likely Democratic Party membership. The results are the same but less dramatic, with a much longer slog to the finish line and the two candidates doing about the same as each other for the entire primary season.
The outcome of my modeling (reflected in the non-adjusted and adjusted versions, each with a Clinton- and Sanders-favored version) is different from the expectations of either campaign, as far as I can tell. Clinton boosters are claiming that the Democratic Party is mainly behind her, and these first primaries are aberrant. Sanders boosters are claiming the Sanders strategy of having a surge of support will carry him to victory. Both of these characterizations require that each candidate surge ahead pretty soon, and don’t look back. The opportunity to surge ahead is, certainly, Super Tuesday (March 1st).
The models I produced, with the assumptions listed above, show a close race all along, so either the campaigns are wrong or I am wrong.
The graphic at the top of the post represents how far ahead each candidate will be across the primary season, for each of their respective favored strategies.
So for Clinton, the ethnic gap is maintained as wide, and the blue line shows that she will surge nearly 40 committed delegates ahead of Sanders (a modest surge) and continue to develop a wider and wider gap past mid-March, and thereafter, maintain but not increase that gap, of about 80 committed delegates, until the end.
For Sanders, the orange line, the initial gap formed on Super Tuesday, does not start out very large, but his gap steadily increases until the end of the primary season, ending with a gap of over 120 committed delegates.
So, that is the new model. But, it is a bogus model.
I’m trying to stick with empirical data that do not rely on polling. Why? Because everybody else is relying on polling, and this is an election season where the polling is not doing a good job of predicting outcomes. Also, my modeling gives credit to each campaign’s claims, which is at least interesting, if not valid, as a way of approaching this problem. If Clinton is right, she wins this way. If Sanders is right, he wins that way.
However, the data are insufficient to have much faith in this model. Super Tuesday will provide a lot more information, and with that information I can rework the model and have some confidence in it.
Who will win the South Carolina Primary, Clinton or Sanders?
While working this out, I naturally came up with predictions for what will happen in all of the future primaries. So let’s look at some of that.
In South Carolina, according to my model, if Clinton’s strategy holds, she will win 29 delegates, and Sanders will win 24 delegates. If the Sanders strategy pertains, they will tie, or possibly, Clinton will win one more delegate than Sanders.
Who will win the Super Tuesday primaries?
The following table shows the results predicted by this model, for both the Clinton-favored and Sanders-favored versions, for all the Super Tuesday state primaries or caucuses.
The Clinton-favored model suggests that Clinton will win six out of 11 primaries, and take the majority of uncommitted delegates. The Sanders-favored model suggests that Sanders will take 9 out of 11 primaries, and win the majority of uncommitted delegates.
Notice that I put Vermont in Italics, because Sanders is likely to win big in Vermont no matter what happens. This underscores the nature of this model in an important way. I’m not using any data from the actual states, other than the ethnic mix from census data, with an adjustment applied to produce an estimate of Democratic Party membership across ethnic groups. That estimate is based on national data as well as data specifically form Virginia, to provide some empirical basis.
I suspect most people will have two responses to this table. First, they will say that a model that incorporates Clinton’s strategic expectations should have her winning more. Second, they will say that all the numbers, for all states and all models, are too close.
These are both legitimate complaints about my model, and will explain why it will turn out to be totally wrong. Or, they are suppositions people are making that are totally wrong, and when my model turns out to be uncannily accurate, those suppositions will have to be put aside for the rest of the primary season. (Or, some other outcome happens.)
I will restate this: I’m looking for Super Tuesday to provide the best empirical data to make this model work for the rest of the primary season. But, in the meantime, this seemed like an interesting result to let you know about.
First, I want to remind you that I totally predicted the current situation with Santorum vis-a-vis Romney. Just so you know.
And, that situation is that Santorum has become a factor in the primary, and has had a steady position in the race, while Gingrich and Romney have sea sawed. Ron Paul is irrelevant.
But whatever has happened so far, this Tuesday is an important day in this primary process because Michigan is considered one of Romney’s home states, and that is one of the two loci of activity on that day. The other primary is in Arizona, which is almost the same size state (which I find shocking, by the way, but that’s the reality).
Following closely on the heels of this Tuesday’s two-state contest will be a quicky in Washington (small, non binding) and then Super Tuesday, with ten states running all at once. So, the nature and tenor of the candidacies and the overall process going into Super Tuesday will be important, making Michigan and Arizona important.