This uses the technique previously described. However, instead of using RCP averages for all polled states and then using extreme (non-tossup) states to develop the regression model, this method uses only polling from states with one or more recent poll, and only with good polls. these poll numbers are then “predicted” by black/hispanic/white/Voted_Romney numbers, and that generates a model, based on just over 20 states, designed to predict all the states.
As expected, the r-squared value is much lower using this method, but this method does not violate any important statistical laws like the last one did.
Most of the polling data pre-dates the revelation of Trump’s interest in sexual assault, last Friday, and of course, Monday’s “I’ll throw my opponent in prison when I win” debate on Sunday. If you believe those events influence the election further, then you can figure this is a conservative estimate from the perspective of Clinton.
All of the blue states, both shades, are projected to go to Clinton, but I left the three closest to 50-50 in light blue.
I suspect the most controversial state here is actually Iowa, which seems to be throwing some sort of hissyfit in the polls.
And this, of course, is why my model is different from everyone else’s. The polls are used in this case to calibrate (in the absence of earlier results, like could be done in the primary!) but the actual prediction then does not use the polls directly. So, even though a recent poll showing Iowa as Trump, the model does not, because the model does not lie like the Iowans do, apparently!
I’ve made my first stab at a prediction for the electoral college outcome for the US Presidential race, 2016. I use a roughly similar methodology as I did to accurately predict most of the Democratic primaries. However, since primaries are different from a general, the methodology had to be adapted.
For the primaries, I eventually used this methodology. I used results form prior primaries to predict voter behavior by ethnicity, in order to predict final behavior. That worked because primaries are done a few states at a time, and because all the people being modeled were Democrats.
It turns out that white people vary a lot across the country with how many per state are assholes. I think there is some variation among Hispanics as well, but African Americans are pretty consistent. So, here, I combined ethnicity with a “Romney Index” indicating how many people in a given state voted for Romney against Obama. —-LATEST PREDICTION HERE CLICK HERE—-
I then put down the poll numbers, the averages of the last several polls, from RCP, where available. I then ranked the results to knock out states with no polls. I then took out the middle, which included swing states, close states, etc. to use only the 23 most distinct states for which there were data to produce a multi variable regression model using “white”, “black”, “hispanic”, and “romney_index” as independent variables. The dependent variable was the poll value. In future iterations, that is what will change. I’ll do a more refined version of that.
I then applied this formula to predict the breakdown between Clinton and Trump in the other ca. half of the states that are more ambiguous.
The multiple R-squared for this model was 0.952, so that’s great. But, I was using only the values at the extreme, so I violated the law of homoscedasticity. But I don’t care about no stinking homoscedasticity, because I have only one data set, am predicting only one election, and I am basically using the regression model as a fancy fill in the blank formula. The fact that the R-squared is so high is great, were it low, I’d be in trouble, but its actual value is not important.
I then took all the states where Trump gets over 50% of the vote and gave them to him. I then gave almost all the other states to Clinton, but I left out a few that were very close, to leave them as unknown. Even if all those unknowns go to Trump, however, the outcome is the same: Clinton wins. Trump loses.
I’ll refine and revise again with more care given to the various parts of the model. I’d love to do this poll free, but not sure if that is possible.
2) It does use exit polling and other information to set certain parameters;
3) It mainly uses prior primary or caucus results to predict the future, and thus assumes that the status quo is the best indicator.
4) It calculates likely voting patterns based on ethnicity (White, African American, Hispanic), and using likely Democratic party distribution among these groups to predict each contest’s outcome.
That method outperformed most other predictions for Super Tuesday and accurately predicted who would win in the four contests held over the last weekend. However, in states that Sanders won last weekend, and in at least two of the Super Tuesday results, the method underestimated how well Sanders would do. Notably, the numbers used to predict those primaries accurately predicted how Clinton would do in Louisiana, and generally.
In other words, mostly, where Clinton won, the model was accurate, but where Sanders won, Sanders did better than expected, not counting “favorite son” states where he did even better.
The most likely reason for the difference between prediction and reality over last weekend, since this is a status quo poll, is a change in voting patterns. In other words, it is possible that Sanders is picking up some momentum. That does not explain why the largest of the primaries, Louisiana, fit the predicted pattern while the others do not.
A second possibility is that Sanders outperforms expectations in caucus states. That seems almost certainly a factor, which I can not explain.
A third possibility is crossover voting or independents favoring Sanders in some, but not all, states. If Republicans are voting in the Democratic contest, or independents are showing up at the Democratic events, specifically because they want to vote for Sanders, that could explain a localized Sanders surge. This does not do well explaining last weekend’s results, because Sanders won in closed caucuses. But, it could explain some earlier results, such as Massachusetts and Minnesota. I know for a fact that some Republicans and a lot of “independents” (as in, “I never did this before, see how independent I am”) voters showed up in the Minnesota caucus. The question remains, of course, where were these voters in Louisiana?
One explanation for this may be that the indies and centrists in more conservative southern states, which also happen to have a lot of pro-Clinton African American voters, are mostly registered Republicans or chose to participate in the Republican rather than Democratic process, while similar voters in less conservative or liberal states were already more likely to be Democrats or to at least participate this year in the Democratic primaries or caucuses. Differences in voter turnout across states seem to conform to this pattern.
Last weekend barely added enough data to consider revising the model. Assuming that the status quo method still works, but with somewhat adjusted numbers to match Sanders wins so far, and combining projections into the future with primary results so far, this model now puts Sanders on top at the very end of the primary process, like this:
I quickly add that I don’t have a lot more confidence in this projection than the previously developed projection that has Clinton winning. But this new projection is important because it accounts for what might be recent changes in how people are voting.
Michigan’s primary, to be held tomorrow, is important. Michigan is relatively diverse, and is northern (less conservative, etc.). The modified model predicts that Sanders will swamp Clinton in Michigan, picking up over 70 delegates to Clinton’s low-fifties. In contrast, the previous iteration of the model predicts that Clinton will win with about 66 delegates and Sanders will pick up a healthy 60 or so.
Michigan’s contest is a primary, not a caucus, but it is open, so cross-party activity is possible.
Michigan will be a test between the two models, the older one that ultimately favored Clinton, and the revised (but far less certain) one that suggests that Sanders could eek out a victory.
Michigan plus last weekend’s contests combined will give me enough data to produce The Model of Models which will accurately predict the outcome of primaries coming up in Florida, Illinois, Missouri, North Carolina and Ohio. Or not. We’ll see. It is possible that I’ll add an element to the model, using one set of assumptions for red states, another set for blue states.
One week after Michigan, Son of Super Tuesday happens. If either one of the candidates is very strong on that day, that may finish off the other candidate. The actual number of committed delegates is not too different between the two candidates, and the so-called “Super Delegates” will probably be obligated to go with whoever enters the Convention with the most delegates.
As you know, I developed a simple model for projecting future primary outcomes in the Democratic party. This model is based on the ethnic mix in each state, among Democratic Party voters. The model attributes a likely voting choice to theoretical primary goers or causers based on previous behavior by ethnicity. Originally I made two models, one using numbers that the Clinton campaign was banking on, and one using numbers that the Sanders campaign was banking on.
The results of the Super Tuesday primaries demonstrated that the Sanders-favoring model does not predict primary outcomes. Those same results showed that the Clinton-favoring model worked better. But the numbers also indicated that the Clinton favoring model estimates Clinton’s ultimate delegate take somewhat inaccurately.
I adjusted the model parameter so the model now matches reality for a subset of the primaries that have already happened to within five percent. The model still slightly favors Clinton, but not by much. The subset of primaries includes only the US states (not territories, where I don’t expect the ethnic mix approach to work at all) and excludes states with a strong favorite son effect. This therefore excludes New Hampshire and Vermont. Due to oddities in the Texas delegate system, the adjustment was also made by excluding Texas, though the model results for Texas match very well proportionately.
(Note: Using only the subset of states, the model predicts previously held primaries and caucuses to within less than two tenths of a percent).
The new model now only has one version, which as noted matches primaries so far very well. While there is a somewhat southern bias in the set of primaries that have been carried out so far, that bias is probably not important. I have a fairly high level of confidence in the model.
The result is best seen in this graphic, which shows the cumulative delegate count of committed delegates in US states. So this excludes non-committed delegates (known as “Super Delegates”) and it excludes territories and other non-states (but it does include DC, because DC is like a state).
Assuming a large proportion of the Democratic Party’s uncommitted delegates support Clinton, Clinton will probably achieve the necessary number of delegates to lock the nomination either on the 19th of April with the New York primary, or on the 26th of April, with the Maryland, Connecticut, Delaware, Pennsylvania and Rhode Island primaries.
There are two phases of primaries coming up. First we have a series of weeks with only one or two primaries happening at once, with a total of 300 committed delegates (130 from Michigan). Then we have what is effectively Return of Super Tuesday, with 691 committed delegates, including Florida with 214. For Sanders to regain traction, he has to do well in some of these big states. In particular, Sanders has to outperform the model in Michigan, Florida, Illinois and possibly North Carolina and Ohio.
When we look at many of these states, the model seems to fit very well with the available polling data, except in cases where the polls suggest a stronger outcome for Clinton. The following table compares the model projections with estimates of the delegate split based on polls. All delegates are assumed to be awarded (among the committed delegates only) and the polling data is not very dense and in some cases not too recent, so this is a very rough estimate.
Prior to Super Tuesday, the then-current version of this model projected results that conformed closely with polls. For most states, the outcome of the actual voting matched the projections and the polls pretty well, except in a couple of places. Now, the refined model matches polling data even more closely, but the polling data is not necessarily to be trusted because there has not been enough polling. (I avoided comparisons with really old polls which are entirely useless).
Clinton’s path to the nomination is clear. Sanders’ path to the nomination requires something to change, and to change dramatically and quickly.
You may be asking yourself the same question, especially if, like me, you vote on Tuesday, March 1st.
For some of us, a related question is which of the two is likely to win the nomination.
If one of the two is highly likely to win the nomination, then it may be smart to vote for that candidate in order to add to the momentum effect and, frankly, to end the internecine fighting and eating of young within the party sooner. If, however, one of the two is only somewhat likely to win the nomination, and your preference is for the one slightly more likely to lose, then you better vote for the projected loser so they become the winner!
National polls of who is ahead have been unreliable, and also, relying on those polls obviates the democratic process, so they should be considered but not used to drive one’s choice. However, a number of primaries have already happened, so there is some information from those contests to help estimate what might happen in the future. On the other hand, there have been only a few primaries so far. Making a choice based wholly or in part on who is likely to win is better left until after Super Tuesday, when there will be more data. But, circling back to the original question, that does not help those of us voting in two days, does it?
Let’s look at the primaries so far.
Overall, Sanders has done better than polls might have suggested weeks before the primaries started. This tell us that his insurgency is valid and should be paid attention to.
There has been a lot of talk about which candidate is electable vs. not, and about theoretical match-ups with Trump or other GOP candidates. If you look at ALL the match-ups, instead one cherry picked match-up the supporter of one or the other candidate might pick, both candidates do OK against the GOP. Also, such early theoretical match-ups are probably very unreliable. So, best to ignore them.
Iowa told us that the two candidates are roughly matched.
New Hampshire confirmed that the two candidates are roughly matched, given that Sanders has a partial “favorite son” effect going in the Granite State.
Nevada confirmed, again, that the two candidates are roughly matched, because the difference wasn’t great between the two.
So far, given those three races, in combination with exit polls, we can surmise that among White voters, the two candidates are roughly matched, but with Sanders doing better with younger voters, and Clinton doing better with older voters.
The good news for Sanders about younger voters is that he is bringing people into the process, which means more voters, and that is good. The bad news is two part: 1) Younger voters are unreliable. They were supposed to elect Kerry, but never showed up, for example; and 2) Some (a small number, I hope) of Sanders’ younger voters claim that they will abandon the race, or the Democrats, if their candidate does not win, write in Sanders, vote for Trump, or some other idiotic thing. So, if Clinton ends up being the nominee, thanks Bernie, but really, no thanks.
Then came South Carolina. Before South Carolina, we knew that there were two likely outcomes down the road starting with this first southern state. One is that expectations surrounding Clinton’s campaign would be confirmed, and she would do about 70-30 among African American voters, which in the end would give her a likely win in the primary. The other possibility is that Sanders would close this ethnic gap, which, given his support among men and white voters, could allow him to win the primary.
What happened in South Carolina is that Clinton did way better than even those optimistic predictions suggested. This is not good for Sanders.
Some have claimed that South Carolina was an aberration. But, that claim is being made only by Sanders supporters, and only after the fact. Also, the claim is largely bogus because it suggests that somehow Democratic and especially African American Democratic voters are somehow conservative southern yahoos, and that is why they voted so heavily in favor of Clinton. But really, there is no reason to suggest that Democratic African American voters aren’t reasonably well represented by South Carolina.
In addition to that, polling for other southern states conforms pretty closely to expectations based on the actual results for South Carolina.
I developed an ethnic-based model for the Democratic primary (see this for an earlier version). The idea of the model is simple. Most of the variation we will ultimately observe among the states in voting patterns for the two candidates will be explained by the ethnic mix in each state. This is certainly an oversimplification, but has a good chance of working given that before breaking out voters by ethnicity, we are subsetting them by party affiliation. So this is not how White, Black and Hispanic people will vote across the states, but rather, how White, Black and Hispanic Democrats will vote across the state. I’m pretty confident that this is a useful model.
My model has two versions (chosen by me, there could be many other versions), one giving Sanders’ strategy a nod by having him do 10% better among white voters, but only 60-40 among non-white voters. The Clinton-favored strategy gives Clinton 50-50 among white voters, and a strong advantage among African American voters, based on South Carolina’s results and polling, of 86-14%. Clinton also has a small advantage among Hispanic voters (based mainly on polls) with a 57:43% mix.
These are the numbers I’ve settled on today, after South Carolina. But, I will adjust these numbers after Super Tuesday, and at that point, I’ll have some real confidence in the model. But, at the moment, the model seems to be potentially useful, and I’ll be happy to tell you why.
First, let us dispose of some of the circular logic. Given both polls and South Carolina’s results, the model, based partly on South Carolina, predicts South Carolina pretty well using the Clinton-favored version (not the Sanders-favored version), with a predicted cf. actual outcome of 34:19% cf 39:14% This is obviously not an independent prediction, but rather a calibration. The Sanders-favored model predicts an even outcome of 27:26%.
The following table shows the likely results for the Clinton-favored and Sanders-favored model in each state having a primary on Tuesday.
The two columns on the right are estimates from polling where available. This is highly variable in quality and should be used cautiously. I highlighted the Clinton- or Sanders-favored model that most closely matches the polling. The matches are generally very close. This strongly suggests that the Clinton-favored version of the model essentially works, even given the limited information, and simplicity of the model.
Please note that in both the Clinton- and Sanders-favored model, Clinton wins the day on Tuesday, but only barely for the Sanders-favored model (note that territories are not considered here).
I applied the same model over the entire primary season (states only) to produce two graphs, shown below.
The Clinton-favored model has Clinton pulling ahead in committed delegate (I ignore Super Delegates, who are not committed) on Tuesday, then widens her lead over time, winning handily. The Sanders-favored model projects a horserace, where the two candidates are ridiculously close for the entire election.
So, who am I going to voter for?
I like both candidates. The current model suggests I should vote for Clinton because she is going to pull ahead, and it is better to vote for the likely winner, since I like them both, so that person gets more momentum (a tiny fraction of momentum, given one vote, but still…). On the other hand, a Sanders insurgency would be revolutionary and change the world in interesting ways, and for that to happen, Sanders needs as many votes on Tuesday as possible.
It is quite possible, then, that I’ll vote for Sanders, then work hard for Hillary if Super Tuesday confirms the Clinton favored model. That is how I am leaning now, having made that decision while typing the first few words of this very paragraph.
Or I could change my mind.
Either way, I want to see people stop being so mean to the candidate they are not supporting. That is only going to hurt, and be a regretful decision, if your candidate is not the chosen one. Also, you are annoying the heck out of everyone else. So just stop, OK?