As you know, I developed a simple model for projecting future primary outcomes in the Democratic party. This model is based on the ethnic mix in each state, among Democratic Party voters. The model attributes a likely voting choice to theoretical primary goers or causers based on previous behavior by ethnicity. Originally I made two models, one using numbers that the Clinton campaign was banking on, and one using numbers that the Sanders campaign was banking on.
The results of the Super Tuesday primaries demonstrated that the Sanders-favoring model does not predict primary outcomes. Those same results showed that the Clinton-favoring model worked better. But the numbers also indicated that the Clinton favoring model estimates Clinton’s ultimate delegate take somewhat inaccurately.
I adjusted the model parameter so the model now matches reality for a subset of the primaries that have already happened to within five percent. The model still slightly favors Clinton, but not by much. The subset of primaries includes only the US states (not territories, where I don’t expect the ethnic mix approach to work at all) and excludes states with a strong favorite son effect. This therefore excludes New Hampshire and Vermont. Due to oddities in the Texas delegate system, the adjustment was also made by excluding Texas, though the model results for Texas match very well proportionately.
(Note: Using only the subset of states, the model predicts previously held primaries and caucuses to within less than two tenths of a percent).
The new model now only has one version, which as noted matches primaries so far very well. While there is a somewhat southern bias in the set of primaries that have been carried out so far, that bias is probably not important. I have a fairly high level of confidence in the model.
The result is best seen in this graphic, which shows the cumulative delegate count of committed delegates in US states. So this excludes non-committed delegates (known as “Super Delegates”) and it excludes territories and other non-states (but it does include DC, because DC is like a state).
Assuming a large proportion of the Democratic Party’s uncommitted delegates support Clinton, Clinton will probably achieve the necessary number of delegates to lock the nomination either on the 19th of April with the New York primary, or on the 26th of April, with the Maryland, Connecticut, Delaware, Pennsylvania and Rhode Island primaries.
There are two phases of primaries coming up. First we have a series of weeks with only one or two primaries happening at once, with a total of 300 committed delegates (130 from Michigan). Then we have what is effectively Return of Super Tuesday, with 691 committed delegates, including Florida with 214. For Sanders to regain traction, he has to do well in some of these big states. In particular, Sanders has to outperform the model in Michigan, Florida, Illinois and possibly North Carolina and Ohio.
When we look at many of these states, the model seems to fit very well with the available polling data, except in cases where the polls suggest a stronger outcome for Clinton. The following table compares the model projections with estimates of the delegate split based on polls. All delegates are assumed to be awarded (among the committed delegates only) and the polling data is not very dense and in some cases not too recent, so this is a very rough estimate.
Prior to Super Tuesday, the then-current version of this model projected results that conformed closely with polls. For most states, the outcome of the actual voting matched the projections and the polls pretty well, except in a couple of places. Now, the refined model matches polling data even more closely, but the polling data is not necessarily to be trusted because there has not been enough polling. (I avoided comparisons with really old polls which are entirely useless).
Clinton’s path to the nomination is clear. Sanders’ path to the nomination requires something to change, and to change dramatically and quickly.
Both Hillary Clinton and Bernie Sanders are viable candidates to win the Democratic nomination to run for President of the United States.
There are polls and pundits to which we may refer to make a guess as to who will win. Or, we could ignore all that, and let the process play out and see what happens. But, spreadsheets exist, so it really is impossible to resist the temptation of creating a simplistic spreadsheet model that predicts the outcome.
But we can take that a step further and suggest alternate scenarios, based on available data. So I did that.
I have removed the so called “Super Delegates” from the process. This model assumes that the super delegates will ultimately either divide themselves up to reflect the overall distribution of committed delegates, or will mass towards the apparent leader. In any event, it is important that you know that the term “Super Delegate” is an unofficial made up term. They are really called “Uncommitted Delegates” because they are uncommitted. They will walk into the National Convention with no requirement as to whom they cast their vote for. That is their purpose. Meanwhile, it is true that individual Uncommitted Delegates will “endorse” a candidate during the process. Personally, I’m against this because it leads to conspiratorial ideation among activists and other interested parties. If I was King of the Democratic Party, I would make a rule that if you are going to be an Uncommitted Delegate that you don’t endorse or in any other way imply support for a candidate. (I would also probably reduce the total number of Uncommitted Delegates somewhat.)
So, in this model, the number of delegates it takes to be assured the nomination, pragmatically if not fully realistically, is the number required by the process minus the number of Uncommitted Delegates, or 2382-712=1670. In the graphs below, I represent this threshold by a wide blue line to reflect uncertainty. When a candidate’s delegate count makes it to the vague blue line first, that is an indicator that this candidate may be anointed. But, if the two candidates are close in delegate count at this point, a proper degree of uncertainty has to be assumed.
This modeling effort explores the effect of ethnicity on the outcome. I assume all voters are White, Black, or Hispanic. I also only look at US states and DC, because things may be very different in the territories and possessions with respect to ethnicity. It is not too hard to estimate the relative preference for either of the two candidates among White, Black, and Hispanic subpopulations. It is probably true that these ethnic divisions work very differently in different areas. For example, union endorsements may affect ethnic voting patterns more or less for different ethnicities in different states. Importantly, it is likely that both preference and turnout will evolve among the ethnic groups as the primary process continues. This, of course, is why we use a spreadsheet. You can change the numbers any time as more information is available.
This model does not involve age directly, but does so indirectly, in that variations in age graded participation factor into ethnicity. Same with sex, or more accurately, sex is divided evenly across the primary states (I assume) while age might not be, so again, it can factor into ethnicity. But a more sophisticated model that looks at turnout differentials or anomalies across age and sex would be better, and if the information related to this becomes available, perhaps I’ll update the model.
The Iowa Caucus involved mostly White voters, and told us that Clinton and Sanders are very close to even in this demographic. So, the model could assume a 50-50 spit among White voters. Currently available and fairly recent polling data tell us that Clinton is preferred by African American Democrats and Hispanic Democrats, but to different levels. So, a first stab at this model can use a Clinton-Sanders ratio of 70-30 for African American primary voters, and 60-40 for Hispanic primary voters. Using these three sets of ratios, and known statewide demographics across the primary, we can estimate the effects of ethnicity.
One problem you might note right away is that the statewide ethnicity profiles are not the same as the Democratic Party ethnicity profiles. A better version of this model will use the primary participant profiles instead. But, the last two election cycles of data are probably biased in this regard because of Obama’s candidacy, and thus may be incorrect. The preferred method will be to recalculate state by state ethnicity profiles, to estimate how many of each of three groups will vote, based on the returns from the first several primaries. I’ll do that. Right now this is impossible because both Iowa and New Hampshire lack the diversity in the voting population to allow it.
I am ignoring the New Hampshire results because I don’t know how to adjust for the Favorite Son Effect there. Also, New Hampshire is an odd state when it comes to primaries. The largest voting block, in the New Hampshire Primary, is uncommitted, and they can vote in either primary (but Republican and Democratic voters can not switch). This, and some other factors, has resulted in a special culture among New Hampshire voters. So, between the Favorite Son Effect and the special snowflake nature of New Hampshire (which is what makes New Hampshire so interesting and important, of course) I’m ignoring it for now, but will include data from the Granite State when there are more other states to consider.
So, the first model assumes the above stated numbers, and produces this effect:
In this model, Clinton wins the primary. The pattern of delegate accumulation is interesting, and is actually one of the main reasons to do this modeling, but it only becomes understandable when compared to other outcomes, so let’s look at the alternative model I ran and then compare.
The second model takes a cue from the large number of new young voters combined with their Bernie-ness and their whiteness to suggest a change in the White Ratio to favor Sanders. I sucked on my thumb for a minute and came up with a 40-60 ratio. This model gives credit to Sanders campaign claims that African Americans will grok the Bern, and lowers the differential among Black voters to 60-40. This model assumes something similar for Hispanic voters, and adds another element. It is possible that in some states labor related issues will cause Hispanic votes to shift even more strongly to Sanders, so my thumb-suck estimate for this ratio is 40-60.
The second model is designed to favor Sanders in a way that might reasonably reflect actual possible voting preference shifts that the Sanders’ campaign is attempting. So, this model assumes Sanders succeeds where he is clearly trying, and produces this result:
Now, we can compare the two models, which I think are a) reasonable given what we know and b) need to be taken with a grain of salt because of what we don’t know.
The two models show a difference in how the spread between the candidates evolves, and when the projected winner can be seen as anointed by the process. In the case of the Clinton win, which assumes the status quo maintained for the entire campaign, and gives credit to the idea that “Sanders can’t win in the South” (more or less), the two candidates stay close enough to each other that there will be no clear winner for a long time, even if Clinton actually does stay ahead of Sanders the whole time. In this case, the jump into the blue zone, though not by a very large margin, does not happen until April 26th, when there are several primaries including Pennsylvania, with a massive delegate count. Also, importantly, after this date there are still some very large states including New Jersey and especially California, that could flip a result. If this is the pattern that develops, the day after the big primary day on April 26th, if I was Sanders, I’d camp out in California!
In the case of the Sanders win, the pattern is very different. (This is why this is interesting.) Here, Sanders pulls farther ahead, and sooner. The big jump would be on March 15th, which is a day of several primaries, including Florida, Illinois, and North Carolina. In this model, a close campaign shifts to a strong Sanders lead, and Bernie does not look back.
Those two scenarios represent two very different primary seasons, indeed!
I will update or redo these models after the next primary or two. Between Nevada and South Carolina, we can get much better data on the ethnic effects on the numbers, though of course, it will still be very provisional. Those data will be limited by not being extensive, but will represent a lot of diversity. On Super Tuesday (March 1st) enough data from a bunch of primaries across the US will allow, I think, a very accurate model that will probably predict the outcome of the primary season IF whatever the status quo on that day happens to be maintains into the future. After that, differences from whatever looks apparent will require something to happen or change to cause voters to do the unexpected.