The Rest of the Democratic Primary

We are in the Primary Doldrums. For the last several days and the next several days, there is not too much happening, big gaps between the action. Wisconsin is important, and it is Tuesday, Then Wyoming by itself, then New York by itself, then a sort of Super Tuesday with several states.

As you know I’ve created a multivariable model that has a good record of predicting primary and caucus outcomes in the contests between Hillary Clinton and Bernie Sanders. For the rest of the primary season, this is what it looks like.

Screen Shot 2016-04-02 at 12.49.47 PM

I used yellow highlighting to indicate who is expected to win the most delegates on each primary/caucus day. Sanders will do well in Wisconsin, tie (or maybe even better) in Wyoming, do well in Indiana, and on balance, do well on June 7th when there will be six contests at once including Pennsylvania. But while Sanders may win the day on three (or four) days, Clinton will win the day on five. In total, Clinton is predicted to take 886 delegates, and Sanders 790.

This is the distribution of cumulative delegates starting with now and moving across this range of primary dates, showing the evolution of the difference between the two candidates throughout.

Screen Shot 2016-04-02 at 12.50.32 PM

On balance, Clinton will, according to this model, will widen her lead over Sanders. If Sanders does better than projected this gap will narrow, but he’ll have to do very well to close the gap.

How Will Clinton And Sanders Do On Tuesday? (Updated)

Most polls and FiveThirtyEight predict a Clinton blow-out on Tuesday, with her winning all five states, in some cases by a large margin. My model, however, predicts that each candidate will win a subset of these states, but with Clinton still win the day.

I’ve been working on a model to predict primary outcomes for the Democratic selection process, and generally, the model has proved very effective. After each set of primaries I’ve adjusted the model to try to do a better job of predicting the upcoming contests. The most important adjustment is the one that affects the current model.

The model assumes that we can predict voting behavior by ethnicity. Given this assumption, the distribution of potential Democratic participants by ethnic group then gives the final likely division among primary voters or caucus goers across the two candidates, then this translates directly into the division of committed delegates for that state. The estimates of within-group voting are made from exit polls.

The most recent revision divides states into “Southern” (meaning deep south) and “Not Southern,” and uses different sets of numbers for each of the two kinds of states.

To date, about 32% of the committed delegates have been assigned, with 769 for Clinton and 502 for Sanders. Next Tuesday, March 15th, an additional 691 delegates will be committed to the two candidates. So, almost exactly 50% of all the delegates for the entire process will be committed. (None of this counts uncommitted delegates, sometimes called “Super Delegates.”)

If Clinton and Sanders each do about as well as they have done in the past, this will leave Sanders with a significant gap to close, and he probably can’t win the nomination. If Clinton does better, that closes the door to Sanders even more firmly. But, if Sanders does well, that may help close the gap and considering Sanders as a possible nominee is reasonable.

The current model, which has the interesting dual property of giving Sanders more delegates than the polls currently predict, but also, according to my own evaluation of my own model, probably underestimates Sanders’ performance, suggests that Clinton will earn more delegates than Sanders, but not by too much. So, if the underperformance of the model is strong enough, they could come close to a tie. At present, here are my predictions for the outcome of Tuesday’s set of primaries:

Florida: Clinton will win but by less than expected. The outcome will be so close that I can’t rule out a Sanders win here.
Illinois: Sanders will win, but this may be close to a tie.
Missouri: Sanders may win by a small margin. However, keep in mind that it is very difficult to classify Missouri as a “Southern” vs. “not-Southern” state. I picked “Not-Southern” for this prediction. But we’ll see. If Missouri goes all “Southern” then Clinton wins there.
North Carolina: Clinton will win by a very large margin (70-something to 30-something delegates).
Ohio: Sanders will win by a small margin.


Here is the output of the model indicating the expected number of committed delegates to be awarded on Tuesday to the two Democratic candidates:
Screen Shot 2016-03-14 at 2.34.04 PM

If these numbers are close to what happens, or if Sanders does better, then Sanders is still in the race, though with a tough road ahead of him. If, in contrast, the polls turn out to be right, it would indicate that Sanders’ over performance in earlier contests may have been temporary, and the chance of him winning the primary is very small. At present the polls show Clinton way ahead in Florida, Clinton barely ahead in Illinois, a near tie in Missouri, Clinton way ahead in North Carolina, and Clinton a little ahead in Ohio. In other words, I’m suggesting that Sanders will win three out of the five races, while the polls suggest he will one or may be two.

Let’s look at the FiveThirtyEight predictions to see how they compare.

FiveThirtyEight gives Florida to clinton (nearly 100% chance of wining). They predict a strong Clinton finish in the state, about 2:1.

For Illinois, FiveThirtyEight says about the same, a better than 2:1 projected result, with Clinton carrying away a lot of the delegates.

For Missouri, FiveThirtyEight has Clinton probably winning, but not by too much, so only a small pickup for her.

For North Carolina, FiveThirtyEight has Clinton winning just shy of 2:1 over sanders.

For Ohio, FiveThirtyEight predicts a Clinton win, and a fairly strong one.

So we can see that there is a huge difference between FiveThirtyEight’s prediction and mine, and the two methods are very different. Both of the methods used by FiveThirtyEight rely on some combination of opinion or support-related information, while my method uses none of that. For this reason it is not surprising that the two methods produce very different results.

The point of going over the FiveThirtyEight predictions is that they do a very good job of representing the polling data, which overall strongly suggest that Clinton will run away with the nomination. The problem is, these data have been suggesting this since Iowa, and generally speaking, Sanders has far outperformed those estimates.

The final outcome in terms of delegates from all five races will be approximately:

Clinton: ca 364 delegates

Sanders: ca 326 delegates

This will mean that, at the end of the day Tuesday, Hillary Clinton will have about 56% of the committed delegates, to Sanders’ 44%, with about 50% of the committed delegates assigned.

If you hurry, you can vote in the Climate Primary (Closes March 8th)

Climate Hawks Votes is running a primary in which you can chose either Hillary Clinton, Bernie Sanders, or No Endorsement.

The web page where you can vote is here. You are required to enter some identifying information in order to eliminate or significantly reduced gaming of the poll, so the results should be reasonably fair.

There is a tendency for climate hawks (using the term generally, not in reference to this specific group) to favor Sanders on climate over Clinton, because Clinton is not 100% anti-fracking and anti-methane, while Sanders is. However, I think this is a bit unfair. Sanders has never been in the executive branch, and the Obama White House made its transition from being softer on climate change to stronger on climate change only recently. In fact, the two candidates for the Democratic Nomination are very similar in their stated positions on climate change. (See this post for more discussion on that, and links out to various other sources of information).

Also, though I like both Hillary and Bernie a lot, the truth is that neither of them really qualify as true Climate Hawks, in my opinion. Science has been telling us about the importance of climate change for years. By 1990, the reality and importance of climate change was clear, and should have permeated the political discourse during that decade, but it didn’t. Very few politicians can really be considered climate hawks by that standard.

More recently, a small number of politicians, none of whom have the names “Sanders” or “Clinton” have been pushing for implementing policies that will address climate change. They may be considered climate hawks for this reason.

In my view, the real contrast will be between whichever Democrat gets the nomination and, almost certainly (though with a brokered convention, who knows?) the Republican nominee. If you care about the climate, you will want to vote for the Democrat, whoever that is. Full disclosure, when I voted in the Climate Hawks primary, I went for “No Endorsement” for this reason. To my mind, this is in line with what Climate Hawks Votes has tended to do; They avoid giving endorsements to candidates without real climate-savvy records. But, the choice is up to you.

Democratic Primary Results: Predicted vs actual (Updated with Maine)

Yesterday, the Democrats held three contests, in Louisiana, Nebraska and Kansas. I had predicted a Sanders win in Nebraska and Kansas, and a Clinton win in Louisiana, using my ever-evolving ethnicity-based projection model. Those predictions came to fruition. Like this:

Predicted on top, Actual on bottom.


Clinton did a bit better than projected in Louisiana, and Sanders did a bit better in Nebraska, but much better in Kansas than predicted.

I had projected the final delegate count to be 60:49 (Clinton:Sanders) for that day, and it turned out to be 55:49 (Clinton:Sanders). The difference is primarily in the number of actual delegates awarded to the candidate between what my model assumed and what the states (Louisiana) actually did. Overall, I’d say that the model, which currently predicts Clinton reaching lock-in on delegate count in mid or late April, is accurate, but with enough of a difference to allow for Sanders to close the gap somewhat. At this point, though, Sanders will have to start performing better in order to catch up.

Lately we’ve seen a discussion that runs something like this. Clinton is winning in states where a Democrat is unlikely to lose, and Sanders is doing well in states where a Democrat is likely to lose. Therefor, Clinton would lose the general election, and Sanders would win it.

This proposition fails to take into account that for the most part the two candidates are interchangeable at the level of the general election. All those people who preferred one candidate in the primary will prefer the other candidate in the general, should that other candidate win the nomination. The only way for Sanders to beat Clinton is to start winning more delegates than the model projects, and soon.

Sanders’ better than predicted performance yesterday is not enough for him to overtake Clinton, but perhaps it is a sign that he is increasing his performance. Every primary or caucus is a test of the running hypothesis of status quo, and at the moment, status quo gives Clinton the nomination. Sanders will have to start falsifying that hypothesis very soon. There is no reason to say that will happen, or not happen, at this time.

By the way, a similar model (using the status quo as the determining factor in making predictions, but with no ethnic adjustment) for the Republican party predicts that Trump will lock in the nomination late enough in the process that he could actually fail to do so if his performance falters. The possibility of a brokered Republican convention is very real.

That is not the case, probably, for the Democratic convention, as the uncommitted delegates (called Super Delegates) will likely vote for the winner at the end of the process, to lock in that candidate.

UPDATE: Today, Sanders won in Maine. I had predicted a Sanders win, though Bernie got more delegates than my model had suggested.

Predicted on top, Actual on bottom.

Screen Shot 2016-03-06 at 10.05.06 PM

The Delegate total for this weekend is now 72:62 Clinton Sanders predicted, 62:64 Clinton Sanders actualized.

I will assume that the extra strong showing by Sanders in Maine is partly a result of the Favorite Son effect, and not adjust the model. Mississippi and Michigan, in just a couple of days, together with this weekend’s contests, should provide excellent calibration in preparation for primaries if Florida, Illinois, Missouri, North Carolina and Ohio.

Whom Should I Vote For: Clinton or Sanders?

You may be asking yourself the same question, especially if, like me, you vote on Tuesday, March 1st.

For some of us, a related question is which of the two is likely to win the nomination.

If one of the two is highly likely to win the nomination, then it may be smart to vote for that candidate in order to add to the momentum effect and, frankly, to end the internecine fighting and eating of young within the party sooner. If, however, one of the two is only somewhat likely to win the nomination, and your preference is for the one slightly more likely to lose, then you better vote for the projected loser so they become the winner!

National polls of who is ahead have been unreliable, and also, relying on those polls obviates the democratic process, so they should be considered but not used to drive one’s choice. However, a number of primaries have already happened, so there is some information from those contests to help estimate what might happen in the future. On the other hand, there have been only a few primaries so far. Making a choice based wholly or in part on who is likely to win is better left until after Super Tuesday, when there will be more data. But, circling back to the original question, that does not help those of us voting in two days, does it?

Let’s look at the primaries so far.

Overall, Sanders has done better than polls might have suggested weeks before the primaries started. This tell us that his insurgency is valid and should be paid attention to.

There has been a lot of talk about which candidate is electable vs. not, and about theoretical match-ups with Trump or other GOP candidates. If you look at ALL the match-ups, instead one cherry picked match-up the supporter of one or the other candidate might pick, both candidates do OK against the GOP. Also, such early theoretical match-ups are probably very unreliable. So, best to ignore them.

Iowa told us that the two candidates are roughly matched.

New Hampshire confirmed that the two candidates are roughly matched, given that Sanders has a partial “favorite son” effect going in the Granite State.

Nevada confirmed, again, that the two candidates are roughly matched, because the difference wasn’t great between the two.

So far, given those three races, in combination with exit polls, we can surmise that among White voters, the two candidates are roughly matched, but with Sanders doing better with younger voters, and Clinton doing better with older voters.

The good news for Sanders about younger voters is that he is bringing people into the process, which means more voters, and that is good. The bad news is two part: 1) Younger voters are unreliable. They were supposed to elect Kerry, but never showed up, for example; and 2) Some (a small number, I hope) of Sanders’ younger voters claim that they will abandon the race, or the Democrats, if their candidate does not win, write in Sanders, vote for Trump, or some other idiotic thing. So, if Clinton ends up being the nominee, thanks Bernie, but really, no thanks.

Then came South Carolina. Before South Carolina, we knew that there were two likely outcomes down the road starting with this first southern state. One is that expectations surrounding Clinton’s campaign would be confirmed, and she would do about 70-30 among African American voters, which in the end would give her a likely win in the primary. The other possibility is that Sanders would close this ethnic gap, which, given his support among men and white voters, could allow him to win the primary.

What happened in South Carolina is that Clinton did way better than even those optimistic predictions suggested. This is not good for Sanders.

Some have claimed that South Carolina was an aberration. But, that claim is being made only by Sanders supporters, and only after the fact. Also, the claim is largely bogus because it suggests that somehow Democratic and especially African American Democratic voters are somehow conservative southern yahoos, and that is why they voted so heavily in favor of Clinton. But really, there is no reason to suggest that Democratic African American voters aren’t reasonably well represented by South Carolina.

In addition to that, polling for other southern states conforms pretty closely to expectations based on the actual results for South Carolina.

I developed an ethnic-based model for the Democratic primary (see this for an earlier version). The idea of the model is simple. Most of the variation we will ultimately observe among the states in voting patterns for the two candidates will be explained by the ethnic mix in each state. This is certainly an oversimplification, but has a good chance of working given that before breaking out voters by ethnicity, we are subsetting them by party affiliation. So this is not how White, Black and Hispanic people will vote across the states, but rather, how White, Black and Hispanic Democrats will vote across the state. I’m pretty confident that this is a useful model.

My model has two versions (chosen by me, there could be many other versions), one giving Sanders’ strategy a nod by having him do 10% better among white voters, but only 60-40 among non-white voters. The Clinton-favored strategy gives Clinton 50-50 among white voters, and a strong advantage among African American voters, based on South Carolina’s results and polling, of 86-14%. Clinton also has a small advantage among Hispanic voters (based mainly on polls) with a 57:43% mix.

These are the numbers I’ve settled on today, after South Carolina. But, I will adjust these numbers after Super Tuesday, and at that point, I’ll have some real confidence in the model. But, at the moment, the model seems to be potentially useful, and I’ll be happy to tell you why.

First, let us dispose of some of the circular logic. Given both polls and South Carolina’s results, the model, based partly on South Carolina, predicts South Carolina pretty well using the Clinton-favored version (not the Sanders-favored version), with a predicted cf. actual outcome of 34:19% cf 39:14% This is obviously not an independent prediction, but rather a calibration. The Sanders-favored model predicts an even outcome of 27:26%.

The following table shows the likely results for the Clinton-favored and Sanders-favored model in each state having a primary on Tuesday.
Screen Shot 2016-02-28 at 12.50.21 PM
The two columns on the right are estimates from polling where available. This is highly variable in quality and should be used cautiously. I highlighted the Clinton- or Sanders-favored model that most closely matches the polling. The matches are generally very close. This strongly suggests that the Clinton-favored version of the model essentially works, even given the limited information, and simplicity of the model.

Please note that in both the Clinton- and Sanders-favored model, Clinton wins the day on Tuesday, but only barely for the Sanders-favored model (note that territories are not considered here).

I applied the same model over the entire primary season (states only) to produce two graphs, shown below.

The Clinton-favored model has Clinton pulling ahead in committed delegate (I ignore Super Delegates, who are not committed) on Tuesday, then widens her lead over time, winning handily. The Sanders-favored model projects a horserace, where the two candidates are ridiculously close for the entire election.


So, who am I going to voter for?

I like both candidates. The current model suggests I should vote for Clinton because she is going to pull ahead, and it is better to vote for the likely winner, since I like them both, so that person gets more momentum (a tiny fraction of momentum, given one vote, but still…). On the other hand, a Sanders insurgency would be revolutionary and change the world in interesting ways, and for that to happen, Sanders needs as many votes on Tuesday as possible.

It is quite possible, then, that I’ll vote for Sanders, then work hard for Hillary if Super Tuesday confirms the Clinton favored model. That is how I am leaning now, having made that decision while typing the first few words of this very paragraph.

Or I could change my mind.

Either way, I want to see people stop being so mean to the candidate they are not supporting. That is only going to hurt, and be a regretful decision, if your candidate is not the chosen one. Also, you are annoying the heck out of everyone else. So just stop, OK?

Who Will Win The Next Several Primaries: Clinton or Sanders?

I recently developed a model of how the primary race will play out between Democratic presidential hopefuls Hillary Clinton and Bernie Sanders.

That model made certain assumptions, and allowed me to produce two projections (well, many, but I picked two) depending on how each candidate actually fairs with different ethnic groups (White, Back, Hispanic, since those are the groupings typically used).

The two different versions of this model were designed to favor each candidate differently. The Clinton-favored model started with the basic assumption that among white Democratic Party voters, both candidates are similar, and that Clinton has a strong lead among Hispanic voters and an even stronger lead among African American voters. The Sanders-favored model assumes that Sanders has a stronger position among White voters and less of a disadvantage among non-White voters.

The logic behind the equivalence among White voters is that this his how the two candidates did in Iowa, which is a representative of the United States White vote, unadulterated by a favorite son effect in New Hampshire. Nevada failed to indicate that this assumption should be changed.

The favoring of Clinton among non-White voters is based on national polling with respect to ethnic effects. The logic behind the Sanders-favored version is that Sanders’ strategy, to win, has to involve a large young, white, male turnout (evidenced in the polls) and a narrowing of the gap among African American and Hispanic voters.

In that model, presented here, I used statewide demographic data to establish the ethnic term. However, that is incorrect, because one’s chances of engaging in the Republican vs. Democratic process in one’s state is tied to ethnicity. More Whites are Republicans, more Blacks are Democrats. I knew that at the time I worked out the model, but sloth and laziness, combined with lack of time, caused me to simplify.

The newer version of the model adjusts for likely Democratic Party membership. The results are the same but less dramatic, with a much longer slog to the finish line and the two candidates doing about the same as each other for the entire primary season.

The outcome of my modeling (reflected in the non-adjusted and adjusted versions, each with a Clinton- and Sanders-favored version) is different from the expectations of either campaign, as far as I can tell. Clinton boosters are claiming that the Democratic Party is mainly behind her, and these first primaries are aberrant. Sanders boosters are claiming the Sanders strategy of having a surge of support will carry him to victory. Both of these characterizations require that each candidate surge ahead pretty soon, and don’t look back. The opportunity to surge ahead is, certainly, Super Tuesday (March 1st).

The models I produced, with the assumptions listed above, show a close race all along, so either the campaigns are wrong or I am wrong.

The graphic at the top of the post represents how far ahead each candidate will be across the primary season, for each of their respective favored strategies.

So for Clinton, the ethnic gap is maintained as wide, and the blue line shows that she will surge nearly 40 committed delegates ahead of Sanders (a modest surge) and continue to develop a wider and wider gap past mid-March, and thereafter, maintain but not increase that gap, of about 80 committed delegates, until the end.

For Sanders, the orange line, the initial gap formed on Super Tuesday, does not start out very large, but his gap steadily increases until the end of the primary season, ending with a gap of over 120 committed delegates.

So, that is the new model. But, it is a bogus model.

I’m trying to stick with empirical data that do not rely on polling. Why? Because everybody else is relying on polling, and this is an election season where the polling is not doing a good job of predicting outcomes. Also, my modeling gives credit to each campaign’s claims, which is at least interesting, if not valid, as a way of approaching this problem. If Clinton is right, she wins this way. If Sanders is right, he wins that way.

However, the data are insufficient to have much faith in this model. Super Tuesday will provide a lot more information, and with that information I can rework the model and have some confidence in it.

Who will win the South Carolina Primary, Clinton or Sanders?

While working this out, I naturally came up with predictions for what will happen in all of the future primaries. So let’s look at some of that.

In South Carolina, according to my model, if Clinton’s strategy holds, she will win 29 delegates, and Sanders will win 24 delegates. If the Sanders strategy pertains, they will tie, or possibly, Clinton will win one more delegate than Sanders.

Who will win the Super Tuesday primaries?

The following table shows the results predicted by this model, for both the Clinton-favored and Sanders-favored versions, for all the Super Tuesday state primaries or caucuses.


The Clinton-favored model suggests that Clinton will win six out of 11 primaries, and take the majority of uncommitted delegates. The Sanders-favored model suggests that Sanders will take 9 out of 11 primaries, and win the majority of uncommitted delegates.

Notice that I put Vermont in Italics, because Sanders is likely to win big in Vermont no matter what happens. This underscores the nature of this model in an important way. I’m not using any data from the actual states, other than the ethnic mix from census data, with an adjustment applied to produce an estimate of Democratic Party membership across ethnic groups. That estimate is based on national data as well as data specifically form Virginia, to provide some empirical basis.

I suspect most people will have two responses to this table. First, they will say that a model that incorporates Clinton’s strategic expectations should have her winning more. Second, they will say that all the numbers, for all states and all models, are too close.

These are both legitimate complaints about my model, and will explain why it will turn out to be totally wrong. Or, they are suppositions people are making that are totally wrong, and when my model turns out to be uncannily accurate, those suppositions will have to be put aside for the rest of the primary season. (Or, some other outcome happens.)

I will restate this: I’m looking for Super Tuesday to provide the best empirical data to make this model work for the rest of the primary season. But, in the meantime, this seemed like an interesting result to let you know about.

Will Clinton or Sanders win the Democratic Nomination?

Both Hillary Clinton and Bernie Sanders are viable candidates to win the Democratic nomination to run for President of the United States.

There are polls and pundits to which we may refer to make a guess as to who will win. Or, we could ignore all that, and let the process play out and see what happens. But, spreadsheets exist, so it really is impossible to resist the temptation of creating a simplistic spreadsheet model that predicts the outcome.

But we can take that a step further and suggest alternate scenarios, based on available data. So I did that.

I have removed the so called “Super Delegates” from the process. This model assumes that the super delegates will ultimately either divide themselves up to reflect the overall distribution of committed delegates, or will mass towards the apparent leader. In any event, it is important that you know that the term “Super Delegate” is an unofficial made up term. They are really called “Uncommitted Delegates” because they are uncommitted. They will walk into the National Convention with no requirement as to whom they cast their vote for. That is their purpose. Meanwhile, it is true that individual Uncommitted Delegates will “endorse” a candidate during the process. Personally, I’m against this because it leads to conspiratorial ideation among activists and other interested parties. If I was King of the Democratic Party, I would make a rule that if you are going to be an Uncommitted Delegate that you don’t endorse or in any other way imply support for a candidate. (I would also probably reduce the total number of Uncommitted Delegates somewhat.)

So, in this model, the number of delegates it takes to be assured the nomination, pragmatically if not fully realistically, is the number required by the process minus the number of Uncommitted Delegates, or 2382-712=1670. In the graphs below, I represent this threshold by a wide blue line to reflect uncertainty. When a candidate’s delegate count makes it to the vague blue line first, that is an indicator that this candidate may be anointed. But, if the two candidates are close in delegate count at this point, a proper degree of uncertainty has to be assumed.

This modeling effort explores the effect of ethnicity on the outcome. I assume all voters are White, Black, or Hispanic. I also only look at US states and DC, because things may be very different in the territories and possessions with respect to ethnicity. It is not too hard to estimate the relative preference for either of the two candidates among White, Black, and Hispanic subpopulations. It is probably true that these ethnic divisions work very differently in different areas. For example, union endorsements may affect ethnic voting patterns more or less for different ethnicities in different states. Importantly, it is likely that both preference and turnout will evolve among the ethnic groups as the primary process continues. This, of course, is why we use a spreadsheet. You can change the numbers any time as more information is available.

This model does not involve age directly, but does so indirectly, in that variations in age graded participation factor into ethnicity. Same with sex, or more accurately, sex is divided evenly across the primary states (I assume) while age might not be, so again, it can factor into ethnicity. But a more sophisticated model that looks at turnout differentials or anomalies across age and sex would be better, and if the information related to this becomes available, perhaps I’ll update the model.

The Iowa Caucus involved mostly White voters, and told us that Clinton and Sanders are very close to even in this demographic. So, the model could assume a 50-50 spit among White voters. Currently available and fairly recent polling data tell us that Clinton is preferred by African American Democrats and Hispanic Democrats, but to different levels. So, a first stab at this model can use a Clinton-Sanders ratio of 70-30 for African American primary voters, and 60-40 for Hispanic primary voters. Using these three sets of ratios, and known statewide demographics across the primary, we can estimate the effects of ethnicity.

One problem you might note right away is that the statewide ethnicity profiles are not the same as the Democratic Party ethnicity profiles. A better version of this model will use the primary participant profiles instead. But, the last two election cycles of data are probably biased in this regard because of Obama’s candidacy, and thus may be incorrect. The preferred method will be to recalculate state by state ethnicity profiles, to estimate how many of each of three groups will vote, based on the returns from the first several primaries. I’ll do that. Right now this is impossible because both Iowa and New Hampshire lack the diversity in the voting population to allow it.

I am ignoring the New Hampshire results because I don’t know how to adjust for the Favorite Son Effect there. Also, New Hampshire is an odd state when it comes to primaries. The largest voting block, in the New Hampshire Primary, is uncommitted, and they can vote in either primary (but Republican and Democratic voters can not switch). This, and some other factors, has resulted in a special culture among New Hampshire voters. So, between the Favorite Son Effect and the special snowflake nature of New Hampshire (which is what makes New Hampshire so interesting and important, of course) I’m ignoring it for now, but will include data from the Granite State when there are more other states to consider.

So, the first model assumes the above stated numbers, and produces this effect:

Screen Shot 2016-02-11 at 1.46.00 PM

In this model, Clinton wins the primary. The pattern of delegate accumulation is interesting, and is actually one of the main reasons to do this modeling, but it only becomes understandable when compared to other outcomes, so let’s look at the alternative model I ran and then compare.

The second model takes a cue from the large number of new young voters combined with their Bernie-ness and their whiteness to suggest a change in the White Ratio to favor Sanders. I sucked on my thumb for a minute and came up with a 40-60 ratio. This model gives credit to Sanders campaign claims that African Americans will grok the Bern, and lowers the differential among Black voters to 60-40. This model assumes something similar for Hispanic voters, and adds another element. It is possible that in some states labor related issues will cause Hispanic votes to shift even more strongly to Sanders, so my thumb-suck estimate for this ratio is 40-60.

The second model is designed to favor Sanders in a way that might reasonably reflect actual possible voting preference shifts that the Sanders’ campaign is attempting. So, this model assumes Sanders succeeds where he is clearly trying, and produces this result:

Screen Shot 2016-02-11 at 1.48.32 PM

Now, we can compare the two models, which I think are a) reasonable given what we know and b) need to be taken with a grain of salt because of what we don’t know.

The two models show a difference in how the spread between the candidates evolves, and when the projected winner can be seen as anointed by the process. In the case of the Clinton win, which assumes the status quo maintained for the entire campaign, and gives credit to the idea that “Sanders can’t win in the South” (more or less), the two candidates stay close enough to each other that there will be no clear winner for a long time, even if Clinton actually does stay ahead of Sanders the whole time. In this case, the jump into the blue zone, though not by a very large margin, does not happen until April 26th, when there are several primaries including Pennsylvania, with a massive delegate count. Also, importantly, after this date there are still some very large states including New Jersey and especially California, that could flip a result. If this is the pattern that develops, the day after the big primary day on April 26th, if I was Sanders, I’d camp out in California!

In the case of the Sanders win, the pattern is very different. (This is why this is interesting.) Here, Sanders pulls farther ahead, and sooner. The big jump would be on March 15th, which is a day of several primaries, including Florida, Illinois, and North Carolina. In this model, a close campaign shifts to a strong Sanders lead, and Bernie does not look back.

Those two scenarios represent two very different primary seasons, indeed!

I will update or redo these models after the next primary or two. Between Nevada and South Carolina, we can get much better data on the ethnic effects on the numbers, though of course, it will still be very provisional. Those data will be limited by not being extensive, but will represent a lot of diversity. On Super Tuesday (March 1st) enough data from a bunch of primaries across the US will allow, I think, a very accurate model that will probably predict the outcome of the primary season IF whatever the status quo on that day happens to be maintains into the future. After that, differences from whatever looks apparent will require something to happen or change to cause voters to do the unexpected.