Tag Archives: Democratic Primaries

Who will win the remaining Democratic primaries?

As you know, I’ve been running a model to predict the outcomes of upcoming Democratic Primary contests. The model has change over time, as described below, but has always been pretty accurate. Here, I present the final, last, ultimate version of the model, covering the final contests coming up in June.

Why predict primaries and caucuses?

Predicting primaries and caucuses is annoying to some people. Why not just let people vote? Polls predict primaries and caucuses, and people get annoyed at polls.

But there are good reasons to make these predictions. Campaign managers might want to have some idea of what to expect, in order to better deploy resources, or to control expectations. But why would a voter who is not involved in a campaign care?

I had a very particular reason for working on this project, of predicting primaries and, ultimately, the course of the Democratic race for the Democratic nomination as a whole. When this campaign started, there were several candidates, and they all had positive and negative features. Very early in the process, all but two candidates dropped out, and I found myself liking both of them, though for different reason. I would have been happy supporting either Hillary Clinton or Bernie Sanders.

Personally I believe that it is good to vote, during a primary, for the person you like best in direct comparison among the other candidates. But at some point, it may be wise to support the one you feel is most likely to win. There are two closely related reasons to do this, and I think most observers of the current campaign can easily understand them. One is to help build momentum for the candidate that is going to win anyway. The other is to limit the damage that is inevitable during a primary campaign as the candidates fight it out.

So, early on in the process, I decided to see if I could produce a reliable method to predict the final outcome of the primary process, in order to know if and when I should get behind one of the candidates. That is the main reason I did this. In order for this method to meet this and other goals, it had to be more accurate than polls.

There are other reasons. One is that it is fun. I’ve been doing this in primaries and general election campaigns for quite a few elections. I like data, I like analyzing data, I like politics, I like trying to understand what is going on in a given political scenario. So, obviously, I’m going to do this.

Another reason is to test the idea that the voters are changing their minds over time. In order to do this one might use all the primaries and caucuses to date to predict future primaries and caucuses, and then, if the predictions go out of whack, you can probably figure that something new is going on. This relates to overall feelings among the electorate as sampled by each state, but it also relates specifically to ideas about why a particular state reacted to the campaigns the way it did.

An example of this came up recently when Bernie Sanders won in West Virginia. My model had predicted a Sanders win there, and the actual vote count was very close to the prediction. Since that prediction was based on voter behavior across the country to date, I was confident that nothing unusual happened in West Virginia. But, something unusual should have happened there, according to some conceptions of this campaign.

The economy of West Virginia is based largely on coal mining, and there are lot of Democrats there. (Democrats in local elections; they tend to vote for Republicans in the general.) So, it was thought that the voters would pick a candidate based on a perceived position on climate change and coal. Clinton went so far as to pander to the West Virginians with a rather mealy mouthed comment about how we could still keep mining coal as long as we figured out a way to have it not harm the environment. That was the Clinton campaign doing something about the coal mining vote. Others thought that a Sanders win there would indicate that he somehow managed to get a strong climate change message across to coal miners. That idea is a bit weak because when it comes down to it, Clinton and Sanders are not different enough on climate change to be distinguished by most voters, let alone coal supporting voters. In any event, the win there by Sanders was touted as a special case of a certain candidate bringing a certain message to certain voters. But, he then lost in the next coal mining state over, Kentucky, and in both states the percentage of voters that picked Clinton and Sanders was almost exactly what my model predicted, and that model was not based on climate change, coal, or perceptions or strategies related to these things, but rather, on what voters had been doing all along.

So, nothing interesting actually happened in West Virginia. Or, two interesting things happened that cancelled each other out perfectly. Which is not likely.

In short, the closeness of my model to actual results, and the lack of significant outliers in the overall pattern (see below), seems to indicate that the voters have been behaving the same way during the entire primary season, by and large. This is a bit surprising when considered in light of the assumption that Sanders would take some time to get his message across, and pick up steam (or, I suppose, drive people over to Clinton) over time. That did not happen. Democratic voters became aware of Sanders and what he represents right away, and probably already had a sense of Clinton, and that has not changed measurably since Iowa.

How does this model work?

For the first few weeks of this campaign I used one model, then switched to an entirely different one. Then I stuck with the second model until now, but with a major refinement that I introduce today. The reason for using different models has to do with the availability of data.

All the models use the same basic assumption. Simply put, what happened will continue to happen. This is why I sometimes refer to this approach a a “status quo model.” I don’t use polling data at all, but rather, I assume that whatever voters were doing in states already done, their compatriots will do in states not yet done. But, I also break the voters down into major ethnic groups based on census data. So, for each state, I have data dividing the voting populous into White, Black, Hispanic and Asian. These racial categories are, of course, bogus in many ways (click on the “race and racism” category in the sidebar if you want to explore that). But as far as American voters go, these categories tend to be meaningful.

The fist version of the model used exit polling (ok, so I did use that kind of polling for a while) to estimate the percentage of black voters who would prefer Sanders vs. Clinton. I used the simple fact that in non-favorite son states that were nearly all white Clinton and Sanders essentially tied to estimate the ratio of preferences for white votes at about even. I ignored Hispanic and Asian voters because the data were unavailable or unclear.

This model simply simulated voters’ behavior (in the simplest way, no randomization or multiple iterations or anything like that). I also used some guesses (sort of based on data) of the ethnic mix for Democrats specifically in so doing. That somewhat clumsy model worked well for the first several primaries, but then, after Super Tuesday there were (sort of) enough data points to use a different, superior method.

This method simply regressed the outcome of the primary (in terms of one candidate’s percentage of the vote) against the available ethnic variables by state. Early on, the percentage of Hispanic or Asian did not factor in as meaningful at all, and White and Black together or White on its own did not work too well. What gave the best results was simply the precent of African Americans per state.

“Best results,” by the way, is simply measured as the r-squared value of the regression analysis, which can be thought of as the percentage of variation (in voting) explained by variation in the independent variable(s) of ethnicity.

Primaries vs. Caucuses and Open vs. Closed

Many things have been said about how each of the two candidates do in various kinds of contests. We heard many say that Sanders does better in Caucuses, or that Clinton does better in closed primaries. During the middle of the primary season, I tested that idea and found it wanting. Yes, Sanders does well in caucuses, but the ethnic model predicts Sanders’ performance much better than the caucus-no caucus difference. It turns out that caucusing is a white people thing. There are no high diversity states where caucusing happens. It is not the caucus, but rather the Caucasian, that gives Sanders the edge.

This graph shows how Sanders vs. Clinton over-performed in caucuses vs. primaries.


The value plotted is the residual of each contest in relation to the model, or how far off a theoretical straight line approximating the pattern of results each contest was. Two things are apparent. One is that caucuses are less predictable than primaries. The other is that while Sanders did over-perform in several caucuses, this was not a fixed pattern.

This graph shows the residuals divided on the basis of whether the contest was open (so people could switch parties, or engage as an independent) vs closed (more restricted).


Open contests were more variable than closed contests, but it is not clear that either candidate did generally better in one or the other.

After many primaries and caucuses were finished, there became enough data to use the kind of contest as a factor in conducting the regression analysis. There are a lot of ways to do this, but I chose the simplified brute force method because it actually gives cleaner, and more understandable, results.

I simply divided the sample into the kind of contest, and then ran a multivariable regression analysis with each group, with the percent of Sanders plus Clinton votes cast for Clinton as the dependent variable, and the percentage of each of the four ethnic categories as the independent variables. There are some combinations of caucus-primary and open-closed/semi-open/semi-closed that are too infrequent to allow this. For those contests, I simply developed a regression model based on all the data to use to make a prediction in each of those states. The results, shown below, use this method of developing the most accurate possible model.

How does this sort of model actually make a prediction?

The actual method is simple, and most of you either know this or don’t care, but for those who would like a refresher or do care…

The regression model, using multiple variables, produces a series of coefficients and an intercept. You will remember from High School algebra that the formula for a line is

Y = mX + b

X is the independent variable, along the x axis, and Y is what you are trying to predict. m is the slope of the line (a higher positive number is a steeply upward sloping line, for example) and b is the point where the line crosses the Y axis.

For multiple variables, the formula looks like this:

Y = m1(X1) + m2(X2) + … mn(Xn) + b

Here, each coefficient (m1, m2, up to mn) is a different number that you multiply by each corresponding variable (percent White, Black, etc.) and then you add on the intercept value (b). So, the regression gives the “m’s” and the ethnic data gives the “X’s” and you don’t forget the “b” and you can calculate Y (percent of voters casting a vote for Clinton) for any given state.

So, enough already, who is going to win what primary when?

Not so fast, I have more to say about my wonderful model.

How have the public opinion polls done in predicting the contests?

Everybody hates polls, but like train wrecks, you can’t look away from them.

Actually, I love polls, because they are data, and they are data about what people are thinking. The idea that polls are inaccurate, misleading, or otherwise bogus is an unsubstantiated and generally false meme. Naturally, there are bad polls, biased polls, and so on, but for the most part polls are carried out by professionals who know what they are doing, and I promise that those professionals are aware of the things you feel make polls wrong, such as the shift from landlines to cell phones.

Anyway, polls can be expected to be reasonable predictors of election outcomes, but just how good are they?

Looking at a number of races today, excluding only a few because there were no polls, I got the Real Clear Politics web site averages for polls across the states, transformed those numbers to get a percentage of the Sanders + Clinton vote that went to Clinton, and plotted that with the similarly transformed data from the actual primaries and caucuses. The r-squared value is 0.52443, which is not terrible, and the graphic shows that there is a clear correlation between the two numbers, though the spread is rather messy.


The ethnic status quo model outperformed polls

My model is actually many models, as mentioned. I have a separate regression model for each of several kinds of primary, including Closed Caucuses, Closed Primaries, Semi-Closed Primaries, and Open primaries. I did not create separate models for the much rarer Semi-Open Primary, Semi-Open Caucus or Open Caucus style contests, as each of these categories had only one or a few states. Rather, the model used to calculate values for these states is derived from all the data, so addressing specific quirkiness of each kind of contest is sacrificed for large sample size.

I also generated models that included White, Black, Hispanic, and Asian; each of these separately; and various combinations of them. As noted above, the best single predictor was Black. Hispanic and Asian were very poor predictors. White was OK but not as good as Black. But, combining all the variables worked best. That is not what usually happens when throwing together variables. It is more like mixing water colors, you end up with muddy grayish brown most of the time. But this worked because, I think, diversity matters but in different ways when it comes in different flavors.

When the total data set was analyzed with the all-ethnicity model, that worked well. But when the major categories of contest type was analyzed with the all-ethnicity model, some of the data really popped, producing some very nice r-squared values. Closed caucuses can not be predicted well at all (r-squared = 0.2577) while Open Caucuses perform very well (over 0.90, but there are only a few). The most helpful and useful results, though, were for the closed primary, open primary, and Semi=closed primary, which had R-squared values of 0.69, 0.61, and 0.74, respectively.

What this means is that the percentage of the major ethnic groups across states, which varies, explains between about 61 and 74% of the variation in what percentage of voters or caucusers chose Clinton vs. Sanders.

Polls did not do as well, “explaining” only about half the variation.

So, the following graph is based on all that. This is a composite of the several different models (same basic model recalculate separately for some of the major categories of contest), using nominal ethnic categories. The model retrodicts, in this case, the percentage of the vote that would be given to Clinton across races. Notice that this works very well. The few outliers both above and below the line are mainly caucuses, but the are also mainly smaller states, which may be a factor.


Who will win the California, New Jersey, Montana, New Mexico, North Dakota, South Dakota, and D.C. primaries?

Clinton will win the California, New Jersey, New Mexico and D.C. Primaries. Sanders will win the Montana, North Dakota, and South Dakota primaries. According to this model.

The distribution of votes and delegates will be as shown here:


This will leave Sanders 576 pledged delegates short of a lock on the convention, and Clinton 212 pledged delegates short of a lock on the convention. If Super Delegates do what Sanders has asked them to do, to respect the will of the voters in their own states, then the final count will be Sanders with 2131 delegates, and Clinton with 2560 delegates. Clinton would then have enough delegates to take the nomination on the first ballot.

In the end, Clinton will win the nomination on the first ballot, and she will win it with more delegates than Obama did in 2008, most likely.

Bernie Sanders’ Strategy to Win the Nomination

Bernie Sanders has either stated or implied two features that make up his strategy to win the Democratic nomination to be the party’s candidate for President this November.

Implied, sort of stated: Convince so-called “Superdelegates” (properly called “uncommitted delegates”) in states where he has won to vote for him, even if he is in second. That is a good idea, and if the two candidates are close, it could happen. However, when I run the numbers, giving Bernie “his” uncommitted delegates and Hillary “her” uncommitted delegates, it is pretty much a wash. The uncommitted delegates are not perfectly evenly distributed across the various voting units (states and such) but they are evenly enough distributed that not much happens. Not that this can’t come into play when Spooky Delegate Math is applied, but there isn’t much there.

Stated, the other part of the strategy: Get more votes. The idea here is that the second half of the primary season (counted in terms of numbers of delegates awarded over time), which started on March 22nd, is more favorable to Sanders than it is to Clinton.

Earlier work I did showed that this strategy has only a small chance of working, because Clinton will in fact win plenty of delegates during this second half of the season, and she has plenty of delegates under her belt now. Bernie just can’t catch up. See this post for details.

Unless …

As I have said many times, each primary or caucus, or each day on which there are a number of contests at once, is a test of one or more hypotheses. One hypothesis at stake last Tuesday was the accuracy of the model noted above. The various iterations and updates of my models for predicting primary outcomes have been very accurate all season, and I accurately predicted the outcome of Tuesday’s primary in terms of wins. I predicted that Hillary would win Arizona, and Bernie would win Idaho and Utah, and they did.

However, the magnitude of the predictions was off. Hillary won fewer votes than expected in Arizona and Bernie did way better in Utah and Idaho than predicted. (Also, the role of crossover voting was reduced as a likely factor in these elections, because Bernie did so well in Arizona with no crossovers.)

The difference in magnitude was so great that the seemingly assured Clinton victory in delegate count was turned on its head, and Sanders got more delegates than Clinton.

Is that a wakeup call? Or is it random variation?

Well, let’s assume for a minute that this is not random, and that this small set of contests tells us that the model is fundamentally wrong(ish). One thing I could do to fix that is to add the new data into the multivariable model and recalculate, but the number of new data points is insufficient to make a difference.

Another thing I could do is to assume that there is change over time in voting behavior, and add a variable for time. There are two reasons to not do that. One is that the more variables you add, the more accurately the model can predict the past (i.e, predict the value of the variables that are used to make the model), but not necessarily the future. The second reason is that if time is in fact a variable, simply adding it now would not work because of imbalance over time in sample size for the relevant variable.

So, what to do? Well, a third possibility is to fudge the data. Let us take a chance and provisionally assume that Arizona, Utah, and Idaho indicate that from here on in the expected outcomes based on my model are off by a certain amount, and then adjust future states to reflect that.

I quickly add that I’ve done this before … fudging the model to see if a Sanders claim about future outcomes might change the numbers … and each time that new hypothesis was falsified by subsequent primaries. But, why not try it again? The numbers from yesterday’s contests are startling enough to make it, actually, necessary, if one wants to remain honest about what is happening on the ground.

I have felt all along, and still feel, and most people agree with this, that there are two kinds of states, those that tend to favor Bernie and those that tend to favor Hillary. Also, the variables used in the multivariable analysis may have asymmetries across the nearly-even-state boundary of bias. (In fact I’m pretty sure they do.) So, let’s consider Arizona as a Clinton-favoring state in which she underperformed a certain amount that we estimate by comparing the expected results with the actual results. Let us also assume that Utah and Idaho are Sanders-favoring states in which he over performed by an amount that we can similarly estimate.

This is conservative because the estimates are based on the differences between the candidates, not the absolute magnitude of their delegate takes in each contest.

In this revision, then, I put Clinton’s expected future performance in Clinton favoring states as a 30% reduction in the spread, and Sanders’s expected future performance in Sanders favoring states as a 300% increase in spread. (Notice the asymmetry emerges here.)

Those sound like really different numbers, but they are not. The typical predicted Sanders win is small, so the total number of extra delegates Sanders ends up with is pretty similar in the two kinds of states.

When I do this, Clinton still wins. See the chart at the top of the post. But, there are three very important things to note.

First, this is too close to call. If this Sanders II strategy works out over the next few contests, and we believe it is the New Normal for this primary season, then it will simply be impossible to say who will win. The outcome here is very close, and had I used just slightly different numbers, I could have come up with an equally close outcome with Sanders winning.

Second, it is possible, depending on what happens with uncommitted delegates, that if the race is this close, there could be a brokered convention. I actually think this is unlikely, because in order to have that happen you probably need three or more candidates staying in it until the end, so a bunch of delegates are bound to vote for someone other than the two front runners. But, I’ve not looked at the numbers and the data and the rules closely enough to be sure. Consider it something to look into.

Third, the role of the big states now emerges as more important than it was before. The really big states, including New York, Pennsylvania, California, and New Jersey, were actually all very close in the model, and frankly, I can’t tell if they are Sanders vs. Clinton favored states. This is for a good reason. These states are so large that they are internally fairly diverse, and also, not easily affected by odd rules in the primary or caucus process the way some other states are. The apparent bimodality of states in general applies mainly to the smaller states.

Putting this another way, the larger the state, the closer to the national average response we see, and the national preference between the two candidates is similar. Smaller states stray away from the mean, larger states regress towards the mean. Like this:

Screen Shot 2016-03-23 at 11.29.35 AM

So what does this mean? This means that larger states are not going to break strongly for either candidate. But, it also remains true that there are a lot of delegates in these states. So, this could mean that a strategy that effectively focuses on the big states, or one or two of them, could push that state over to one side or another.

I can make you this promise. Both campaigns are currently having this conversation and there will be intense campaigning in the big states. It is possible, maybe probable, that the candidates will watch each other doing this and end up differentiating, with the different states being focused on by different candidates. But, there are also states neither will give up. I suspect New York and California will be fought over heavily, while Clinton may give way to Sanders in Pennsylvania and Sanders may give way to Clinton in New Jersey.

The cycle over the last several weeks has been to see Sanders as possibly moving closer to Clinton, but then, failing to do so. But this week, he did. And, this is the first week in a series of contests where elements of the stated or implied Sanders strategy are supposed to come into play. And maybe they did. Or maybe not.

Frustratingly, the next several states are not going to be too informative. Washington is big, and Sanders will probably make big gains there. My main model, which I will continue to assume is the most accurate projection until proven otherwise, has Sanders getting ten more delegates there than Clinton. The revised Sanders II concept, in contrast, has him getting 30 more delegates than Clinton. That will be a test of the Sanders II hypothesis.

Then, eventually, comes New York, where we will see a test of the Too Big To Fail In State strategies. My model has Clinton winning in New York by just a few delegates, and the Sander II model says pretty much the same (remember, it is conservative, addressing only the gap). If New York is close to a draw, as predicted, then we will be left wondering. If Sanders takes 20 or more more delegates than Clinton in New York, then we will be left in wonderment.

Following that is Little Big Tuesday, with several small states and Pennsylvania. That should also be close to a draw, according to my primary model, with Clinton winning a few more delegates than Sanders. But the Sanders II model has Sanders winning not just a few more, but many more delegates.

According to the Sanders II model, at the end of the day on Tuesday, April 26h, after Pennsylvania, a ca 320 delegate lead by Clinton will be cut to a 190 delegate lead. According to the main model, the one I still trust until proven otherwise (perhaps over the next few weeks), the Clinton lead will still be over 300.

So, that’s my story, and I’m sticking to it. Both of them. For now.

Democratic Primaries in Arizona, Utah, and Idaho: Sanders is still in the race

This post was written in two parts, pre-primary and post-primary. To see the result and a discussion of what they mean, skip down to the last part of the post, where I’ll discuss why Tuesday’s results may mean that Sanders could win the primary.


As already discussed, Clinton is likely to win the Democratic nomination. Sanders is too far behind to catch up without extraordinary results, as outlined here. However, it is also true that Sanders is likely to win a majority of contests from here on out, while at the same time, Clinton is likely to win many (if not most?) of the actual delegates.

Here, I’ll review what my recently upgraded predictive model indicates for today’s primaries in Arizona, Idaho, and Utah. Also I’ll provide a list of states and delegate counts for the upcoming primaries (including today’s) that would have to be realized for Sanders to catch up to Clinton. At the end of the post, you’ll find results of today’s primaries, and some discussion, when available.

First, the expected outcome of today’s primaries based on this model:

Screen Shot 2016-03-22 at 9.59.26 AM

Clinton is expected to win big in Arizona, while Sanders is expected to squeak by in Idaho and win handily in Utah. The total delegate count for the day would be 82:49, Clinton:Sanders, so if this model is accurate, Clinton will win the day. As I’ve noted before, this model tends to under-predict Sanders’ wins when he does win, so the delegate count could be closer.

Or, this could be totally wrong and Sanders does much better, which would require me to go back to the drawing board. Which, of course, I’ll do.

In order for Sanders to catch up to Clinton, he’ll have to do much better than he’s done, even given the fact that he is favored in a lot of upcoming states. If we take all the upcoming states together and simply give Sanders even wins across the states sufficient to tie Clinton on the last day of contests, then he’ll need to win Arizona 44:31, Idaho 14:9, and Utah 19:14.

Here’s a chart of the outcomes across all states for Sanders and Clinton to finish the primary season in a tie.

Screen Shot 2016-03-22 at 10.03.27 AM

This is, of course, totally unrealistic. Sanders would likely do much better in some places, and just OK in others. But this chart serves as a basis of comparison for future races.

Every primary or caucus is a test of a hypothesis. The hypothesis that Clinton will do what I suggested she will do here is being tested by today’s contests. If Clinton gets somewhere around 75 to 89 delegates, the hypothesis is not rejected. If Sanders manages to perform much better than 49 delegates, say, over 62 or so, then the hypothesis has to be rejected (I’m not being formal here with rejection levels) and the possibility of him catching up has to be re-evaluated. If, of course, Sanders gets fewer than 40 or so delegates today, than he will have an even steeper uphill battle for the rest of the primary season.

I’ll add more information and commentary below after we get results!


OK, it is early the next morning and we have results, but the results are not entirely complete. Because of oddness in the way delegates are assigned, it is often the case that the votes are counted, the primary results published, but the delegate allocation incomplete. Texas took forever, it seems, to post its actual delegate count, for example. All three states that had contests yesterday have incomplete delegate counts, even though we know how people voted. Proportional representation applies in all three states but things are not so simple.

For example, in Arizona, a certain number of delegates are eventually (in April) selected at the congressional district level, a certain number are at large, a certain number are linked to party officialdom, and a certain number are linked to constitutional office. Delegates are selected at different times (most at the convention in April). There is a right of review (by the candidates) of some of these delegates. Some delegates are committed to vote a certain way on the first ballot at the national convention, some are uncommitted. There are threshold effects whereby certain delegates may not be assigned if the threshold is not met in the preference ballot (I think … this part confuses me). A delegate is both a number (i.e., 10 delegates for Mary and 10 Delegates for Sam) and a person (Joe Bleaugh will go the National Convention as a delegate). There are lists of delegates (as in number as well as personage) and the exact number of “delegates” that might be on that list depends on … all of the above.

And that is the simple version of it. The rules are 18 pages long. Arizona is not unusual. Anyway, Arizona has 75 pledged delegates, of which 63 are counted as pledged (though they don’t exist yet as people) now, but the rest will be eventually. In the end, the allocation will be close to proportional, but because of the precinct and district level math, and other things, the exact number for each candidate probably can’t be known at this time.

So, given all that, I’m going to go out on a limb and say that in Arizona, Clinton will have 44 pledged delegates, and Sanders will have 31. This is not the same number you will see reported, because several delegates are listed as “available” for reasons cited above.

Using the same method, Idaho will award 5 delegates to Clinton and a whopping 18 delegates to Sanders. This is close to the reported amount, but off by one. I’ll attribute that to the Washington Post’s rounding error.

Meanwhile, Utah has reported delegates nice and clean like, straight shooters that they are, and we have 18 assigned to Sanders and 5 assigned to clinton.

My model predicted that Sanders would win Utah and Idaho, and he did. My model predicted (along with everyone else in the country) that Clinton would win Arizona.

However, the numbers are different than expected. Sanders did much better than my model suggested and better than mainstream media expected.

My model had predicted that Clinton would walk away from yesterday’s contests with 82 delegates to Sanders’ 49 delegates. Instead, depending on rounding and other factors, Clinton will have 54 and Sanders 67.

This means that Sanders is walking away from Tuesday’s contests with more delegates than Clinton instead of the other way around.

I’ve stated several times that Sanders has to average 60% of the take for the rest of the contest in order to tie clinton. He didn’t do that this time, he only got 55%. But that is 55% on a day when the largest contest, Arizona, was expected to go very favorably towards Clinton. In other words, because of the variation across primaries noted above in the discussion of “what Sanders needs to do to tie Clinton,” Sanders may have actually done what he needs to do this week.

You see, Sanders is expected to get about 48% of the votes here on in, with Clinton at about 52%. He needs to achieve a seemingly unlikely 60%. Last night, he got 55%. My model suggested that last night he’d get less than the overall expected, by a tiny a mount (46.7%) but he got much more.

Sanders is expected to win many of the upcoming states. Hawaii and Alaska are next, and I have no idea what will happen in Hawaii. But he will likely win Alaska, then Washington, then Wisconsin, then Wyoming and, I think, New York. My current prediction is that he’ll take about 57% of the delegates trough that period, but if he performs better during that time than expected at the same level as yesterday, he could easily exceed the required 60% return and move significantly toward catching up to Clinton.

All I can say is that I like both candidates a lot and will be happy with either one. If you are supporting either of these candidates, I hope you keep in mind that it is still possible that the other candidate, the one you don’t support, whoever that is, may win. Vote blue no matter who!

Sanders can win the nomination: New Analysis

I developed a predictive model for the Democratic primaries that was designed to have the following features:

1) It does not rely on polling;

2) It does use exit polling and other information to set certain parameters;

3) It mainly uses prior primary or caucus results to predict the future, and thus assumes that the status quo is the best indicator.

4) It calculates likely voting patterns based on ethnicity (White, African American, Hispanic), and using likely Democratic party distribution among these groups to predict each contest’s outcome.

That method outperformed most other predictions for Super Tuesday and accurately predicted who would win in the four contests held over the last weekend. However, in states that Sanders won last weekend, and in at least two of the Super Tuesday results, the method underestimated how well Sanders would do. Notably, the numbers used to predict those primaries accurately predicted how Clinton would do in Louisiana, and generally.

In other words, mostly, where Clinton won, the model was accurate, but where Sanders won, Sanders did better than expected, not counting “favorite son” states where he did even better.

The most likely reason for the difference between prediction and reality over last weekend, since this is a status quo poll, is a change in voting patterns. In other words, it is possible that Sanders is picking up some momentum. That does not explain why the largest of the primaries, Louisiana, fit the predicted pattern while the others do not.

A second possibility is that Sanders outperforms expectations in caucus states. That seems almost certainly a factor, which I can not explain.

A third possibility is crossover voting or independents favoring Sanders in some, but not all, states. If Republicans are voting in the Democratic contest, or independents are showing up at the Democratic events, specifically because they want to vote for Sanders, that could explain a localized Sanders surge. This does not do well explaining last weekend’s results, because Sanders won in closed caucuses. But, it could explain some earlier results, such as Massachusetts and Minnesota. I know for a fact that some Republicans and a lot of “independents” (as in, “I never did this before, see how independent I am”) voters showed up in the Minnesota caucus. The question remains, of course, where were these voters in Louisiana?

One explanation for this may be that the indies and centrists in more conservative southern states, which also happen to have a lot of pro-Clinton African American voters, are mostly registered Republicans or chose to participate in the Republican rather than Democratic process, while similar voters in less conservative or liberal states were already more likely to be Democrats or to at least participate this year in the Democratic primaries or caucuses. Differences in voter turnout across states seem to conform to this pattern.

Last weekend barely added enough data to consider revising the model. Assuming that the status quo method still works, but with somewhat adjusted numbers to match Sanders wins so far, and combining projections into the future with primary results so far, this model now puts Sanders on top at the very end of the primary process, like this:

Screen Shot 2016-03-07 at 9.57.31 AM

I quickly add that I don’t have a lot more confidence in this projection than the previously developed projection that has Clinton winning. But this new projection is important because it accounts for what might be recent changes in how people are voting.

Michigan’s primary, to be held tomorrow, is important. Michigan is relatively diverse, and is northern (less conservative, etc.). The modified model predicts that Sanders will swamp Clinton in Michigan, picking up over 70 delegates to Clinton’s low-fifties. In contrast, the previous iteration of the model predicts that Clinton will win with about 66 delegates and Sanders will pick up a healthy 60 or so.

Michigan’s contest is a primary, not a caucus, but it is open, so cross-party activity is possible.

Michigan will be a test between the two models, the older one that ultimately favored Clinton, and the revised (but far less certain) one that suggests that Sanders could eek out a victory.

Michigan plus last weekend’s contests combined will give me enough data to produce The Model of Models which will accurately predict the outcome of primaries coming up in Florida, Illinois, Missouri, North Carolina and Ohio. Or not. We’ll see. It is possible that I’ll add an element to the model, using one set of assumptions for red states, another set for blue states.

One week after Michigan, Son of Super Tuesday happens. If either one of the candidates is very strong on that day, that may finish off the other candidate. The actual number of committed delegates is not too different between the two candidates, and the so-called “Super Delegates” will probably be obligated to go with whoever enters the Convention with the most delegates.