Bernie Sanders has either stated or implied two features that make up his strategy to win the Democratic nomination to be the party’s candidate for President this November.

Implied, sort of stated: Convince so-called “Superdelegates” (properly called “uncommitted delegates”) in states where he has won to vote for him, even if he is in second. That is a good idea, and if the two candidates are close, it could happen. However, when I run the numbers, giving Bernie “his” uncommitted delegates and Hillary “her” uncommitted delegates, it is pretty much a wash. The uncommitted delegates are not perfectly evenly distributed across the various voting units (states and such) but they are evenly enough distributed that not much happens. Not that this can’t come into play when Spooky Delegate Math is applied, but there isn’t much there.

Stated, the other part of the strategy: Get more votes. The idea here is that the second half of the primary season (counted in terms of numbers of delegates awarded over time), which started on March 22nd, is more favorable to Sanders than it is to Clinton.

Earlier work I did showed that this strategy has only a small chance of working, because Clinton will in fact win plenty of delegates during this second half of the season, and she has plenty of delegates under her belt now. Bernie just can’t catch up. See this post for details.

Unless …

As I have said many times, each primary or caucus, or each day on which there are a number of contests at once, is a test of one or more hypotheses. One hypothesis at stake last Tuesday was the accuracy of the model noted above. The various iterations and updates of my models for predicting primary outcomes have been very accurate all season, and I accurately predicted the outcome of Tuesday’s primary in terms of wins. I predicted that Hillary would win Arizona, and Bernie would win Idaho and Utah, and they did.

However, the magnitude of the predictions was off. Hillary won fewer votes than expected in Arizona and Bernie did way better in Utah and Idaho than predicted. (Also, the role of crossover voting was reduced as a likely factor in these elections, because Bernie did so well in Arizona with no crossovers.)

The difference in magnitude was so great that the seemingly assured Clinton victory in delegate count was turned on its head, and Sanders got more delegates than Clinton.

Is that a wakeup call? Or is it random variation?

Well, let’s assume for a minute that this is not random, and that this small set of contests tells us that the model is fundamentally wrong(ish). One thing I could do to fix that is to add the new data into the multivariable model and recalculate, but the number of new data points is insufficient to make a difference.

Another thing I could do is to assume that there is change over time in voting behavior, and add a variable for time. There are two reasons to not do that. One is that the more variables you add, the more accurately the model can predict the past (i.e, predict the value of the variables that are used to make the model), but not necessarily the future. The second reason is that if time is in fact a variable, simply adding it now would not work because of imbalance over time in sample size for the relevant variable.

So, what to do? Well, a third possibility is to fudge the data. Let us take a chance and provisionally assume that Arizona, Utah, and Idaho indicate that from here on in the expected outcomes based on my model are off by a certain amount, and then adjust future states to reflect that.

I quickly add that I’ve done this before … fudging the model to see if a Sanders claim about future outcomes might change the numbers … and each time that new hypothesis was falsified by subsequent primaries. But, why not try it again? The numbers from yesterday’s contests are startling enough to make it, actually, necessary, if one wants to remain honest about what is happening on the ground.

I have felt all along, and still feel, and most people agree with this, that there are two kinds of states, those that tend to favor Bernie and those that tend to favor Hillary. Also, the variables used in the multivariable analysis may have asymmetries across the nearly-even-state boundary of bias. (In fact I’m pretty sure they do.) So, let’s consider Arizona as a Clinton-favoring state in which she underperformed a certain amount that we estimate by comparing the expected results with the actual results. Let us also assume that Utah and Idaho are Sanders-favoring states in which he over performed by an amount that we can similarly estimate.

This is conservative because the estimates are based on the differences between the candidates, not the absolute magnitude of their delegate takes in each contest.

In this revision, then, I put Clinton’s expected future performance in Clinton favoring states as a 30% reduction in the spread, and Sanders’s expected future performance in Sanders favoring states as a 300% increase in spread. (Notice the asymmetry emerges here.)

Those sound like really different numbers, but they are not. The typical predicted Sanders win is small, so the total number of extra delegates Sanders ends up with is pretty similar in the two kinds of states.

When I do this, Clinton still wins. See the chart at the top of the post. But, there are three very important things to note.

First, this is too close to call. If this Sanders II strategy works out over the next few contests, and we believe it is the New Normal for this primary season, then it will simply be impossible to say who will win. The outcome here is very close, and had I used just slightly different numbers, I could have come up with an equally close outcome with Sanders winning.

Second, it is possible, depending on what happens with uncommitted delegates, that if the race is this close, there could be a brokered convention. I actually think this is unlikely, because in order to have that happen you probably need three or more candidates staying in it until the end, so a bunch of delegates are bound to vote for someone other than the two front runners. But, I’ve not looked at the numbers and the data and the rules closely enough to be sure. Consider it something to look into.

Third, the role of the big states now emerges as more important than it was before. The really big states, including New York, Pennsylvania, California, and New Jersey, were actually all very close in the model, and frankly, I can’t tell if they are Sanders vs. Clinton favored states. This is for a good reason. These states are so large that they are internally fairly diverse, and also, not easily affected by odd rules in the primary or caucus process the way some other states are. The apparent bimodality of states in general applies mainly to the smaller states.

Putting this another way, the larger the state, the closer to the national average response we see, and the national preference between the two candidates is similar. Smaller states stray away from the mean, larger states regress towards the mean. Like this:

So what does this mean? This means that larger states are not going to break strongly for either candidate. But, it also remains true that there are a lot of delegates in these states. So, this could mean that a strategy that effectively focuses on the big states, or one or two of them, could push that state over to one side or another.

I can make you this promise. Both campaigns are currently having this conversation and there will be intense campaigning in the big states. It is possible, maybe probable, that the candidates will watch each other doing this and end up differentiating, with the different states being focused on by different candidates. But, there are also states neither will give up. I suspect New York and California will be fought over heavily, while Clinton may give way to Sanders in Pennsylvania and Sanders may give way to Clinton in New Jersey.

The cycle over the last several weeks has been to see Sanders as possibly moving closer to Clinton, but then, failing to do so. But this week, he did. And, this is the first week in a series of contests where elements of the stated or implied Sanders strategy are supposed to come into play. And maybe they did. Or maybe not.

Frustratingly, the next several states are not going to be too informative. Washington is big, and Sanders will probably make big gains there. My main model, which I will continue to assume is the most accurate projection until proven otherwise, has Sanders getting ten more delegates there than Clinton. The revised Sanders II concept, in contrast, has him getting 30 more delegates than Clinton. That will be a test of the Sanders II hypothesis.

Then, eventually, comes New York, where we will see a test of the Too Big To Fail In State strategies. My model has Clinton winning in New York by just a few delegates, and the Sander II model says pretty much the same (remember, it is conservative, addressing only the gap). If New York is close to a draw, as predicted, then we will be left wondering. If Sanders takes 20 or more more delegates than Clinton in New York, then we will be left in wonderment.

Following that is Little Big Tuesday, with several small states and Pennsylvania. That should also be close to a draw, according to my primary model, with Clinton winning a few more delegates than Sanders. But the Sanders II model has Sanders winning not just a few more, but many more delegates.

According to the Sanders II model, at the end of the day on Tuesday, April 26h, after Pennsylvania, a ca 320 delegate lead by Clinton will be cut to a 190 delegate lead. According to the main model, the one I still trust until proven otherwise (perhaps over the next few weeks), the Clinton lead will still be over 300.

So, that’s my story, and I’m sticking to it. Both of them. For now.