Tag Archives: Prediction

Who will win the remaining Democratic primaries?

As you know, I’ve been running a model to predict the outcomes of upcoming Democratic Primary contests. The model has change over time, as described below, but has always been pretty accurate. Here, I present the final, last, ultimate version of the model, covering the final contests coming up in June.

Why predict primaries and caucuses?

Predicting primaries and caucuses is annoying to some people. Why not just let people vote? Polls predict primaries and caucuses, and people get annoyed at polls.

But there are good reasons to make these predictions. Campaign managers might want to have some idea of what to expect, in order to better deploy resources, or to control expectations. But why would a voter who is not involved in a campaign care?

I had a very particular reason for working on this project, of predicting primaries and, ultimately, the course of the Democratic race for the Democratic nomination as a whole. When this campaign started, there were several candidates, and they all had positive and negative features. Very early in the process, all but two candidates dropped out, and I found myself liking both of them, though for different reason. I would have been happy supporting either Hillary Clinton or Bernie Sanders.

Personally I believe that it is good to vote, during a primary, for the person you like best in direct comparison among the other candidates. But at some point, it may be wise to support the one you feel is most likely to win. There are two closely related reasons to do this, and I think most observers of the current campaign can easily understand them. One is to help build momentum for the candidate that is going to win anyway. The other is to limit the damage that is inevitable during a primary campaign as the candidates fight it out.

So, early on in the process, I decided to see if I could produce a reliable method to predict the final outcome of the primary process, in order to know if and when I should get behind one of the candidates. That is the main reason I did this. In order for this method to meet this and other goals, it had to be more accurate than polls.

There are other reasons. One is that it is fun. I’ve been doing this in primaries and general election campaigns for quite a few elections. I like data, I like analyzing data, I like politics, I like trying to understand what is going on in a given political scenario. So, obviously, I’m going to do this.

Another reason is to test the idea that the voters are changing their minds over time. In order to do this one might use all the primaries and caucuses to date to predict future primaries and caucuses, and then, if the predictions go out of whack, you can probably figure that something new is going on. This relates to overall feelings among the electorate as sampled by each state, but it also relates specifically to ideas about why a particular state reacted to the campaigns the way it did.

An example of this came up recently when Bernie Sanders won in West Virginia. My model had predicted a Sanders win there, and the actual vote count was very close to the prediction. Since that prediction was based on voter behavior across the country to date, I was confident that nothing unusual happened in West Virginia. But, something unusual should have happened there, according to some conceptions of this campaign.

The economy of West Virginia is based largely on coal mining, and there are lot of Democrats there. (Democrats in local elections; they tend to vote for Republicans in the general.) So, it was thought that the voters would pick a candidate based on a perceived position on climate change and coal. Clinton went so far as to pander to the West Virginians with a rather mealy mouthed comment about how we could still keep mining coal as long as we figured out a way to have it not harm the environment. That was the Clinton campaign doing something about the coal mining vote. Others thought that a Sanders win there would indicate that he somehow managed to get a strong climate change message across to coal miners. That idea is a bit weak because when it comes down to it, Clinton and Sanders are not different enough on climate change to be distinguished by most voters, let alone coal supporting voters. In any event, the win there by Sanders was touted as a special case of a certain candidate bringing a certain message to certain voters. But, he then lost in the next coal mining state over, Kentucky, and in both states the percentage of voters that picked Clinton and Sanders was almost exactly what my model predicted, and that model was not based on climate change, coal, or perceptions or strategies related to these things, but rather, on what voters had been doing all along.

So, nothing interesting actually happened in West Virginia. Or, two interesting things happened that cancelled each other out perfectly. Which is not likely.

In short, the closeness of my model to actual results, and the lack of significant outliers in the overall pattern (see below), seems to indicate that the voters have been behaving the same way during the entire primary season, by and large. This is a bit surprising when considered in light of the assumption that Sanders would take some time to get his message across, and pick up steam (or, I suppose, drive people over to Clinton) over time. That did not happen. Democratic voters became aware of Sanders and what he represents right away, and probably already had a sense of Clinton, and that has not changed measurably since Iowa.

How does this model work?

For the first few weeks of this campaign I used one model, then switched to an entirely different one. Then I stuck with the second model until now, but with a major refinement that I introduce today. The reason for using different models has to do with the availability of data.

All the models use the same basic assumption. Simply put, what happened will continue to happen. This is why I sometimes refer to this approach a a “status quo model.” I don’t use polling data at all, but rather, I assume that whatever voters were doing in states already done, their compatriots will do in states not yet done. But, I also break the voters down into major ethnic groups based on census data. So, for each state, I have data dividing the voting populous into White, Black, Hispanic and Asian. These racial categories are, of course, bogus in many ways (click on the “race and racism” category in the sidebar if you want to explore that). But as far as American voters go, these categories tend to be meaningful.

The fist version of the model used exit polling (ok, so I did use that kind of polling for a while) to estimate the percentage of black voters who would prefer Sanders vs. Clinton. I used the simple fact that in non-favorite son states that were nearly all white Clinton and Sanders essentially tied to estimate the ratio of preferences for white votes at about even. I ignored Hispanic and Asian voters because the data were unavailable or unclear.

This model simply simulated voters’ behavior (in the simplest way, no randomization or multiple iterations or anything like that). I also used some guesses (sort of based on data) of the ethnic mix for Democrats specifically in so doing. That somewhat clumsy model worked well for the first several primaries, but then, after Super Tuesday there were (sort of) enough data points to use a different, superior method.

This method simply regressed the outcome of the primary (in terms of one candidate’s percentage of the vote) against the available ethnic variables by state. Early on, the percentage of Hispanic or Asian did not factor in as meaningful at all, and White and Black together or White on its own did not work too well. What gave the best results was simply the precent of African Americans per state.

“Best results,” by the way, is simply measured as the r-squared value of the regression analysis, which can be thought of as the percentage of variation (in voting) explained by variation in the independent variable(s) of ethnicity.

Primaries vs. Caucuses and Open vs. Closed

Many things have been said about how each of the two candidates do in various kinds of contests. We heard many say that Sanders does better in Caucuses, or that Clinton does better in closed primaries. During the middle of the primary season, I tested that idea and found it wanting. Yes, Sanders does well in caucuses, but the ethnic model predicts Sanders’ performance much better than the caucus-no caucus difference. It turns out that caucusing is a white people thing. There are no high diversity states where caucusing happens. It is not the caucus, but rather the Caucasian, that gives Sanders the edge.

This graph shows how Sanders vs. Clinton over-performed in caucuses vs. primaries.


The value plotted is the residual of each contest in relation to the model, or how far off a theoretical straight line approximating the pattern of results each contest was. Two things are apparent. One is that caucuses are less predictable than primaries. The other is that while Sanders did over-perform in several caucuses, this was not a fixed pattern.

This graph shows the residuals divided on the basis of whether the contest was open (so people could switch parties, or engage as an independent) vs closed (more restricted).


Open contests were more variable than closed contests, but it is not clear that either candidate did generally better in one or the other.

After many primaries and caucuses were finished, there became enough data to use the kind of contest as a factor in conducting the regression analysis. There are a lot of ways to do this, but I chose the simplified brute force method because it actually gives cleaner, and more understandable, results.

I simply divided the sample into the kind of contest, and then ran a multivariable regression analysis with each group, with the percent of Sanders plus Clinton votes cast for Clinton as the dependent variable, and the percentage of each of the four ethnic categories as the independent variables. There are some combinations of caucus-primary and open-closed/semi-open/semi-closed that are too infrequent to allow this. For those contests, I simply developed a regression model based on all the data to use to make a prediction in each of those states. The results, shown below, use this method of developing the most accurate possible model.

How does this sort of model actually make a prediction?

The actual method is simple, and most of you either know this or don’t care, but for those who would like a refresher or do care…

The regression model, using multiple variables, produces a series of coefficients and an intercept. You will remember from High School algebra that the formula for a line is

Y = mX + b

X is the independent variable, along the x axis, and Y is what you are trying to predict. m is the slope of the line (a higher positive number is a steeply upward sloping line, for example) and b is the point where the line crosses the Y axis.

For multiple variables, the formula looks like this:

Y = m1(X1) + m2(X2) + … mn(Xn) + b

Here, each coefficient (m1, m2, up to mn) is a different number that you multiply by each corresponding variable (percent White, Black, etc.) and then you add on the intercept value (b). So, the regression gives the “m’s” and the ethnic data gives the “X’s” and you don’t forget the “b” and you can calculate Y (percent of voters casting a vote for Clinton) for any given state.

So, enough already, who is going to win what primary when?

Not so fast, I have more to say about my wonderful model.

How have the public opinion polls done in predicting the contests?

Everybody hates polls, but like train wrecks, you can’t look away from them.

Actually, I love polls, because they are data, and they are data about what people are thinking. The idea that polls are inaccurate, misleading, or otherwise bogus is an unsubstantiated and generally false meme. Naturally, there are bad polls, biased polls, and so on, but for the most part polls are carried out by professionals who know what they are doing, and I promise that those professionals are aware of the things you feel make polls wrong, such as the shift from landlines to cell phones.

Anyway, polls can be expected to be reasonable predictors of election outcomes, but just how good are they?

Looking at a number of races today, excluding only a few because there were no polls, I got the Real Clear Politics web site averages for polls across the states, transformed those numbers to get a percentage of the Sanders + Clinton vote that went to Clinton, and plotted that with the similarly transformed data from the actual primaries and caucuses. The r-squared value is 0.52443, which is not terrible, and the graphic shows that there is a clear correlation between the two numbers, though the spread is rather messy.


The ethnic status quo model outperformed polls

My model is actually many models, as mentioned. I have a separate regression model for each of several kinds of primary, including Closed Caucuses, Closed Primaries, Semi-Closed Primaries, and Open primaries. I did not create separate models for the much rarer Semi-Open Primary, Semi-Open Caucus or Open Caucus style contests, as each of these categories had only one or a few states. Rather, the model used to calculate values for these states is derived from all the data, so addressing specific quirkiness of each kind of contest is sacrificed for large sample size.

I also generated models that included White, Black, Hispanic, and Asian; each of these separately; and various combinations of them. As noted above, the best single predictor was Black. Hispanic and Asian were very poor predictors. White was OK but not as good as Black. But, combining all the variables worked best. That is not what usually happens when throwing together variables. It is more like mixing water colors, you end up with muddy grayish brown most of the time. But this worked because, I think, diversity matters but in different ways when it comes in different flavors.

When the total data set was analyzed with the all-ethnicity model, that worked well. But when the major categories of contest type was analyzed with the all-ethnicity model, some of the data really popped, producing some very nice r-squared values. Closed caucuses can not be predicted well at all (r-squared = 0.2577) while Open Caucuses perform very well (over 0.90, but there are only a few). The most helpful and useful results, though, were for the closed primary, open primary, and Semi=closed primary, which had R-squared values of 0.69, 0.61, and 0.74, respectively.

What this means is that the percentage of the major ethnic groups across states, which varies, explains between about 61 and 74% of the variation in what percentage of voters or caucusers chose Clinton vs. Sanders.

Polls did not do as well, “explaining” only about half the variation.

So, the following graph is based on all that. This is a composite of the several different models (same basic model recalculate separately for some of the major categories of contest), using nominal ethnic categories. The model retrodicts, in this case, the percentage of the vote that would be given to Clinton across races. Notice that this works very well. The few outliers both above and below the line are mainly caucuses, but the are also mainly smaller states, which may be a factor.


Who will win the California, New Jersey, Montana, New Mexico, North Dakota, South Dakota, and D.C. primaries?

Clinton will win the California, New Jersey, New Mexico and D.C. Primaries. Sanders will win the Montana, North Dakota, and South Dakota primaries. According to this model.

The distribution of votes and delegates will be as shown here:


This will leave Sanders 576 pledged delegates short of a lock on the convention, and Clinton 212 pledged delegates short of a lock on the convention. If Super Delegates do what Sanders has asked them to do, to respect the will of the voters in their own states, then the final count will be Sanders with 2131 delegates, and Clinton with 2560 delegates. Clinton would then have enough delegates to take the nomination on the first ballot.

In the end, Clinton will win the nomination on the first ballot, and she will win it with more delegates than Obama did in 2008, most likely.

Who Will Win The New York Democratic Primary?

As you know, I’ve been applying a model to predict the outcome of each of the Democratic Primary contests, and have done pretty well at predicting results.

All of the future contests are primaries, not caucuses. It turns out that the two modes have very different patterns. Many have suggested that this has to do with how the process works, and somehow caucuses, or open contests, favor Sanders, who has won several. However, it also turns out that caucusing is a northern thing (and Sanders does somewhat better in the north, or more accurately perhaps, rarely wins in the south). Caucusing is also a white thing, apparently. Caucuses happen in non-southern mostly white states, and these are states that Sanders can (but does not always) win.

Since the remainder of the contests are primaries, I used my simple ethnic-based model, which predicts the outcome of the various contests based on the estimated percentage of African American voters. I used only data from previous primaries to develop a simple linear model. This model applied to all of the future contests, starting with New York, tells us that Clinton will win in New York.

After that, Sanders wins in several smaller and mostly norther states, but also, California . Clinton wins in Pennsylvania and New Jersey, which are relatively large. If this plays out as predicted, between now and the end of the primary season, Hillary Clinton will pick up about 795 delegates and Sanders will pick up about 778 delegates.

How many delegates does each candidate have so far? Clinton has approximately 1310 and Sanders approximately 1094. (This is approximate because in some states it is actually a little hard to count because of the nature of the system.)

Here is a table showing all of my projections from here on out. I’ll probably redo the model a few more times, especially if anything unexpected happens, so stay tuned.


The Rest of the Democratic Primary

We are in the Primary Doldrums. For the last several days and the next several days, there is not too much happening, big gaps between the action. Wisconsin is important, and it is Tuesday, Then Wyoming by itself, then New York by itself, then a sort of Super Tuesday with several states.

As you know I’ve created a multivariable model that has a good record of predicting primary and caucus outcomes in the contests between Hillary Clinton and Bernie Sanders. For the rest of the primary season, this is what it looks like.

Screen Shot 2016-04-02 at 12.49.47 PM

I used yellow highlighting to indicate who is expected to win the most delegates on each primary/caucus day. Sanders will do well in Wisconsin, tie (or maybe even better) in Wyoming, do well in Indiana, and on balance, do well on June 7th when there will be six contests at once including Pennsylvania. But while Sanders may win the day on three (or four) days, Clinton will win the day on five. In total, Clinton is predicted to take 886 delegates, and Sanders 790.

This is the distribution of cumulative delegates starting with now and moving across this range of primary dates, showing the evolution of the difference between the two candidates throughout.

Screen Shot 2016-04-02 at 12.50.32 PM

On balance, Clinton will, according to this model, will widen her lead over Sanders. If Sanders does better than projected this gap will narrow, but he’ll have to do very well to close the gap.

Clinton Likely To Win Democratic Party Nomination

Almost exactly 50% of the votes have been cast in the Democratic Party primary and caucus process. I’ve been updating a model to predict primary and caucus results all along, and the model has done fairly well. The most recent update, however, was a bit off. That update involved separating states into two groups, southern vs northern, then calculating different sets of likely voting patterns by ethnicity for those two groups, and integrating that with estimates of ethnic distribution (“white, black, hispanic”) among Democratic voters by state.

What I did not do in those models was to incorporate the effect of whether or not a primary or caucus is open, closed, or somewhere in between.

Now that we have had quite a few primaries and caucuses, it is possible to move to a somewhat more sophisticated model, because there is (probably) enough data.

I ran a multi-variable regression analysis that coded primary openness (0=closed, 1=semi open, 2=open) and whether or not a state is southern or not southern, then included the percent of each ethnic group by state.

The result indicated that the percent of a voting group (by state) that is hispanic did not influence the result. In doing the analysis I looked only at states, and excluded Vermont and New Hampshire because of the strong favorite son effect. The resulting model, naturally, predicts the number of delegates that have already been awarded to each candidate, in total, precisely, for the simple reason that the model is based on that number. Within the data set, the R-squared value is 0.83, which is pretty good. This means, roughly, that 83% of the variation in voting (by percent who voted for each candidate) is explained by those variables. The following table shows the actual delegates won vs. the delegates predicted by the model.

Screen Shot 2016-03-16 at 11.07.20 AM

Also indicated is the spread between the two candidates in percent. The spread starts off a bit wonky because there are only a few contests, but then settles in to about 20% and remains at that level. Not shown is an analysis of the degree to which Sanders performed relative to expectations. If that number changed a lot, showing a trend, this would be important for predicting the future. The first half of the contests show Sanders under performing, according to this model, by 2%, and the last half have him over performing by 2%. So there may be a very low level “surge,” but not enough to make any real difference in the outcome.

So, what does the future look like? There are several states coming up where Sanders is likely to do well. But is it enough to make it likely for him to overtake Clinton? With a 20% spread and half the votes counted, Sanders would have to take an average of 60% of the delegates from here on. That is very unlikely.

The following table shows the primary and caucus outcomes through the present, followed by the predicted delegate commitments for the rest of the primary season. The percent spread between the candidates is indicated, and it does indeed drop over time, though slowly, reaching a minimum of 8% for the last few races.

Screen Shot 2016-03-16 at 11.49.30 AM

The total number of delegates required to lock the nomination is 2,383. There are 717 uncommitted delegates (aka “Super Delegates”). If we assume that all of those uncommitted delegates will simply vote for the majority candidate, then the number of delegates required to have a likely lock on the nomination is 1669. This is not a fully supportable assumption because some of the uncommitted delegates may chose a different path, but it is a reasonable approximation.

The part of the table above marked in yellow indicates the approximate point in time when the leading candidate, Clinton, will get somewhere around 1669 delegates. So, if this model is reasonably accurate, Clinton will achieve a lock about mid May.

The next set of primaries, next week, are Arizona, Idaho, and Utah. In my view, these are somewhat hard to predict. Polls suggest a weak Sanders win in Idaho and a weak Clinton win in Utah. My model predicts a strong Clinton win in Arizona, and Sanders victories in Idaho and Utah. The total number of delegates at stake next week is small (131 in total). In order for Sanders to signal that he can overtake Clinton, he would have to win about 79 delegates in total. If he falls short of that, the rest of the road is more uphill. If he does better than that, then he may be seriously in the running.

Sanders is also expected to do well in the next several races (Alaska, Hawaii, Washington, Wisconsin, and Wyoming) according to my model. However, I don’t actually expect my model to work at all in Hawaii. My model suggests that he may well achieve over 55% of the vote in those primaries, but again, he will have to have already achieved 60% (unlikely) on the 22nd for this to start to accumulate to a catch-up number.

Following Wyoming is New York State followed by Super Tuesday III, six states with 631 delegates. My model suggests he will get less than half of these delegates, though he will do well in Pennsylvania and lose by not much in New York. I’m also predicting that he will win in California, in June, but not by much.

Between now and the end of the race, there are 1946 uncommitted delegates to fight for. Of these, the top five states account for a whopping 1138 delegates. These states are Washington, New York, Pennsylvania, California, and New Jersey. I predict he will come close to even with Clinton or win most of these states (but Clinton will do very well in New Jersey), but in order for Sanders to overtake Clinton by focusing on these states, he’ll have to do VERY well in all or most of them.

This model uses everything that happened before (mostly) to predict everything that will happen in the future. The first half of this series of events is over (in terms of delegate counts) and there is no evidence of any dynamic change occurring at the moment. This model does an excellent job at retrodicting the prior races, but it might slightly underestimate Sanders performance, since for the last half of the retrodicted contests Sanders outperforms the model by an average of 2%. However, in order for him to catch up to Clinton, he has to outperform the model by 10%.

The graphic at the top of the post is the predicted delegate counts for the entire primary season. The already-held contests are represented as predictions instead of actual because the final number (today’s delegate count) is the same for both predicted and actual. There is a slight narrowing of the gap (see table above) but not enough to change the outcome of Clinton achieving a lock on the Democratic Party nomination in May.

How Will Clinton And Sanders Do On Tuesday? (Updated)

Most polls and FiveThirtyEight predict a Clinton blow-out on Tuesday, with her winning all five states, in some cases by a large margin. My model, however, predicts that each candidate will win a subset of these states, but with Clinton still win the day.

I’ve been working on a model to predict primary outcomes for the Democratic selection process, and generally, the model has proved very effective. After each set of primaries I’ve adjusted the model to try to do a better job of predicting the upcoming contests. The most important adjustment is the one that affects the current model.

The model assumes that we can predict voting behavior by ethnicity. Given this assumption, the distribution of potential Democratic participants by ethnic group then gives the final likely division among primary voters or caucus goers across the two candidates, then this translates directly into the division of committed delegates for that state. The estimates of within-group voting are made from exit polls.

The most recent revision divides states into “Southern” (meaning deep south) and “Not Southern,” and uses different sets of numbers for each of the two kinds of states.

To date, about 32% of the committed delegates have been assigned, with 769 for Clinton and 502 for Sanders. Next Tuesday, March 15th, an additional 691 delegates will be committed to the two candidates. So, almost exactly 50% of all the delegates for the entire process will be committed. (None of this counts uncommitted delegates, sometimes called “Super Delegates.”)

If Clinton and Sanders each do about as well as they have done in the past, this will leave Sanders with a significant gap to close, and he probably can’t win the nomination. If Clinton does better, that closes the door to Sanders even more firmly. But, if Sanders does well, that may help close the gap and considering Sanders as a possible nominee is reasonable.

The current model, which has the interesting dual property of giving Sanders more delegates than the polls currently predict, but also, according to my own evaluation of my own model, probably underestimates Sanders’ performance, suggests that Clinton will earn more delegates than Sanders, but not by too much. So, if the underperformance of the model is strong enough, they could come close to a tie. At present, here are my predictions for the outcome of Tuesday’s set of primaries:

Florida: Clinton will win but by less than expected. The outcome will be so close that I can’t rule out a Sanders win here.
Illinois: Sanders will win, but this may be close to a tie.
Missouri: Sanders may win by a small margin. However, keep in mind that it is very difficult to classify Missouri as a “Southern” vs. “not-Southern” state. I picked “Not-Southern” for this prediction. But we’ll see. If Missouri goes all “Southern” then Clinton wins there.
North Carolina: Clinton will win by a very large margin (70-something to 30-something delegates).
Ohio: Sanders will win by a small margin.


Here is the output of the model indicating the expected number of committed delegates to be awarded on Tuesday to the two Democratic candidates:
Screen Shot 2016-03-14 at 2.34.04 PM

If these numbers are close to what happens, or if Sanders does better, then Sanders is still in the race, though with a tough road ahead of him. If, in contrast, the polls turn out to be right, it would indicate that Sanders’ over performance in earlier contests may have been temporary, and the chance of him winning the primary is very small. At present the polls show Clinton way ahead in Florida, Clinton barely ahead in Illinois, a near tie in Missouri, Clinton way ahead in North Carolina, and Clinton a little ahead in Ohio. In other words, I’m suggesting that Sanders will win three out of the five races, while the polls suggest he will one or may be two.

Let’s look at the FiveThirtyEight predictions to see how they compare.

FiveThirtyEight gives Florida to clinton (nearly 100% chance of wining). They predict a strong Clinton finish in the state, about 2:1.

For Illinois, FiveThirtyEight says about the same, a better than 2:1 projected result, with Clinton carrying away a lot of the delegates.

For Missouri, FiveThirtyEight has Clinton probably winning, but not by too much, so only a small pickup for her.

For North Carolina, FiveThirtyEight has Clinton winning just shy of 2:1 over sanders.

For Ohio, FiveThirtyEight predicts a Clinton win, and a fairly strong one.

So we can see that there is a huge difference between FiveThirtyEight’s prediction and mine, and the two methods are very different. Both of the methods used by FiveThirtyEight rely on some combination of opinion or support-related information, while my method uses none of that. For this reason it is not surprising that the two methods produce very different results.

The point of going over the FiveThirtyEight predictions is that they do a very good job of representing the polling data, which overall strongly suggest that Clinton will run away with the nomination. The problem is, these data have been suggesting this since Iowa, and generally speaking, Sanders has far outperformed those estimates.

The final outcome in terms of delegates from all five races will be approximately:

Clinton: ca 364 delegates

Sanders: ca 326 delegates

This will mean that, at the end of the day Tuesday, Hillary Clinton will have about 56% of the committed delegates, to Sanders’ 44%, with about 50% of the committed delegates assigned.

Whom Should I Vote For: Clinton or Sanders?

You may be asking yourself the same question, especially if, like me, you vote on Tuesday, March 1st.

For some of us, a related question is which of the two is likely to win the nomination.

If one of the two is highly likely to win the nomination, then it may be smart to vote for that candidate in order to add to the momentum effect and, frankly, to end the internecine fighting and eating of young within the party sooner. If, however, one of the two is only somewhat likely to win the nomination, and your preference is for the one slightly more likely to lose, then you better vote for the projected loser so they become the winner!

National polls of who is ahead have been unreliable, and also, relying on those polls obviates the democratic process, so they should be considered but not used to drive one’s choice. However, a number of primaries have already happened, so there is some information from those contests to help estimate what might happen in the future. On the other hand, there have been only a few primaries so far. Making a choice based wholly or in part on who is likely to win is better left until after Super Tuesday, when there will be more data. But, circling back to the original question, that does not help those of us voting in two days, does it?

Let’s look at the primaries so far.

Overall, Sanders has done better than polls might have suggested weeks before the primaries started. This tell us that his insurgency is valid and should be paid attention to.

There has been a lot of talk about which candidate is electable vs. not, and about theoretical match-ups with Trump or other GOP candidates. If you look at ALL the match-ups, instead one cherry picked match-up the supporter of one or the other candidate might pick, both candidates do OK against the GOP. Also, such early theoretical match-ups are probably very unreliable. So, best to ignore them.

Iowa told us that the two candidates are roughly matched.

New Hampshire confirmed that the two candidates are roughly matched, given that Sanders has a partial “favorite son” effect going in the Granite State.

Nevada confirmed, again, that the two candidates are roughly matched, because the difference wasn’t great between the two.

So far, given those three races, in combination with exit polls, we can surmise that among White voters, the two candidates are roughly matched, but with Sanders doing better with younger voters, and Clinton doing better with older voters.

The good news for Sanders about younger voters is that he is bringing people into the process, which means more voters, and that is good. The bad news is two part: 1) Younger voters are unreliable. They were supposed to elect Kerry, but never showed up, for example; and 2) Some (a small number, I hope) of Sanders’ younger voters claim that they will abandon the race, or the Democrats, if their candidate does not win, write in Sanders, vote for Trump, or some other idiotic thing. So, if Clinton ends up being the nominee, thanks Bernie, but really, no thanks.

Then came South Carolina. Before South Carolina, we knew that there were two likely outcomes down the road starting with this first southern state. One is that expectations surrounding Clinton’s campaign would be confirmed, and she would do about 70-30 among African American voters, which in the end would give her a likely win in the primary. The other possibility is that Sanders would close this ethnic gap, which, given his support among men and white voters, could allow him to win the primary.

What happened in South Carolina is that Clinton did way better than even those optimistic predictions suggested. This is not good for Sanders.

Some have claimed that South Carolina was an aberration. But, that claim is being made only by Sanders supporters, and only after the fact. Also, the claim is largely bogus because it suggests that somehow Democratic and especially African American Democratic voters are somehow conservative southern yahoos, and that is why they voted so heavily in favor of Clinton. But really, there is no reason to suggest that Democratic African American voters aren’t reasonably well represented by South Carolina.

In addition to that, polling for other southern states conforms pretty closely to expectations based on the actual results for South Carolina.

I developed an ethnic-based model for the Democratic primary (see this for an earlier version). The idea of the model is simple. Most of the variation we will ultimately observe among the states in voting patterns for the two candidates will be explained by the ethnic mix in each state. This is certainly an oversimplification, but has a good chance of working given that before breaking out voters by ethnicity, we are subsetting them by party affiliation. So this is not how White, Black and Hispanic people will vote across the states, but rather, how White, Black and Hispanic Democrats will vote across the state. I’m pretty confident that this is a useful model.

My model has two versions (chosen by me, there could be many other versions), one giving Sanders’ strategy a nod by having him do 10% better among white voters, but only 60-40 among non-white voters. The Clinton-favored strategy gives Clinton 50-50 among white voters, and a strong advantage among African American voters, based on South Carolina’s results and polling, of 86-14%. Clinton also has a small advantage among Hispanic voters (based mainly on polls) with a 57:43% mix.

These are the numbers I’ve settled on today, after South Carolina. But, I will adjust these numbers after Super Tuesday, and at that point, I’ll have some real confidence in the model. But, at the moment, the model seems to be potentially useful, and I’ll be happy to tell you why.

First, let us dispose of some of the circular logic. Given both polls and South Carolina’s results, the model, based partly on South Carolina, predicts South Carolina pretty well using the Clinton-favored version (not the Sanders-favored version), with a predicted cf. actual outcome of 34:19% cf 39:14% This is obviously not an independent prediction, but rather a calibration. The Sanders-favored model predicts an even outcome of 27:26%.

The following table shows the likely results for the Clinton-favored and Sanders-favored model in each state having a primary on Tuesday.
Screen Shot 2016-02-28 at 12.50.21 PM
The two columns on the right are estimates from polling where available. This is highly variable in quality and should be used cautiously. I highlighted the Clinton- or Sanders-favored model that most closely matches the polling. The matches are generally very close. This strongly suggests that the Clinton-favored version of the model essentially works, even given the limited information, and simplicity of the model.

Please note that in both the Clinton- and Sanders-favored model, Clinton wins the day on Tuesday, but only barely for the Sanders-favored model (note that territories are not considered here).

I applied the same model over the entire primary season (states only) to produce two graphs, shown below.

The Clinton-favored model has Clinton pulling ahead in committed delegate (I ignore Super Delegates, who are not committed) on Tuesday, then widens her lead over time, winning handily. The Sanders-favored model projects a horserace, where the two candidates are ridiculously close for the entire election.


So, who am I going to voter for?

I like both candidates. The current model suggests I should vote for Clinton because she is going to pull ahead, and it is better to vote for the likely winner, since I like them both, so that person gets more momentum (a tiny fraction of momentum, given one vote, but still…). On the other hand, a Sanders insurgency would be revolutionary and change the world in interesting ways, and for that to happen, Sanders needs as many votes on Tuesday as possible.

It is quite possible, then, that I’ll vote for Sanders, then work hard for Hillary if Super Tuesday confirms the Clinton favored model. That is how I am leaning now, having made that decision while typing the first few words of this very paragraph.

Or I could change my mind.

Either way, I want to see people stop being so mean to the candidate they are not supporting. That is only going to hurt, and be a regretful decision, if your candidate is not the chosen one. Also, you are annoying the heck out of everyone else. So just stop, OK?

Trump is going to lose the Iowa Caucus, and here’s why

As of 8:45 or so PM:

Cruz 28.9 Trump 25.6 Rubio 20.8

I’m privileged to live in Minnesota, which is Iowa’s neighbor and thus not so different from Iowa, except our college football teams are better.

And it isn’t just the corn, but also, the caucus. We do that here too. Our caucus system is similar enough to Iowa that one can have a sense of what goes on over the border just with some local experience.

So let me tell you a story. I volunteered one day to help out a friend with a local campaign. The idea was to show up at the local VFW post and engage in a caucus to determine a DFL (that’s what we call Democrats in Minnesota) candidate for a local election. I met the candidate and the other volunteers in the parking lot, and coffee was passed around. As we stood around sipping our coffee, the other candidate’s team showed up, parked their van with that candidates name on it near the door, and attacked the VFW hall in the prescribed way. They plastered signs up everywhere, and positioned themselves around to meet and greet everybody who walked into the hall, giving them literature and buttons.

I asked the person who seemed to be in charge of our team where our signs were, suggesting that we needed to get in there and take some wall space before it was all used up. The response, “Well, people shouldn’t really be picking a candidate on the basis of signs, but rather, on where they stand on the issues.”

A little while later, I suggested that we get in position around the entrance ways and by the food table and bathrooms and such in order to hand out buttons and literature. “We didn’t make any literature, but here’s some buttons, if you want to hand them out. It shouldn’t really matter, though, our candidate is so much better that we don’t need to do that.”

A little while later each candidate got to make a speech outlining their respective positions. My candidate was indeed way better. Articulate, intelligent, made sense. The other candidate mainly talked about her inexperience, and how she didn’t really want this job but her neighbors talked her into it.

Then the process started. We were creamed. We got something like single digit support.


No signs. No buttons. No literature.

Here’s the thing. A caucus is a commitment of time. It takes a few hours. The majority of caucus goers are party activists or people otherwise motivated to spend a few hours in a confusing and sometimes frustrating environment. There are elements to the caucus process, at least in Minnesota, that seem to be designed to weed out the less committed or interested individuals, such as votes on who should be in this or that job that nobody ever even heard of, or resolutions that everyone already supports, etc.

So when you get a room full of activists and they are trying to decide who to put up for election, what do they base that decision on? Well, first, they eliminate the candidates that are simply untenable. At another caucus a few years ago, a candidate who would be running against Michele Bachmann got up and explained that she was the best DFL candidate because she was anti-abortion and anti-gay marriage and such, and so she was the only Democrat that could get those votes away from Bachmann. The room remained silent as she exited the stage, and not another word was said about her. (That is the modern day Minnesotan method of drawing and quartering someone.)

Once the untenable candidates are suitably ignored, we then get to the number one actual question we must ask of this candidate: Can this candidate win?

In a general election, it has been suggested that lawn signs and such matter little. Everybody knows that most of the literature ends up in the recycling. In fact, too much lit can annoy people. Campaign buttons don’t do much either, because once you’ve handed them out most people will not wear them again. None of that really means much in a general election.

But in a caucus it means everything. These are signals a candidate sends out the the activists indicating that they have a clue as to how the process works. I know this does not make much sense at first, but then again, the giant schnoz on the front end of a male elephant seal does not make much sense either. Nor does the giant tail on a male peacock, which mainly serves to make it hard to get away from predators. But these are signals sent out to indicate not too indirectly some aspect of quality.

Sexual selection in animals often causes the evolution of traits that make no sense in most contexts, but end up serving as honest advertisements of some innate quality that females will prefer. Union printed wall and lawns signs, literature and buttons, and having a lot of volunteers standing around clearly identified as working for a given candidate are honest indicators of seriousness, ability, knowledge of the process, support, and so on.

At the local caucus for my friend, the activists saw a candidate that knew the ropes, and a candidate that did not. They picked the one who sent out the proper signals, even though the choice based on positions, speaking ability, etc. should have gone the other way.

Why will Donald Trump lose the Iowa Caucus?

The word on the street in Iowa is that the Cruz campaign is running a tight and effective ground game. They have all the parts. People have arrived from hundreds of miles away to phone bank and door knock … having someone at your door telling you they just drove in from Montana to visit their grandmother in the ancestral Iowa home, oh and caucus for this candidate please, is effective.

Meanwhile Trump is not letting the press near or in the local headquarters. They are playing the ground game totally differently, more like the run up to the latest greatest reality TV show. Trump is inviting random children to tour his private plane. His daughter made a video on how to caucus, as though anyone in Iowa needs to know how to caucus. In short, Trump is sending almost none of the proper signals, and if anything, is sending bad signals. Iowans don’t care about someone’s private plane and they don’t need to be told how to do their jobs.

Iowans, today, will see on the news Cruz’s machine pulling out all the stops and doing all the things. They will see some dude in the parking lot outside of the blacked out windows of what appears to be Trump’s headquarters saying that they have no comment about anything, asking the press to go away. Caucus delegates who might have been leaning towards Trump will caucus instead for someone else, most likely Cruz. And Cruz will trounce trump.

That’s my story and I’m sticking to it. For now. I’ll delete this post in shame if I’m wrong. Which is a distinct possibility. Becauase you never know with a caucus…

Who Will Win The Iowa Caucus?

The answer: One Republican and One Democrat/Independent.

The Iowa Caucus is pretty much up for grabs in both parties. Over recent days, a clear Trump lead has been erased, and Cruz is now ahead in recent polls. Over roughly the same period, a clear Clinton lead has been erased, and Sanders is now ahead in recent polls.

FiveThirtyEight (Nate Silver) is still predicting a Clinton victory for the Dems, but a Cruz victory for the GOPs. The Clinton victory prediction is of high confidence, while the Cruz prediction is not, and Trump is close behind.

One way to look at the polls is to track changes and put a lot of faith in the most recent information. Another way is to use as much data as seems relevant (even looking outside polls) and assume that this gives a better prediction, and go with that. The latter is the method used by FiveThirtyEight. So, Nate Silver’s method will be a big winner if Clinton and Cruze cinch the Caucus, but not so much if Sanders sandbags Hillary and Trump trumps Cruz.

People put a lot of significance on the Iowa Caucus because it is the first real contest among candidates. But then, after the caucus has become history, they are less likely to care too much about it. How important is it as a predictor of the outcome of the entire primary season?

That depends on the party.

Barack Obama, John Kerry, Al Gore, Bill Clinton, Walter Mondale, Jimmy Carter and George McGovern all won the Iowa caucus (or came in above the other candidates) and went on to be the Democratic Party nominee. Dick Gephardt and Tom Harkin also won the caucus, but did not become the nominee. One might say that the Iowa Caucus predicts the nominee pretty well for Democrats.

Gerald Ford, Bob Dole, and George W. Bush all beat the other contenders and went on to get the nomination. But most of the time, the Iowa Caucus was either won by an unopposed Republican (so we can’t count those years in assessing its significance) or was won by a candidate other than the eventual nominee (such as Rick Santorum in 2012, Mike Huckabee in 2008, and Bob Dole in 1988). Overall, the Iowa Caucus means little in the Republican Party, if we go on history, especially in recent years.

Despite FiveThirtyEight’s claims, based on a good analysis of hefty data, I’m going to say that there has been too much flux in the polling numbers to call the caucus at this stage, just over a week prior.

How much will the Arctic Sea ice melt this year?

We are reaching the point where Arctic Sea ice tends to max out, in terms of extent (I will not be talking about volume here, though that is vitally important). Using data provided by the National Snow and Ice Data Center, I ran an informal “Science by Spreadsheet” analysis and came up with a prediction for the minimum extent of sea ice this year, which would be some time in September.

This is mostly a seat of the pants analysis and don’t take it too seriously, but feel free to put your bets in the comments section.

The data over the last few decades shows a generally declining extent of sea ice, especially at the minimum in September. But the maximum extent (where we are now, typically) seems uncorrelated to the minimum extent. Different processes are involved at different times of the year. Also, the shape of those data indicate to me a shift form a slower annual decline to a faster annual decline, happening some time around 1995 or 1996. So, I used September data only from 1996 to last year. I ran a simple regression analysis and from the model it produced I calculate that the AVERAGE September value of sea ice (an odd number that no one ever uses, but I have it anyway) will be 4.1 million square kilometers.

Using the minima for September for this range of years, the MINIMUM sea ice extent for 2014 is predicted to be 3.9458 million square kilometers.

This places this year’s minimum above the extraordinary year of 2012, which to cherry picking denialists will mean a “recovery” (though it isn’t) but below any prior year. The value will be somewhere in the crudely drawn box on this chart:

Screen Shot 2014-03-12 at 11.50.42 AM

We’ll see.

The other thing going on right now, obviously, is the shift from adding ice to removing ice that happens as the seasons shift. It looks like the ice may be starting its seasonal decline now, but in previous years, the squiggly line representing sea ice extent has continued to squiggle up and down for a few more days. In a week or so I think we’ll have a better idea. But it is quite possible that the highest value was reached over the last few days. Again, we’ll see.