Tag Archives: Hillary Clinton

Clinton Likely To Win Democratic Party Nomination

Almost exactly 50% of the votes have been cast in the Democratic Party primary and caucus process. I’ve been updating a model to predict primary and caucus results all along, and the model has done fairly well. The most recent update, however, was a bit off. That update involved separating states into two groups, southern vs northern, then calculating different sets of likely voting patterns by ethnicity for those two groups, and integrating that with estimates of ethnic distribution (“white, black, hispanic”) among Democratic voters by state.

What I did not do in those models was to incorporate the effect of whether or not a primary or caucus is open, closed, or somewhere in between.

Now that we have had quite a few primaries and caucuses, it is possible to move to a somewhat more sophisticated model, because there is (probably) enough data.

I ran a multi-variable regression analysis that coded primary openness (0=closed, 1=semi open, 2=open) and whether or not a state is southern or not southern, then included the percent of each ethnic group by state.

The result indicated that the percent of a voting group (by state) that is hispanic did not influence the result. In doing the analysis I looked only at states, and excluded Vermont and New Hampshire because of the strong favorite son effect. The resulting model, naturally, predicts the number of delegates that have already been awarded to each candidate, in total, precisely, for the simple reason that the model is based on that number. Within the data set, the R-squared value is 0.83, which is pretty good. This means, roughly, that 83% of the variation in voting (by percent who voted for each candidate) is explained by those variables. The following table shows the actual delegates won vs. the delegates predicted by the model.

Screen Shot 2016-03-16 at 11.07.20 AM

Also indicated is the spread between the two candidates in percent. The spread starts off a bit wonky because there are only a few contests, but then settles in to about 20% and remains at that level. Not shown is an analysis of the degree to which Sanders performed relative to expectations. If that number changed a lot, showing a trend, this would be important for predicting the future. The first half of the contests show Sanders under performing, according to this model, by 2%, and the last half have him over performing by 2%. So there may be a very low level “surge,” but not enough to make any real difference in the outcome.

So, what does the future look like? There are several states coming up where Sanders is likely to do well. But is it enough to make it likely for him to overtake Clinton? With a 20% spread and half the votes counted, Sanders would have to take an average of 60% of the delegates from here on. That is very unlikely.

The following table shows the primary and caucus outcomes through the present, followed by the predicted delegate commitments for the rest of the primary season. The percent spread between the candidates is indicated, and it does indeed drop over time, though slowly, reaching a minimum of 8% for the last few races.

Screen Shot 2016-03-16 at 11.49.30 AM

The total number of delegates required to lock the nomination is 2,383. There are 717 uncommitted delegates (aka “Super Delegates”). If we assume that all of those uncommitted delegates will simply vote for the majority candidate, then the number of delegates required to have a likely lock on the nomination is 1669. This is not a fully supportable assumption because some of the uncommitted delegates may chose a different path, but it is a reasonable approximation.

The part of the table above marked in yellow indicates the approximate point in time when the leading candidate, Clinton, will get somewhere around 1669 delegates. So, if this model is reasonably accurate, Clinton will achieve a lock about mid May.

The next set of primaries, next week, are Arizona, Idaho, and Utah. In my view, these are somewhat hard to predict. Polls suggest a weak Sanders win in Idaho and a weak Clinton win in Utah. My model predicts a strong Clinton win in Arizona, and Sanders victories in Idaho and Utah. The total number of delegates at stake next week is small (131 in total). In order for Sanders to signal that he can overtake Clinton, he would have to win about 79 delegates in total. If he falls short of that, the rest of the road is more uphill. If he does better than that, then he may be seriously in the running.

Sanders is also expected to do well in the next several races (Alaska, Hawaii, Washington, Wisconsin, and Wyoming) according to my model. However, I don’t actually expect my model to work at all in Hawaii. My model suggests that he may well achieve over 55% of the vote in those primaries, but again, he will have to have already achieved 60% (unlikely) on the 22nd for this to start to accumulate to a catch-up number.

Following Wyoming is New York State followed by Super Tuesday III, six states with 631 delegates. My model suggests he will get less than half of these delegates, though he will do well in Pennsylvania and lose by not much in New York. I’m also predicting that he will win in California, in June, but not by much.

Between now and the end of the race, there are 1946 uncommitted delegates to fight for. Of these, the top five states account for a whopping 1138 delegates. These states are Washington, New York, Pennsylvania, California, and New Jersey. I predict he will come close to even with Clinton or win most of these states (but Clinton will do very well in New Jersey), but in order for Sanders to overtake Clinton by focusing on these states, he’ll have to do VERY well in all or most of them.

This model uses everything that happened before (mostly) to predict everything that will happen in the future. The first half of this series of events is over (in terms of delegate counts) and there is no evidence of any dynamic change occurring at the moment. This model does an excellent job at retrodicting the prior races, but it might slightly underestimate Sanders performance, since for the last half of the retrodicted contests Sanders outperforms the model by an average of 2%. However, in order for him to catch up to Clinton, he has to outperform the model by 10%.

The graphic at the top of the post is the predicted delegate counts for the entire primary season. The already-held contests are represented as predictions instead of actual because the final number (today’s delegate count) is the same for both predicted and actual. There is a slight narrowing of the gap (see table above) but not enough to change the outcome of Clinton achieving a lock on the Democratic Party nomination in May.

Super Tuesday: What does it mean for the Democratic Primary?

As you know, I developed a simple model for projecting future primary outcomes in the Democratic party. This model is based on the ethnic mix in each state, among Democratic Party voters. The model attributes a likely voting choice to theoretical primary goers or causers based on previous behavior by ethnicity. Originally I made two models, one using numbers that the Clinton campaign was banking on, and one using numbers that the Sanders campaign was banking on.

The results of the Super Tuesday primaries demonstrated that the Sanders-favoring model does not predict primary outcomes. Those same results showed that the Clinton-favoring model worked better. But the numbers also indicated that the Clinton favoring model estimates Clinton’s ultimate delegate take somewhat inaccurately.

I adjusted the model parameter so the model now matches reality for a subset of the primaries that have already happened to within five percent. The model still slightly favors Clinton, but not by much. The subset of primaries includes only the US states (not territories, where I don’t expect the ethnic mix approach to work at all) and excludes states with a strong favorite son effect. This therefore excludes New Hampshire and Vermont. Due to oddities in the Texas delegate system, the adjustment was also made by excluding Texas, though the model results for Texas match very well proportionately.

(Note: Using only the subset of states, the model predicts previously held primaries and caucuses to within less than two tenths of a percent).

The new model now only has one version, which as noted matches primaries so far very well. While there is a somewhat southern bias in the set of primaries that have been carried out so far, that bias is probably not important. I have a fairly high level of confidence in the model.

The result is best seen in this graphic, which shows the cumulative delegate count of committed delegates in US states. So this excludes non-committed delegates (known as “Super Delegates”) and it excludes territories and other non-states (but it does include DC, because DC is like a state).


Assuming a large proportion of the Democratic Party’s uncommitted delegates support Clinton, Clinton will probably achieve the necessary number of delegates to lock the nomination either on the 19th of April with the New York primary, or on the 26th of April, with the Maryland, Connecticut, Delaware, Pennsylvania and Rhode Island primaries.

There are two phases of primaries coming up. First we have a series of weeks with only one or two primaries happening at once, with a total of 300 committed delegates (130 from Michigan). Then we have what is effectively Return of Super Tuesday, with 691 committed delegates, including Florida with 214. For Sanders to regain traction, he has to do well in some of these big states. In particular, Sanders has to outperform the model in Michigan, Florida, Illinois and possibly North Carolina and Ohio.

When we look at many of these states, the model seems to fit very well with the available polling data, except in cases where the polls suggest a stronger outcome for Clinton. The following table compares the model projections with estimates of the delegate split based on polls. All delegates are assumed to be awarded (among the committed delegates only) and the polling data is not very dense and in some cases not too recent, so this is a very rough estimate.


Prior to Super Tuesday, the then-current version of this model projected results that conformed closely with polls. For most states, the outcome of the actual voting matched the projections and the polls pretty well, except in a couple of places. Now, the refined model matches polling data even more closely, but the polling data is not necessarily to be trusted because there has not been enough polling. (I avoided comparisons with really old polls which are entirely useless).

Clinton’s path to the nomination is clear. Sanders’ path to the nomination requires something to change, and to change dramatically and quickly.

Will Clinton or Sanders win the Democratic Nomination?

Both Hillary Clinton and Bernie Sanders are viable candidates to win the Democratic nomination to run for President of the United States.

There are polls and pundits to which we may refer to make a guess as to who will win. Or, we could ignore all that, and let the process play out and see what happens. But, spreadsheets exist, so it really is impossible to resist the temptation of creating a simplistic spreadsheet model that predicts the outcome.

But we can take that a step further and suggest alternate scenarios, based on available data. So I did that.

I have removed the so called “Super Delegates” from the process. This model assumes that the super delegates will ultimately either divide themselves up to reflect the overall distribution of committed delegates, or will mass towards the apparent leader. In any event, it is important that you know that the term “Super Delegate” is an unofficial made up term. They are really called “Uncommitted Delegates” because they are uncommitted. They will walk into the National Convention with no requirement as to whom they cast their vote for. That is their purpose. Meanwhile, it is true that individual Uncommitted Delegates will “endorse” a candidate during the process. Personally, I’m against this because it leads to conspiratorial ideation among activists and other interested parties. If I was King of the Democratic Party, I would make a rule that if you are going to be an Uncommitted Delegate that you don’t endorse or in any other way imply support for a candidate. (I would also probably reduce the total number of Uncommitted Delegates somewhat.)

So, in this model, the number of delegates it takes to be assured the nomination, pragmatically if not fully realistically, is the number required by the process minus the number of Uncommitted Delegates, or 2382-712=1670. In the graphs below, I represent this threshold by a wide blue line to reflect uncertainty. When a candidate’s delegate count makes it to the vague blue line first, that is an indicator that this candidate may be anointed. But, if the two candidates are close in delegate count at this point, a proper degree of uncertainty has to be assumed.

This modeling effort explores the effect of ethnicity on the outcome. I assume all voters are White, Black, or Hispanic. I also only look at US states and DC, because things may be very different in the territories and possessions with respect to ethnicity. It is not too hard to estimate the relative preference for either of the two candidates among White, Black, and Hispanic subpopulations. It is probably true that these ethnic divisions work very differently in different areas. For example, union endorsements may affect ethnic voting patterns more or less for different ethnicities in different states. Importantly, it is likely that both preference and turnout will evolve among the ethnic groups as the primary process continues. This, of course, is why we use a spreadsheet. You can change the numbers any time as more information is available.

This model does not involve age directly, but does so indirectly, in that variations in age graded participation factor into ethnicity. Same with sex, or more accurately, sex is divided evenly across the primary states (I assume) while age might not be, so again, it can factor into ethnicity. But a more sophisticated model that looks at turnout differentials or anomalies across age and sex would be better, and if the information related to this becomes available, perhaps I’ll update the model.

The Iowa Caucus involved mostly White voters, and told us that Clinton and Sanders are very close to even in this demographic. So, the model could assume a 50-50 spit among White voters. Currently available and fairly recent polling data tell us that Clinton is preferred by African American Democrats and Hispanic Democrats, but to different levels. So, a first stab at this model can use a Clinton-Sanders ratio of 70-30 for African American primary voters, and 60-40 for Hispanic primary voters. Using these three sets of ratios, and known statewide demographics across the primary, we can estimate the effects of ethnicity.

One problem you might note right away is that the statewide ethnicity profiles are not the same as the Democratic Party ethnicity profiles. A better version of this model will use the primary participant profiles instead. But, the last two election cycles of data are probably biased in this regard because of Obama’s candidacy, and thus may be incorrect. The preferred method will be to recalculate state by state ethnicity profiles, to estimate how many of each of three groups will vote, based on the returns from the first several primaries. I’ll do that. Right now this is impossible because both Iowa and New Hampshire lack the diversity in the voting population to allow it.

I am ignoring the New Hampshire results because I don’t know how to adjust for the Favorite Son Effect there. Also, New Hampshire is an odd state when it comes to primaries. The largest voting block, in the New Hampshire Primary, is uncommitted, and they can vote in either primary (but Republican and Democratic voters can not switch). This, and some other factors, has resulted in a special culture among New Hampshire voters. So, between the Favorite Son Effect and the special snowflake nature of New Hampshire (which is what makes New Hampshire so interesting and important, of course) I’m ignoring it for now, but will include data from the Granite State when there are more other states to consider.

So, the first model assumes the above stated numbers, and produces this effect:

Screen Shot 2016-02-11 at 1.46.00 PM

In this model, Clinton wins the primary. The pattern of delegate accumulation is interesting, and is actually one of the main reasons to do this modeling, but it only becomes understandable when compared to other outcomes, so let’s look at the alternative model I ran and then compare.

The second model takes a cue from the large number of new young voters combined with their Bernie-ness and their whiteness to suggest a change in the White Ratio to favor Sanders. I sucked on my thumb for a minute and came up with a 40-60 ratio. This model gives credit to Sanders campaign claims that African Americans will grok the Bern, and lowers the differential among Black voters to 60-40. This model assumes something similar for Hispanic voters, and adds another element. It is possible that in some states labor related issues will cause Hispanic votes to shift even more strongly to Sanders, so my thumb-suck estimate for this ratio is 40-60.

The second model is designed to favor Sanders in a way that might reasonably reflect actual possible voting preference shifts that the Sanders’ campaign is attempting. So, this model assumes Sanders succeeds where he is clearly trying, and produces this result:

Screen Shot 2016-02-11 at 1.48.32 PM

Now, we can compare the two models, which I think are a) reasonable given what we know and b) need to be taken with a grain of salt because of what we don’t know.

The two models show a difference in how the spread between the candidates evolves, and when the projected winner can be seen as anointed by the process. In the case of the Clinton win, which assumes the status quo maintained for the entire campaign, and gives credit to the idea that “Sanders can’t win in the South” (more or less), the two candidates stay close enough to each other that there will be no clear winner for a long time, even if Clinton actually does stay ahead of Sanders the whole time. In this case, the jump into the blue zone, though not by a very large margin, does not happen until April 26th, when there are several primaries including Pennsylvania, with a massive delegate count. Also, importantly, after this date there are still some very large states including New Jersey and especially California, that could flip a result. If this is the pattern that develops, the day after the big primary day on April 26th, if I was Sanders, I’d camp out in California!

In the case of the Sanders win, the pattern is very different. (This is why this is interesting.) Here, Sanders pulls farther ahead, and sooner. The big jump would be on March 15th, which is a day of several primaries, including Florida, Illinois, and North Carolina. In this model, a close campaign shifts to a strong Sanders lead, and Bernie does not look back.

Those two scenarios represent two very different primary seasons, indeed!

I will update or redo these models after the next primary or two. Between Nevada and South Carolina, we can get much better data on the ethnic effects on the numbers, though of course, it will still be very provisional. Those data will be limited by not being extensive, but will represent a lot of diversity. On Super Tuesday (March 1st) enough data from a bunch of primaries across the US will allow, I think, a very accurate model that will probably predict the outcome of the primary season IF whatever the status quo on that day happens to be maintains into the future. After that, differences from whatever looks apparent will require something to happen or change to cause voters to do the unexpected.

Hillary Clinton Opposes Keystone XL Pipeline

This just came in from NBC

Last week, Clinton said,

“I have been waiting for the administration to make a decision,” she said last week in Concord, NH. “I thought I owed them that. I worked in the administration. I started the process that is supposed to lead to a decision. I can’t wait too much longer. and I am putting the white house on notice. I’m gunna tell you what I think soon because I can’t wait. I thought they would have it decided way, you know, way by now and they haven’t.”

And moments ago she said:

“I think it is imperative that we look at the Keystone XL pipeline as what I believe it is: A distraction from the important work we have to do to combat climate change, and, unfortunately from my perspective, one that interferes with our ability to move forward and deal with other issues,” she said during a campaign event in Iowa Tuesday.

“Therefore, I oppose it. I oppose it because I don’t think it’s in the best interest of what we need to do to combat climate change.”