Tag Archives: Clinton

Electoral College Prediction: Trump 241 vs. Clinton 297

I’ve got a new set of electoral college predictions. I’m using the same method as before, but with these differences: a) I had to use less than ideal polls (c rating, a few that overlapped with days prior to POTUS debate III) on the last run, this time no such polls are used; and b) there are some new polls added in this time.

screen-shot-2016-10-27-at-2-41-38-pm

The difference is interesting, and somewhat concerning (compare to this result). For example, in this run, Arizona, Virginia, and New Hampshire go for Trump. Most people think of that as unlikely. Personally, I don’t see Virginia doing that. New Hampshire is conservative and is very white (thus potentially Trump-leaning), but is in transition. However, these states are all within a very small fraction of the 50-50 cutoff. Oddly, North Carolina is not that close.

I did a second map (using 270 to win) with the same data but adding ca 3% correction for ground game. Trump seems to not have much of one. I asked a number of colleagues what percentage correction they might use for a good vs. bad ground game. These are people who have ground game experience and a good record. They were all over the place in their suggestions, and noted that any such guess would be iffy this year. So, I picked 3%. North Carolina is actually slightly more tha 3% of the 50-50 line, but I included it anyway in this latest run, which adds New Hampshire, Arizona, Virginia, Ohio, and North Carolina.
screen-shot-2016-10-27-at-2-42-38-pm

I want to remind you of a couple of things about this model. It is indifferent to your opinion as it might be derived from polls. That is the point. This is meant to account for some of that potential observational bias, or at least, ignore it. Also, this model tends to work ver well. However, it is accurate mainly with respect to the percentage of the vote assigned to each candidate in a unit area (a state), not whether the candidate takes the state or not. In other words, we look at this and freak out about a state being blue or red, and the model says, “Who cares about that, I’m trying to tell you the PERCENT of the vote per candidate. So, 49 vs. 51 are two points off, and 81 and 83 are two points off, they are the same, silly human!”

The real meaning of this particular prediction, which uses BETTER DATA POINTS than the last one but FEWER OF THEM, is that it is not a) closing in on a Clinton landslide — that isn’t going to happen and b) it shows the kind of crazy variation over time that should keep us up at night. On the 8th. But not so much other nights, because it is, essentially, impossible for Clinton to lose.

And, to underscore that point, here are the states that my model currently says will go to Clinton, on the stronger Clinton side of the distribution, that are the minimal needed to get 270 votes:

screen-shot-2016-10-27-at-2-54-30-pm

So, that’s how Clinton can win without Pennsylvania or Ohio. And, again, this is the quirky nature of variation near the 50-50 line. Clinton will probably win Pennsylvania (according to everything) and couple of other states, and does not, therefore, need Florida. Probably.

Clinton Vs. Trump: Latest Electoral Prediction

It is fun to look at polls, and using such data, decide which candidate will win which state, and ultimately, which candidate will win the electoral college. A lot of people and organizations do that, and for this reason, I don’t. I do not have access to polls that no one else sees. Were I to use polling data to directly predict outcomes per state, I’d use a method like that used by FiveThirtyEight, and probably come up with similar results. How boring. It would be a waste of my time to try to replicate the excellent work done by Nate Silver and his team.

Back during the Democratic Primaries, I decided that I wanted to get a handle on which candidate was likely to win, fairly early on. The polling based estimates were inadequate because most states simply didn’t have polling data that early in the process. So, I invented an alternative method, which made certain estimates of how voters with different ethnic identities would vote. That method accurately predicted several primary outcomes, outperforming the poll based methods such as those used by FiveThirtyEight.

After a while, enough primaries had been carried out that I could switch methods slightly. Using the same exact model, but primed with the results of prior primaries (that year) rather than my estimates of voter behavior, I used the ethnic distribution data for each state to predict the outcome of upcoming primary contests.

Once again, my method was very accurate, and once again, it out performed the polling based methods.

So, recently, I’ve tried to apply a similar method to estimating the electoral outcome for this year’s presidential race. But, it is impossible to use the same exact method because the entire thing happens all on one day. I can’t use the election results from a handful of states to estimate the likely future outcomes in other states.

I recognize that polling data is very limited on a national level. Things happen during an election season that probably change people’s likely voting behavior, especially among independents. Solid states are rarely polled, and small states, swing or not, are rarely polled. Many polls are of low quality. Right now, for instance, fewer than half of the states have polls that were a) taken fully after the final POTUS debate and b) have an A- or better rating from FiveThirtyEight. If I allow the use of B and occasional C ratings for recent polls, and allow a few polls to include periods of time prior to the last POTUS debate, but only in states that are very strongly in favor of one candidate or the other (and thus likely to not move anyway), I can find 32 states that have sort of usable polling data. Interestingly, states with some of the more controversial changes happening, like Utah and Iowa, are not adequately polled.

In order to apply a model like the one I used in the Primaries to the current election, I used the 32 states for which there was somewhat acceptable recent polling data to inform the model (to calculate the regression coefficients) in order to then, separately, predict the likely voting behavior (Trump vs. Clinton) in all of the states.

Before I show you the map, however, I need to discuss something else.

About a week ago the press, especially the somewhat more left leaning press, and various commenters, seeing much reaction to a series of events beginning with the NYT release of Trump’s tax return and ending with the final POTUS debate, events which sandwiched the sexual assault tapes and accusations, collectively decided that a huge gap between Clinton and Trump was rapidly opening up and the race would end with a double digit spread, an electoral rout, and a big party.

Soon after, I pointed out that this may not be correct. That polling data seemed to show, rather, that there was an expansion of the difference between the two candidates followed by a re-closing of the gap, with Clinton still leading but by about as much as before this temporary shift. To this I added a concern. If too many people assumed that the race was over and in the double digit range, perhaps there could be a GOTV backlash effect, or a funding effect, that would shift things to within shooting distance for Trump.

I was not alone in thinking this, and I was probably right. The GOP sunk, via pacs, 25 million dollars into Senate races in response to the Democrats shifting from the national race to the Senate, which was followed by the Democrats shifting back to the national race in certain states, presumably recognizing that the polls were artificially spread. Indeed, some who criticized (arguing mainly from incredulity and good wishes) my admonition noted, correctly, that some of that narrowing was because a bunch of right-leaning polls had come out all at once. This is true, but it ignores that a bunch of left-leaning polls had made the formation of the Great Gap of GOP Defeat look a lot bigger than it ever really was.

I say all this as part one of my preparation for what I’m going to tell you below, which is not the news you want to hear. Part two is some logic I’d like to bludgeon you with.

Consider these points:

1) True Trump supporters could give a rat’s ass about sexual assault, poor debate performance, or tax forms. Donald Trump was correct when he said, weeks ago, now forgotten, that he could gun someone down on the streets of Manhattan and he would not lose support form his base. These people did not abandon him when he was heard to talk about sexual assault. If anything, they were energized by it. And, I’m talking about something just shy of 40% of the voters. We live in a barely civilized asshole country.

2) Please tell me exactly which Hillary Clinton supporters, who were going to vote for Clinton over Trump all along, are NOW going to pick Clinton (if polled or on voting day) that change from not being Clinton supporters to being Clinton supporters? In other words (this is a somewhat subtle point) which people who hated Trump became True Haters of Trump after the sexual assault thing? Almost none. They were already there.

3) The third category of people, the undecideds (who are only lying about being undecided, in most cases) and the so-called “reasonable Republicans” (of which there are very, very few), who could conceivably shift from Trump to Clinton are going to divide their voting activities between Johnson, a write in (as they are being advised by Republican leaders in some cases) or simply staying home.

In other words, over the last few weeks, no source has emerged that hands Secretary Clinton more electoral votes than she probably had about a month ago, and Trump is not going to have any, or at least not many, electoral votes go away.

Those observations (part one) and that logic (part two) cause me to be utterly unsurprised to find out that an analysis of the electoral map I did on October 16th and one I did today do not show Clinton pulling farther ahead. In fact, the two analyses have Clinton being less far ahead than Trump now than ten days ago. The difference is in Ohio (shifting from Clinton to Trump) which is almost certainly going to happen, and North Carolina (which shifted from Clinton to Trump in this analysis) which seems much less likely to happen, and Arizona shifting from Clinton (that was probably wishful thinking) to Trump.

The point here is this, plain and simple. An analysis using a technique that has worked very well for me in the past shows that the difference between that moment of Maximal Clintonosity and today is plus or minus a couple of state. In other words, not different. Maybe a little worse. Really, about the same.

Here’s the current map:

screen-shot-2016-10-26-at-11-34-33-am

Obviously, I will be watching for more data over the next few days. I assume there will be a spate of polls as we approach November 8th (the day Democrats vote. Republicans vote on the 28th of November). If so, then there will be convergence between my method of calibration and my method of calculation, and the model will consume itself by the tail and become very accurate at the same time.

But between now and then, perhaps that very small number of polls that are both recent and high quality will grow a bit more and I can do this again and resolve those closer states.

By the way, the “swing states” according to my model, the states where things are close, are Ohio, North Carolina, Arizona, and Georgia of those now in the Trump column. Those are indeed swing states. Numerically, the close states that are in the Clinton column are Virginia, New Hampshire, and Pennsylvania.

The real meaning of Trump’s Al Smith fiasco

A presidential election season involves a series of debates. After the last debate, a day or a few days after, the main candidates attend and speak at a charity dinner run by the Archdiocese of New York, to raise money for Catholic Charities. It is the last event at which the candidates will appear together, and the format is that of a roast.

That is more or less the tradition.

Last night, Secretary Hillary Clinton and Donald Trump were at the Al Smith dinner. Here is are the salient facts:

Trump spoke first. He had two or three pretty funny jokes, but the one that I think will go down in history as the funniest required that he throw a woman (who did not know the joke was going to be told) under the bus for his own benefit. Figures.

His other “jokes” were almost entirely taken from his stump and debate speeches; and they were offensive. He didn’t use the term “crooked Hillary” but almost. People in the room booed him and yelled out insults to him. The people sitting behind him looked like they had just swallowed live baby porcupines.

I assume both candidates were given the same amount of time to talk. Trump’s time on the podium, however, was very short. It appears that he was, essentially, booed off the stage.

Secretary Clinton spoke second. She was very funny. She was gracious. The roast parts of her speech … and here is the important part … were just as effective as anything Trump said as jabs against one’s opponent, even more so. If you took at face value all the bad things Trump implied in his awkward statements about Clinton, and all the bad things Clinton implied in her very entertaining routine about Trump, Trump would end up with a truly deplorable resumé, while Clinton would look just a tad shady, well within the normal range for a politician.

After Secretary Clinton finished the roast part of her monologue, she talked about other things, larger things, important things, eloquently and effectively.

Trump had everyone booing him and squirming. Secretary Clinton had everyone in stitches, then a bit weepy-eyed.

The final score: Clinton 9, Trump -2. The difference in performance between the two at this event was double the difference between them during the most differentiating of the debates.

So, what is the real significance?

There has always been the suggestion that Trump’s intention, from the beginning of the primary process, was to increase his brand’s value, maybe sell a book, increase his speaker fee, etc., and not really run for President. I never believed that, I said so at the time, and everyone else was wrong. But, the idea that ultimately he would use this entire run for the presidency as brand enhancement, win or lose, was clearly correct. That would be correct for anyone running for president, and especially for a professional entertainer, which is what Trump is.

(Actually, he is something else. Not an entertainer and not a business person. See the graphic at the top of this post for a hint as to what he is.)

Here’s the thing. Last night, two people got to get up in front of a fairly tough audience, including major members of the press, major east coast politicians, and the mucky-mucks of the Catholic Church, and be entertainers for a few minutes. Hillary Clinton, not known to be an Obama-level speaker (either Obama) and often seen as a bit dry, killed it. Donald Trump, the great entertainer, totally screwed the proverbial pooch.

So, now, imagine yourself as a network executive, or a potential investor in the entertainment industry. You are presented with a proposal to develop Trump TV or some other Trumpy project. But you were at the Al Smith dinner, because you are rich and you happen to live in New York. Or maybe you just saw the video. And now you are going to decide whether or not to put substantial funds at risk. While you are thinking about it, you also realize that you would be putting your reputation at risk.

No, that won’t happen. Invest elsewhere.

Yes, Trump will be able to develop a post-election quasi network (on the Internet) that will fit in with the broad panoply of such projects, and it may have some value (fiscally, not morally or ethically). But Trump’s entertainment mojo as demonstrated in this campaign is negative. He doesn’t kill the room, he kills the mood. He was apparently suitable to play the asshole boss on a TV show or two, but his range is very limited, his basic talent non-existent, and his ability to develop in this area nil.

This campaign, rather than preparing him, and a large audience, for an entertainment coup, has proven that he is not up for it, lacks the talent, lacks the appeal. The Al Smith Dinner, which happened at the end of a long period of time during which Trump could have developed his talent, and his act, shows that there is nothing there worth looking at. Indeed, Trump’s performance at the Al Smith dinner was so bad, so cringeworthy, that a producer or investor in entertainment would gong the likes of him off the stage in record time.

Trump went bankrupt how many times? Failed in how many relationships? Is gong to lose the presidency by how much? Couldn’t even handle a roast at a charity dinner? It just might be that the man isn’t really good at anything.

UPDATED: Was there a Clinton Surge or not?

Updated to include polls through Oct 26th (AM, more polls later in the day on the 26th will be added at the next update):

screen-shot-2016-10-26-at-9-46-18-am

Updated, 25 October AM

As I expected, and demonstrated much to the consternation of everyone, the ever widening double digit lead of Clinton over Trump in an increasing number of polls meme is a falsehood. Here is the latest graphic using the same approach as described below, but updated to reflect additional polls.

screen-shot-2016-10-25-at-8-28-14-am

Rather than a widening, or even consistent, gap, or a gap that is double digit, we see Clinton continuing to lead, but pretty much in the same way that she has led since the conventions. In other words, the three presidential debates, the release of Trump’s tax records, the sexual assault tape, the confirmation of many actual groping cases, and the VEEP debate, may have had some short term effects on the polls, and if you look closely and squint, may have actually re-widened Clinton’s lead to post convention levels a bit, but for the most part, we are looking at a pretty steady relationship between the two candidates from the end of the convention period to the present.

When the general polls conform to expectations, they matter. When they don’t conform to expectations, say “yeah, but what really matters is the electoral college, and in the electoral college … bla bla bla.”

And yes, since we attempt to choose our president using the Electoral College (though that doesn’t always work) that is what matters, and it may be the case, though I can not independently confirm this at this exact moment in time (Tuesday AM), that Clinton is either taking or widening the lead in some of the swing states, and some red states are turing less red, as we speak. But, it turns out that we DO look at the general numbers for a number of reasons, including the fact that we expect general trends to conform to state wide trends, as a check on what we are seeing, and general trends may matter down ballot.

The original reason that I wrote this post is that I was concerned that a lot of commenters (and maybe voters) had come to the conclusion that Clinton’s lead was growing, nearing or in the double digit range, and that the Clinton campaign need not look back, and could start doing other things, but, my read on the polls was that the debate/scandal swing looked like earlier swings, and I had little faith that it was long lasting. I took a look at the data and saw preliminary information suggesting that this may be the case. And now, that is confirmed. I conclude for now that the three presidential debates, the release of Trump’s tax records, the sexual assault tape, the confirmation of many actual groping cases, and the VEEP debate, may have had some short term effects on the polls, and if you look closely and squint, may have actually re-widened Clinton’s lead to post convention levels a bit, but for the most part, we are looking at a pretty steady relationship between the two candidates from the end of the convention period to the present.

And yes, I said the part that the incredulous will ignore twice.

I may do another electoral projection to replace this one later today.

Original Post:
America. Democracy. Decency. Thoughtfulness. Everybody and every thing, it feels like.

Everyone is upset this morning about Trump’s comment that he will wait and see about the results before he accepts them. His comments are deplorable and astonishing, but I think they are also a distraction. If he ignores the results, it may be a bit messy but he will be ignored. A few militia groups will go and take over a Federal facility or two, but that will be managed. Unless the Congress gets on board with denying Clinton the presidency, nothing really bad will happen.

I’m more alarmed by all the comments he made in this debate, and perviously, about how he would handle wars, the military, the economy, the law, the Supreme Court, trade, ethnic/race relations, and his comments about women (which continued last night). Those are all problems that will ruin us as a country if he wins, and that have damaged us as a country already even if he walks away from this race right now. I’m not all that worried about him having a tantrum if he loses.

And, of course, it is maximally concerning that Trump wins the election, than it is that he loses and refuses to go quietly. This is because it is simply not the case that Hillary Clinton and the Democrats have this sewn up. Let me show you why.

screen-shot-2016-10-20-at-9-45-37-am

This graph shows the daily averaged-out polls, all of them, as listed by RCP’s site, since July 1st (plotted on a y-axis of days before the election). There is a 3 day moving average imposed on this (a shorter moving average than usual, but this is an average of averages, and those averages are of polls taken over varying numbers of prior days, so we have plenty of helpful smoothosity on that curve).

Never mind the details for a moment. Notice first that over this time, which starts in the month of the conventions and goes up to the present, there is an overall pattern of oscillation. For much of the time all of the pols are within the margin of error, but Clinton’s polls are usually higher than Trumps, when averaged out. If you apply the FiveThirtyEight method, or use similar approaches, to combine the different polls into probability statements, one can be more definitive about Clinton’s overall and consistent lead since the conventions.

But, notice that about 50 days out, the two candidate’s polling became close before Clinton started to separate again, and also notice, that this cycle of Clinton pulling ahead and then drawing down again seems to be happening one more time. There was probably a lot of pressure separating Clinton and Trump, with Trump’s bizarre and generally poor performance in the debates, the revelation of the tape in which he seems to have no clue that sexual harassment is not OK, and the revelations seeming to confirm that he is a serial sexual molester, and the tax story from the NYT, and all of that. But the about 27 days out, that pressure relaxes, and all the numbers regress towards the mean again.

Let me put this another way, as a stark but supportable hypothesis. About 50% of the United States would vote for Trump, and about 50% would vote for Clinton. People talk about the 35% to 40% Trump base, and that’s real. And Clinton has a similar base. But the rest of the country, the 20% to 30% that are not part of those groups, are divided roughly in half, in terms of preference for either candidate, and their preference is soft.

If there are no more strong events pushing people away from Trump, the numbers will settle down to where they were between days 40 and 50. this will place trump within about one point of Clinton. And, one point is very very close.

The current widespread rhetoric that Clinton is going to win no matter what may be the exact cause of her losing. How many people will not bother to vote, when they otherwise might have, because they are confident that Clinton will win? If the two candidates are 1% apart, then only 1 in 200 voters have to do that to put Trump in the White House.

Let me note what may end up being the greatest situational irony of our times. MSNBC has lots of great commentators and reporters, like Rachel Maddow and Chris Hayes. They are providing the most thoughtful and coherent analyses of what is going on during this election cycle. But, they are also constantly repeating and supporting the rhetoric that Trump can’t win. And, their audience corresponds closely to that subset of people who are going to vote for Clinton.

Unless…

Unless MSNBC and other sources fail to shut up about how Clinton can’t possibly lose, and one in 200 otherwise-Clinton-voters stay home.

There are, of course, other possibilities. The apparent closing of the gap we see on the above chart could be an artifact of poling and disappear by itself over the next 48 hours, or it could be real, but reverses because of something Trump does. However, keep this in mind: Trump is being such a distraction from the race that a lot of information that could be used against Clinton (legitimately or not) is currently piling up and not coming into play. It is quite possible that forces that work to push Trump down on this graph could be weak, and forces that work to push Clinton down on this graph could be strong, and we might not be looking at a dangerously weak 1% lead by Clinton when the first week of November rolls around. We may be looking at a distinct Trump lead.

I should mention that today’s polls are not shown on this graph because they are mostly not available. Those that are available are in that subset that tends to favor Trump, but they are all showing a virtual dead heat.

Today, tomorrow, through Monday, we should be looking very closely at the polls. If they show narrowing, then my Hypothesis from Hell can’t be ruled out and the idea that the race is really about 50-50 between scandals needs to be taken seriously.

How have events shaped the Clinton-Trump race?

It is unfortunate that “all the pundits” are now saying that Clinton will now win no matter what, and that Trump will likely suffer more scandal before the end of the process.

This is unfortunate because a weak get out the vote effort is probably worth a couple of points on election day. It is unfortunate because some Trump scandals increase, rather than decrease, his numbers. He could suddenly gain a couple of points if he says or does just the wright/wrong things. It is unfortunate because, for whatever reason, Hillary “My Middle Name is Target” Clinton has turned into the Teflon Candidate for now, but that won’t stick, as it were, for more than a day or two. Then Wikileaks, weak as it is, or some other issue, will come into play and knock two points off of her numbers.

It is unfortunate because the difference between Clinton and Trump is now between about 5 and 7 points, and 2 + 2 + 2 = 6.

Do the math. This race is not over.

In order to give some idea of the magnitude of things like the post-sexual-assault-revelations Trump Slump, or the conventions, or a given debate, in relation to the overall shifts of numbers across this race, I mad this chart, using RCP’s national polling averages, and adding in some key moments from the campaign:

screen-shot-2016-10-19-at-3-00-12-pm

While Clinton has always been ahead, on average, she has not always been that far ahead, and was, in fact, father ahead at various points in the past than she is now. In other words, for all the talk about BusTapeGate and debate performances, Clinton has not pulled out ahead of Trump father than she has been in the past. If you look at this graph, you do not see a clear breakout. And, if you look at the MOST current version from RCP, as I write this, the blue line on top is dropping (those data came in while I was drawing this graphic, and I did not adjust). See that earlier peak in September for Clinton? The current peak is starting to look like that.

So, no, this is not over, and it is not wise to insist that it is.

The Current Trump-Clinton Electoral Prediction

There are some interesting, and in some cases, potentially disturbing, things going on with the state by state numbers in the current election. Most of this has to do with third party candidates, and most of it with Gary Johnson.

First, I’ll note, that despite fears among liberals and progressives that a lot of Bernie Bots would flock to third party candidates and eschew Clinton, there is no strong evidence that Clinton is losing much to any third party candidates. However, in some states, especially those with libertarian tendencies, Gary Johnson is doing fairly well. And, this had been hurting Trump.

However, lately, there has been a shift backwards in at least one state, New Hampshire. Johnson supporters are abandoning Johnson and switching to Trump, as though they were trying to shore up his position there. This has brought the Trump-Clinton numbers to within the margin of error.

In other words, Libertarian White Males in the “Live Free or Die” state are flocking to Misogynist Racist Trump’s aid rather than “voting on principle” which is what, I assume, they were formally pretending to do. And, this could cost Clinton a couple of electoral votes if the trend continues.

Meanwhile, something like this may be happening in Virginia, but in the opposite direction, where Johnson appears to be getting a lot of Trump votes, maybe more as time goes on.

I don’t have time to do any of this right now, but when this is all over, it would be very interesting to look at the third party effects in this race.

OK, now on to the model. Let me explain the basic approach I take, which is different from other predictors (though 538 may have quietly adopted part of my approach for the general, as they’ve added something that looks a lot like my primary methods to their analysis).

Assume that all polls are good, and that all states are recently sampled with high quality polls with good methods and good samples.

OK, after you’ve stopped laughing, work with this assumption for a minutes. If this was the case, then you could use those polls to predict the electoral outcome, and unless the electoral outcome was really close, or something major went wrong, your prediction would be clear as two who won, and very close if not spot on as to how many electoral votes ultimately go to each candidate.

Now assume that we don’t have polls at all, but we have some numbers indicating how people in a given state are likely to vote (like, if they went for Romney, they are likely to go GOP) or numbers indicating how people will vote based on ethnicity (like, African Americans are not likely to vote for any of the candidates other than Clinton, or among whites there is a certain percentage of White Supremacists, so they’ll vote for Johnson or Trump, etc.) If these numbers are accurate, you can predict the state by state outcome.

We don’t have either of these, but we do have a little of each.

My method uses only a subset of polls, hopefully across a range of states (geographically, politically, etc.), that are taken by higher end polling agencies and recently. These are then combined with data on percentage of voters in that state that voted for Romney, and the classically defined ethnic breakdown for that state, to come up with a muliti-variable regression model. This model uses the percent of the vote that Trump gets out of Trump vs Clinton as the dependent variable, and a Romney number, and the ethnic breakdowns, as the independent variables.

I exclude some states that have recent data but that are beating to their own drums. In this case, Iowa is doing something different, and nobody understands it. Also, Virginia is doing something different and has not been analyses yet. So, even though I have recent data from those two states, they are excluded.

The polls need to be mostly or entirely after the famous “bus tape” and most are after the second presidential debate. These polls come from Utah, Wisconsin, Georgia, Missouri, Indiana, Texas, Alaska, Ohio, Colorado, New Hampshire, Florida, North Carolina, Nevada, Maine, Pennsylvania, Oregon, Michigan, and Washington.

So, good polls are assumed to be nearly perfect, and they show the relationship between available prior voting patterns and demographics and the likely outcome. Then, this model is applied to all states (even those with the good polls) to come up with a list of states and their corresponding “Trumposity”

The result of that analysis is this:

State Trumposity
Utah 0.58063023
Wyoming 0.567768212
Oklahoma 0.549223043
Idaho 0.548614992
Alabama 0.546790641
West Virginia 0.541869467
Arkansas 0.541711727
Louisiana 0.539524138
Tennessee 0.536052614
Kentucky 0.53465294
Mississippi 0.532941927
Nebraska 0.529403232
Kansas 0.527964979
North Dakota 0.524771763
South Carolina 0.522670157
South Dakota 0.519464786
Georgia 0.517394808
Texas 0.513464342
Montana 0.51208424
Missouri 0.509752111
Indiana 0.509722982
Alaska 0.503877982
North Carolina 0.498999354
Arizona 0.494053798
Florida 0.486219535
Ohio 0.485026033
Virginia 0.48473854
Pennsylvania 0.477290142
Michigan 0.472823203
New Hampshire 0.472629126
Iowa 0.472420972
Wisconsin 0.470889284
Minnesota 0.470213444
Colorado 0.46956662
Nevada 0.465734145
Delaware 0.456986898
Oregon 0.454603152
Illinois 0.452734653
Maine 0.45123086
Connecticut 0.450282382
New Jersey 0.448700354
Washington 0.448170787
New Mexico 0.447855158
Maryland 0.445995181
Massachusetts 0.436816062
New York 0.429018521
Rhode Island 0.427783887
California 0.425306503
Vermont 0.411357768
Hawaii 0.371874288
District of Columbia 0.339691168

You can now split the table at the 50-50% mark to decide which states will break for Clinton and which will break for Trump.

(Note: Alaska will always break for Trump. It is located near the 50-50 line because Alaska is a special snowflake state. Ignore it, just keep it red on any map, and that will do.)

The first map I want to show you is the map of states that are in the Clinton Camp that are a) most Clinton leaning in this analysis, and b) sufficient to get Clinton to 270:

screen-shot-2016-10-16-at-10-59-42-am

I added Virginia and colored it light blue. The reason I did this is that Iowa is a presumed-Clinton state in this mode, but is in fact, polling for Trump, because people in Iowa seem to have a new goal in life: Pissing off the parties and the electorate sufficiently that nobody cares about them any more, and the Iowa Caucus is no longer allowed to take the prominent role it has for all these year. I predict that if Iowa breaks for Trump, in four years, the first contest will not be the Iowa Caucus.

By adding Virginia and thus potentially starting early on the process of regarding Iowa as irrelevant to electoral politics, we have a list of states that is clearly Clinton and sufficient to put the former first lady back in the White House but with a different job.

Now, let’s do the same thing for Trump. What states are required to put him past the 270 line?

screen-shot-2016-10-16-at-11-05-19-am

In this case, I’ve colored pink the states that my model puts in the Clinton column but that are on the Trump-end of that part of the list (see table above), that are required to give Trump the election.

Ohio is actually possible. My model shows Ohio going to Clinton, but recent polling shows that Ohioans are more white supremacist than we might have thought. So may be Trump gets Ohio, but I don’t think he’g soing to get all those other pink states, or even any of them, likely.

Putting this a slightly different way, the solid Trump states (in my model) plus Ohio is still under 200 electoral points.

The current most likely outcome according to this model is this:

screen-shot-2016-10-16-at-11-08-44-am

That would be an electoral blowout.

What happens if some of the more suspect states go backwards and vote for Trump? Iowa is threatening its own irrelevance, New Hampshire is acting strange, Ohio is polling towards Trump, and North Carolina, Arizona and Florida are close to the mid point. Change all of those states to Trump, and we get this nailbiter:

screen-shot-2016-10-16-at-11-11-41-am

The difference between these last two maps is clearly going to be the focus of interest over the next several days.

Colored here in red, for Danger, not for Trump/GOP, are the states that need to be watched closely, for which we eagerly await new polling, because they are either close, near the middle, or acting strange over recent days:

screen-shot-2016-10-16-at-11-14-01-am

That’s my story and I’m sticking to it. Until at least Tuesday or so.

Go to 270 to win to make your own maps!

The Electoral Map: Clinton Vs. Trump

Above is my latest electoral college projection.

This uses the technique previously described. However, instead of using RCP averages for all polled states and then using extreme (non-tossup) states to develop the regression model, this method uses only polling from states with one or more recent poll, and only with good polls. these poll numbers are then “predicted” by black/hispanic/white/Voted_Romney numbers, and that generates a model, based on just over 20 states, designed to predict all the states.

As expected, the r-squared value is much lower using this method, but this method does not violate any important statistical laws like the last one did.

Most of the polling data pre-dates the revelation of Trump’s interest in sexual assault, last Friday, and of course, Monday’s “I’ll throw my opponent in prison when I win” debate on Sunday. If you believe those events influence the election further, then you can figure this is a conservative estimate from the perspective of Clinton.

All of the blue states, both shades, are projected to go to Clinton, but I left the three closest to 50-50 in light blue.

I suspect the most controversial state here is actually Iowa, which seems to be throwing some sort of hissyfit in the polls.

And this, of course, is why my model is different from everyone else’s. The polls are used in this case to calibrate (in the absence of earlier results, like could be done in the primary!) but the actual prediction then does not use the polls directly. So, even though a recent poll showing Iowa as Trump, the model does not, because the model does not lie like the Iowans do, apparently!

PREVIOUS PREDICTION

Who Won The Presidential Debate Weekend?

You can’t say who really won the debate, because on Friday, news broke, confirming other news from the prior Monday (and general suspicians) indicating that Donald Trump is not fit to be President in Yet Another Way, and his campaign essentially imploded. So, instead, we’ll ask, “who won the weekend?”

As you know, I’m the last person to write off Donald Trump. From the very beginning, without fail, I’ve been warning you that he’ll do well, that he’ll win the GOP debates, that he’ll win various primaries, that he’ll win the nomination, etc. All of it. I have never once been wrong about this.

The reason I’m never wrong is because I know something that you also know but that you refuse to admit because it is too painful. Most Americans, perhaps way more than a majority, share one or more opinions with the core Republican political and social philosophy. A smaller number, a minority but not fewer than about 40%, agree with most or all of those points of policy. Added to this Republicans tend to work better in lockstep than Democrats.

And this, dear reader, is why Republicans have been mostly in charge for most of the time since the Republican party became what it is today (staring in the 1970s).

Donald Trump, meanwhile perfectly represents most of that ~40% of Americans, and that is why he is their candidate.

However, more than one thing must be in place to win an election. One of these things is having a large and loyal base, and Trump has that. Another is money, from multiple big donors. Trump had that (including himself) but it is gone (except himself). Another is the support of the party elite and all those great surrogates that go out and stump for you. Trump lost whatever he had along those lines a while back, and as of a couple of weeks ago has had absolutely nothing in the way of surrogate support. It has been just Trump and Pence. And now, Pence seems to have stopped campaigning, so it is just Trump over the last few days, today, and tomorrow, at least.

There remained for a while the Basket of Hypocrites, such as Mike Pence, Paul Ryan, Ted Cruz and the others. These are mostly evangelical conservatives who were willing to throw every one in the country under the bus just to defeat Hillary Clinton, regardless of the cost. But with the culmination of sufficient evidence to regard Donald Trump as a supporter and likely doer of sexual assault on arbitrary females as a given part of his privilege, even the Hypocrites can not survive being associated with him.

And for this reason, over the weekend, these rats left the ship.

As of some time over the last 48 hours or so, the Trump Campaign is over, and this is true regardless of any debate.

Then, there was the debate.

One could argue that Trump did better than expected, and Clinton could have done better, but everyone who is not extremely partisan thinks Clinton pretty much won.

So, what do the polls show? A new poll by NBC and the Wall Street Journal, that does not include the debate (because it was conducted on Saturday and Sunday, before the debate) puts Clinton at 46% to Trump’s 35% in a four way match. Head to head, the spit is 52% to 38%, so if some of those third party snowflakes get with the program and actually vote in the election, the spread widens from 11% to 14%.

Those are double digit numbers. We’ve not seen double digit numbers from a major and legit poll since, I think, the start of the national campaign.

I’m pretty sure the debate did not push the polls back the other way. I’m pretty sure this weekend poll reflects the current situation, more or less. Of course, it is only one poll.

Looking at phone polls by major pollsters and/or major news agencies, excluding one outlier because its numbers are so far different (FOX), from September 1 to the present (including the poll mentioned above) we get this from HuffPo Pollster:

screen-shot-2016-10-10-at-12-45-18-pm

OK, now, pretend I’m wearing a Steve Kornacki mask and I’ve got a sharpie.

screen-shot-2016-10-10-at-12-48-12-pm

screen-shot-2016-10-10-at-12-49-37-pm

I could do more, but I think you get the point.

I expect more scandalous news.

Last week there were indications that the NYT had more about taxes that would eventually come out. I’ve heard rumors of a tape with Trump saying the “N-word.” Right now there is strong evidence that Trump is on board with the whole idea of sexual assault, and there is already some information out there about this, but with the Access Hollywood tapes out, may be we will start seeing actual victims, if any, come to the fore. And, there are known to be tapes from The Apprentice said to be similar to, maybe worse than, the Access Hollywood tapes.

These things will not come out today, because today, the news cycle is still finishing with Friday’s information, and still working on the Debate, so any editor or producer with something to say will wait until tomorrow. So, if something is out there, may be we’ll hear of it then. a few days ago I suggested that we’d be seeing approximately one Trump news dump about every four days until the election. The time span between Monday’s revelations (already forgotten) and Friday’s was about four days, right? Then there was friday ..let’s see … (counting on fingers) … friday, saturday, sunday monday, … TUESDAY! So, Tuesday, or maybe Wednesday. Stay tuned.

Who won the first presidential debate?

It was a tossup, but in a rather complicated way.

Even the regular commenters with major network news, and PBS, clearly indicated that Hillary Clinton won this debate. And she did. She not only had better answers, but actual answers. Trump acted very poorly and Clinton acted presidential. Trump got caught in several lies, and made several more lies that were to be caught later. He made a fool of himself and Clinton did very well.

Therefore, it was a tossup. It was a tossup because a couple percent of the populous are former Bernie Sanders supporters with so much butt hurt that they will not vote for Clinton and may even vote for Trump, not because they like Trump, but because they want to punish the rest of us by supporting Trump since they did not get their way. A few percent of the votes are Special Snowflakes who know that the only way to advance civilization is if they vote for a candidate that can’t win in a single state and that no one will remember exists in two years, even if that mans Ralphing the election. It was a tossup because the worse Trump preforms the more his Deplorables love him, and the more likely they are to go out and vote.

Everybody who already supported Secretary Clinton thinks she won the debate, and now they are going to vote for her, just like they already were going to vote for her. Everybody who doesn’t care which of the two major candidates will win saw what everyone else saw, but they have been reminded that there is an election coming up, and are now more likely to either not vote for either candidate, or to vote for Trump out of spite. Everybody who was already supporting Trump was already going to vote for Trump, if they showed up at the polls, are now slightly more likely to show up at the polls.

So, perhaps, Trump won by a percentage point or two, with respect to how this debate will affect the outcome at the voting booth.

So, that’s what happened last night.

The most important single act you can carry out period.

This is it. Don’t mess this up.

It isn’t that common that a single event can have a cascading effect on so many things. And if it does, such an event would not be that likely to have an entirely negative effect on all it touches. But, the election of Donald Trump as President of the United States would be such an event.

Therefore, in turn and in opposition, your vote this November 8th matters as much as his presidency would matter. So, you must vote. (And please remember to NOT VOTE FOR TRUMP. That’s the point. Do not vote for Trump.)

Also please, make sure that if you intend to vote for a third party candidate because you have calculated that “my vote doesn’t matter in my state because my state bla bla bla” that you aren’t wrong. You might be wrong. A lot of people will be wrong this year because of the simple fact that the electorate is behaving differently than it has behaved in decades, so expectations that allow you to feel safe are not valid.

For example, if you live in Minnesota, and you think it is safe to vote for Stein or write in Bernie, you should know that there will not be helpful polling in this state, and Minnesotans tend to vote very conservatively, suddenly, and surprisingly, an din groups, now and then. This could be such a year. We sent the Worst Senator in the World to the US Senate, twice, because the other side violated a fundamental law of Minnesota culture, even though the DFL candidate was already widely recognized as the best candidate in all of the Senate races that year. We elected Jesse Ventura as governor. No, Minnesotans, your vote is not “safe.” Vote for Hillary Clinton. Anything else you do IS a vote for Tump.

This applies as well to all of the battleground states. Too much is at stake for you to let your special snowflake voting status, your own personal feeling of wanting to do the “right thing,” lead you.

Meanwhile, for those who have actually been paying attention to the careers and policies of the candidates for many years, Hillary Clinton is a great candidate, and it is a shame that so many of the smears against her, perpetuated by Karl Rove, Newt Gingrich, and the GOP, have convinced so many otherwise smart people that she is not. Sure, disagree with her on any issue you like, and advocate and activate on behalf your positions. But do recognize that she is a legit candidate and none of the others are.

Anyway, this all comes as introduction to the following video. Which, by the way, includes a LOT of people who supported Bernie in the primaries, but who are now warning of the dangers of Trump. Please pay attention to this. Special bonus appearances for West Wing fans. And, Mark Ruffalo, naked. Full Monty. But only if you do the right thing.

If you are not a voter in the United States of America you may disregard this message.

Here’s the link mentioned.

Hat tip: Julia

Science Questions for the Candidates

ScienceDebate.org is an organization that, for years now, has been pushing to get the candidates running for President of the United States to engage in a debate over science policy, just as they debate foreign policy, or economic policy, etc.

And, ScienceDebate.org has had some success. Some of the candidates, at the primary level, have engaged in such a debate, and at the national level, some of the candidates have contributed written answers to citizen-generated questions about science policy.

And now, they’ve done it again.

The four main candidates (two actual main candidates and two “third party” candidates) were provided with several science policy related questions. Three of the candidates have provided answers.

The entire project is to be found HERE. There are 20 questions.

I’m still going through them. If you have comments on any, please post them, I’d love to hear what you think.

Personally, I think Trump’s answer on climate change was probably written by Bjorn Lomborg. Or, cribbed form something he wrote.

(I suppose someone should be running these answers through a plagiarism checker???)

Gary Johnson apparently has nothing to say about science policy. That makes sense. He’s a Libertarian, and Libertarians don’t believe in science policy.

Jill Stein gave an interesting answer on Vaccines.

Trump wants to stop the inflow of opioids into the United States. He may not have understood the question.

The word “wall” does not appear among the answers, though Immigration is asked about.

Interesting answers on space as well.

Go look. Report back!

And, if you’ve not seen this, enjoy:

The reason Hillary Clinton has cinched the nomination

This is an excellent moment to revel in the complexity of life, and argument, and to appreciate the value of the honest conversation.

A candidate is the presumed nominee when she or he obtains the required number of pledged delegates to be at 50% plus a fraction in the total pledged delegate count. This is because a candidate must have a true majority to win the nomination when the delegates are all counted up at the convention, and the pledged delegates are required to cast their lot with the candidate they are pledged to, assuming that candidate exists at the time of the convention.

Hillary Clinton and Donald Trump have not reached that bar. Therefore, neither is the presumed nominee for their party.

But then there are the unpledged delegates. Unpledged delegates can vote for whomever they like at the convention, and therefore, anything can happen. However, it is the practice among unpledged delegates to “endorse” or otherwise show support for a particular candidate. News agencies may use that statement of support to place that unpledged delegate in the column for a particular candidate.

Using this form of math, Trump did not reach a true majority of delegates a couple of weeks ago for two reasons. First, the Republicans have very few truly unpledged delegates. (The Republican and Democratic systems are not parallel or comparable, but the Republicans do have a certain number of delegates who can do what they want when the convention rolls around.) But then, one day, a bunch of unpledged delegates from one of the Dakotas made a statement. They said that they would definitely cast their ballot for Trump in the Convention. This was just enough to put Trump over the top, by adding together the pledged delegates that were pledged to him, and this small number of “unpledged” but now “pledged-ish” delegates.

That is still not clenching the nomination, because even though those Dakota delegates went beyond support or endorsement, to the level of actually promising to vote for Trump, they really still don’t have to vote for him.

But, the press took this as an event, and decided to go with it, and Trump became the actual nominee.

That may seem like a digression in a post about the Democratic primary, but it is relevant because the press has this thing they do where they balance or equalize. Therefore, even though the systems are not truly comparable or parallel, and the event in the Dakotas was actually meaningless, the press did in fact go with the “Trump is the presumed nominee” thing, and therefore, one should expect, even in the absence of a logical underpinning to the argument, the press to do the same in the Democratic party. That is only a small part of the story, but it is part of the story.

I should reiterate that unpledged delegates (in the Democratic party, unofficially called “Super Delegates”) are unpledged even when they pledge. That is a simple fact. But, there are nuances. For example, I know one Super Delegate that on principle will not declare for a candidate until the convention. But I also know that this individual liked Bernie Sanders. I suspect that this means that under some conditions, this delegate would vote for Sanders, but maybe not. I know another Super Delegate who has endorsed Clinton, and another who has endorsed Sanders, publicly. However, I do not assume that either one of them will absolutely vote for that candidate. An endorsement is not a pledge. If Bernie Sanders is found sitting in a hot tub full of fruit jello with the leader of North Korea on a yacht owned by the Koch Brothers, making a deal to trade nuclear warheads, that the delegate that endorsed Sanders will not cast a ballot for Sanders at the convention. But the pledged delegates from the same state will be forced to by the rules. (This is why we have Super Delegates. This is also why we can expect the Republicans to add a higher percentage of unpledged delegates when they rewrite the rules for the next primary season.)

The Dakota delegates, however, did something different. They did not endorse, or show support, but they pledged. However, their pledged is, in fact, legally irrelevant.

And now, we come to 2008. It could be said not too inaccurately that a point in time came during the 2008 nomination battle between now President Obama and Hillary Clinton, when it became apparent that Obama was going to win, the press said so, and Clinton took two days or so off and came back into the ring no longer fighting Obama, but now as part of his tag team.

And, it could be said not too inaccurately that this same moment came in the present election about now. Staring a few days ago, various members of the press began to note that this moment was upon us, and to imply that it would be unfair to Hillary to have given this moment to Obama in 2008, but not give it to Clinton now. I think the belief 48 hours ago might have been that this moment would definitely be on us by the end of the voting process in today’s primaries, but then another thing about the press came into play. The press has to treat everybody and every event like they are all identical blue Smurfs but they also have to do things first, to beat out their rivals, to scoop. In fact, this “moment of clinch” could have been after the Puerto Rico primary, or even earlier. And it was absolutely going to happen after Tuesday. So, AP jumped out of the gate and made it happen Monday, and this is now the True Reality.

So, let us review.

Hillary Clinton is the presumed nominee because she has almost enough pledged delegates plus a gazillion unpledged delegates.

However, part of the impetus for declaring this is that Trump got that courtesy two weeks ago.

But, Trump was the only person running in that race, and Clinton still has an opponent.

Still, numerically, Clinton can’t not get the nomination because she has many hundreds of Super Delegates and Sanders has only a few dozen.

On the other hand, Super Delegates are unpledged. UNpledged. We argue all along that they should not be counted. Then suddenly we count them. Is that fair?

One could say, however, that it is fair. At some point it becomes fair because the numbers become so tilted. If the hundreds of Super Delegates that have endorsed Hillary decided to randomize their preference using a coin biased in favor of Sanders, there would still be more than enough to put Clinton over the top.

It is only fair to Clinton that she gets the same treatment as Obama.

It is only fair to Sanders that she not.

And on and on it goes.

So, is there a good reason that Hillary Clinton is now regarded as the Democratic Party nominee for the office of the President?

Yes.

And no.

A good part of the reason that both answers are valid is because the press has painted themselves into a corner located between a rock and a hard spot and have only a Hobbson’s choice. That is a bad reason. Another reason is fairness. That is a good but not overwhelmingly good reason. There is no reason that the process one year needs to be the same as other years, since presidential election years are so different in so many ways. Another reason is math. While we wish to keep the Super Delegate count separate and let the pledged delegates do their job, at some point the Super Delegates should probably be considered as a factor, if not counted precisely. (See this, “Fixing The Super Delegate Problem,” for an alternative way of doing this whole thing.) That is probably reasonable and fair. If the numbers are big enough. But there is no objective criterion for when the numbers are big enough. So maybe not so fair.

So here is where the honest conversation part comes in. There really is no considered, informed, honest position on this that ignores the complexity and dismisses other opinions out of hand.

I hope you read this post on Tuesday, June 7th, because starting the next day it is not going to matter too much.

Who will win the remaining Democratic primaries?

As you know, I’ve been running a model to predict the outcomes of upcoming Democratic Primary contests. The model has change over time, as described below, but has always been pretty accurate. Here, I present the final, last, ultimate version of the model, covering the final contests coming up in June.

Why predict primaries and caucuses?

Predicting primaries and caucuses is annoying to some people. Why not just let people vote? Polls predict primaries and caucuses, and people get annoyed at polls.

But there are good reasons to make these predictions. Campaign managers might want to have some idea of what to expect, in order to better deploy resources, or to control expectations. But why would a voter who is not involved in a campaign care?

I had a very particular reason for working on this project, of predicting primaries and, ultimately, the course of the Democratic race for the Democratic nomination as a whole. When this campaign started, there were several candidates, and they all had positive and negative features. Very early in the process, all but two candidates dropped out, and I found myself liking both of them, though for different reason. I would have been happy supporting either Hillary Clinton or Bernie Sanders.

Personally I believe that it is good to vote, during a primary, for the person you like best in direct comparison among the other candidates. But at some point, it may be wise to support the one you feel is most likely to win. There are two closely related reasons to do this, and I think most observers of the current campaign can easily understand them. One is to help build momentum for the candidate that is going to win anyway. The other is to limit the damage that is inevitable during a primary campaign as the candidates fight it out.

So, early on in the process, I decided to see if I could produce a reliable method to predict the final outcome of the primary process, in order to know if and when I should get behind one of the candidates. That is the main reason I did this. In order for this method to meet this and other goals, it had to be more accurate than polls.

There are other reasons. One is that it is fun. I’ve been doing this in primaries and general election campaigns for quite a few elections. I like data, I like analyzing data, I like politics, I like trying to understand what is going on in a given political scenario. So, obviously, I’m going to do this.

Another reason is to test the idea that the voters are changing their minds over time. In order to do this one might use all the primaries and caucuses to date to predict future primaries and caucuses, and then, if the predictions go out of whack, you can probably figure that something new is going on. This relates to overall feelings among the electorate as sampled by each state, but it also relates specifically to ideas about why a particular state reacted to the campaigns the way it did.

An example of this came up recently when Bernie Sanders won in West Virginia. My model had predicted a Sanders win there, and the actual vote count was very close to the prediction. Since that prediction was based on voter behavior across the country to date, I was confident that nothing unusual happened in West Virginia. But, something unusual should have happened there, according to some conceptions of this campaign.

The economy of West Virginia is based largely on coal mining, and there are lot of Democrats there. (Democrats in local elections; they tend to vote for Republicans in the general.) So, it was thought that the voters would pick a candidate based on a perceived position on climate change and coal. Clinton went so far as to pander to the West Virginians with a rather mealy mouthed comment about how we could still keep mining coal as long as we figured out a way to have it not harm the environment. That was the Clinton campaign doing something about the coal mining vote. Others thought that a Sanders win there would indicate that he somehow managed to get a strong climate change message across to coal miners. That idea is a bit weak because when it comes down to it, Clinton and Sanders are not different enough on climate change to be distinguished by most voters, let alone coal supporting voters. In any event, the win there by Sanders was touted as a special case of a certain candidate bringing a certain message to certain voters. But, he then lost in the next coal mining state over, Kentucky, and in both states the percentage of voters that picked Clinton and Sanders was almost exactly what my model predicted, and that model was not based on climate change, coal, or perceptions or strategies related to these things, but rather, on what voters had been doing all along.

So, nothing interesting actually happened in West Virginia. Or, two interesting things happened that cancelled each other out perfectly. Which is not likely.

In short, the closeness of my model to actual results, and the lack of significant outliers in the overall pattern (see below), seems to indicate that the voters have been behaving the same way during the entire primary season, by and large. This is a bit surprising when considered in light of the assumption that Sanders would take some time to get his message across, and pick up steam (or, I suppose, drive people over to Clinton) over time. That did not happen. Democratic voters became aware of Sanders and what he represents right away, and probably already had a sense of Clinton, and that has not changed measurably since Iowa.

How does this model work?

For the first few weeks of this campaign I used one model, then switched to an entirely different one. Then I stuck with the second model until now, but with a major refinement that I introduce today. The reason for using different models has to do with the availability of data.

All the models use the same basic assumption. Simply put, what happened will continue to happen. This is why I sometimes refer to this approach a a “status quo model.” I don’t use polling data at all, but rather, I assume that whatever voters were doing in states already done, their compatriots will do in states not yet done. But, I also break the voters down into major ethnic groups based on census data. So, for each state, I have data dividing the voting populous into White, Black, Hispanic and Asian. These racial categories are, of course, bogus in many ways (click on the “race and racism” category in the sidebar if you want to explore that). But as far as American voters go, these categories tend to be meaningful.

The fist version of the model used exit polling (ok, so I did use that kind of polling for a while) to estimate the percentage of black voters who would prefer Sanders vs. Clinton. I used the simple fact that in non-favorite son states that were nearly all white Clinton and Sanders essentially tied to estimate the ratio of preferences for white votes at about even. I ignored Hispanic and Asian voters because the data were unavailable or unclear.

This model simply simulated voters’ behavior (in the simplest way, no randomization or multiple iterations or anything like that). I also used some guesses (sort of based on data) of the ethnic mix for Democrats specifically in so doing. That somewhat clumsy model worked well for the first several primaries, but then, after Super Tuesday there were (sort of) enough data points to use a different, superior method.

This method simply regressed the outcome of the primary (in terms of one candidate’s percentage of the vote) against the available ethnic variables by state. Early on, the percentage of Hispanic or Asian did not factor in as meaningful at all, and White and Black together or White on its own did not work too well. What gave the best results was simply the precent of African Americans per state.

“Best results,” by the way, is simply measured as the r-squared value of the regression analysis, which can be thought of as the percentage of variation (in voting) explained by variation in the independent variable(s) of ethnicity.

Primaries vs. Caucuses and Open vs. Closed

Many things have been said about how each of the two candidates do in various kinds of contests. We heard many say that Sanders does better in Caucuses, or that Clinton does better in closed primaries. During the middle of the primary season, I tested that idea and found it wanting. Yes, Sanders does well in caucuses, but the ethnic model predicts Sanders’ performance much better than the caucus-no caucus difference. It turns out that caucusing is a white people thing. There are no high diversity states where caucusing happens. It is not the caucus, but rather the Caucasian, that gives Sanders the edge.

This graph shows how Sanders vs. Clinton over-performed in caucuses vs. primaries.

Caucus_vs_Primaries_Clinton_vs_Sanders

The value plotted is the residual of each contest in relation to the model, or how far off a theoretical straight line approximating the pattern of results each contest was. Two things are apparent. One is that caucuses are less predictable than primaries. The other is that while Sanders did over-perform in several caucuses, this was not a fixed pattern.

This graph shows the residuals divided on the basis of whether the contest was open (so people could switch parties, or engage as an independent) vs closed (more restricted).

Open_vs_Closed_Primaries_Caucuses

Open contests were more variable than closed contests, but it is not clear that either candidate did generally better in one or the other.

After many primaries and caucuses were finished, there became enough data to use the kind of contest as a factor in conducting the regression analysis. There are a lot of ways to do this, but I chose the simplified brute force method because it actually gives cleaner, and more understandable, results.

I simply divided the sample into the kind of contest, and then ran a multivariable regression analysis with each group, with the percent of Sanders plus Clinton votes cast for Clinton as the dependent variable, and the percentage of each of the four ethnic categories as the independent variables. There are some combinations of caucus-primary and open-closed/semi-open/semi-closed that are too infrequent to allow this. For those contests, I simply developed a regression model based on all the data to use to make a prediction in each of those states. The results, shown below, use this method of developing the most accurate possible model.

How does this sort of model actually make a prediction?

The actual method is simple, and most of you either know this or don’t care, but for those who would like a refresher or do care…

The regression model, using multiple variables, produces a series of coefficients and an intercept. You will remember from High School algebra that the formula for a line is

Y = mX + b

X is the independent variable, along the x axis, and Y is what you are trying to predict. m is the slope of the line (a higher positive number is a steeply upward sloping line, for example) and b is the point where the line crosses the Y axis.

For multiple variables, the formula looks like this:

Y = m1(X1) + m2(X2) + … mn(Xn) + b

Here, each coefficient (m1, m2, up to mn) is a different number that you multiply by each corresponding variable (percent White, Black, etc.) and then you add on the intercept value (b). So, the regression gives the “m’s” and the ethnic data gives the “X’s” and you don’t forget the “b” and you can calculate Y (percent of voters casting a vote for Clinton) for any given state.

So, enough already, who is going to win what primary when?

Not so fast, I have more to say about my wonderful model.

How have the public opinion polls done in predicting the contests?

Everybody hates polls, but like train wrecks, you can’t look away from them.

Actually, I love polls, because they are data, and they are data about what people are thinking. The idea that polls are inaccurate, misleading, or otherwise bogus is an unsubstantiated and generally false meme. Naturally, there are bad polls, biased polls, and so on, but for the most part polls are carried out by professionals who know what they are doing, and I promise that those professionals are aware of the things you feel make polls wrong, such as the shift from landlines to cell phones.

Anyway, polls can be expected to be reasonable predictors of election outcomes, but just how good are they?

Looking at a number of races today, excluding only a few because there were no polls, I got the Real Clear Politics web site averages for polls across the states, transformed those numbers to get a percentage of the Sanders + Clinton vote that went to Clinton, and plotted that with the similarly transformed data from the actual primaries and caucuses. The r-squared value is 0.52443, which is not terrible, and the graphic shows that there is a clear correlation between the two numbers, though the spread is rather messy.

How_Good_Are_Polls_2016_Democratic_Primary

The ethnic status quo model outperformed polls

My model is actually many models, as mentioned. I have a separate regression model for each of several kinds of primary, including Closed Caucuses, Closed Primaries, Semi-Closed Primaries, and Open primaries. I did not create separate models for the much rarer Semi-Open Primary, Semi-Open Caucus or Open Caucus style contests, as each of these categories had only one or a few states. Rather, the model used to calculate values for these states is derived from all the data, so addressing specific quirkiness of each kind of contest is sacrificed for large sample size.

I also generated models that included White, Black, Hispanic, and Asian; each of these separately; and various combinations of them. As noted above, the best single predictor was Black. Hispanic and Asian were very poor predictors. White was OK but not as good as Black. But, combining all the variables worked best. That is not what usually happens when throwing together variables. It is more like mixing water colors, you end up with muddy grayish brown most of the time. But this worked because, I think, diversity matters but in different ways when it comes in different flavors.

When the total data set was analyzed with the all-ethnicity model, that worked well. But when the major categories of contest type was analyzed with the all-ethnicity model, some of the data really popped, producing some very nice r-squared values. Closed caucuses can not be predicted well at all (r-squared = 0.2577) while Open Caucuses perform very well (over 0.90, but there are only a few). The most helpful and useful results, though, were for the closed primary, open primary, and Semi=closed primary, which had R-squared values of 0.69, 0.61, and 0.74, respectively.

What this means is that the percentage of the major ethnic groups across states, which varies, explains between about 61 and 74% of the variation in what percentage of voters or caucusers chose Clinton vs. Sanders.

Polls did not do as well, “explaining” only about half the variation.

So, the following graph is based on all that. This is a composite of the several different models (same basic model recalculate separately for some of the major categories of contest), using nominal ethnic categories. The model retrodicts, in this case, the percentage of the vote that would be given to Clinton across races. Notice that this works very well. The few outliers both above and below the line are mainly caucuses, but the are also mainly smaller states, which may be a factor.

Laden_Model_Performance_Democratic_Primary_2016_May

Who will win the California, New Jersey, Montana, New Mexico, North Dakota, South Dakota, and D.C. primaries?

Clinton will win the California, New Jersey, New Mexico and D.C. Primaries. Sanders will win the Montana, North Dakota, and South Dakota primaries. According to this model.

The distribution of votes and delegates will be as shown here:

Democratic_Primary_California_New_Jersey_Others_Winners

This will leave Sanders 576 pledged delegates short of a lock on the convention, and Clinton 212 pledged delegates short of a lock on the convention. If Super Delegates do what Sanders has asked them to do, to respect the will of the voters in their own states, then the final count will be Sanders with 2131 delegates, and Clinton with 2560 delegates. Clinton would then have enough delegates to take the nomination on the first ballot.

In the end, Clinton will win the nomination on the first ballot, and she will win it with more delegates than Obama did in 2008, most likely.