… if Nate Silver’s analysis is correct.
If a person is asked to make up a bunch of numbers … random numbers … s/he will tend to make up non-random numbers instead. So, for instance, if I ask you to state random numbers that have two digits, and I plot the second digits on a hisogram, and then I ask my computer to make up two digit random numbers and plot the second digits on a histogram, my computer’s histogram will show roughly the sane number of 0’s as 1’s as 2’s …. as 9’s (especially with a large sample size) but you, being a silly you-man, will make up numbers with more of one than another in a non-random fashion. If you are a typical Westerner you will come up with more sevens, most likely.
Nate Silver studied the polling data prodcued over a long period of time by the conservative publicity firm and polling company “Strategic Vision” and showed that their polling data is not random in the trailing digit like it should be.
Maybe polling data is just not random? If you studied the trailing digits of, say, prices of gasoline to the tenth of a cent in the United States, you’d find that all gasoline costs something-something-point-nine (very non random). If you studied the trailing digits of retail prices in most areas, again, you’d find more 9’s than expected. But, when Nate Silver studies Five Thirty Eight’s polling data, he gets a pretty even distribution across the trailing digits, which is what you’d expect in a large sample.
Nate’s article is here.
It will be interesting to see how Strategic Vision reacts to this . They’ll probably have a poll that shows that most people think they are being honest.
They’ve already reacted with bluster and insinuation of legal threats. As I noted over at my place, they’ve also been sanctioned by the AAPOR for not disclosing their methodologies, which I figure is because who wants to actually disclose that their methodology is “pick some numbers that are as close to reality as we can manage, then skew them heavily toward whoever paid us”.
I did a quicky in Excel 2000 using the RND function. I ran it for 100 numbers.
Interesting on pass 1 the most frequent numbers are 4, 5, and 8.
Second pass gets 5, 6, 9.
So if I combine the two the most frequent numbers are 4, 5, 6, 8, 9. 0, 2, and 3 seem to be missing.
I know there’s a randomization bug in VBA so maybe that accounts for it.
They’re threatening Nate with lawyers:
Maybe polling data is just not random? If you studied the trailing digits of, say, prices of gasoline to the tenth of a cent in the United States, you’d find that all gasoline costs something-something-point-nine (very non random). If you studied the trailing digits of retail prices in most areas, again, you’d find more 9’s than expected.
He does say the numbers might not be random in a homogeneous sample (say, only McCain-Obama polls in the presidential election), but he’s using data from a wide variety of polls so it doesn’t really apply.
Anyway, there’s a later post addressing that problem, where he compares Strategic Vision’s numbers to Quinnipiac polls, and although the comparison shows that polling data might not follow a completely uniform distribution, SV’s data is still way off.
Also, it turns out that citizenship test that Oklahoma high-schoolers flunked so badly that 5% of them thought Obama was our first president ? Strategic Vision.
That would explain a lot…
They’re threatening Silver with legal action yet still refuse to disclose their polling methodologies? That sounds like the polling firm’s equivalent to leading police on an ultra-slow chase through LA to me.
They won’t sue for the simple reason that Silver would have to get access to lots of their raw data & methodologies in order to defend himself, which they probably won’t be able to allow.
@TonyP: that’s not how you do statistics. There’s always a most common digit in any run of any size, or a set of three most common digits. Running two samples likely will give you two different sets of three most common digits. None of this gives you any information on whether some digits are less common than others in your RNG. It’s even possible that there might be a digit that is in the top three most common digits of the combined data set, even though it’s not in the top three of each data set separately.
So what? Everybody juices their polls, one way or another. Skewed samples, loaded questions, finagled analyses … you can find something “questionable” about any poll done by any firm regardless of its political persuasion. That’s why I never trust any of them, and haven’t in many years.
wolfwalker: well, no. And the specific accusation here is that they pulled the data out of their asses. That is not juicing the poll, it is totally making the thing up.
I like the fact that Nate has asked for people to contact him if they were polled by this group and have some story. At first I thought this was dumb because the ‘story’ would be pretty useless. But then I realized his true motivation.
Has anyone ever been polled by this group?
*subdued cackle* – they “have a call into”? So their attourney doesn’t take their calls straight away? Maybe a receptionist makes a note, and later in the afternoon, when the serious work for the day is done, their attourney will look through the notes and decide whether to call them back – or whether to put in a quick nine holes on the local course?
And the specific accusation here is that they pulled the data out of their asses.
Again: so what? What’s the difference between data “pulled out of their asses” and data slanted using a skewed party affiliation, or selective sampling, or loaded questions, or any of the other ways that pollsters have learned to use over the years? It’s all fraud, and all sides do it. If you get hot and bothered over this case and not over others, you’re being hypocritical.
That question came up at dinner last night, and the answer is best described here. Given the degree of ingenuity displayed (always assuming Nate is right) I wouldn’t rule out other demonstrations of brilliance. Still, the “thumb on the scales” hypothesis can’t be ruled out. Ockham’s Razor rather than Hanlon’s.
Either way, discovery will tell. Which is a good reason to expect that this will never actually go to court — too much other dirty laundry (e.g. the objectives requested by the sponsors) would come out.
wolfwalker, the “pox on both your houses” BS is getting really old. Many polling orgs put a lot of effort into actually measuring the opinions of the public and generally do a pretty good job of it.
There are plenty which make serious but honest mistakes. Certainly, some polling orgs get clever with sampling, ordering, and wording to get their desired results. However, these dishonest manipulations are far more often designed to produce “newsworthy” results than a particular partisan bias. Subtle changes in wording which result in dramatic swings in polls which follow the existing media narratives are a classic example.
Several of the polling organizations are actually academic endeavors, where getting the methodology right is actually the central goal.
Nate Silver is popular (and respected) because he is extremely competent at statistical measurement. He has also quickly gained real expertise in the subtleties of polling and the use of surveys and polls to actually make real measurements.
It is notable that he started off in the world of predictive baseball statistics, where being right is all that counts.
PS: Nate had a post a while back on
How to Poll on the Public Option.
Beyond being interesting, I think it also illustrates Nate’s essential geekiness and concern for ‘doing it right’.
wolfwalker, the “pox on both your houses” BS is getting really old.
[shrug] It’s what I think of polling and pollsters. I have no doubt that at least some pollsters try to get it right. I remain unconvinced that they can succeed reliably. I also remain unconvinced that there is anything honest about political polling — by anyone, on any side, of any issue whatever. Some of them bend the results on purpose, others do it unintentionally — but one way or another, they all do it.
I don’t trust any political poll, and I don’t recommend anyone else do so either.
wolfwalker, are you claiming that you make no moral distinction between pollsters making erroneous assumptions and fabricating the data?
I can see why it was so easy to con the sheeple on the “conservative” side of U.S. politics. They don’t care what the truth is.
Doesn’t this just mean the 538 uses a computer to fake their polls?