I spent about 45 minutes yesterday in the local HMO clinic. They had turned the main waiting room into a Pandemic Novel A/H1N1 Swine (nee Mexican) Influenza quarantine area, and I could feel the flu viruses poking at my skin looking for a way in the whole time I was there.
Amanda, who is 8.3 months pregnant, started getting symptoms of the flu two days ago. As a high school teacher in a school being affected in a state being affected (as most are) she is at high risk for this. She was one of the first people around here to get the vaccine, just a couple of days ago, but it takes about 10 days to take full effect, so it was recommended that she go on Tamiflu for a while.
Tamiflu seems to not work very well against the current (or should I say expected) seasonal flu, but it appears that the Pandemic Swine Flu has virtually no resistance to it. And it normally works fast. Within 24 hours Amanda’s symptoms disappeared. There are three possible explanations for that:
- Utter chance;
- Tamiflu did it’s thing; or
- The Tamiflu pill was actually a sugar pill with an especially strong Placebo effect.
Today, Amanda and many many other teachers from across the country are meeting at the national Science Teachers Association. So any mixing up and spreading of the flu that the students have not yet accomplished will be compensated for by the teachers exchanging the virus today and over the weekend. But Amanda has her Tamiflu and the vaccine, so she should be fine. I may ask her to take some extra placebo tonight with dinner.
In the next iteration of a pandemic, we should be providing vaccine for free at conferences and conventions. (Maybe we’re doing that now…. anybody know?)
There are three things you should read on the internet this morning about the flu, vaccines, and related issues:
This is by Revere, and it covers a paper just published in expedited form, OpenAccess, so you can read it yourself. I’ll have a few comments to make about this paper below, but the best summary of its results is Revere’s post at Effect Measure.
Then there are these two items by Orac and and James Hrynyshyn, respectively, on related issues: 2) The anti-vaccine movement strikes back using misogyny and 3) The link between the climate denial and anti-vaccine crowds
OK, now, about this flu paper. My comments are restricted to two aspects of the method used in this paper, and all I really want to do is add a little to your comfort level in relation to these methods. These are methods commonly used in my own fields of research (including archaeology) and that I’ve thought a bit about and taught in various classes, and I’ve found that people, once they start to learn about them, get all freaked out and refuse to believe that they are of any use. The methods are, to adopt terminology for this post that may not be reflected perfectly in the paper at hand, extrapolation and resampling.
Resampling first. Bootstrapping is also known, depending on its implementation, as Monte Carlo Simulation, Resampling, or just Simulation. There are other terms as well. It is probably best to consider them all under the heading “Resampling.”
To really understand the value of resampling, it is best to start with a concept of the inadequacy of normal parametric statistics. What the heck does that mean? At the risk of oversimplifying…. Let’s say you have calculated two averages and you want to find out if it is statistically OK for you to say that they are different … that they are averages of different populations, instead of two numbers that look different but only for random reasons. So you take the averages, the difference between them, and some kind of estimate of the variation in the population(s) you think you are sampling, and the number of samples you took to get the average.
If the two numbers are farther apart, you can have more confidence that they are different. If the amount of variation in the actual populations you are sampling is low, then you can have more confidence that they are different. If the number of samples you’ve taken is greater, you can have greater confidence that they are different.
Standard statistical methods evaluate this information … difference between means, variation in the population, and size of your sample(s), to give you a couple of numbers you can use to determine if it is statistically valid to say that the numbers are different.
But, there is a problem with this. In order for out of the box statistical methods to be used to do this, there has to be a number of assumptions made about the underlying distributions of the population(s) you are looking at. For instance, it is common to assume that these populations are “normally distributed” (like a bell curve) or that they follow some other standard, well studied distribution. So, you plug the numbers you have … the means or the difference between them, the info on variance, and the sample size … and those parameters are evaluated by magic statistical formulas built into computer programs in relation to some pre-existing model using distributions and statistics derived from earlier study with those distributions.
Often that works well because the previously studied distributions, and the relationships between the numbers and the distributions and stuff tends to be the same time after time. If you are studying the behavior of a roulette wheel, the frequency over time of raindrops falling into a bucket, people getting the flu, Russian soldiers getting killed by their horses, the distribution of stars in the sky, and so on, you may be able to use research on the distributions and statistical measures (and their interactions) carefully carried out on one or two of these phenomena to develop shortcuts to apply in the other situations.
And that is the crux of what I want to say: Standard statistical tests (the z-test, the t-test, the F-test, chi-square statistics, etc. etc.) whether they be “parametric” or “non-parametric” are all shortcuts.
The reason these shortcuts exist is because it is impossible to take thousands or tens of thousands of data points, analyze them to determine the nature of the distributions they represent, then use those discovered empirically based situationally dependent distributions to calculate test statistics and confidence intervals and stuff.
Unless, of course, we had a machine to do this! If only we had a machine into which we could put all the data, and then this machine would do calculations on the data!
Yes, folks, with modern computers it is quite straight forward to replace the old fashioned shortcuts with a brute force, direct analysis of actual data which produces (using proper methods and theory) much much better statistics than before.
I want to re-explain this two more ways keeping in mind that I’m still oversimplifying.
1) Here is the actual sequence of events one would like to do in statistical analysis.
a) Formulate a hypothesis about some numbers.
b) Fully analyze the distributional context of those numbers … are the populations they come from uniformly distributed? skewed? unary (only one possible number can be obtained no matter how often you sample it)? distributed like a bell curve?
c) Calculate the parameters of the actual distribution linked to the actual numbers you are using.
d) Calculate the actual probability related to your hypothesis, such as “the probability that these two numbers I say are different are actually drawn form the same population and only look different because of the nature of the distributions I analyzed in step ‘b’ is …”
Here’s what really happens in traditional statistical analysis:
a) Some guy, like two hundred years ago, gets interested in numbers and creates idealized distributions of things and figures out that there are some interesting relationships between and among them.
b) Some other guys, over the next couple of centuries, do the same thing with a bunch of other phenomena and come up with a handful of additional relationship types. Having no computers for any of this, that was hard.
c) Meanwhile, people figure out how to take this handful of distribution sets and use then to estimate what may or may not be going on with a particular data set. But each time one must worry about the degree to which one’s own data matches the original distribution on which a certain test statistic is based. Over time, people forget what the original distributions even were, and begin to fetishize them. For instance, the degree to which one’s data behave just like Russian Army horses’ tendency to kick soldiers to death becomes a matter of great angst and consternation, especially in graduate school.
d) Individual researchers learn which other researchers to emulate, and then they just do what they do and hope nothing goes wrong. The important thing is the p-value anyway.
Here is how resampling works:
a) All of the above is compressed into a single analysis of your actual data.
The distributional behavior of your data is determined by taking repeated random samples of the data (with replacement). Perhaps you will do this at several sample sizes. The result tells you how badly wrong your hypothesis can be … and if the answer is “not to bad” then your good. (This is all done with numbers, of course.)
2) For my second parable, imagine that you are in a situation that has nothing to do with statistics but requires you to make a decision. It is complex. The situation is unique although is falls into a known category of situations. So, you go to an experienced expert in this kind of situatoi and you describe only the basic outline, leaving out all details, and ask the expert what she would normally do in this situation.
The expert replies “Well, I don’t know the details, but generally, in this situation, I’d punt (or whatever).”
Alternatively, you are facing the same situation. So you get the expert (from above) and bring them to wherever it is you are working on this. The expert gets to see the exact situation you are in, and how your situation differs from the typical situation. Based on all the information, she draws a very different conclusion than above because there are particulars that matter.
“Don’t punt (or whatever).”
Which would you prefer? The first scenario is your data in a t-test. The second scenario is your data bootstrapped.
The second analytical techniques talked about in the paper covered by Revere is extrapolation. Obviously, extrapolation is dangerous and scary. Which would you feel more comfortable with:
1) Estimate the percentage of people who are sick in the hospital with a possible flu who require IV fluids in a particular hosptical in United States. You are given given data on number of people who walk into a hospital with flu-like symptoms, and the number of these people who get IV’s, for five one week periods distributed evenly across the flu season in ten randomly chosen hospitals plus the one you are charged to calculate this number for. In other words, you are having a statistician’s wet dream.
2) Estimate the number of people who have the flu in the United states for a given flu season based on the number of IV’s doled out to patients in ten randomly chosen hospitals. You are now having a statistician’s nightmare.
Or, consider this somewhat cleaner comparison:
You must dig a hole into which will be placed the the concrete base for a gate you hope to have in a fence you are installing in your yard.
1) All of the fence posts are in place, and you are told to put the gate post half way between two of the posts.
2) None of the fence posts are in place, and you are told to measure a line that is 47.5 feet from the NW corner of your house at bearing 312 degrees. You are not quite sure what is meant by “corner” of your house because your foundation has a vertical jog in it, and the original measurement may have been from the siding and not the foundation. Your compass sucks. You are not sure if this is 312 degrees off magnetic north or true north. You don’t have a tape measure that long. And so on.
Taking numbers that are fairly good numbers and dividing them up, looking within their ranges, breaking them into bits, is interpolation, and that can be done fairly accurately. Extending numbers outward long ‘distances’ (sometimes real distances, sometimes time, sometimes frequencies, etc.) involves a lot more uncertainty. That is what you see in the flu paper. The authors use appropriate techniques, and you will see that the range of numbers they conclude in answer to the question proposed in the title of the paper is quite large … that is because it is extrapolation that they are using, but these numbers are well confirmed by a kind of resampling.
How well all this works depends, as usual, on the question you are asking. One time I needed to find out if a particular house was made of brick vs. timber. The remote farm house had been torn down and most of the debris seemed to be dumped in the cellar hole. There were a lot of bricks, but there would have been one or two chimneys in a frame house. Also, a frame house could be “nogged” which is where clunkers and seconds (low quality bricks) are used to fill in between the timbers. Or, it oculd have been a brick house.
So, I did two things. Using the foundation size and what was known for houses at the time, I estimated how many bricks would be used for the following:
- A two story brick house
- A one story brick house
- A two story nogged house
- A one story nogged house
- A two story house with a brick chimney
- A one story house with a brick chimney
Separately, I weighed all the bricks we dug up in several holes, and extrapolated that number to estimate how many bricks would likely be found if we dug up the whole property. I came up with a number closest to choice 5: One chimney on one story frame house.
I did not need to know the actual number of bricks. What I needed to know was which of the plausible alternatives the estimate of brick quantity matched most closely. For the flu, it may be enough at this time to know if the Swine Flu is like the seasonal flu, not nearly as bad, much worse, etc.
Confidence can be increased in extrapolation with confirming evidence. In the case of the farm house, I counted the number of brick faces that were heavily charred (from being inside the chimney) and found that this number relative to uncharred faces was a very high. This suggests a fireplace. I noted that the bricks were mostly in one area of the foundation like maybe there was a chimney there. That suggests the chimney idea is more likely than the other ideas. And, I noted that most houses built in Saugerties NY in the 1870s were one story unnogged timber with brick chimneys. Had I started with that last observation and drew conclusions I might be guilty of confirmation bias. But instead, I ended with it, and got reasonable confirmation.
The first estimate was truly unworthy …. I could have been way far off with the brick count for a lot of reasons, and I had to make a lot of assumptions (we had not dug very many holes!). But the ratio of burned surfaces was an independent confirmation, and the conclusion was not unexpected. So, I was able to argue against confirmation bias (finding what we expected) and put this house down in the data base as yet another timber framed farm house.
Extrapolation is dangerous. Ask any Marine artillery forward observer you may happen to know, because it is what they do, but they do it with bombs and a misplaced bomb may fall right on him or herself, or a nearby baby food factory, or some other thing you don’t want to drop a bomb on. But with strong empirical background, experience, good theory, and independent confirmation it works. Or at least, it is often the best we can do and our best is good enough.
Reed, C, Angulo, F., Swerdow, D, Lipsitch, M, Meltzer, M, Jeernigan, F., & Harvard School of Public Health (2009). Estimates of the Prevalence of Pandemic (H1N1) 2009, United States, April-July 2009
Emerging Infectiou Diseases, 15 (11)