The Times doesn’t know Bayes

If you’ve spent any time at all reading science and medicine blogs, you know that many of us are quite critical of the way the traditional media covers science. The economics of the business allows for fewer and fewer dedicated science and medical journalists. In the blogosphere, writers have a certain freedom—-the freedom not to be paid, which means that the financial fortunes of our medium (the web) are not directly tied to how many readers I bring in with a headline. But all this is just a lot of words introducing my critique of a recent New York Times article.

The article is titled “Using Science to Sort Claims of Alternative Medicine”. It’s well-written and interesting, but suffers from a fatal flaw (or perhaps just recapitulates it)—like most of us, it fails to take into account how likely (or unlikely) a bizarre medical claim is when evaluating evidence for it.

The author doesn’t realize it, but he points out the fatal flaw in the modus operandi of the National Center for Complementary and Alternative Medicine (NCCAM). Lately, the alternative medicine community has seen some of its bigger trials fall apart.

The alternative medicine community has a few different sects. The largest is the group of various snake-oil salesmen out to make a buck on others’ suffering. Then there is the “supplement industry”. Finally there is the saddest sect—that of real scientists trying to use evidence-based medicine to evaluate improbable claims. These folks mean well, but they’ve picked the wrong tool for the job.


Those who try to design bigger and better studies to study improbable medical claims forget to ask one simple question. Observe:

Now the federal government is working hard to raise the standards of evidence, seeking to distinguish between what is effective, useless and harmful or even dangerous.

“The research has been making steady progress,” said Dr. Josephine P. Briggs, director of the National Center for Complementary and Alternative Medicine, a division of the National Institutes of Health. “It’s reasonably new that rigorous methods are being used to study these health practices.”

[…]

That kind of fog [poor study design, false positives, false negatives] is what Dr. Briggs and the National Center for Complementary and Alternative Medicine, with a budget of $122 million this year, are trying to eliminate. Their trials tend to be longer and larger. And if a treatment shows promise, the center extends the trials to many centers, further lowering the odds of false positives and investigator bias.

They are forgetting one critical question: is the treatment being studied even plausible? This question is crucial, as without it, the data are meaningless. Bayes’ Theorem is a statistical way to include plausibility when looking at a clinical trial (and I’ll refer you to the linked source for a more complete explanation). Basically, the lower the prior probability of a result, the more likely a positive result is to be false.

I can’t resist dragging you through a medical example of the effect of prior probability (and without all that pesky math!) so stick with me. One of the most difficult and important diagnostic problems in medicine is pulmonary embolism (PE), in which a blood clot forms in the legs, breaks off, and lodges in the lungs. It affects around 600,000 Americans every year, killing a hefty percentage of them. There have been a number of recent advances in diagnosing PE, but fifteen years ago, before the widespread use of advanced CT technology, it wasn’t at clear how to diagnose this condition in a relatively non-invasive way.

The most accurate method (the “gold standard”) for diagnosing PE is pulmonary angiography, where a catheter is inserted into the pulmonary arteries and dye is introduced while taking X-rays. This is invasive, costly, and time-consuming. It would be great if we could use a combination of patient history, physical exam, and laboratory results to make the diagnosis, but studies have shown that this doesn’t work very well. And in comes a middle ground, the ventilation-perfusion (V/Q) scan. This easy-to-administer nuclear study gives a picture of blood supply and gas exchange in the lungs. A group of investigators designed a large study (PIOPED I) to see how to best use this technology. What they found was that when we combine the results of a V/Q scan with our initial suspicion (prior probability) of PE, we were able to determine the likelihood of the patient having a PE.

Let’s take an example (remembering that this methodology is historical). Let’s say two patients come to the emergency room. I decide, based on clinical criteria, that the first has a high likelihood of having a PE. I get a V/Q scan which is read as “high probability”. The data show that a patient with a high pre-test probability and a high probability scan has about a 96% chance of really having a PE. Another patient comes in, and using the clinical criteria, he has a low pre-test probability of having a PE. His scan comes back as high probability, the same as the first patient. According to the data, the likelihood of Patient 2 having a PE is 56 percent.

Both patients had a test that showed a high-probability of PE, but simply changing how likely we think the patient is to have a PE changes what that high-prob test actually means. That’s a big deal. If I were to rely on only the test results, or only my clinical impression, I would not have any real idea of the likelihood of the patient having a PE.

Now let’s apply this to a CAM study mentioned in the Times article. I’ve reviewed the study before, but let’s add in a dose of Bayes.

Another large study enrolled 570 participants to see if acupuncture provided pain relief and improved function for people with osteoarthritis of the knee. In 2004, it reported positive results. Dr. Brian M. Berman, the study’s director and a professor of medicine at the University of Maryland, said the inquiry “establishes that acupuncture is an effective complement to conventional arthritis treatment.”

Well, in fact, this acupuncture trial does no such thing. It is technically well-designed, in that it is a randomized controlled trial and proper statistical techniques were used. But this is where evidence-based medicine can be wielded improperly. If you think simply crunching numbers is all there is to proving a clinical point, UR DOING IT RONG! (I don’t even need to go into the other real problem with the study, which compared “real” with “sham” acupuncture, but did not include a group that got standard therapy.)

What Bayes’ Theorem teaches us is that the more implausible the hypothesis, the less likely that any numerical data can confirm it. In other words, if you’re testing an idea with little scientific merit to start, it doesn’t really matter how well you design a clinical study, how many patients you enroll, how well you blind it—any positive results are more likely to be due to chance or confounding variables than to any real effect of the treatment. This is the root problem with NCCAM and with the Times article—no matter how many times you check, pigs can’t fly under their own power. Spending money to try to refute this will only create a hole in your wallet (and a lot of dead pigs).


Comments

  1. Sunil D’Monte

    Excellent article, thanks. I had come to believe that clinical trials were all that were needed, not realising one has to think about scientific plausibility as well. Would you say then that in order to be proven as effective, a drug/treatment must satisfy these two conditions?
    1) Be scientifically plausible; and
    2) Be proven by clinical trials.

  2. Dr Khalsa’s quote at the end was the part that made my head explode:

    “It’s a big problem. These grants are still very hard to get and the emphasis is still on conventional medicine, on the magic pill or procedure that’s going to take away all these diseases.”

    She thinks “conventional” scientists are the ones wasting money looking for magic pills?!

  3. Are you trying to start a blog war with Ben Goldacre?

    I must admit that, even as a Bayesian, I think that crunching the numbers is the thing to do. What you’re suggesting runs face-first into the problem that you’re using subjective priors, and these differ between people. Better to get data from several trials, and do a meta-analysis. With enough data, we shouldn’t have to care about our priors.

    One technical point: it’s not clear to me reading the paper, but was the acupuncture trial only single-blinded?

  4. Ironic, since the author of the accompanying article in the Times (A Primer on Medical Studies) does discuss the Bayes Factor:

    “Large and definitive clinical trials can be hugely expensive and take years, so they usually are undertaken only after a large body of evidence indicates that a claim is plausible enough to be worth the investment.”

  5. William Broad is evidently Bayesian at the Moon….

  6. That’s nice, but you cannot honestly *a priori* estimate prior probability. Using Bayes to weigh two different sized fish is like tipping the scale toward the bigger one and comparing the difference in results to the difference in the “tip” (at best). However, when it comes to pseudo-science and “alternative medicine”, there is no objectively-measured priors: Any pre-measuring tipping might be too much.

  7. There are many ways to estimate prior probability, but the point here is that some things are so improbably as not not warrant investigation. homeopathy, chelation for autism, and many other bizarre practices have no basis in reality, and therefore have a prior probability approaching zero, meaning that it would take a nearly infinite number of positive clinical trials to make us even consider them as effective.

  8. minimalist

    Far be it from me to be seen as “defending” woo, but believe me, as an institution only founded in order to satisfy the whim of an influential but scientifically-illiterate Senator, NCCAM could be a lot worse.

    Under the current administration at NCCAM, the grant review committees are usually equally divided between woomeisters and more sensible evidence-based researchers. The nuttier grant applications tend, then, to get ‘average’ ratings (around 3 on a scale of 5), while stuff that would at least reasonably be expected to have a measurable physical effect scores higher.

    Thus, most grants now go out for stuff at the level of, say, herb-based remedies, because chemicals at least exist without reference to “Qi” or whatever. Even acupuncture could, at a stretch, seem worthy of testing under some limited circumstances since you’re actually doing something with tangible consequences, i.e. sticking needles into nerve bundles, which might maybe possibly have some limited effect somewhere in some limited application.

    Especially given the inevitable utter collapse of our economy, I’d rather NCCAM just be dissolved and its resources poured into the more useful institutes (like mine), but as long as it has to exist, its administration is at least tipping it in favor of EBM as much as possible, given the constraints.

    Personally I see it as a bit of a Potemkin Village, given the people in charge: they have to put on a show to satisfy Congress and the woomeisters, but at the end of the day they know when to cut their losses, as with that cancelled chelation study. They’re now far less likely to fund chelation studies in the future (if at all), so I’d bet admin figured it was best to lay out some money for a large-scale study now in order to put that to rest — because if they didn’t, then the autism cranks would continue whining, and pressuring them to waste even more money and effort, and so on and so on. Of course they’re not going to shut up anyway, they’re going to try to lobby Congress to lean on NCCAM to fund more crank-autism studies, but NCCAM now at least has the ammunition to go to Congress and say “look, it didn’t work, period, we owe nothing to these cranks”, and get them off their backs.

  9. That’s nice, but you cannot honestly *a priori* estimate prior probability.

    Eh? Sorry, that’s either bollocks, or you’re saying that Bayesians are liars. Which is bollocks.

    I guess you mean that we can’t objectively assign prior probabilities. Which is largely true, but then you end up with the well-known criticism that the Bayesian approach is subjective. But I think here we have the next-best thing: a consensus about what the prior should be, and PalMD’s prior is pretty close to most medical experts’ opinions.

    Musing on this a bit more, I think it’s not as simple as PalMD argues. Given that we have a data set with a positive result, for which we had a low prior probability, his argument is that we think it’s more likely that there is a mistake somewhere. I think this is fine. But he then argues that this means we should dis-regard the study. But for me this is going too far – it implies that we are right to ignore all unexpected results, and indeed that authority can trump everything.

    I think a better approach is to say that if we get an unlikely result, then we should first look to see if we can find any flaws. If so, we’re happy – our authoritative intuition was correct. but we have to look for them – it’s rather insulting to say that we’ll assume someone is incompetent just because we don’t believe their results. If we can’t find any flaws, then we start to worry, and (for example) ask for replication by other groups. Only if enough evidence can be produced should we accept a result, but teh definition of “enough” is tricky.

  10. otheus, depending on the treatment being investigated, I think you can safely make some assumptions.

    Homeopathy violates everything we know about chemistry and physics. If it were true, it would turn the scientific world upside-down. That’s pretty exciting to think about, but it also means that the amount of evidence required to prove homeopathy “works” through any mechanism other than the placebo effect is enormous.

  11. I think the organization is useful just for being able to seriously try things and say, “That doesn’t work. We’re not going to waste any more energy on it.” And maybe that would have the effect of marking them INVALID, the way Judge Jones’ decision did for Intelligent Design.

    Here’s a thought experiment: if homeopathy worked, when we started cleaning up air pollution we would have all suffocated and when we started cleaning up water pollution we would have all been poisoned. So it doesn’t. Q.E.D.

  12. This is why competent statisticians do stress-testing on their modelling prior to using the model to work out what kind of model they should be using. Using Bayes in medical trials means that, for example, if Drug X is generally considered before the trial hugely unlikely to cure Illness Y, but in fact does so in effectively all cases, it is still rated as low probability / low effectiveness. As this is clearly nonsensical, the statistician should be building a more accurate model to test effectiveness than simple Bayes. I think PalMD may have a good point, that we can make qualitative judgements about whether or not the trial of something should go ahead, but using Bayes in this sort of situation just muddies the water, and stops the trial being good science.

  13. “pigs can’t fly under their own power”

    Not even if they’re wearing lipstick!

  14. What Bayes’ Theorem teaches us is that the more implausible the hypothesis, the less likely that any numerical data can confirm it. In other words, if you’re testing an idea with little scientific merit to start, it doesn’t really matter how well you design a clinical study, how many patients you enroll, how well you blind it—any positive results are more likely to be due to chance or confounding variables than to any real effect of the treatment. This is the root problem with NCCAM and with the Times article—no matter how many times you check, pigs can’t fly under their own power. Spending money to try to refute this will only create a hole in your wallet (and a lot of dead pigs).

    In principle, if you have enough data, it doesn’t much matter where you start with in terms of your estimate of prior probability–the probability will converge on the correct value. But if the amount of data is small enough that the estimate of prior probability has a large influence on the conclusions, then the estimate of prior probability must be based upon a strong foundation, as it is for diagnostic testing.

    But how do you estimate the prior probability for acupuncture analgesia? Of course, it is based upon a nonsensical pneumatic theory of physiology, but that is beside the point, because what is required is not the prior probability that it works in that particular way, but the probability that it works at all. To estimate the prior probability, you need to rigorously answer the question, “What is the probability that a biological mechanism exists that could produce such an effect?” Well, it certainly violates none of the “laws” of physics–you’ve got nerves all over the body, with their input integrated by a complex, highly connected, and poorly understood organ, so it certainly could happen. There’s no particular reason to expect those particular points to be important, but again, no clear reason why they couldn’t be. I haven’t the vaguest idea of how to rigorously estimate a probability for such a phenomenon.

    While I agree that the fundamental notion that Bayes’s theorem provides a rational basis for the adage that “extraordinary claims require extraordinary evidence,” it doesn’t seem to me that Bayes carries us vary far when it comes to converting that intuition into any kind of meaningful probability for any non-trivial case.

  15. …how do you estimate the prior probability for acupuncture analgesia?

    Estimating prior probability is of course an issue, but with I think it’s safe to say that based on basic science (and even on previous studies), the prior prob would have to be “low”.

  16. Estimating prior probability is of course an issue, but with I think it’s safe to say that based on basic science (and even on previous studies), the prior prob would have to be “low”.

    Perhaps, but unless you are able to quantitatively answer the question, “Yes, but how low?” I don’t think that invoking Bayes takes you beyond the qualitative adage, “Extraordinary claims require extraordinary evidence.” (And certainly, a physiological phenomenon with no known physiological mechanism qualifies an extraordinary claim).

  17. So you don’t believe in alternative medicine. Your best effort to discredit it is a chicken and egg theory that claims that if it is not scientifically sound, then it cannot be scientifically proven to be sound.

    You obviously see alternative medicine as a foe that threatens to destroy everything you hold as sacred to your scientific beliefs.

    What if there are flaws in scientific method? Where will you be then?

    Glad we didn’t have you around 400 or so years ago, otherwise we would still think the world was flat.

  18. LanceR, JSG

    BUZZ! Not even wrong.

    Nice try, O2 B.A. (Oxygenated Bullsh*t Artist?) Try to keep up with the adults. Learn what a “scientifically sound” theory is. Learn what “falsification” means. And try to keep the irrelevancies to a minimum, M’kay?

  19. Do you want to play WoW game?Welcome to our website for runescape gold and runescape powerleveling service.You can come and have a look!

Leave a Reply

Your email address will not be published. Required fields are marked *