Are metrics in medicine a good thing?

The Washington post reports on new efforts by insurance companies to rate doctors performance and their policies that penalize doctors for performing poorly according to their metrics.

After 26 years of a successful medical practice, Alan Berkenwald took for granted that he had a good reputation. But last month he was told he didn’t measure up — by a new computerized rating system.

A patient said an insurance company had added $10 to the cost of seeing Berkenwald instead of other physicians in his western Massachusetts town because the system had demoted him to its Tier 2 for quality.

In the quest to control spiraling costs, insurance companies and employers are looking more closely than ever at how physicians perform, using computers, mountains of health claims and billing data and sophisticated software. Such data-driven surveillance offers the prospect of using incentives to steer patients to care that is both effective and sensibly priced.

Now, on the surface, many people might say this is a good thing. But I will argue, that this kind of superficial measurement of performance will not only demoralize doctors but adversely affect patient care:

As medical records are increasingly digitized, the ability to mine them for information about physician performance becomes more and more tempting. It’s an obvious and noble idea, but is a good one? Just think about it. If you could rate doctors based on simple data culled from their records – like how well they manage blood sugars in diabetics, blood pressure in hypertensive patients, etc. – you could figure out how good a job they are doing in patient outcomes, right?

Well, luckily it has been studied, and this article in the New England Journal (free) shows what happened when the British NHS, which has extensive electronic record infrastructure, rewarded physicians based upon metrics of quality of care. I blogged about this before and I’ll give a similar warning – don’t just read the abstract, read the data too because I think the authors misread their own data a little. Really what they showed when they rewarded doctors for maintaining good health metrics in their patients was that doctors that treated the young, healthy, and rich did well, whereas those that served more patients, the poor, the elderly, and difficult patients were paid poorly. Also, those who filed lots of “exception reports” to justify the exclusion of a patient from the data set did the best of all.

Basically, they show that rewarding (and I suspect this will apply to penalizing as well) doctors based on patient health metrics led to doctors serving “easy” populations to do well, and those serving “difficult” populations to do poorly (or try and cook the books). There is no evidence it led to a significant improvement in care. I’m all for getting paid more, but the problem with penalizing or rewarding doctors based on how the patients perform is that it rewards doctors for avoiding difficult patients.

Now, on top of the problems shown in a national health care system, imagine the insurance companies in the United States rating doctors based on similar metrics. On top of the problem of rewarding doctors for seeing easy patient pools, you add a lack of transparency or appeals process, no system for exception reporting, and a patchwork of systems for each insurer. Since it’s a penalty system, that one patient who’s still smoking despite being on oxygen might cause your co-pays for all your patients to go up by 100%. What do you do? You try to avoid the poor, the elderly, and the non-compliant, because they won’t just cost you more money, but they may ruin your practice, and you won’t be able to do anything about it. Further, insurance companies often encourage doctors to use specific drugs and use less expensive procedures. The next step will obviously be to make sure that doctors perform the cheapest procedures they can justify, and select the drugs the insurance companies prefer because of rebate deals they make with drug companies.

So before people jump on medical rating systems as some great new idea, realize that superficial data collection might actually hurt medical care, especially for the most at-risk and ill populations of patients. If data collection is going to be tied to medical reimbursement, or worse, penalties, the system needs to be designed in such a way that doctors can see how the calculations are determined, they must be able to appeal to change their rating based on errors or on characteristics of their patients that might not be in the records, and it must not discourage doctors from seeing the patients that most need medical care.


  1. Interesting study. Something struck me about one of the paragraphs, so here it is, slightly revised:

    “Basically, they show that rewarding (and I suspect this will apply to penalizing as well) teachers based on student exam results led to teachers serving “easy” populations to do well, and those serving “difficult” populations to do poorly (or try and cook the books). There is no evidence it led to a significant improvement in test scores. I’m all for getting paid more, but the problem with penalizing or rewarding teachers based on how the students perform is that it rewards teachers for avoiding difficult students.”

  2. David Spiegelhalter has been looking at these sorts of metrics as well. He made a similar point to you: he looked at the worst doctors in terms of death rates, and found that they did some of their work in hospices for the terminally ill. So no wonder they had high death rates.

    On the other hand, David suggested that some sort of monitoring is needed: he was also involved in looking at the case of Harold Shipman, who was murdering his patients. I guess the problem is not so much the monitoring, but how the information is used.


  3. I’ll have to check this study out and maybe blog on it next week.

    In the meantime, measuring physician metrics seems like a good thing in principle, but the devil, as always, is in the details. What, specifically, do you measure? How do you account for population differences? (Practicing in Westchester County is very different than practicing in inner city Newark, for instance, and the results will be different.) How do we “normalize” results to patient populations, in order not to penalize physicians who do take care of sicker, older, and poorer patients when they have poorer outcomes. The other problem with measuring metrics is that it tends to reward “cookbook” medicine, and the metrics start to become codified as unyielding dogma not necessarily based on the best evidence. I see signs of this starting at my own hospital with respect to certain aspects of surgical practice.

    None of this is to say that we shouldn’t measure metrics, just that, when applied simplistically (which is how bean-counters tend to do it), such measurements have the potential to do far more harm than good.

  4. G. Shelley

    Well, that is one of the objections to the league tables for exam results here – they don’t take account of any factors other than the final performance of pupils

  5. Ray, I think that is a perfectly valid paragraph as well. Based on the fraud that the “Texas Educational miracle” was, the same kind of cherry-picking and data fudging occurred when similar superficial measurements were inflicted by Bush during his Governor years. Now it happens nation wide and has done nothing for education.

    I think it’s a good general principle, tying rewards to superficial measurements is doing everyone a disservice.

  6. Ray, yes, we have that problem here, too, or did for a while, until the headline metric was changed to a ‘value added’ measure which looks at the difference between pupil performance on arrival at the school and final exam result. Now the private schools are complaining because their intake is such that it’s hard for them to add much ‘value’. Also, it won’t surprise me if schools start paying less attention to very good and very poor pupils on the grounds that it’s hard to improve the measured performance of either.

    As for:

    You try to avoid the poor, the elderly, and the non-compliant,

    I can hear the invisible insurance salesman over my shoulder moaning “you say that like it’s a bad thing”.

  7. In theory, a company using misleading metrics should place itself at a long-term competative disadvantage.

    In practice, for this to work requires low-cost consumer mobility and choice.

  8. Measuring is generally good. Whether we’re talking about schoolkids or doctors. But giving the measurement data to politicians and businessmen is generally bad, it appears…

    W.r.t. measuring schoolkids, the quantity to use depends on what you want to use your measurement for. If you want to use it for college admissions, using gain is worthless. If you want to use it for funding, a case could be made that gain is a good measure.

    I would argue, however, that a) funding of schools should not be based solely on output, b) output should be measured and averaged over a couple of years to get a reasonable sample size, and c) you’ll want to use a non-linear scale so that improvement in the abilities of a poorly off child rate higher than comparative (relative or absolute) improvement in a well off child.

    Both because the well off child will learn more by itself, irrespective of teacher input than the less well off, and because for society as a whole making sure that there are fewer truly ignorant people around is arguably more important than making sure that there are more very well-educated ones around.

    – JS

  9. Dr Peter Suchsland

    This is yet another nifty way for insurance companies to legitimize their position in the healthcare equation. How many mba’s are your insurance dollars supporting? Yet how much actual use is there for this layer of middle management?

    The insurance companies in the United States are a cottage industry based on the good will and hard work of doctors who actually take care of patients.

    The actual value THEY (the insurance cos) provide is what should be measured. Leave the doctors alone.

  10. Stagyar zil Doggo

    What if you randomized the (initial) assignment of patients to physicians? The metrics would then carry more meaning. Of course, the extent of possible randomization is limited – geographical, and physician specialization constraints come to mind, and of course patients won’t stick with a physician they dislike.

    So the comparisons would only have limited and local validity …

    But do I really care that there are better cardiologists on the other side of the country serving Blue Cross patients, while I am looking for an orthopedist in a 50 mile radius who’ll accept my Kaiser Insurance card?

  11. Provenge and Our F.D.A.’s Etiology For Not Being Approved

    Terminal patients are those who are not expected to live due to usually illness such as advanced prostate cancer (cT3). If the patient has 6 months or less to live, those patients are considered terminally ill. Regardless, if a patient is terminal, they are without a cure or tolerable treatment for their illness. Since such patients will likely die in a short period of time, treatment options, even if unproven, are often desired by such patients. This is understandable, because at such a severe stage of illness, such as prostate cancer, possible extension of their lives with comfort is worth it to them, regardless of lack of evidence of proof of whatever treatment that may be advantageous to them regarding these issues. The FDA, however, claims authority on the treatment options of such patients, although that administration has proven itself over the years to be rather inadequate with its frequent drug recalls and black box warnings, and they do these things only under pressure from the public, usually.
    Prostate cancer is a rather frequent occurrence- with between 10 to 20 percent of men predicted to acquire the disease during their lifespan, resulting in about 30,000 deaths a year from this disease of the one million men who have prostate cancer in the United States. Furthermore, there are different stages of prostate cancer, and the more severe the prostate cancer cases are which is determined by such methods as bone scans and Gleason’s scores, which is a score that assesses prostate tissue after it is biopsied and if it is determined that the stage of cancer is severe by this and to estimate proper treatment options if proven to be malignant. Typically, the initial suspicion of prostate cancer is determined by the results of what is called a PSA blood test, as PSA is a protein produced by prostate cancer cells. If the PSA blood test is above normal limits, a prostate biopsy is performed to determine and confirm not only the presence of cancer, but also the severity of the disease on such a patient.
    Yet fortunately, and as you will read, innovation still exists in medicine. A few years ago, a small Biotechnology company called Dendreon was working on a conceptually new treatment for the worst prostate cancer patients, and this treatment therapy created by Dendreon was named Provenge. Provenge is the first immunotherapy biologic treatment for the progressed prostate cancer patients, and has proven to be a very novel and innovative treatment option for advanced prostate cancer patients who are terminally ill. Usually, these patients are unresponsive to usual treatment methods for prostate cancer, and are left with chemotherapy as their only treatment option at such a traumatic stage of prostate cancer. Understandably, most patients at this stage refuse treatment entirely, largely due to the brutal side effects of such chemotherapy treatments as taxotere. The immunotherapy method developed by Dendreon required the removal of white blood cells of the diseased patient and, after altered, are re-injected into this patient now designed to attack what is called PAP, which is on prostate cancer cells only. This treatment required only three such injections in a period of six weeks. This resulted in life extension twice that of chemotherapy treated prostate cancer patients of this severity, and without the concerning side effects of chemotherapy. The medical community and survivors of prostate cancer were elated and waited with great anticipation for access to this treatment method.
    Fortunately, as the years passed, Provenge, by 2007, had convinced others of its safety and efficacy in its benefit for severe prostate cancer patients. This caused great joy to such patients and their families. Perhaps greater elation was experienced by the caregivers and specialists of such a disease, such as Urologists and Oncologists who treat such patients. While Provenge was on fast track status at this time at the FDA, the FDA panel thankfully recommended with clarity the approval of Provenge based on its proven and substantial efficacy and safety demonstrated in its performance in past trials. The FDA announced this to the public in the early Spring of 2007, I believe.
    Now for the bad news: With great shock and surprise, the FDA agency rejected the approval of this great treatment for very sick patients due to, they said, ‘lack of data’ in May of 2007. This contradicts their favorable opinion of Provenge weeks before delivering this terrible news. Especially when one considers the FDA Commissioner is a prostate cancer survival himself!
    Soon after this judgment was passed by the FDA, conflicts of interest were discovered by others. For example, a member of the FDA agency who was evaluating Provenge, Dr. Scher, was found to have a financial commitment to a future competitor of Provenge that was being produced by a company called Novacea, and this company had signed a co-promotion agreement with Schering with this similar prostate cancer drug being developed by this company. Dr. Scher never disclosed this conflict during the approval process of Provenge. As it turns out, this anticipated prostate cancer drug made by Novacea was discovered to have serious flaws, and Schering pulled out of the agreement with Novacea. In addition to this incident and before May of 2007, baseless letters were anonymously delivered to the FDA stating negative qualities about Provenge that were without Merit and speculative claims about the treatment. Yet overall, the disapproval by the FDA of Provenge angered many, and a newly formed advocacy group called Care to Live filed a lawsuit against the FDA for their clear lack of protocol or knowledge about such complex treatment agents as Provenge at the end of last year.
    Terminal patients, I surmise, desire comfort during their progressive disease that has placed them in the last chapter of their lives, and certainly should have a right to choose any treatment that possibly could benefit them. At this stage of such a patient, one could argue, safety of any treatment option is not of concern to these patients, because they are going to die anyway. Yet the FDA, with reckless disregard and overt harshness for these very ill patients, ultimately harmed others more by not approving Provenge with deliberate intent.
    The FDA does in fact presently have the ability to grant what is called conditional approval for such treatment methods as Provenge, and why they have not expanded this approval process to all terminally ill patients remains completely unknown. What is known is that they are harming those they pledged to protect so long ago by depriving such patients in need of treatment, as no other options are viable presently that are as safe and effective with great tolerability associated with Provenge. So now the FDA appears to be a bought, corrupt, and incompetent administration without loyalty and dedication to the public and its health. This needs to be corrected in any way possible for the lives of others. A terminally ill patient has a personal right to obtain and access such treatments upon their own volition as well as the discretion of their doctor, just as a terminally ill patient is granted an individual right to die, if they choose to do so. It is an individual decision in such cases that should be void of interference from others.

    “Facts do not cease to exist because they are ignored.” — Aldous Huxley

    William Abshear

Leave a Reply

Your email address will not be published. Required fields are marked *