Are you familiar with IBM’s supercomputer Watson? IBM’s stated aspiration is to make it “the world’s best diagnostician,” because it can store way more medical information than a human doctor and makes decisions based on evidence, free of cognitive biases. It is consistent, and, according to the company, “given the same inputs, Dr. Watson will always output the same diagnosis.”
What if a flawed human doctor inputs incomplete or inaccurate data for Watson, though? If he doesn’t ask the patient the right questions or the patient wasn’t truthful or was misleading, will Watson go down the wrong path? What if the same input from a different patient should actually lead to a different diagnosis because there are different underlying factors that need to be taken into consideration?
Watson may never be sleep-deprived, tired, upset, or in the middle of a divorce, but it also isn’t creative, intuitive, or observant. Do you know what I call a healthcare provider who simply collects information from the patient and then does a data dump into a more experienced, more knowledgeable receptacle, expecting the diagnosis to be served on a silver platter? A medical student.
The best diagnosticians take into consideration many different factors and constantly combat cognitive bias. The best of both worlds would be to have a compassionate, astute clinician consult a computerized entity that can evaluate medical literature at a rate of 200 million pages every three seconds and then take the computer’s input into consideration as he or she thoughtfully makes excellent decisions on the patient’s behalf. Personally, I would not be comfortable solely relying on even the world’s fastest, biggest supercomputer to diagnose and treat me.
Computer-assisted coding (CAC) has been touted as being capable of streamlining coding workflow and increasing productivity. And I think it may be able to…eventually. But I think the issues with CAC in its current state are multiple and multifactorial. No system should rely solely on CAC for its coding needs at this juncture.
CAC is possible because we transitioned to electronic health records (EHRs), and the computer can sift through the text, finding words and phrases that merit codes. In the documentation arena in healthcare, however, technology is actually in the toddler stage. I used to refer to it as being in its infancy, but I believe it has evolved to some degree in the past five years. I am certain that all of you have been on the receiving end of at least one tirade about how much your HCPs hate whichever EHR your organization uses.
The way CAC works is through natural language processing (NLP). A computer program is designed according to rules determined by human beings. For example, as you well know, if you are trying to code heart failure, you need acuity and type for the most granular code. You wouldn’t want a computer to stop at the verbiage “heart failure” and code I50.9, Heart failure, unspecified, if all the requisite qualifiers are present in the documentation. But you need the acuity to be in proximity of the heart failure in the documentation, because a computer shouldn’t take a patient with acute pancreatitis in paragraph 1 of the history of present illness, with heart failure documented in paragraph 3, and conclude that the “acute” relates to the “heart failure.” You would include in your rules the terminology “HF,” “HFpEF,” “HFrEF,” and “cardiac failure.” What if the HCP types “heart failuer” by accident and doesn’t proofread it? What if he or she just writes “failure?” You must set up the program for the algorithm to anticipate all the likely or possible permutations or synonyms to capture the condition every time. This is very complex, and imagine what a task it would be to do it for more than 70,000 ICD-10-CM codes. CAC must be constantly evolving and learning from frequent feedback, or it is doomed to fail.
It’s as the old adage says: garbage in, garbage out. Until we can actually develop EHRs such that it is easy and efficient for healthcare providers (HCP) to document completely, accurately, and precisely, the text the CAC has available to it is limited. If a medical record has excessive copy-and-paste material, there may be voluminous note bloat with very little substance. If it isn’t documented, you can’t code it, whether you are a computer or a coder.
This reflects one of the biggest problems with CAC. There is a code for “altered mental status,” R41.82. It doesn’t risk-adjust, and human clinical documentation integrity specialists (CDISs) see that as a red flag indicating that there is an opportunity for query. Would CAC suffice, having found suitably codable verbiage?
There are systems designed to root out CDI opportunities while performing CAC. You can set up a rule mandating that “if ‘altered mental status’ or ‘altered mentation’ or ‘decreased level of consciousness’ are present, with no ‘encephalopathy,’ then trigger CDI review for query.” This might be helpful, but what if the doctor instead typed “impaired mental status”? It is really hard to foresee every phrase that might be suggestive of a codable, risk-adjusting, clinically significant condition.
Let’s go one step further. If you review for suboptimal diagnoses such as Z74.01, Bed confinement status, which might actually indicate a possible functional quadriplegia, what happens if the clinicians never explicitly document that the patient is bedridden? What happens if the clinician documents “history of?” If the rule is set to exclude conditions specified as “history of,” you miss out on diagnoses that are actually current and were just incorrectly characterized by the provider.
This is why human review of the EHR verifying the CAC is so crucial. The utility of the CDIS is to read between the lines and respond to suggestive clinical indicators. A good coder will also recognize conditions that are intimated but not documented, and will refer to the CDIS for consideration of a potential query. A human being can assess the context of the entirety of the documentation, whereas a computer has an algorithm that samples many words before and after the index term; I just do not think these actions are equivalent. Totally surrendering coding to a computerized algorithm is analogous to completely relinquishing medical diagnosis determination to Watson.
I am not even addressing the issue of sequencing. Algorithms may be able to manipulate the codes to determine maximal relative weight and severity of illness/risk of mortality calculations, but they cannot adjust for the nuances of the coding rules and guidelines. It requires human involvement by a certified coder. And don’t get me started on procedures. I have to confess that even I, an emergency physician for 25 years but not a real coder, often scratch my head at operative notes and have no clue whether I am missing a PCS code.
Why do organizations use CAC? As we approached implementation of ICD-10, institutions were grasping at ways to improve coder productivity. There is a reduction of coder time of approximately 20 percent when they are essentially second-pass reviewing CAC. This results in a significant and quite desirable increase in productivity.
However, there is an approximately 20-percent projected decrease in accuracy if one were to use CAC exclusively. So clearly, the optimal process is to perform primary CAC and secondary coder review. I would add to this system a strategic third-pass audit for accuracy and completeness, and let me explain why.
It is my opinion that reviewing CAC may lead coders to fall prey to certain cognitive biases I alluded to initially regarding Watson, the diagnostician, because these biases are usually referred to in the practice of clinical medicine. Coders reviewing CAC may anchor, like picking up a code early on in a record and not acknowledging further specificity that would take them to a different, more granular ICD-10 code. In their haste to produce, they might miss linkage or qualifiers that would change a code completely. They may perform premature closure, or search satisficing, which means failing to continue to look for additional diagnoses after CAC has identified those diagnoses it was able to find. This means they might confirm the set that the computer picked up, but out of expediency fail to scrutinize the documentation for additional conditions that were overlooked.
Convenience is compelling, but complacency is concerning. I stipulate that without the check and balance of a recurring “third-level” quality review of a subset of records, coders might allow their secondary review of CAC to become cursory, and not substantive. This would be a serious mistake; it would deflate quality metrics and reimbursement, and lead us right back into the clinical validation denials morass.
It is best to remember that technology is a tool, not a replacement, for people.