You may have seen recent headlines about COVID-19 and hospital data. In early July, the Trump Administration directed hospitals to send crucial data on COVID-19 hospitalizations and intensive care capacity directly to a new data management tool run by the Department of Health and Human Services. Previously, hospitals shared this data with the Centers to Disease Control. The new system has not been without errors, and many of them are a little too familiar to us in the benefits and healthcare data industries.
With COVID-19 data not flowing to a tried and true CDC system, many are worried about the quality and trustworthiness of the data. The latest reports on this issue suggest that a new data management system is being rolled out at the CDC, and data will flow back there once it’s complete. Even if the novel coronavirus was a more familiar infectious virus, the data around it would be difficult to load, normalize, and analyze. Because it’s causing a brand-new disease and our understanding of it is shifting as we learn more, fighting COVID-19 with data is even more difficult. It’s a moving target at best, and, as WIRED reported recently, an “information catastrophe:”
“Behind the crisis lies a difficult reality: Covid-19 data in the US—in fact, almost all public health data—is chaotic: not one pipe, but a tangle. If the nation had a single, seamless system for collecting, storing, and analyzing health data, HHS and the Coronavirus Task Force would have had a much harder time prying the CDC’s Covid-19 data loose. Not having a comprehensive system made the HHS move possible, and however well or badly the department handles the data it will now receive, the lack of a comprehensive data system is harming the US coronavirus response.”
While the coronavirus pandemic has revealed the challenges of gathering and analyzing public health data, Benefits/HR leaders and their consultants are all too familiar with the challenges of gathering and analyzing private health data. Approximately 50% of Americans access healthcare through their employer-paid benefits plans, this data ends up with a health carrier, pharmacy plan, benefit program vendors, and for self-insured employers, a benefits management team. Benefits teams use data to help measure population health, analyze trends, measure program success, and budget for healthcare costs.
All of the same challenges facing public health data also apply to employee healthcare data. There are real, messy, and frustrating things about working with healthcare data. If you’ve seen our CEO Grant Gordon speak at a conference or event, you may have heard him say, “There are a lot of ways to make money and build a successful company that don’t involve benefits data, and sometimes I wish I’d pursued them.” While tongue in cheek, Grant is right that the benefits data industry isn’t an easy one to work in.
Here are a few challenges that COVID-19 data and benefits data share that have reached the public eye during the pandemic.
Many benefits consultants and employee benefits teams have been following the spread of COVID-19 all around the country. So have we, and we’ve noticed that each country, state, and even county seems to track different metrics around COVID-19. Some are measuring deaths as a percentage of the population, while others are measuring deaths as a percentage of known cases. Some are looking at confirmed cases, and some are looking at presumed cases due to testing shortages. Some counties are reporting as a group due to health departments that cover multiple rural counties, which makes it seem to residents like their county has no COVID-19 cases.
The same issues affect healthcare benefits analytics, with different areas of the country reporting and recording data differently. Codes used for medical billing help tremendously with making sense of the data, and make it possible to figure out how many members on a given plan have been affected by a particular disease or condition. For established illnesses and injuries, we have fairly reliable data that can be used to reach conclusions about prevalence, trends, and population health.
In the case of COVID-19, the diagnosis and testing codes are still being rolled out, and Artemis is adding new codes to our analyses continually. One complication with this is that testing is being done outside of ordinary points of care. State and county health departments, private industries, and providers have all scrambled to launch their own testing initiatives, so diagnosis codes are still in flux at this stage of the pandemic. Which leads to our next challenge, the vast variety of data sources.
The global pandemic has demanded data sharing from not just counties and states, but from every nation on Earth. One frustration early in the outbreak was the difficulty in getting accurate data from Chinese authorities, but that has now been surpassed by the difficulty of dealing with a vast mountain of data. Experts are trying to find answers in hospital data, state data, county data, claims data, pharmacy data, research data, testing and diagnostics data, and self-reporting data, just to name a few.
The same mountain of data exists for health and benefits professionals, and it poses a challenge to go from so many sources to accurate answers. The average self-insured benefits plan has 10+ programs, and that means 10+ data feeds. These include everything from traditional data sources, like medical and pharmacy claims, to more innovative benefits programs, like chronic condition management and virtual physical therapy. Not to mention everything in between: dental, vision, 401k and financial, smoking cessation, health risk assessments, and wellness programs.
The key to finding opportunities in a vast data set is to “warehouse” it all in one place, where it can be compared across feeds and analyzed for trends. Data warehouse solutions provide this ability for benefits consultants and employers, and the new HHS database was looking to solve this challenge. Reports are mixed about whether or not COVID-19 data is being warehoused in a useful way, and Becker Hospital Review reports that the CDC is working on a new data management system of its own. Dr. Deborah Birx indicated that data collection and warehousing may be restored to the CDC when this system is complete.
We’re hopeful that with time, public health officials will have access to the same next-generation data warehouse solutions that benefits teams do.
While we haven’t specifically run across any news stories of issues with member matching due to COVID-19, we’re sure with the wide variety of testing sites and providers, it has been a challenge during the pandemic.
What is member matching? This refers to the process of ensuring that a patient treated for, say, a knee replacement at one surgical center, is accurately known in the data as the same patient in post-op physical therapy and the same one who picked up a prescription for hydrocodone. Different electronic medical records systems, healthcare claims data, and pharmacy claims data may label the same patient (John Doe) as John R. Doe, J. Doe, Jon Doe, or J.R. Doe. Birth dates may be mis-typed, or rendered as Month/Day/Year vs. Day/Month/Year as well. Member matching is the process (often manual) of matching up other identifying information to make sure that an analytics tool shows that this is the same patient.
This process can be sped up with algorithms, but it remains a difficult challenge to overcome. Benefits teams working with a data partner need to rely on the accuracy of member matching so they can get a realistic picture of their population health. Consultants and employers should look for a high member matching rate (above 98%) to ensure trustworthiness.
Member matching is just one example of a data quality indicator, and there are other ways in which benefits data (and COVID-19 data) can be messy. A data quality process will also need to account for eligibility for coverage, costs for the employer and members, copays, consistency between carrier feeds and existing data, and much more. A lot of these checks happen during the data integration and refresh processes, where new files are loaded into a system for normalization.
With millions of rows in multiple spreadsheets of raw data, you’ll run into columns that don’t match up or reflect the same information. For example, one carrier might have the member’s name, birthdate, and member ID in a different order from another. Still more complex, vendors from other types of benefits programs will include different metrics in their data, like weight or resting heart rate for a wellness/fitness benefit program. Traditional data warehouses and benefits analytics solutions like Artemis use a process called “ETL,” which stands for “extract, transform, and load” to normalize the data, match up the columns and metrics, and load the data into the system.
At Artemis, we use a powerful, purpose-built tool we call Zeus to help with this process. It does a lot of the ETL process with an automated process that helps match up the data and transform it into a usable format for our system. While an automated tool like Zeus eliminates a fair amount of manual work, it’s critical to always keep some checks for the humans. There’s nothing quite like the human eye to spot outliers or breaks in a pattern, and that’s how we ensure reliable data quality.
COVID-19 has brought health data science to the national stage, while also exposing some of the biggest challenges in our field. We hope that the newfound attention on these issues will help the public better understand both the challenges and the great value in healthcare and benefits data.