As the country moves to the next phase of reopening businesses and activities, many employers are looking to their benefits data. They are trying to find the information they need to help employees return to work safely, but it’s not always easy to do. We have learned more and more about this novel coronavirus as it spreads. New data means new recommendations, updated protocols, and innovative treatments to help COVID-19 patients fight the infection.
New information is part of the scientific process, and Artemis clients and other employers and advisors we work with are eager for reliable data that can help them make decisions. As a Data Scientist at Artemis, I get a lot of questions about which data are reliable and how to use it appropriately. Employers and advisors are making plans to bring employees back to the office or move to a new phase of operations with essential employees, and these benefits leaders are looking at one guideline in particular: the White House’s recommendation of reopening based on a “downward trend over two weeks.”
Here are the details from whitehouse.gov:
Symptoms
Cases
Hospitals
Essentially, the guidelines suggest that in order to reopen, a geography needs to have a downward trend over a two-week period that shows fewer people developing COVID-19 symptoms, fewer people testing positive for COVID-19, and a capable healthcare system that’s ready to treat those who do become infected.
It’s not necessarily easy to calculate these data points for a specific employee population, so most of the folks we’re working with are looking for geographic data for their worksite locations. However, there are many ways to calculate this, on top of a wide variety of data sources to use. In my capacity on Artemis’ data science team, I’ve worked with our experts to develop a simple way for employers and advisors to evaluate COVID-19 trend at the state or county level in the hopes that this will arm clients and the wider benefits industry with the information they need to safely and cautiously reopen their offices.
First, we had to select a data source. There is some flexibility here based on geography, and it doesn’t make a huge difference which data source you choose. However, it’s critical to use the same data source consistently over time. We recommend against mixing data sources, as some will calculate data points differently. For our purposes, we have chosen to use the usafacts.com data set, which is where the Centers for Disease Control is sourcing its data. It offers a handy, downloadable file with confirmed COVID-19 cases broken down by county.
Note that this specific data file counts cumulative total case counts by day, so we had to create a second tab to calculate the daily new case count. This was as simple as subtracting each day from the day after in order to get the daily case counts for each county.
Once you have your data set with the daily count for the county or state that you’re interested in, it’s time to use a simple equation to calculate the trend of infections and the percentage increase. Here’s our equation:
One way we could have calculated slope would be to simply take the number of cases on the last day of the 2-week period and subtract off the number of cases on the first day of the two week period. This would not necessarily be wrong, but is not very resistant to outlier values on those two days (i.e., when case counts were lower or higher and not representative of the overall trend). Instead, we choose to run a regression through the observations throughout the two-week period and report the slope of the regression line. This more accurately depicts the change in cases over these 14 days. In this case, a slope of -3 would mean that on average over the course of the 2-week period, there were 3 fewer cases each day.
The slope can be easily calculated by using the slope function in Excel using this formula: =SLOPE([casecounts],[daterange]). For example, =SLOPE(F2:S2,$F$1:$S:2) is what we used to calculate the slope of the first two weeks. Note that we used the dollar sign in front of the dates so that we could easily reuse the formula for multiple geographies.
Calculating the mean over the two week period is relatively straightforward as well. This can also be done in Excel with this formula: =AVERAGE([casecounts]). In the example above, it would be =AVERAGE(F2:S2).
Now that we have the slope and the mean, we can calculate the percent growth by simply dividing the results. There are a couple of ways to do this in Excel. We simply used the above two functions in their own Excel cells and divided the first cell by the second cell. Then we applied the “percent” formatting to the cell.
Advanced Excel users (shoutout to all our data scientist customers!) could use one cell and one more complex formula to accomplish the same thing: =SLOPE([casecounts],[daterange])/AVERAGE([casecounts]). That would look like this: =SLOPE(F2:S2,$F$1:$S:2)/AVERAGE(F2:S2). After applying the “percent” formatting to the cell, you have a reliable number for percentage growth of COVID-19 cases.
Ideally, what you would see is a negative number. This would indicate that cases are decreasing in the area that you’re exploring. A positive number, on the other hand, shows that cases are increasing in that area. Per the most recent White House briefings, 10% growth is the threshold deemed as “significant.” If the results of your calculation shows a 10% upward trend or more, the geography you selected is still experiencing a surge in infections. If you see a result lower than -10%, that would indicate a “significant drop” in new cases over a two-week period.
This downward trend is what we’re looking for as the economy reopens. A -10% result shows that individuals are less likely to become infected with COVID-19 as they venture out. The data set we used from usafacts.com allows you to dive down to the county level, which will help employers and advisors get specific information to help them make clear, localized decisions.
We hope this formula and explanation helps benefits leaders get the information they need to see how COVID-19 is trending in the areas around their worksites, how to prioritize return-to-work strategies based on geography, and whether or not their population is still at elevated risk for exposure to the coronavirus.
If you’d like to learn more about how health and benefits data can help you tackle the challenges of COVID-19, please visit our Resource Center or get in touch with our team.