Insufficient Data to Accurately Monitor Most At-Risk Communities for Flu

Author(s):

It is important to understand how and why disease risk and health burden vary by socioeconomic status.

Samuel Scarpino, PhD

There is an overall critical lack of sufficient data to accurately monitor influenza (flu) in the most at-risk communities.

Samuel Scarpino, PhD, and colleagues developed a method to design robust and efficient forecasting systems for flu hospitalizations. They found zip codes in the highest poverty quartile were a critical vulnerability for the United States’ primary national influence surveillance system, ILINet, that the integration of next-generation data failed to improve. ILINet currently monitors outpatient healthcare providers and may be largely inaccessible to lower socioeconomic populations, Scarpino and the team suggested.

“Understanding how and why disease risk and health burden vary by socioeconomic status, race, ethnicity, immigration status, and other factors is essential for supporting a healthy and equitable society and economy,” Scarpino said in a statement. “Otherwise, new machine-learning and big-data systems are likely to perpetuate the existing biases of traditional decision-making systems.”

The investigators used 3 data aggregators to collect information on 6 counties in Texas. Weekly BioSense 2.0 data were extracted from an online repository. Data included the percent of emergency department visits for an upper respiratory infection. They also used data from ILINet, which contained information from thousands of healthcare providers across the US. Such information consisted of the weekly number of cases of flu-like illness treated and the total number of patients seen by age group. The final data source was Google Flu Trends, which estimated the number of ILI patients per 100,000 people based on the daily number of Google search terms associated with the signs, symptoms, and treatment for acute respiratory infections.

Surveillance models were used to predict hospitalizations aggregated by income quartile. Hospital discharge records were obtained from the Texas Health Care Information Collection and filtered for flu-related diagnostic codes.

The team’s new system was used to make predictions about flu-related hospitalization rates in the Dallas-Fort Worth metro area in Texas between 2007-2012 and compared the predictions to real-world rates. The region included 305 zip codes.

Scarpino and the investigators estimated the hospitalization rate per 1000 people in each zip code. They found rates exhibited a positive correlation with both the poverty level and the proportion of the 2010 census population over 65 years old. The team controlled for age and found poverty and flu burden were significantly correlated in the population of those under 65 years old but not the over 65 years old population (P <.001).

Zip codes were classified into quartiles based on the proportion of the population living below the federally defined poverty line. The investigators found the data was less informative as the poverty level increased. The models made the best predictions in the most affluent 25% of the zip codes with poverty levels between 0-7.5% and the worst predictions in the most impoverished 25% of the zip codes with poverty levels between 21.2-48.1%. Differences in prediction errors between the upper and lower poverty quartiles were statistically significant (P <.0001).

The system designed by the investigators performed well for higher socioeconomic brackets but could not accurately predict flu hospitalization rates for the lowest-income quartile. Those living in such neighborhoods had a two- to three-times higher rate of hospitalization for influenza.

The investigators noted the discrepancy in prediction accuracy was likely due to bias or under-sampling in the data. They said inequalities that reduce access to care also increase data gaps and biases.

The study, “Socioeconomic bias in influenza surveillance,” was published online in the journal PLOS Computational Biology.