Article
Author(s):
Researchers used random forests analysis to gain insight that will be useful in a new questionnaire designed to identify undiagnosed cases of chronic obstructive pulmonary disease
Researchers used "random forests" analysis to gain insight that should be useful in a new questionnaire designed to identify undiagnosed cases of chronic obstructive pulmonary disease (COPD). A study conducted by Nancy K. Leidy, PhD, of Evidera in Bethesda, Maryland, and colleagues was published in Chronic Obstructive Pulmonary Diseases: Journal of the COPD Foundation in January, 2016.
The authors described their work by saying, “The current study was part of a larger multi-method project to develop a practical and effective primary care strategy for identifying undiagnosed patients with clinically significant COPD, defined as FEV1 % predicted < 60%, or at risk of developing exacerbations.”
They examined 3 databases: the COPD Foundation PEF study, the Burden of Obstructive Lung Disease (BOLD), Kentucky site, and COPD Genetic Epidemiology (COPDGene) looking for categories, types of variables, and attributes that could be useful for identifying undiagnosed COPD.
The random forests method of analysis, say the researchers, “is a machine learning statistical method that uses decision trees to identify and validate variables most important in prediction.”
Using this method avoids the problems that exist with model analyses.
The researchers used four case-control scenarios. The researchers described them: “Scenario 1 was designed to identify variables that best differentiate COPD patients with moderate to severe airflow limitations from those without COPD”;
“The purpose of SCenario 2 was to identify variables that distinguish undiagnosed and diagnosed COPD”;
“”Scenario 3 differentiated COPD patients with an exacerbation history and those without exacerbation history."
“Scenario 4 identified attributes differentiating patients with an FEV1 <60% or an exacerbation history from all others, including COPD with higher FEV1% predicted and no exacerbation and non-COPD patients.”
The researchers divided the variables they found with the scenarios into 6 categories: exposure (smoking), personal health history, recent health history, respiratory symptoms, activity limitations, and demographics.
“These analyses attempted to identify the best and smallest set of predictors capable of differentiating cases and controls” said the researchers. They acknowledge that the results are limited by several factors. However, although several screening tools already available include either symptom or exacerbation-related questions, few, if any, ask both, and this study shows that could be useful.