Machine Learning Boosts Accuracy in Identifying Familial Hypercholesterolemia

Author(s):

Machine learning outperformed clinical diagnostic criteria and the recommended screening criteria in the United Kingdom in identifying familial hypercholesterolemia.

Machine Learning Boosts Accuracy in Identifying Familial Hypercholesterolemia

Christophe A. T. Stevens, PhD

Credit: Imperial College London

Machine learning can be a part of the solution to improve the detection of potential cases of familial hypercholesterolemia, new data discovered. A machine learning model performed better at identifying familial hypercholesterolemia cases than clinical diagnostic criteria and recommended screening criteria in the United Kingdom.

“Our top‐performing model exhibits superior predictive accuracy compared with FAMCAT, the algorithm currently recommended in primary care settings in England,” wrote investigators, led by Christophe A. T. Stevens, PhD, from the department of primary care and public health at the Imperial Centre for Cardiovascular Disease Prevention at Imperial College London.¹

Approximately 1 in 250 people in the general population in the US and globally have this condition, according to the Centers for Disease Control & Prevention (CDC), which occurs due to lifelong exposure to high low-density lipoprotein cholesterol levels.² Despite its high prevalence, familial hypercholesterolemia is significantly undiagnosed.¹

Patients with familial hypercholesterolemia have an increased risk of premature atherosclerotic cardiovascular disease—specifically, coronary artery disease. Detecting familial hypercholesterolemia can help patients start effective cholesterol-lowering treatments early to avoid future adverse health outcomes.

However, for undetected cases, patients may not be aware of their increased cardiovascular risk and thus they don’t take preventative measures. This leaves them more vulnerable to experiencing cardiovascular events such as a myocardial infarction or stroke. Accurate familial hypercholesterolemia detection can reduce the number of cardiovascular events among this patient population.

Aiming to create a more accurate way to identify pathogenic variants of familial hypercholesterolemia, investigators evaluated whether their novel Stacking Ensemble machine learning model outperformed clinical diagnostic criteria (signs, history, and biomarkers) and the recommended screening criteria in the United Kingdom. The team leveraged participants from a prospective UK Biobank study of > 500000 participants aged 40 – 69 years from 2006 – 2010. The database included information on the participant’s lifestyle, environmental influences, genotype, phenotype, and health status. Only participants with whole exome sequencing were included in the analysis.

The team classified patients as having familial hypercholesterolemia when likely pathogenic variants were detected in 1 of 3 genes: low‐density lipoprotein receptor (LDLR), apolipoprotein B (APOB), and proprotein convertase subtilisin/kexin type 9 (PCSK9). The LDLR gene variants were classified based on a simplified version of the familial hypercholesterolemia variant interpretation guidelines, created by the American College of Medical Genetics and Genomics guideline. Since gene variants APOB and PCSK9 lack classification guidelines, these were identified through the ClinVar database.

The team designed their models with real-world EHR data that contained detailed clinical signs of familial hypercholesterolemia. They created 2 stacking ensembles. The first model used all their initial machine learning models as base learnings, the second model was fed the top 5 models in terms of PPV as base learners, and the third was the “best” model in terms of PPV for their 7 initial algorithms as base learners. Afterward, they compared the models’ effectiveness by examining the patients with a confirmed genetic diagnosis of familial hypercholesterolemia based on clinical diagnostic criteria and FAMCAT screening criteria.

In total, 1003 out of 454710 participants were identified as familial hypercholesterolemia carriers. These individuals had ≥ 1 pathogenic or likely pathogenic variant detected in the machine learning model. A pathogenic variant in the LDLR gene was the most common 1 in the data set (99.6%).

When examining comorbidities, investigators found no significant differences in traditional cardiovascular disease risk factors, such as hypertension, diabetes, body mass index, and smoking between individuals with and without FH. However, absolute differences existed between the 2 groups for the prevalence of ASCVD (+8.02%), CAD (+9.38%), and peripheral artery disease (+2.4) (P ≤ .0001).

The machine learning model, specifically the one fed the top 5 models in terms of PPV, had the best predictive performance in identifying familial hypercholesterolemia variant carriers. This model outperformed clinical diagnostic criteria and the recommended screening criteria, demonstrating a 74.93% sensitivity, 0.61% precision, 72.80% accuracy, and 79.12% area under the receiver operating characteristic curve. Additionally, the model reduced the number of needed screenings compared with the Familial Case Ascertainment Tool (164 vs 227).

“Application of our [machine learning] algorithm would reduce the [number needed to screen] for genetic confirmation by about one‐quarter, which would reduce unnecessary referrals to lipid clinics and potentially unnecessary genetic testing, a relevant consideration at a time of increasing health care workload and ever restricted health care budgets,” investigators wrote.

Ultimately, the investigator’s model had a greater calibration performance than FAMCAT (Hosmer–Lemeshow test: P=.015 vs <2.2e‐16; Spiegelhalter's z‐test: P=.001 vs <2.2e‐16).

“Integration of [machine learning] ‐derived models into electronic health records for prioritizing genetic confirmation of FH offers a more effective and efficient approach in finding FH in adult populations in general populations and diverse clinical settings,” investigators wrote. “Implementing [machine learning] screening criteria may enhance early identification and management, potentially reducing acute myocardial infarctions, revascularizations, and cardiovascular death resulting from undetected cases.”

References

^{Stevens CAT, Vallejo-Vaz AJ, Chora JR, et al. Improving the Detection of Potential Cases of Familial Hypercholesterolemia: Could Machine Learning Be Part of the Solution?. J Am Heart Assoc. 2024;13(12):e034434. doi:10.1161/JAHA.123.034434}
^{How Common is Familial Hypercholesterolemia? Centers for Disease Control and Prevention. https://blogs.cdc.gov/genomics/2021/01/25/how-common-is-fh/#:~:text=Studies%20have%20estimated%20that%20the,the%20United%20States%20and%20globally. Accessed June 24, 2024.}