A Machine Learning Approach to Identify Distinct Subgroups of Veterans at Risk for Hospitalization or Death Using Administrative and Electronic Health Record Data

Authors

Ravi B. Parikh

Kristin A. Linn

Jiali Yan

Matthew L. Maciejewski

Ann-Marie Rosland

Kevin G. Volpp

Peter W. Groeneveld

Amol S. Navathe

Source

PLOS ONE https://journals.plos.org/plosone/

Peer-Reviewed Article

February 2021

Headline

Use of machine learning clustering algorithms revealed 30 distinct subgroups of patients among high-risk veterans, indicating a need for tailored approaches to health care.

Context

The Veterans Health Administration developed the Care Assessment Needs (CAN) score to predict risk of future hospitalization or mortality for all veterans who receive primary care. Interventions for all veterans, such as telemedicine or case management, have had limited effectiveness in improving care for individuals with high CAN risk scores. Since approaches to subgrouping high-risk patients that rely on diagnosis/disease criteria or expert opinion are prone to human error, this study sought to understand whether machine learning algorithms could identify distinct subpopulations of individuals at high risk for hospitalization or mortality using electronic health record and administrative data.

Findings

Within a national randomized sample of 110,000 veterans identified from high CAN scores, this study identified 30 unique subgroups with a wide range of characteristics, health care utilization, and mortality risk. Though most subgroups were categorized via chronic conditions, the analysis identified several subgroups based on non-clinical factors, including sociodemographic (e.g., Medicaid, Hispanic ethnicity) and psychobehavioral (e.g., specific polysubstance use, psychoses without polysubstance use) characteristics. About 25 percent of the high-risk veterans did not fit clearly into one subgroup due to having characteristics across multiple subgroups or characteristics that were not captured.

Takeaways

Care management programs for high-risk patients should be tailored to the distinct needs of the population. The clustering methods described in this article can guide health care organizations in using both clinical and administrative data to identify unique needs and inform appropriate interventions for targeted subgroups.

VIEW THE RESOURCE

Posted to The Playbook on

April 8, 2021

Topics

Population Identification

Level of Evidence

Moderate

What does this mean?

A Machine Learning Approach to Identify Distinct Subgroups of Veterans at Risk for Hospitalization or Death Using Administrative and Electronic Health Record Data

Headline

Context

Findings

Takeaways

Does Interdisciplinary Care Team Care Management Improve Health Quality and Demonstrate Cost-Effectiveness?

Redesign of a Primary Care-Based Housing Intervention to Address Upstream Housing Needs

A Descriptive Study of Screening and Navigation on Health-Related Social Needs in a Safety-Net Hospital Emergency Department