Machine Learning Model Accurately Predicts Long-Term Risk of Type 2 Diabetes
Researchers have unveiled a new electronic health record-based prediction model designed to identify individuals at the highest risk for developing type 2 diabetes up to 10 years in advance. The findings were presented during the 2026 Scientific Sessions of the American Diabetes Association in New Orleans.
The development of this tool comes as over 60% of U.S. Adults possess risk factors for type 2 diabetes, a number that exceeds the capacity of current prevention programs. Because the condition often develops gradually without clear warning signs, healthcare professionals frequently struggle to identify those who would benefit most from early treatment.
A Massive Data-Driven Approach
The retrospective cohort study analyzed 3,365,464 adults between the ages of 18 and 70 who received care at Kaiser Permanente Northern California from 2012 to 2024. The study group had a median age of 39, and 55% of the patients were female.

To estimate risk over one, three, and 10-year intervals, researchers used a hazard-based super learning approach. This method combined multiple survival-analysis models to process a wide array of data points.
The model relied on clinical and demographic information routinely gathered during medical visits, such as weight, age, medications, medical history, and blood glucose levels. It incorporated public data regarding walkable areas and access to healthy food.
Measuring Precision and Performance
During a median follow-up of 5.4 years, the study recorded a type 2 diabetes incidence of 10.7 per 1,000 person-years. The training model achieved an area under the curve of 0.886, while the validation model scored 0.883.
For those defined as high risk—those with a risk greater than 1.2%—the model demonstrated a sensitivity of 74% and a specificity of 82% over a 10-year follow-up period. One-year follow-up data showed a near-ideal calibration, with a mean predicted risk of 1.03% compared to an observed risk of 1.01%.
The Path Toward Proactive Prevention
“These findings represent a potential advancement over existing approaches for identifying individuals at risk of developing type 2 diabetes by enabling earlier, more precise detection and supporting a more targeted, proactive approach to prevention,” said lead author Luis A. Rodriguez, PhD, MPH, RD.

Dr. Rodriguez noted that the model could allow clinicians and health systems to focus their resources on high-risk individuals who have the most to gain from treatment and prevention.
As a possible next step, the authors intend to test the model within a clinical setting. This phase may determine if the tool can successfully increase engagement in prevention programs and lead to a reduction in diabetes incidence.
For more information on fighting diabetes, visit diabetes.org.
Frequently Asked Questions
Who was included in the research study?
The study included 3,365,464 adults aged 18–70 who received care at Kaiser Permanente Northern California between 2012 and 2024.
What types of data did the prediction model use?
The model used clinical and demographic data from medical visits—including weight, age, blood glucose levels, medications, and medical history—as well as public data on walkable areas and access to healthy food.
What is the ultimate goal of testing this model in a clinical setting?
Researchers aim to see if the model helps increase participation in type 2 diabetes prevention programs and reduces the incidence of the disease.
Do you believe that integrating environmental data into medical records could change the way we approach preventative healthcare?