NIH scientists use machine learning AI to narrow down risk factors for long COVID

Since the early months of the COVID-19 pandemic, doctors and scientists have been mystified by the occurrence of what’s come to be known as long-haul COVID or simply long COVID, in which individuals experience symptoms that last for weeks or months after the initial coronavirus infection has passed.

Long COVID affects people of all ages, races and ethnicities, those with and without prior health conditions and both vaccinated and unvaccinated individuals, leaving them with long-lasting fatigue, “brain fog,” headaches or chest pain, to name a few of the long list of symptoms.

But while it may seem that there’s no clear rhyme or reason to long COVID diagnoses, researchers from the National Institutes of Health (NIH) are determined to suss out a pattern that could shed at least a little light on the factors that put a person at higher risk of developing the condition.

They’ve embarked on the mission with the help of machine learning artificial intelligence, and, according to a study published in The Lancet Digital Health this week, they’ve already found some success in narrowing down the risk factors for long COVID.

In the study, the researchers pulled data from the electronic health records of nearly 100,000 adults who had tested positive for the virus—with or without hospitalization—including almost 600 who were diagnosed as long-haulers and treated in a long COVID clinic.

Using information about the patients’ demographics, healthcare utilization, diagnoses and medications, the NIH-backed team trained a trio of machine learning models to look for data points that distinguish long-haulers from those diagnosed with COVID but without the follow-up condition.

With that training, the AI was able to sift through a larger database of de-identified EHRs that represented nearly 5 million people who have tested positive for COVID. In records dating up to October 2021, the model was able to spot more than 100,000 people who had many of the risk factors and symptoms of long COVID; the researchers estimate that number has since doubled.

“Once you’re able to determine who has long COVID in a large database of people, you can begin to ask questions about those people,” said Josh Fessel, M.D., Ph.D., a scientific program lead in the NIH’s Researching COVID to Enhance Recovery (RECOVER) initiative. “Was there something different about those people before they developed long COVID? Did they have certain risk factors? Was there something about how they were treated during acute COVID that might have increased or decreased their risk for long COVID?”

Indeed, in its analysis, the AI pinpointed several factors that seem to carry the most weight in determining whether a COVID-positive person will go on to develop long COVID. They include the presence of long-term respiratory symptoms and non-respiratory symptoms like sleep disorders, chest pain and malaise after the acute COVID infection has passed as well as preexisting risk factors that make for more severe acute COVID infections including chronic conditions like diabetes, chronic kidney disease and chronic pulmonary disease.

The model also found that receiving a COVID vaccination after recovering from the infection lessened the risk of being labeled as a potential long COVID patient.

“This result is noteworthy and indicates that not only does vaccination against SARS-CoV-2 protect against hospitalization and death, but that it might also protect against long COVID,” the study’s authors wrote.

Moving forward, the researchers said they plan to continue training the AI models on more patient data to make them more accurate. From there, they can be used to recruit long COVID patients to clinical trials.