To help drugs withstand the 'white hot heat' of human trials, scientists model the preclinic after the clinic

It’s no secret that most drugs that work in mice fail to work in men (and women). Yet thousands of animal lives and taxpayer dollars are spent on preclinical research that, in the end, offers little more than false hope. A new method for conducting preclinical studies could help change that.

In an article published Sept. 18 in Science Translational Medicine, a research team from the Keck School of Medicine at the University of Southern California (USC) described the outcome of a National Institutes of Health (NIH)-sponsored multilaboratory preclinical trial for six potential stroke treatments—only one of which, uric acid, was found to be effective when combined with surgery. The trial, dubbed Stroke Preclinical Assessment Network, or SPAN, could ultimately serve as a proof-of-concept for improving translational research overall.

“It’s done nicely,” Malcolm Macleod, Ph.D., a neuroscientist specializing in translational research who was not involved in SPAN, told Fierce Biotech Research in an interview. “Their findings, I think, demonstrate the power of the approach, because five out of the six compounds didn’t really do anything in spite of the preclinical data from other individual experiments looking promising.”

The problem in translational research is one facet of a broader “replication crisis” in science, referred to interchangeably as the reproducibility crisis. When scientists from a different lab run the same experiments that were conducted in a study, they often—or even most of the time, by some estimates—fail to replicate the same results. That implies that many researchers are using faulty methods, at best. At worst, it suggests outright fraud.

“There’s kind of a general Gestalt concern shared by the whole community, because we’ve seen so many poster children from the lab fall over in the white hot heat of human clinical trials,” Macleod said.

Scientists began raising concerns about reproducibility as early as the 1960s, but it wasn’t until the early 2000s that debates began in earnest. Pre-2020, the conversation was largely limited to the scientific community, with some public attention piqued by high-profile controversies.

“Amongst ourselves, we all thought it had gotten to be a crisis,” Patrick Lyden, M.D., a physician and stroke researcher who led the SPAN study, recalled to Fierce Biotech Research.

Then along came COVID and a twin pandemic of misinformation. Suddenly, the reproducibility crisis was tinder for the fires of conspiracy theorists and snake oil salesmen who sought to discredit scientists and physicians for their own gain, pointing to the science world’s long history of irreplicable studies and failed medicines as proof that it couldn’t be trusted.

“Now you’ve got anti-vaccine people, you’ve got Q Anon, you’ve got bleach,” Lyden said. “It just became hypercritical that we identify a pathway where science can proceed with extraordinary confidence that we have a result that we believe in, that we can trust and that we can go forward with.”

“Everywhere that there could be unconscious bias—or overt, fraudulent bias—we eliminated it,” he added.

Rigor and transparency

That was particularly true in Lyden’s own specialty. Dozens of potential stroke treatments have failed in patients despite strong preclinical support for their efficacy. Lyden himself has witnessed this firsthand as a principal investigator for clinical trials of all sizes, including large ones sponsored by the NIH.

That experience, coupled with his preclinical research, made Lyden’s lab at USC an ideal coordinating center for SPAN. His team designed the protocols and set up a network of six trial sites across the U.S. just as they would a clinical trial for humans, “enrolling” more than 2,600 mice and rats.

“The intent was to bring to science some of the hard-earned, hard-learned lessons from clinical trials,” Lyden said. In this case, that meant addressing rigor and transparency.

“Everywhere that there could be unconscious bias—or overt, fraudulent bias—we eliminated it,” he added.

Take concealment and randomization, for instance. Often, the “random” aspect of preclinical studies comes from a researcher taking a mouse out from a box without knowing its identity. But while the scientist may not know which mouse they’re using, they do know if they’re giving it a treatment or a placebo, Lyden explained.

“Instead of that, we created this very elaborate and complex system where our six study sites were required to purchase their animals and have them delivered to the lab,” he said. Each of the animals received an ear tag, then was entered into a database controlled by the coordinating center. Meanwhile, the five medicines were delivered to the centers in the form of powders in amber vials marked only with a number. (The sixth treatment, a medical device, couldn’t be disguised.)

Forty-eight hours before a treatment, the surgeon performing the experiments at the trial site had to contact the clinical coordinators and tell them which rodent was scheduled for treatment. The researchers then randomized whether the animal would receive an active or placebo treatment and gave the surgeon the number on the vial to use.

“Now, the surgeon has absolutely no way of influencing consciously or unconsciously how the stroke gets done, how the drug is administered and what they do with that animal during surgery,” Lyden explained.

Besides making the randomization and concealment processes more rigorous, the researchers attempted to improve the models themselves. Many preclinical studies on drugs show benefits in young, otherwise healthy animals that don’t reflect the most likely consumers of the end product, Lyden said.

“This is one of those tropes that at scientific meetings people are always yelping about, but no one ever changes,” he said. There are practical reasons for this: Younger animals live longer, giving more time to carry out the study. Males don’t have estrous cycles, so there’s no need to consider the effects of hormonal shifts the way there is in females.

Well, now our drug is only going to work in teenage boys. And, you know, I have had a few stroke patients that were teenage boys, but only a few.” — Patrick Lyden, M.D.

“Well, now our drug is only going to work in teenage boys,” Lyden mused. “And, you know, I have had a few stroke patients that were teenage boys, but only a few.”

To make the trial more representative, the researchers used older models with co-morbidities like diabetes and hypertension—a closer resemblance to the average stroke patient. Half of them were female.

“There’s the limitation that these models are not really perfect mimics of human hypertension or human diabetes, or even human aging,” Lyden said. “But at least we're trying to make an attempt to demonstrate that the drug shows efficacy in models that have the type of diseases humans have when they get their stroke.”

The scientists also saw an opportunity to raise the threshold for what constituted an effective treatment—and to add efficiency. In a traditional clinical trial, a drug is studied all the way to the end of a phase, then the data are analyzed. But in SPAN, the researchers were studying six drugs at once.

“If one of those six is futile, it would be nice to know that early on and exclude it so you don’t keep studying it,” Lyden said. “By that same token, if one of them is a winner, and that shows early on, you also don’t want to be studying it to the end.”

To speed up the elimination and affirmation process, Cedars-Sinai Medical Center statistician Márcio A. Diniz, Ph.D., developed a “multi-stage, multi-arm” method of analyzing the data, where each stage was set at 25%, 50%, 75% and 100% of the predicted sample size. At each of those points, the researchers analyzed the data on primary and secondary endpoints and decided whether the treatment could be deemed “efficacious”—that is, whether it showed a benefit in at least 50% of the subjects—or if it was “futile,” showing less than a 15% treatment benefit. It would then either be used to treat more animals or discontinued.

These thresholds are relatively high compared to the ones set in clinical trials, Lyden said. Drugs in stroke trials, for instance, are designed to show a 7% treatment benefit, which is derived from the threshold physicians need to see before they’re willing to prescribe a drug, he explained.

“We set it a bit stringently because we’re all tired of finding things that don’t work,” Lyden said. 

Strengthening SPAN

Those thresholds worked well when it came to eliminating treatments from the list. Only one, uric acid, made it beyond the third stage, where the drug was given to 75% of the treatment population. The rest were discontinued, despite prior evidence for their effectiveness. Even for uric acid, more studies are warranted before it’s tested in humans, the researchers noted—while most animals passed the primary endpoint, called the corner test, a common assessment for stroke studies, they didn’t meet the secondary endpoint of gait improvement.

SPAN doesn’t tackle all the problems with translational research. One looming challenge for the field is whether animal models, especially rodents, can ever really reflect what happens in humans. Some researchers say it’s impossible, for reasons ranging from inherent species differences to the ethical dilemmas raised by making models more accurately reflect human disease. Older and sicker mice, for instance, have to suffer longer.

“SPAN does not address the overarching question of how suitable animal models are for modeling human research,” Lyden said. “That’s a question that challenges everybody—cancer, heart, pulmonary, everybody struggles with that one.”

From Macleod’s perspective, there are ways to make SPAN stronger. The first is to make it a long-term, systematic means of assessing potential treatments.

That same systematic approach to testing drugs should be coupled with one for selecting them for the program in the first place, he said. The treatments in this case were chosen after “independent, rigorous peer review by the NIH," according to the paper, but the criteria weren’t published.

Together, those changes could make it so that the most promising drugs get into the clinic faster, with fewer duds along the way. As SPAN stands now, Macleod said, “I think it’s a very important step on the way, but I don’t think it’s the final road-ready engine for transforming stroke research that I think we might need.”

Down the road

Improvements aside, weeding out drugs before they get to critical, expensive points in human clinical trials could translate to major savings for the NIH, Macleod said. By his estimates, it would have cost around $300 million to take all six of the treatments tested into clinical trials. The budget for SPAN was $500,000 per year, according to NIH documents.

To Lyden, it’s not yet exactly clear where SPAN will fit in the pipeline from mice to men. A couple of the drugs in the study were “nowhere near” ready for use in humans, as he put it. To avoid wasting resources in the future, perhaps the program should be implemented before the start of phase 2b or phase 3, with proof of concept and target engagement already established and human safety data known.

That may be more clear when SPAN 2.0 is on the books. Preliminary prep work for the study is underway now, with five new treatments and Lyden and his team again at the helm. The researchers are also in the process of posting all of their protocols online, so similar preclinical trial networks can be carried out for other disease areas. Oncology, Alzheimer’s disease and cardiac disease could all stand to benefit, Lyden said.

“For any fields that have a pretty high failure rate between preclinical and clinical translation, this is one tool to think about,” he said.