“New Hype about Toxicity Prediction: Myth or Reality?”—Takeaways from the “Mouse vs. Machine” Debate


At the “From the Pages of ToxSci: Mouse vs. Machine … Are Animal Studies Being Supplanted by Computers?” Toxicological Sciences Featured Session during the 2019 SOT Annual Meeting and ToxExpo, top scientists around the country came together to discuss the current state of computational toxicology and whether this will lead to a paradigm change in toxicological research. The big question being discussed was: Will computer models permanently replace animal studies?

Most of the discussion was focused on a recently published paper in Toxicological Sciences (Luechtefeld et al. 2018) where Dr. Thomas Hartung and his research team from Johns Hopkins University in Baltimore, Maryland, reported an algorithm that can predict toxicity for thousands of chemicals. The first presenter was Dr. Hartung, who discussed key points in his paper, which showed that their machine learning software was better at predicting chemical toxicity than research using animal models. Dr. Hartung excitingly ended his short presentation stating: “Artificial intelligence machine learning will be a new contribution to safety science.”

It wasn’t surprising when Dr. Hartung’s paper gained so much attention and positive publicity—it was covered the same day by Nature, Science, and 140 other press outlets. But it also ignited enormous controversy from other toxicologists who questioned the validity and integrity of Dr. Hartung’s study—topics that were addressed thoroughly during the Annual Meeting debate.

Dr. Ivan Rusyn from Texas A&M University in College Station, Texas, agreed that in the future we will use less animal research and more computational studies. However, Dr. Rusyn directly stated that Dr. Hartung’s paper made hyperbolic and provocative claims. Dr. Rusyn had many reservations toward the paper, claiming (among others) that the model validation and curation of the data wasn’t clearly explained. “Testing the accuracy of the predictions can’t be done in the context of reproducibility of the animal test,” said Dr. Rusyn, explaining that these are very different concepts.

Dr. Rusyn mentioned that there’s a lot more to this debate than just taking a reductionist perspective and picking sides between mouse versus machine. “Regulators and decision-makers need a safety value rather than just stating that something is black or white,” said Dr. Rusyn, who pointed out that machine learning software gives you a number and not a classification. Indeed, this is not enough information in the context of chemical safety and risk assessment. Dr. Rusyn summed up his presentation with a quote from the EliteDataScience website: “Better data beats fancier algorithms.” According to Dr. Rusyn, what is most important to your outcome is the quality of the information.

Another reason why computational approaches in toxicology is a contentious issue was humorously mentioned briefly in Dr. Rusyn’s presentation—computers might replace their jobs, similarly to the Industrial Revolution, where machines and automation have replaced human workers.

The third speaker was Dr. Nicole Kleinstreuer from the National Institute of Environmental Health Sciences (NIEHS) NTP Interagency Center for the Evaluation of Alternative Toxicological Methods in Research Triangle Park, North Carolina. For Dr. Kleinstreuer, the most important need in toxicology (and in all biomedical research) is to move beyond animal models as our basis of comparisons and instead shift toward a new gold standard: human biology. “There is a reproducibility crisis in animal testing,” said Dr. Kleinstreuer, and the numbers are terrifying—up to 70% of research hasn’t been reproducible for other scientists. Dr. Kleinstreuer explained that policy and regulatory decision-makers have used animal experimentation as the gold standard for decades. “Animal studies are our benchmark for new alternative approaches,” said Dr. Kleinstreuer. “You can’t necessarily expect a predictive model to outperform the ability of the animal tests to predict itself. So, it’s important to set appropriate expectations for the performance of these new [computational] approaches when we are using animal data as our benchmark,” she added.

But Dr. Kleinstreuer thinks that we need something much bigger than conventional testing—a veritable paradigm shift in science. “We need to evolve past trying to predict cancer in a rat, trying to predict cardiotoxicity in a dog and actually trying to predict human toxicity as an endpoint and gold standard,” she said.

The last speaker was Dr. Alison Harrill from the NIEHS National Toxicology Program in Research Triangle Park, North Carolina. Dr. Harrill took a different approach than the previous speakers and discussed the ethical considerations around artificial intelligence and machine learning approaches. “I think that there is a tendency to assume that since we are moving away from animal studies to more computational approaches, all of the ethical concerns go away. But that’s not the case,” said Dr. Harrill. “Artificial intelligence are opinions embedded in code,” she added. Indeed, a poorly designed algorithm will generate weak data. Dr. Harrill ended her talk with a quote from a book titled Weapons of Math Destruction written by Cathy O’Neil stating: “Artificial intelligence are more like training a puppy-machine-creature we don’t really understand or control.”

During the Q&A, the audience asked provocative questions to the panelists. An attendee, Mr. Thomas Luechtefeld, commented: “I would argue that there is an ethical concern about NOT using these models [artificial intelligence]. We certainly should and we would save a lot of animal and human lives,” Mr. Luechtefeld said. And he is right—now we have a lot of data, thousands of models are being published, millions of data points are being generated, and thousands of datasets are being produced through computational approaches. Mr. Luechtefeld asked the panelists: “How are we going to create a structure for use and give value for these data points?”

The panelists responded with “good question!” followed by shared laughter—yet overall, the question was vaguely answered. Most of the answers focused on the challenges of machine learning software approaches, with a particular focus on deconstructing Dr. Hartung’s study published in ToxSci.

But it might be important to take a few steps back and look at the “bigger picture” of the direction where toxicology is heading and talk about critical issues that Dr. Hartung, Dr. Kleinstreuer, and Mr. Luechtefeld addressed. One of the biggest goals in science is to reduce the number of animals (and humans) used for research. This goal stems from not just an ethical standpoint, as mentioned by Dr. Hartung and Mr. Luechtefeld, but also a financial one, too—animal studies cost billions of dollars per year in biomedical research, and a big chunk of that budget is in spent on toxicological studies.

Next, we have the reproducibility crisis as mentioned by Dr. Kleinstreuer, which has been pleiotropically detrimental toward scientific progress. 50% of scientists report that they aren’t even able to reproduce data generated in their own lab. So, what are some ways that computational research can be designed to proactively preempt reproducibility issues? Dr. Harrill emphasized that these codes need to be amenable to us. Open-source coding is absolutely crucial for reproducibility and transparency. Closed-source coding will make reproducibility incredibly difficult between various groups.

Lastly, Dr. Kleinstreuer raised a point about moving beyond animal data and focusing on human biology. The biggest financial consequences of using animal models as a research paradigm are seen in the billion-dollar process of drug development where a drug would work wonderfully in animal models but show serious adverse effects in humans. On the other hand, Dr. Hartung briefly mentioned advancing in vitro assays and tissue-on-a-chip technologies, which when combined with computational technology might significantly reduce animal testing—points mentioned by Dr. Kleinstreuer and Dr. Hartung seem ideologically contrasting perspectives.

So, what direction is toxicological research heading toward (human versus animal versus machine)? The answer is most certainly not just black and white—each has its own merits and complications, as mentioned by most panelists. Should we abandon animal research in favor of computational models or vice versa? Absolutely not. Dr. Harrill emphasized the importance of integrating these approaches. Instead, the big question is: How can these approaches be reconciled in a way that allows all these seemingly diametrically opposing approaches to be used in a concerted effort by the greater scientific community?

Author’s Notes: Some of the career panelist quotes were shortened in order to deliver the message in a concise manner. The title “New Hype about Toxicity Prediction: Myth or Reality?” was originally used in Dr. Ivan Rusyn’s presentation during the Toxicological Sciences Featured Session debate “From the Pages of ToxSci: Mouse vs. Machine … Are Animal Studies Being Supplanted by Computers?”

Editor’s Note: This blog was prepared by an SOT Reporter. SOT Reporters are SOT members who volunteer to write about sessions and events they attend during the SOT Annual Meeting and ToxExpo. If you are interested in participating in the SOT Reporter program in the future, please email Giuliana Macaluso.

Recent Stories
National Postdoc Appreciation Week: A Message from the 2019–2020 Postdoctoral Assembly Chair

Nominations Are Open for the 2020 SOT Translational Impact Award

Nominate a Scientist or Clinician for the 2020 SOT Translational/Bridging Travel Award