Clarkson University Fall 2024 David A. Walsh ‘67 Arts & Sciences Seminar Series
Friday, October 18th at 12pm in SN 213
Active learning in drug discovery
When trying to build a fitted (i.e., machine-learned) mathematical model of some scientific phenomenon, more data is better, but not all data is equally useful. In drug discovery (and in many other domains), computational scientists and laboratory scientists often find ourselves in an “active learning” loop:
(1) Acquire experimental data in the lab, add it to our pool of labeled data
(2) Train a model on our data
(3) Using our existing data and model, decide which data to acquire next
(Repeat)
The active learning problem is about deciding which data to acquire next, in order to most rapidly improve the performance of our model. Doing this well gets us the same performance with fewer rounds in the laboratory, which can be a big savings in time and money.
I will describe the problem using examples from pharma R&D (including protein structure prediction!), and discuss various strategies for solving it, including new state-of-the-art strategies we have developed at Sanofi. I will also discuss some of the mathematical underpinnings of the problem, including the role of entropy in measuring uncertainty (this connection is one of the excuses the Nobel prize committee used to give Geoff Hinton the physics prize).
Michael Bailey, PhD, computational scientist at Sanofi
Michael did his PhD in mathematics at the University of Toronto, with a specialization in differential geometry. After a series of postdocs, he joined Sanofi (a French big-pharma company) where he works as a computational scientist, developing AI tools for all parts of the R&D pipeline, with a special love for anything involving molecular geometry.
The Arts & Sciences Seminar Series is a weekly colloquium series that has been supported by the School of Arts & Sciences Advisory Council at Clarkson University especially through generous gifts from David A. Walsh ‘67. Please contact ansseminar@clarkson.edu
SA&S 300: Arts and Sciences Seminar is a one credit course intended to foster an
interdisciplinary outlook in undergraduates.