Biological Hypothesis Discovery

Aug 8, 2022

How do we discover new biology using neural networks? Even as we’ve trained models that can sensitively detect biological signal, we still need a way to extract and organize these signals into actionable hypotheses. For example, what phenotypes did we miss because they are too rare, or only occur in certain conditions? How do phenotypes correlate with disease or molecular functions? Is a biological phenotype really a group of multiple subclasses informed by different underlying mechanisms? To address these challenges, I focus on two main strategies in my research: visualization and interpretation.

Visualization: Unsupervised cluster analyses can help group together representations learned by neural networks, so we can analyze patterns and global trends. We analyzed the entire yeast proteome in 24 different screens - integrating over 600,000 different images. We found a lot of cool patterns in the ways different proteins responded, that were unexpected - for example, in addition to finding specific and general responses to stress, we also found proteins that would behave in really different ways in different stresses, or unexpected changes in proteins that were thought to do one thing, but had a change that implied that they were functional in different ways. These analysis help us come up with new biological hypotheses, highlighting biology that biologists wouldn’t have previously noticed without these types of big, systematic analyses.

Interpretation methods: In our work understanding the intrinsically disordered “dark proteome", we developed a method allowing us to interpret individual neurons of a neural network as sequence logos (showing the subsequences these neurons preferentially identified in protein sequences), as well as a method that allowed us to interpret what specific properties of any given sequence a neural network was relying on to identify evolutionary conservation in that sequence. We showed that we could identify things like new subclasses of motifs impacting the proteome on a global level, as well as specific hypotheses about functional elements in individual sequences.

Alex Lu

Senior Researcher

Senior Researcher at Microsoft Research New England.

Publications

Assessing the limits of zero-shot foundation models in single-cell biology

Kasia Z Kedzierska, Lorin Crawford, Ava P Amini, Alex X Lu

PDF Project

A functional map of the human intrinsically disordered proteome

Iva Priti�anac, T Reid Alderson, �esika Kolaric, Taraneh Zarin, Shuting Xie, Alex X Lu, Aqsa Alam, Abdullah Maqsood, Ji-Young Youn, Julie D Forman-Kay, Alan M Moses

PDF Project

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Alex X Lu, Amy X Lu, Iva Pritisanac, Taraneh Zarin, Julie D Forman-Kay, Alan M Moses

PDF Code Dataset Project

Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting

Alex X Lu, Oren Z Kraus, Sam Cooper, Alan M Moses

PDF Code Project

Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins

Alex X Lu, Yolanda T Chong, Ian S Hsu, Bob Strome, Louis-Francois Handfield, Oren Kraus, Brenda J Andrews, Alan M Moses

PDF Code Project