Alex Lu

Senior Researcher

Microsoft Research


Modern biological experiments generate an unprecedented amount of data. How do we discover new biology when we have millions of microscopy images or protein sequences, and it becomes impossible to look at data at a one-by-one basis anymore? My research develops machine learning methods for discovering hypotheses in biology.

I have broad research interests, but central themes include:

  • Reducing effort and bias in applying machine learning: The best-performing machine learning methods often require a large volume of labeled training data. Not only is this time-consuming, but it biases models to be more sensitive to biology we have prior knowledge of - we might not be able to discover unknown biology, since we can’t label it. To address these barriers, I focus on self-supervised machine learning methods.
  • Learning relevant signal without direct specification: In biology, one scientist’s signal is another scientist’s noise: different biologists simply have different questions and are interested in learning different things from even the same data. I research how we can train machine learning models to extract relevant biological signal from data (even when we don’t know how to directly specify this), and robust to non-biological noise.
  • Interpretation and visualization: After we’ve trained a model, how do we extract insights? I’m interested in how we can use interpretation and visualization techniques to identify new biological hypotheses.


How do we discover biological hypotheses with machine learning?

Self-Supervised Learning

Self-supervised representation learning produces unbiased representations of biology

Biological Hypothesis Discovery

Visualization and interpretation discovers biological hypotheses

Robust Models

How do we build models that ignore non-biological variation?

Recent Publications

Improved Conditional Flow Models for Molecule to Image Synthesis