Publications

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning

Self-supervised learning exploiting principles of comparative genomics can help us understand the intrinsically disordered dark proteome.

Convolutions are competitive with transformers for protein sequence pretraining

Convolutional models are competitive with transformers for protein sequences.

Improved Conditional Flow Models for Molecule to Image Synthesis

In this paper, we aim to synthesize cell microscopy images under different molecular interventions, motivated by practical applications to drug development. Building on the recent success of graph neural networks for learning molecular embeddings and flow-based models for image generation, we propose Mol2Image: a flow-based generative model for molecule to cell image synthesis. To generate cell features at different resolutions and scale to high-resolution images, we develop a novel multi-scale flow architecture based on a Haar wavelet image pyramid. To maximize the mutual information between the generated images and the molecular interventions, we devise a training strategy based on contrastive learning. To evaluate our model, we propose a new set of metrics for biological image generation that are robust, interpretable, and relevant to practitioners. We show quantitatively that our method learns a meaningful embedding of the molecular intervention, which is translated into an image representation reflecting the biological effects of the intervention.

Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning

Self-supervised representation learning of biological sequence embeddings alleviates computational resource constraints on downstream tasks while circumventing expensive experimental label acquisition. However, existing methods mostly borrow directly from large language models designed for NLP, rather than with bioinformatics philosophies in mind. Recently, contrastive mutual information maximization methods have achieved state-of-the-art representations for ImageNet. In this perspective piece, we discuss how viewing evolution as natural sequence augmentation and maximizing information across phylogenetic “noisy channels” is a biologically and theoretically desirable objective for pretraining encoders. We first provide a review of current contrastive learning literature, then provide an illustrative example where we show that contrastive learning using evolutionary augmentation can be used as a representation learning objective which maximizes the mutual information between biological sequences and their conserved function, and finally outline rationale for this approach.

YeastSpotter: accurate and parameter-free web segmentation for microscopy images of yeast cells

We introduce YeastSpotter, a web application for the segmentation of yeast microscopy images into single cells.

The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers

We created a public dataset of 132,209 images of mouse cells, COOS-7, to test how robust classifiers are to covariate shifts.

Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting

By training models with a self-supervised learning task, we learn highly effective representations of protein biology with no labels.

Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins

Integrating 400,000 images from 24 experiments conducted on yeast to reveal novel aspects of cell biology.

Influence of repetitive mechanical loading on MMP2 activity in tendon fibroblasts

Matrix metalloproteinase2 has been implicated in tendon pathology caused by repetitive movements. However, its activity in the early stages of the tendon’s response to overuse, and its presence in the circulation as a possible indicator of tendon degradation, remain unknown. Human tendon cells were repetitively stretched for 5 days, and the rabbit Achilles tendon complex underwent repetitive motion 3× per week for 2 weeks. Quantitative polymer chain reaction analysis was performed to detect matrix metalloproteinase2/14 and tissue inhibitor of matrix metalloproteinase2 messenger ribonucleic acid of cells and rabbit tissue, and matrix metalloproteinase2 protein levels were determined with an enzyme linked immunoassay. Matrix metalloproteinase2 activity was examined using zymography of the conditioned media, tendon and serum. Immunohistochemistry was used to localize matrix metalloproteinase2 in tendon tissue, and the density of fibrillar collagen in tendons was examined using second harmonic generation microscopy. Tendon cells stretched with high strain or high frequency demonstrated increased matrix metalloproteinase2 messenger ribonucleic acid and protein levels. Matrix metalloproteinase2 activity was increased in the rabbit Achilles tendon tissue at weeks 1 and 2; however, serum activity was only increased at week 1. After 2 weeks of exercise, the collagen density was lower in specific regions of the exercised rabbit Achilles tendon complex. Matrix metalloproteinase2 expression in exercised rabbit Achilles tendons was detected surrounding tendon fibroblasts. Repetitive mechanical stimulation of tendon cells results in a small increase in matrix metalloproteinase2 levels, but it appears unlikely that serum matrix metalloproteinase2 will be a useful indicator of tendon overuse injury.

An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images

A simple k-nearest neighbor algorithm can locally correct for covariate shifts when comparing image screens.

Angiopoietin‐like 4 promotes angiogenesis in the tendon and is increased in cyclically loaded tendon fibroblasts

The mechanisms that regulate angiogenic activity in injured or mechanically loaded tendons are poorly understood. The present study examined the potential role of angiopoietin-like 4 (ANGPTL4) in the angiogenic response of tendons subjected to repetitive mechanical loading or injury. Cyclic stretching of human tendon fibroblasts stimulated the expression and release of ANGPTL4 protein via transforming growth factor-β (TGF-β) and hypoxia-inducible factor 1α (HIF-1α) signalling, and the released ANGPTL4 was pro-angiogenic. Angiogenic activity was increased following ANGPTL4 injection into mouse patellar tendons, whereas the patellar tendons of ANGPTL4 knockout mice displayed reduced angiogenesis following injury. In human rotator cuff tendons, the expression of ANGPTL4 was correlated with the density of tendon endothelial cells. To our knowledge, this is the first study characterizing a role of ANGPTL4 in the tendon. ANGPTL4 may assist in the regulation of vascularity in the injured or mechanically loaded tendon. TGF-β and HIF-1α comprise two signalling pathways that modulate the expression of ANGPTL4 by mechanically stimulated tendon fibroblasts and, in the future, these could be manipulated to influence tendon healing or adaptation.

Accumulation of oxidized LDL in the tendon tissues of C57BL/6 or apolipoprotein E knock-out mice that consume a high fat diet: potential impact on tendon health

Clinical studies have suggested an association between dyslipidemia and tendon injuries or chronic tendon pain; the mechanisms underlying this association are not yet known. The objectives of this study were (1) to evaluate the impact of a high fat diet on the function of load-bearing tendons and on the distribution in tendons of oxidized low density lipoprotein (oxLDL), and (2) to examine the effect of oxLDL on tendon fibroblast proliferation and gene expression.

Enhanced collagen type I synthesis by human tenocytes subjected to periodic in vitro mechanical stimulation

Mechanical stimulation (e.g. slow heavy loading) has proven beneficial in the rehabilitation of chronic tendinopathy, however the optimal parameters of stimulation have not been experimentally determined. In this study of mechanically stimulated human tenocytes, the influence of rest insertion and cycle number on (1) the protein and mRNA levels of type I and III collagen; (2) the mRNA levels of transforming growth factor beta (TGFB1) and scleraxis (SCXA); and (3) tenocyte morphology, were assessed.

Podocalyxin Regulates Murine Lung Vascular Permeability by Altering Endothelial Cell Adhesion

Despite the widespread use of CD34-family sialomucins (CD34, podocalyxin and endoglycan) as vascular endothelial cell markers, there is remarkably little known of their vascular function. Podocalyxin (gene name Podxl), in particular, has been difficult to study in adult vasculature as germ-line deletion of podocalyxin in mice leads to kidney malformations and perinatal death. We generated mice that conditionally delete podocalyxin in vascular endothelial cells (PodxlΔEC mice) to study the homeostatic role of podocalyxin in adult mouse vessels. Although PodxlΔEC adult mice are viable, their lungs display increased lung volume and changes to the matrix composition. Intriguingly, this was associated with increased basal and inflammation-induced pulmonary vascular permeability. To further investigate the etiology of these defects, we isolated mouse pulmonary endothelial cells. PodxlΔEC endothelial cells display mildly enhanced static adhesion to fibronectin but spread normally when plated on fibronectin-coated transwells. In contrast, PodxlΔEC endothelial cells exhibit a severely impaired ability to spread on laminin and, to a lesser extent, collagen I coated transwells. The data suggest that, in endothelial cells, podocalyxin plays a previously unrecognized role in maintaining vascular integrity, likely through orchestrating interactions with extracellular matrix components and basement membranes, and that this influences downstream epithelial architecture.

Mast cells exert pro-inflammatory effects of relevance to the pathophyisology of tendinopathy

We have previously found an increased mast cell density in tendon biopsies from patients with patellar tendinopathy compared to controls. This study examined the influence of mast cells on basic tenocyte functions, including production of the inflammatory mediator prostaglandin E2 (PGE2), extracellular matrix remodeling and matrix metalloproteinase (MMP) gene transcription, and collagen synthesis.