2024-04-15 16:30
IMBI

Niklas Brunn (IMBI): Infusing structural assumptions into dimension reduction for single-cell RNA sequencing data to identify small gene sets.

Abstract

Dimension reduction techniques significantly enhance the exploration of cellular heterogeneity in single-cell RNA sequencing data. While these approaches are predominantly data-driven, it may still be useful to incorporate biologically informed assumptions about the underlying structure or the experimental design. For instance, dimensions that help to distinguish between cell groups intuitively should be characterized by distinct small sets of genes. Additionally, the design in a time series experiment should be incorporated such that gradual changes in corresponding gene sets characterize temporal changes of cell states. To this end, we propose the boosting autoencoder approach, which synergizes unsupervised deep learning models for dimension reduction and boosting to formalize constraints. Specifically, the approach selects distinct small sets of genes by maximizing a score function to explain cell state-related patterns in separate latent dimensions. We showcase the functionality of our approach through applications on simulated data, accounting for different structural assumptions. Additionally, we explore the diversity of neural cell identities and temporal patterns of embryonic development in different applications to gene expression data. Our approach presents a complementary dimension reduction strategy to identify cell stage-related genes without needing pre-defined cluster memberships.