Cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states, which can provide critical biological insights into the mechanisms driving cellular heterogeneity. Starting with single cell gene expression data, we offer as a service the SCENIC protocol, both in R and Python (on ACCRE or AWS).
SCENIC and pySCENIC consist of a shared computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (Aibar et al, 2017, Van de Sande et al, 2020).
In the SCENIC workflow, coexpression modules between transcription factors and candidate target genes are first inferred using GENIE3 (R) or GRNBoost (Python). RcisTarget then identifies modules for which the regulator's binding motif is significantly enriched across the target genes and creates regulons with only direct targets. AUCell scores the activity of each regulon in each cell, thereby yielding a binarized activity matrix. The prediction of cell states is based on the shared activity of regulatory subnetworks.
CDS also has several downstream reporting and visualization solutions specific to SCENIC, without which significant data wrangling and visualization is needed for interpretation.
Our standard service provides the following deliverables and assumes that expression data is already processed and possibly normalized.
- Preparation of single cell dataset(s)
- Running SCENIC workflow on desired data and selected binding motif database(s)
- Regulon enrichment reporting
- Re-integration of SCENIC results into single cell object (eg. Seurat)
- UMAP of all regulons (binarized and AUC enrichment data)
- Heatmap visualization of binarized data, alongside of cell/sample annotations
Extended services are available, which supplement our standard/fixed workflow described above:
- Construction of gene correlation network graphs via Cytoscape
- Community detection analysis for meta-module detection (clusters of clusters)
- Acquiring data from 3rd party services, such as ArrayExpress or GEO
- Data processing from FastQ files
- Iterative customization of results and plots for presentations, publications, grants, etc...