Strand Lab analysis of single-cell RNA sequencing
- Contact: Gervaise H. Henry
- Institution: UT Southwestern Medical Center
- Department: Urology
- Lab: Strand Lab
- PI: Douglas W. Strand, PhD
Data Analysis
-
Requirements:
- /analysis/DATA/
- ProjectName-demultiplex.csv
- 10x/
- [sample]/
- filtered_feature_bc_matrix/
- "barcodes.tsv.gz"
- "genes.tsv.gz"
- "matrix.mtx.gz"
- filtered_feature_bc_matrix/
- [sample]/
- /analysis/DATA/
-
METHODS
- Single cell sequencing data analysis: Data analysis was based on previously published analysis 1,2, modified in the following ways:
- Cellranger: Cellranger mkfastq and count (version 3.1.0) was used to demultiplex, combine sequencing runs on the same sample, and to call cells. The reference genomes used were GRCh38 and mm10 (10x Genomics version 3.0.0). These 10x Genomics tools were automated and parallelized using the UT Southwestern Bioinformatics Core Facility (BICF) pipelines Cellranger aggr was not used in this analysis to prevent downsampleing of data, instead, the data was normalized relative to depth (see Aggregation below).
- Cell filtering: Low quality cells were filtered out by consecutively filtering each sample individually based on UMI counts, percentage mitochondrial content (%mito), and number of genes (in that order). Filter thresholds were chosen dynamically for samples based on the distribution of the parameter. Upper and lower filters were applied on unique molecular identifiers (UMIs), while %mito had only upper, and gene number had only lower filters. The goal of the UMI filters is primarily to remove multiplets and well as gel bead-in emulsions (GEM) comprised of purely ambient RNA. Apoptotic cells degrade their nuclear genome preferentially resulting in higher percentages of mitochondrial genes and are the target of the %mito filter. Gene number removes residual ambient RNA and other low-quality GEMs. The upper bound of the UMI filter was determined by removing cells with UMI counts lower than the highest frequency bin (10 bins) and scaling UMIs of the remaining cells between 0 and 360 and applying the RenyiEntropy thresholding technique. The lower bound of the UMI filter was set to 200. High %mito threshold was determined using the Triangle filter on the rescaled (0-360) parameter, from cells with a greater than or equal value to the highest frequency binned %mito (100 bins). The lower bound to the gene count filter was determined using the MinErrorI filter on the rescaled (0-360) parameter, from cells with less than the value to the highest frequency binned gene count (100 bins). RenyiEntropy, Triangular, and MinErrorI thresholding was applied using autothresholdr version 1.3.5 3.
- Aggregation: Samples were aggregated by normalizing with the sctransform (version 0.2.0) method, and using Seurat’s 4,5 (version 3.1.0) reciprocal PCA method. Samples with less than 750 cells (post-filter) were merged prior to aggregation.
- Stressed cell removal: Cells displaying high stress signatures were removed. Aggregated cells were scored for stress using Seurat’s AddModuleScore method, using genesets enriched in for for stressed cells 6,7. For human samples, the geneset came from Henry, G.H. et al. (2018) 8, while the mouse geneset came from van den Brink, S. et al. (2017) 9
- Clustering: Principal component analysis (PCA) was performed on the data. Graph-based clustering was performed using the principal components which represents 90% of the cumulative variance associated (using Seurat).
- Human cell type identification: The tool SingleR 10 (version 1.0.0) was used to identify cell types. Broad cell lineages were identified utilized the main labels from the built-in Human Cell Atlas data 11. This was done in cluster mode using a resolution of 0.5. Cell types were merged based on their membership in the following major groups: epithelia, fibromuscular stroma, endothelia, and leukocyte. The prostatic epithelial and fibromuscular stromal cell types used cell identities from Henry, G.H. et al. (2018) 8 in single-cell mode.
- Human to mouse cell type transfer: Mouse cell types were identified similarly as the human. Broad cell lineages were identified using the main labels from the built-in mouse Immunological Genome Project data 12. The cell types were merged as as done with the human data (above). Epithelial cells were subset, and re-clustered (as described above). Human hillock and club epithelial cells were merged into a pan-urethral cluster. The human genes were converted to mouse orthologs (genes with no orthology were removed). SingleR was used to identify mouse epithelial clusters using the edited human data as a reference in cluster mode, at a resolution of 0.1. The ortholog map was retrieved from Ensembl’s 13 BioMart from release 99 GRCm38.p6 data.
- DEG calculation: Differentially expressed genes were calculated from the sctransform normalized data, using the MAST 14 method, implemented in Seurat. Significance was determined using the Bonferroni corrected p-value, with an alpha of 0.05. Correlation between human and mouse urethral cells were confirmed by relating DEGs calculated (as above) from human urethral cells to mouse urethral cells. These urethral cells DEGs were obtained by comparing urethral cells against all other epithelial cell types. Human gene names were converted to mouse orthologs (as described in Human to mouse cell type transfer: Ensemble’s BioMart data).
- Geneset Enrichment Analysis: Differentially expressed genes were calculated E-MATB-4991 15 microarray data using Transcriptome Analysis Console using default setting (SST-RMA). Each sample type’s DEGs were calculated compared to the other two groups (alpha = 0.05). These DEGs were used to correlate to the expression of the identified mouse epithelial cell types using the tool QuSAGE (version 2.20.0).
- KEGG Analysis: Enrichment of KEGG annotated genesets was calculated using DAVID (version 6.8) from DEGs up in human BPH club cells compared to human normal club cells. Significant genesets were determined using an alpha of 0.05 on Benjamini corrected p-values.
- Analysis code: All code used for the single cell analysis is publically accessible 6,7,16,17.
-
References
- Henry GH, Malewska A, Joseph DB, et al. A Cellular Anatomy of the Normal Adult Human Prostate and Prostatic Urethra. Cell reports. 2018;25(12):3530-3542 e3535.
- Henry G, Strand D. Determining cellular heterogeneity in the human prostate with single-cell RNA sequencing. 2018.
- Landini G, Randell DA, Fouad S, Galton A. Automatic thresholding from the gradients of region boundaries. J Microsc. 2017;265(2):185-195.
- Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411-420.
- Stuart T, Butler A, Hoffman P, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888-1902.e1821.
- Henry GH, Mathews J, Gesell J, Malladi VS. BICF Cellranger mkfastq Analysis Workflow. http://doiorg/105281/zenodo2652611. 2019.
- Henry GH, Mathews J, Malladi VS. BICF Cellranger count Analysis Workflow. https://doiorg/105281/zenodo2652622. 2019.
- Henry GH, Malewska A, Joseph DB, et al. A Cellular Anatomy of the Normal Adult Human Prostate and Prostatic Urethra. Cell reports. 2018;25(12):3530-3542.e3535.
- van den Brink SC, Sage F, Vértesy Á, et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat Methods. 2017;14(10):935-936.
- Aran D, Looney AP, Liu L, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol. 2019;20(2):163-172.
- Regev A, Teichmann SA, Lander ES, et al. The Human Cell Atlas. Elife. 2017;6.
- Heng TS, Painter MW, Immunological Genome Project C. The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunol. 2008;9(10):1091-1094.
- Cunningham F, Achuthan P, Akanni W, et al. Ensembl 2019. Nucleic Acids Res. 2019;47(D1):D745-D751.
- Finak G, McDavid A, Yajima M, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16:278.
- Sackmann Sala L, Boutillon F, Menara G, et al. A rare castration-resistant progenitor cell population is highly enriched in Pten-null prostate tumours. J Pathol. 2017;243(1):51-64.
- Henry GH, Strand DW. Strand Lab analysis of single-cell RNA sequencing. https://doiorg/105281/zenodo3687064. 2020.
- Henry GH, Strand DW. Determining cellular heterogeneity in the human prostate with single-cell RNA sequencing. https://doiorg/105281/zenodo2654018. 2018.