Skip to content
Snippets Groups Projects
Gervaise H. Henry's avatar
Gervaise Henry authored
Develop

See merge request !6
a6c42edf

Determining cellular heterogeneity in the human prostate with single-cell RNA sequencing

Data Analysis

  • Requirements:

    • /analysis/DATA/
      • Pd-demultiplex.csv
      • 10x/
        • aggregation_csv.csv
        • GRCh38/
          • "barcodes.tsv"
          • "genes.tsv"
          • "matrix.mtx"
    • 10x CellRanger analyzed data:
      • "filtered_gene_bc_matrice_mex" folder
      • csv file cellranger aggr used for aggregation
      • demultiplex csv file to define subsets of samples
    • R (v3.4.1) packages:
      • methods (v3.4.1)
      • optparse (v1.4.4)
      • Seurat (v2.3.1)
      • readr (v1.1.1)
      • fBasics (v3042.89)
      • pastecs (v1.3.21)
      • qusage (v2.10.0)
      • RColorBrewer (v1.1-2)
      • monocle (v2.6.3)
      • dplyr (v0.7.6)
      • viridis (v0.5.1)
      • and all dependencies
  • HOW TO RUN

  • Pipeline:

    • Link cellranger count/aggr output to analysis
    • Create demultiplex file to add custom sample groups
    • Load R packages
    • Create analysis folders
    • Load analysis parameters (from default or overwrite from command line)
    • Load cellranger data into R/Seurat
    • Label cells based on their cell cycle stated using Seurat based method
    • QC and filter cells/genes
    • If analyzing samples from multiple patients: Align experiments using canonical correlation analysis (CCA)
    • If analyzing samples from one patient: Perform principle component analysis (PCA) using most highly variable genes (HVG) for downstream clustering etc
    • Perform initial "over" clustering
    • Identify "highly stressed" cells using custom PCA based analysis, remove stressed clusters/cells, and re-cluster
    • Correlate cluster gene expression using Quantitative Set Analysis for Gene Expression (QuSAGE) on lineage genesets for identification (epithelia, and stroma)
    • Subset epithelia from stroma for additional analaysis
      • Re-cluster cell types separately
        • Correlate cluster gene expression using QuSAGE on epithelial subtype genesets for identification (basal, luminal and "other")
        • Correlate cluster gene expression using QuSAGE on stromal subtype genesets for identification (fibroblasts, smooth muscle, endothelia and leukocyte)
      • Optional: Correlate cluster gene expression using QuSAGE on additional genesets for analysis
    • Merge epithelial and stromal cells
    • Identify neuroendocrine cells from epithelial cells using custom PCA based analysis
    • Tabulate population cell numbers
    • Generate differentially expressed genelists (DEGs) of populations
  • Genesets:

    • Cell cycle:
    • Stress:
      • "DEG_C2.CGP.M10970.txt" MSigDB C2 Chemical and Genetic Perturbations M10970 CHUANG_OXIDATIVE_STRESS_RESPONSE_UP
      • "genes.deg.Stress.csv" DWS generated DEGs of stressed cells from scRNA-Seq of 3 patient aggregate
    • Lineage:
      • "DEG_Epi_5FC.txt" DWS generated DEGs of epithelia from FACS population (bulk) RNA-sequencing
      • "DEG_FMSt_5FC.txt" DWS generated DEGs of fibromuscular stroma from FACS population (bulk) RNA-sequencing
    • Epithelia:
      • "DEG_BE_5FC.txt" DWS generated DEGs of basal epithelia from FACS population (bulk) RNA-sequencing
      • "DEG_LE_5FC.txt" DWS generated DEGs of luminal epithelia from FACS population (bulk) RNA-sequencing
      • "DEG_OE_5FC.txt" DWS generated DEGs of "other" epithelia from FACS population (bulk) RNA-sequencing
      • "genes.deg.BE.csv" DWS generated DEGs of basal epithelial cells from scRNA-Seq of 3 patient aggregate
      • "genes.deg.LE.csv" DWS generated DEGs of luminal epithelial cells from scRNA-Seq of 3 patient aggregate
      • "genes.deg.OE1.csv" DWS generated DEGs of "other" epithelia cluster 1 cells from scRNA-Seq of 3 patient aggregate
      • "genes.deg.OE2.csv" DWS generated DEGs of "other" epithelia cluster 2 cells from scRNA-Seq of 3 patient aggregate
    • Stroma:
      • "DEG_C5.BP.M11704.txt" MSigDB C5 GO Biological Processes M11704 GO_ENDOTHELIAL_CELL_DIFFERENTIATION
      • "DEG_C5.BP.M10794.txt" MSigDB C5 GO Biological Processes M10794 GO_SMOOTH_MUSCLE_CELL_DIFFERENTIATION
      • "DEG_C5.BP.M13024.txt" MSigDB C5 GO Biological Processes M13024 GO_REGULATION_OF_FIBROBLAST_PROLIFERATION
      • "DEG_C5.BP.M10124.txt" MSigDB C5 GO Biological Processes M10124 GO_LEUKOCYTE_ACTIVATION
      • "genes.deg.Fib.csv" DWS generated DEGs of fibroblast cells from scRNA-Seq of 3 patient aggregate
      • "genes.deg.SM.csv" DWS generated DEGs of smooth muscle cells from scRNA-Seq of 3 patient aggregate
      • "genes.deg.Endo.csv" DWS generated DEGs of endothelial ccells from scRNA-Seq of 3 patient aggregate
      • "genes.deg.Leu.csv" DWS generated DEGs of leukocyte cells from scRNA-Seq of 3 patient aggregate
    • Neuroendocrine:
      • "EurUrol.2005.NE.txt" Neuroendocrine markers from Table 1 of Eur Urol. 2005 Feb;47(2):147-55
      • "genes.deg.NE.csv" DWS generated DEGs of neuroendocrine epithelial cells from scRNA-Seq of 3 patient aggregate
    • Lung epithelia from Lung Gene Expression Analysis (LGEA) Web Portal:
    • Lung epithelia from Nature 2018 Aug;560(7718):319
      • "SupTab3_Consensus_Sigs.csv" scRNA-Sequencing DEGs of mouse basal, club, ciliated, tuft, neuroendocrine, ionocyte cells (Supplementary Tables, SupTab3_Consensus_Sigs)
      • "SupTab6_Krt13_Hillock.csv" scRNA-Sequencing DEGs of mouse Krt13+ hillock cells (Supplementary Tables, SupTab6_Krt13_Hillock)
      • "Ensemble.mus-hum.txt" Ensemble export of mouse (GRCm38.p6) to human ortholog mapping
    • General MSigDb
      • "c2.all.v6.1.symbols.gmt" MSigDB C2 Curated Gene Sets MSigDB C2
      • "c2.cp.kegg.v6.1.symbols" MSigDB C2 KEGG Gene Subsets KEGG
      • "c5.all.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Gene Sets MSigDB C5
      • "c5.bp.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Biological Processes Gene Subsets GO BP