Skip to content
Snippets Groups Projects
Commit 641f5c1f authored by Gervaise Henry's avatar Gervaise Henry :cowboy:
Browse files

Update readme.md

parent 7008fdcf
Branches
Tags
No related merge requests found
......@@ -14,52 +14,88 @@ Determining cellular heterogeneity in the human prostate with single-cell RNA se
Data Analysis
-------------
* **Requirements**:
* **Requirements:**
* 10x CellRanger analyzed data:
* "filtered\_gene\_bc\_matrice_mex" folder
* csv file cellranger aggr used for aggregation
* demultiplex csv file to define subsets of samples
* R (v3.4.1) packages:
* methods (v3.4.1)
* optparse (v1.4.4)
* Seurat (v2.3.1)
* readr (v1.1.1)
* fBasics (v3042.89)
* pastecs (v1.3.21)
* qusage (v2.10.0)
* RColorBrewer (v1.1-2)
* *and all dependencies*
* **Pipeline** (r.scripts):
* sc_Demultiplex (import 10x Cell Ranger data into R and subset samples based on user input)
* sc_D-SampleReorder (reorder sample names)
* sc_SeuratScore.CellCycle (identify cell cycle state)
* sc_QC (filter cells, scale and remove variation associated to: UMI; % mitochondrial genes; S; G2M phase score, identify most variable genes and run PCA on them
* sc_Cluster (perform tSNE and cluster using graph*based approach)
* sc_PC.Score.Stress (PCA analysis of stress gene signature, projecction to PC1 used as "Stress Score", stressed cells and clusters removed cells re*clustered)
* sc_QuSAGE.Lineage (clusters correlated to prostate population RNA-Seq DEGs of Epithelia and Fibromuscular Stroma to assign those identities to those clusters)
* sc_LineageSubClust (separate identified Epithelia, and Stroma + Lineage Unknown cells, and recluster them independently)
* sc_QuSAGE.EpiSubClust (Epithelial clusters correlated to DWS generated prostate population RNA-Seq DEGs of Basal, Luminal, and "Other" Epithelia, Lung Map generated Epithelial, C2 KEGG, and C5 GO BP genesets, to assign those identities to those clusters)
* sc_QuSAGE.StSubClust (Stromal and Lineage Unknown clusters correlated to external DEGs of Endothelial Cells, Smooth Muscle Cells, and Fibroblasts, Lung Map generated Epithelial, C2 KEGG, and C5 GO BP genesets, to assign those identities to those clusters)
* sc_MergeSubClust (merge identities of Eithelial and non-Epithelial clusters from QuSAGE)
* sc_PC.Score.NE (PCA analysis of Neurocrine Epithelia protein marker signature, projecction to PC1 used as "NE Score", highest scoring cells identified as Neuroendocrine Cells)
* sc_DEG (generate DEG lists between important cell types and predict surface and nuclear markers from them)
* sc_Table (produce tables of population differences between samples)
* **Pipeline:**
* Link cellranger count/aggr output to analysis
* Create demultiplex file to add custom sample groups
* Load R packages
* Create analysis folders
* Load analysis parameters (from default or overwrite from command line)
* Load cellranger data into R/Seurat
* Label cells based on their cell cycle stated using Seurat based method
* QC and filter cells/genes
* *If combining multiple experiments:* Align experiments using canonical correlation analysis (CCA)
* Perform principle component analysis (PCA) using most highly variable genes (HVG) for downstream clustering etc
* Perform initial "over" clustering
* Identify "highly stressed" cells using custom PCA based analysis, remove stressed clusters/cells, and re-cluster
* Correlate cluster gene expression using Quantitative Set Analysis for Gene Expression (QuSAGE) on lineage genesets for identification (epithelia, and stroma)
* Subset epithelia from stroma for additional analaysis
* Re-cluster cell types separately
* Correlate cluster gene expression using QuSAGE on epithelial subtype genesets for identification (basal, luminal and "other")
* Correlate cluster gene expression using QuSAGE on stromal subtype genesets for identification (fibroblasts, smooth muscle, endothelia and leukocyte)
* *Optional:* Correlate cluster gene expression using QuSAGE on additional genesets for analysis
* Merge epithelial and stromal cells
* Identify neuroendocrine cells from epithelial cells using custom PCA based analysis
* Tabulate population cell numbers
* Generate differentially expressed genelists (DEGs) of populations
* **Genesets**:
* "regev\_lab\_cell\_cycle\_genes.txt" G2M and S phase genes from [*Genome Res. 2015 Dec; 25(12): 1860–1872*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665007/)
* "DEG\_C2.CGP.M10970.txt" MSigDB C2 Chemical and Genetic Perturbations M10970 [**CHUANG\_OXIDATIVE\_STRESS\_RESPONSE\_UP**](http://software.broadinstitute.org/gsea/msigdb/cards/CHUANG_OXIDATIVE_STRESS_RESPONSE_UP.html)
* "DEG\_Epi_2FC.txt" DWS generated DEGs of Epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_FMSt_2FC.txt" DWS generated DEGs of Fibromuscular Stroma from FACS population (bulk) RNA-sequencing
* "DEG\_BE_2FC.txt" DWS generated DEGs of Basal Epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_LE_2FC.txt" DWS generated DEGs of Luminal Epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_OE_2FC.txt" DWS generated DEGs of "Other" Epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_C5.BP.M11704.txt" MSigDB C5 GO Biological Processes M11704 [**GO\_ENDOTHELIAL\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_ENDOTHELIAL_CELL_DIFFERENTIATION.html)
* "DEG\_C5.BP.M10794.txt" MSigDB C5 GO Biological Processes M10794 [**GO\_SMOOTH\_MUSCLE\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_SMOOTH_MUSCLE_CELL_DIFFERENTIATION.html)
* "DEG\_C5.BP.M13024.txt" MSigDB C5 GO Biological Processes M13024 [**GO\_REGULATION\_OF\_FIBROBLAST\_PROLIFERATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_REGULATION_OF_FIBROBLAST_PROLIFERATION.html)
* "EurUrol.2005.NE.txt" Neuroendocrine markers from Table 1 of [*Eur Urol. 2005 Feb;47(2):147-55*](https://www.ncbi.nlm.nih.gov/pubmed/15661408)
* "Basal cells-signature-genes.csv" scRNA-Sequencing Lung Map generated top 20 DEGs for human lung Basal Cells
* "Normal AT2 cells-signature-genes.csv" scRNA-Sequecing Lung Map generated top 20 DEGs for human lung Alveolar Type 2 Cells
* "Club\_Goblet cells-signature-genes.csv" scRNA-Sequencing Lung Map generated top 20 DEGs for human lung Club/Goblet Cells
* "journal.pcbi.1004575.s026.XLSX" scRNA-Sequencing Lung Map generated DEGs for E16.5 mouse lung stromal subtypes (genes converted to human othologs with Ensembl):
* Proliferative Fibroblasts
* Myofibroblast/Smooth Muscle-like Cells
* Pericytes, Matrix Fibroblasts
* Endothelial Cells
* Myeloid/Immune Cells
* "c2.all.v6.1.symbols.gmt" MSigDB C2 Curated Gene Sets [**MSigDB C2**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2)
* "c2.cp.kegg.v6.1.symbols" MSigDB C2 KEGG Gene Subsets [**KEGG**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:KEGG)
* "c5.all.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Gene Sets [**MSigDB C5**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C5)
* "c5.bp.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Biological Processes Gene Subsets [**GO BP**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=BP)
* ["DWS.scStress.txt"](genesets/DWS.scStress.txt) DWS generated DEGs of Stressed Cells from scRNA-Sequencing
* ["DWS.scNE.txt"](genesets/DWS.scNE.txt) DWS generated DEGs of Neuroendocrine Cells from scRNA-Sequencing
* **Genesets:**
* Cell cycle:
* "regev\_lab\_cell\_cycle\_genes.txt" G2M and S phase genes from [*Genome Res. 2015 Dec; 25(12): 1860–1872*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665007/)
* Stress:
* "DEG\_C2.CGP.M10970.txt" MSigDB C2 Chemical and Genetic Perturbations M10970 [**CHUANG\_OXIDATIVE\_STRESS\_RESPONSE\_UP**](http://software.broadinstitute.org/gsea/msigdb/cards/CHUANG_OXIDATIVE_STRESS_RESPONSE_UP.html)
* "genes.deg.Stress.csv" DWS generated DEGs of stressed cells from scRNA-Seq of patient D17 only
* "" DWS generated DEGs of stressed cells from scRNA-Seq of an aggregation of patient D17 and D27
* Lineage:
* "DEG\_Epi_5FC.txt" DWS generated DEGs of epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_FMSt_5FC.txt" DWS generated DEGs of fibromuscular stroma from FACS population (bulk) RNA-sequencing
* "genes.deg.Epi.csv" DWS generated DEGs of epithelial cells from scRNA-Seq of patient D17 only
* "genes.deg.St.csv" DWS generated DEGs of stromal cells from scRNA-Seq of patient D17 only
* "" DWS generated DEGs of epithelial cells from from scRNA-Seq of an aggregation of patient D17 and D27
* "" DWS generated DEGs of stromal cells from scRNA-Seq of an aggregation of patient D17 and D27
* Epithelia:
* "DEG\_BE_5FC.txt" DWS generated DEGs of basal epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_LE_5FC.txt" DWS generated DEGs of luminal epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_OE_5FC.txt" DWS generated DEGs of "other" epithelia from FACS population (bulk) RNA-sequencing
* "genes.deg.BE.csv" DWS generated DEGs of basal epithelial cells from scRNA-Seq of patient D17 only
* "genes.deg.LE.csv" DWS generated DEGs of luminal epithelial cells from scRNA-Seq of patient D17 only
* "genes.deg.OE1.csv" DWS generated DEGs of "other" epithelia cluster 1 cells from scRNA-Seq of patient D17 only
* "genes.deg.OE2.csv" DWS generated DEGs of "other" epithelia cluster 2 cells from scRNA-Seq of patient D17 only
* "" DWS generated DEGs of basal epithelial cells from scRNA-Seq of an aggregation of patient D17 and D27
* "" DWS generated DEGs of luminal epithelial cells from scRNA-Seq of an aggregation of patient D17 and D27
* "" DWS generated DEGs of "other" epithelia cluster 1 cells from scRNA-Seq of an aggregation of patient D17 and D27
* "" DWS generated DEGs of "other" epithelia cluster 2 cells from scRNA-Seq of an aggregation of patient D17 and D27
* Stroma:
* "DEG\_C5.BP.M11704.txt" MSigDB C5 GO Biological Processes M11704 [**GO\_ENDOTHELIAL\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_ENDOTHELIAL_CELL_DIFFERENTIATION.html)
* "DEG\_C5.BP.M10794.txt" MSigDB C5 GO Biological Processes M10794 [**GO\_SMOOTH\_MUSCLE\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_SMOOTH_MUSCLE_CELL_DIFFERENTIATION.html)
* "DEG\_C5.BP.M13024.txt" MSigDB C5 GO Biological Processes M13024 [**GO\_REGULATION\_OF\_FIBROBLAST\_PROLIFERATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_REGULATION_OF_FIBROBLAST_PROLIFERATION.html)
* "DEG\_C5.BP.M10124.txt" MSigDB C5 GO Biological Processes M10124 [**GO\_LEUKOCYTE\_ACTIVATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_LEUKOCYTE_ACTIVATION.html)
* Neuroendocrine:
* "EurUrol.2005.NE.txt" Neuroendocrine markers from Table 1 of [*Eur Urol. 2005 Feb;47(2):147-55*](https://www.ncbi.nlm.nih.gov/pubmed/15661408)
* "genes.deg.NE.csv" DWS generated DEGs of neuroendocrine epithelial cells from scRNA-Seq of patient D17 only
* "" DWS generated DEGs of neuroendocrine epithelial cells from scRNA-Seq of patient D17 only
* Lung epithelia from [Lung Gene Expression Analysis (LGEA) Web Portal](https://research.cchmc.org/pbge/lunggens/mainportal.html):
* "Basal cells-signature-genes.csv" scRNA-Seq LGEA generated top 20 DEGs for [human lung Basal Cells] (https://research.cchmc.org/pbge/lunggens/lungDisease/celltype_IPF.html?cid=3)
* "Normal AT2 cells-signature-genes.csv" scRNA-Sequecing LGEA generated top 20 DEGs for [human lung Alveolar Type 2 Cells](https://research.cchmc.org/pbge/lunggens/lungDisease/celltype_IPF.html?cid=1)
* "Club\_Goblet cells-signature-genes.csv" scRNA-Sequencing LGEA generated top 20 DEGs for [human lung Club/Goblet Cells](https://research.cchmc.org/pbge/lunggens/lungDisease/celltype_IPF.html?cid=4)
* General MSigDb
* "c2.all.v6.1.symbols.gmt" MSigDB C2 Curated Gene Sets [**MSigDB C2**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2)
* "c2.cp.kegg.v6.1.symbols" MSigDB C2 KEGG Gene Subsets [**KEGG**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:KEGG)
* "c5.all.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Gene Sets [**MSigDB C5**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C5)
* "c5.bp.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Biological Processes Gene Subsets [**GO BP**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=BP)
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment