Skip to content
Snippets Groups Projects
Commit 641f5c1f authored by Gervaise Henry's avatar Gervaise Henry :cowboy:
Browse files

Update readme.md

parent 7008fdcf
Branches
Tags
No related merge requests found
...@@ -14,52 +14,88 @@ Determining cellular heterogeneity in the human prostate with single-cell RNA se ...@@ -14,52 +14,88 @@ Determining cellular heterogeneity in the human prostate with single-cell RNA se
Data Analysis Data Analysis
------------- -------------
* **Requirements**: * **Requirements:**
* 10x CellRanger analyzed data: * 10x CellRanger analyzed data:
* "filtered\_gene\_bc\_matrice_mex" folder * "filtered\_gene\_bc\_matrice_mex" folder
* csv file cellranger aggr used for aggregation * csv file cellranger aggr used for aggregation
* demultiplex csv file to define subsets of samples * demultiplex csv file to define subsets of samples
* R (v3.4.1) packages:
* methods (v3.4.1)
* optparse (v1.4.4)
* Seurat (v2.3.1)
* readr (v1.1.1)
* fBasics (v3042.89)
* pastecs (v1.3.21)
* qusage (v2.10.0)
* RColorBrewer (v1.1-2)
* *and all dependencies*
* **Pipeline** (r.scripts): * **Pipeline:**
* sc_Demultiplex (import 10x Cell Ranger data into R and subset samples based on user input) * Link cellranger count/aggr output to analysis
* sc_D-SampleReorder (reorder sample names) * Create demultiplex file to add custom sample groups
* sc_SeuratScore.CellCycle (identify cell cycle state) * Load R packages
* sc_QC (filter cells, scale and remove variation associated to: UMI; % mitochondrial genes; S; G2M phase score, identify most variable genes and run PCA on them * Create analysis folders
* sc_Cluster (perform tSNE and cluster using graph*based approach) * Load analysis parameters (from default or overwrite from command line)
* sc_PC.Score.Stress (PCA analysis of stress gene signature, projecction to PC1 used as "Stress Score", stressed cells and clusters removed cells re*clustered) * Load cellranger data into R/Seurat
* sc_QuSAGE.Lineage (clusters correlated to prostate population RNA-Seq DEGs of Epithelia and Fibromuscular Stroma to assign those identities to those clusters) * Label cells based on their cell cycle stated using Seurat based method
* sc_LineageSubClust (separate identified Epithelia, and Stroma + Lineage Unknown cells, and recluster them independently) * QC and filter cells/genes
* sc_QuSAGE.EpiSubClust (Epithelial clusters correlated to DWS generated prostate population RNA-Seq DEGs of Basal, Luminal, and "Other" Epithelia, Lung Map generated Epithelial, C2 KEGG, and C5 GO BP genesets, to assign those identities to those clusters) * *If combining multiple experiments:* Align experiments using canonical correlation analysis (CCA)
* sc_QuSAGE.StSubClust (Stromal and Lineage Unknown clusters correlated to external DEGs of Endothelial Cells, Smooth Muscle Cells, and Fibroblasts, Lung Map generated Epithelial, C2 KEGG, and C5 GO BP genesets, to assign those identities to those clusters) * Perform principle component analysis (PCA) using most highly variable genes (HVG) for downstream clustering etc
* sc_MergeSubClust (merge identities of Eithelial and non-Epithelial clusters from QuSAGE) * Perform initial "over" clustering
* sc_PC.Score.NE (PCA analysis of Neurocrine Epithelia protein marker signature, projecction to PC1 used as "NE Score", highest scoring cells identified as Neuroendocrine Cells) * Identify "highly stressed" cells using custom PCA based analysis, remove stressed clusters/cells, and re-cluster
* sc_DEG (generate DEG lists between important cell types and predict surface and nuclear markers from them) * Correlate cluster gene expression using Quantitative Set Analysis for Gene Expression (QuSAGE) on lineage genesets for identification (epithelia, and stroma)
* sc_Table (produce tables of population differences between samples) * Subset epithelia from stroma for additional analaysis
* Re-cluster cell types separately
* Correlate cluster gene expression using QuSAGE on epithelial subtype genesets for identification (basal, luminal and "other")
* Correlate cluster gene expression using QuSAGE on stromal subtype genesets for identification (fibroblasts, smooth muscle, endothelia and leukocyte)
* *Optional:* Correlate cluster gene expression using QuSAGE on additional genesets for analysis
* Merge epithelial and stromal cells
* Identify neuroendocrine cells from epithelial cells using custom PCA based analysis
* Tabulate population cell numbers
* Generate differentially expressed genelists (DEGs) of populations
* **Genesets**:
* "regev\_lab\_cell\_cycle\_genes.txt" G2M and S phase genes from [*Genome Res. 2015 Dec; 25(12): 1860–1872*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665007/) * **Genesets:**
* "DEG\_C2.CGP.M10970.txt" MSigDB C2 Chemical and Genetic Perturbations M10970 [**CHUANG\_OXIDATIVE\_STRESS\_RESPONSE\_UP**](http://software.broadinstitute.org/gsea/msigdb/cards/CHUANG_OXIDATIVE_STRESS_RESPONSE_UP.html) * Cell cycle:
* "DEG\_Epi_2FC.txt" DWS generated DEGs of Epithelia from FACS population (bulk) RNA-sequencing * "regev\_lab\_cell\_cycle\_genes.txt" G2M and S phase genes from [*Genome Res. 2015 Dec; 25(12): 1860–1872*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4665007/)
* "DEG\_FMSt_2FC.txt" DWS generated DEGs of Fibromuscular Stroma from FACS population (bulk) RNA-sequencing * Stress:
* "DEG\_BE_2FC.txt" DWS generated DEGs of Basal Epithelia from FACS population (bulk) RNA-sequencing * "DEG\_C2.CGP.M10970.txt" MSigDB C2 Chemical and Genetic Perturbations M10970 [**CHUANG\_OXIDATIVE\_STRESS\_RESPONSE\_UP**](http://software.broadinstitute.org/gsea/msigdb/cards/CHUANG_OXIDATIVE_STRESS_RESPONSE_UP.html)
* "DEG\_LE_2FC.txt" DWS generated DEGs of Luminal Epithelia from FACS population (bulk) RNA-sequencing * "genes.deg.Stress.csv" DWS generated DEGs of stressed cells from scRNA-Seq of patient D17 only
* "DEG\_OE_2FC.txt" DWS generated DEGs of "Other" Epithelia from FACS population (bulk) RNA-sequencing * "" DWS generated DEGs of stressed cells from scRNA-Seq of an aggregation of patient D17 and D27
* "DEG\_C5.BP.M11704.txt" MSigDB C5 GO Biological Processes M11704 [**GO\_ENDOTHELIAL\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_ENDOTHELIAL_CELL_DIFFERENTIATION.html) * Lineage:
* "DEG\_C5.BP.M10794.txt" MSigDB C5 GO Biological Processes M10794 [**GO\_SMOOTH\_MUSCLE\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_SMOOTH_MUSCLE_CELL_DIFFERENTIATION.html) * "DEG\_Epi_5FC.txt" DWS generated DEGs of epithelia from FACS population (bulk) RNA-sequencing
* "DEG\_C5.BP.M13024.txt" MSigDB C5 GO Biological Processes M13024 [**GO\_REGULATION\_OF\_FIBROBLAST\_PROLIFERATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_REGULATION_OF_FIBROBLAST_PROLIFERATION.html) * "DEG\_FMSt_5FC.txt" DWS generated DEGs of fibromuscular stroma from FACS population (bulk) RNA-sequencing
* "EurUrol.2005.NE.txt" Neuroendocrine markers from Table 1 of [*Eur Urol. 2005 Feb;47(2):147-55*](https://www.ncbi.nlm.nih.gov/pubmed/15661408) * "genes.deg.Epi.csv" DWS generated DEGs of epithelial cells from scRNA-Seq of patient D17 only
* "Basal cells-signature-genes.csv" scRNA-Sequencing Lung Map generated top 20 DEGs for human lung Basal Cells * "genes.deg.St.csv" DWS generated DEGs of stromal cells from scRNA-Seq of patient D17 only
* "Normal AT2 cells-signature-genes.csv" scRNA-Sequecing Lung Map generated top 20 DEGs for human lung Alveolar Type 2 Cells * "" DWS generated DEGs of epithelial cells from from scRNA-Seq of an aggregation of patient D17 and D27
* "Club\_Goblet cells-signature-genes.csv" scRNA-Sequencing Lung Map generated top 20 DEGs for human lung Club/Goblet Cells * "" DWS generated DEGs of stromal cells from scRNA-Seq of an aggregation of patient D17 and D27
* "journal.pcbi.1004575.s026.XLSX" scRNA-Sequencing Lung Map generated DEGs for E16.5 mouse lung stromal subtypes (genes converted to human othologs with Ensembl): * Epithelia:
* Proliferative Fibroblasts * "DEG\_BE_5FC.txt" DWS generated DEGs of basal epithelia from FACS population (bulk) RNA-sequencing
* Myofibroblast/Smooth Muscle-like Cells * "DEG\_LE_5FC.txt" DWS generated DEGs of luminal epithelia from FACS population (bulk) RNA-sequencing
* Pericytes, Matrix Fibroblasts * "DEG\_OE_5FC.txt" DWS generated DEGs of "other" epithelia from FACS population (bulk) RNA-sequencing
* Endothelial Cells * "genes.deg.BE.csv" DWS generated DEGs of basal epithelial cells from scRNA-Seq of patient D17 only
* Myeloid/Immune Cells * "genes.deg.LE.csv" DWS generated DEGs of luminal epithelial cells from scRNA-Seq of patient D17 only
* "c2.all.v6.1.symbols.gmt" MSigDB C2 Curated Gene Sets [**MSigDB C2**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2) * "genes.deg.OE1.csv" DWS generated DEGs of "other" epithelia cluster 1 cells from scRNA-Seq of patient D17 only
* "c2.cp.kegg.v6.1.symbols" MSigDB C2 KEGG Gene Subsets [**KEGG**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:KEGG) * "genes.deg.OE2.csv" DWS generated DEGs of "other" epithelia cluster 2 cells from scRNA-Seq of patient D17 only
* "c5.all.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Gene Sets [**MSigDB C5**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C5) * "" DWS generated DEGs of basal epithelial cells from scRNA-Seq of an aggregation of patient D17 and D27
* "c5.bp.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Biological Processes Gene Subsets [**GO BP**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=BP) * "" DWS generated DEGs of luminal epithelial cells from scRNA-Seq of an aggregation of patient D17 and D27
* ["DWS.scStress.txt"](genesets/DWS.scStress.txt) DWS generated DEGs of Stressed Cells from scRNA-Sequencing * "" DWS generated DEGs of "other" epithelia cluster 1 cells from scRNA-Seq of an aggregation of patient D17 and D27
* ["DWS.scNE.txt"](genesets/DWS.scNE.txt) DWS generated DEGs of Neuroendocrine Cells from scRNA-Sequencing * "" DWS generated DEGs of "other" epithelia cluster 2 cells from scRNA-Seq of an aggregation of patient D17 and D27
* Stroma:
* "DEG\_C5.BP.M11704.txt" MSigDB C5 GO Biological Processes M11704 [**GO\_ENDOTHELIAL\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_ENDOTHELIAL_CELL_DIFFERENTIATION.html)
* "DEG\_C5.BP.M10794.txt" MSigDB C5 GO Biological Processes M10794 [**GO\_SMOOTH\_MUSCLE\_CELL\_DIFFERENTIATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_SMOOTH_MUSCLE_CELL_DIFFERENTIATION.html)
* "DEG\_C5.BP.M13024.txt" MSigDB C5 GO Biological Processes M13024 [**GO\_REGULATION\_OF\_FIBROBLAST\_PROLIFERATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_REGULATION_OF_FIBROBLAST_PROLIFERATION.html)
* "DEG\_C5.BP.M10124.txt" MSigDB C5 GO Biological Processes M10124 [**GO\_LEUKOCYTE\_ACTIVATION**](http://software.broadinstitute.org/gsea/msigdb/cards/GO_LEUKOCYTE_ACTIVATION.html)
* Neuroendocrine:
* "EurUrol.2005.NE.txt" Neuroendocrine markers from Table 1 of [*Eur Urol. 2005 Feb;47(2):147-55*](https://www.ncbi.nlm.nih.gov/pubmed/15661408)
* "genes.deg.NE.csv" DWS generated DEGs of neuroendocrine epithelial cells from scRNA-Seq of patient D17 only
* "" DWS generated DEGs of neuroendocrine epithelial cells from scRNA-Seq of patient D17 only
* Lung epithelia from [Lung Gene Expression Analysis (LGEA) Web Portal](https://research.cchmc.org/pbge/lunggens/mainportal.html):
* "Basal cells-signature-genes.csv" scRNA-Seq LGEA generated top 20 DEGs for [human lung Basal Cells] (https://research.cchmc.org/pbge/lunggens/lungDisease/celltype_IPF.html?cid=3)
* "Normal AT2 cells-signature-genes.csv" scRNA-Sequecing LGEA generated top 20 DEGs for [human lung Alveolar Type 2 Cells](https://research.cchmc.org/pbge/lunggens/lungDisease/celltype_IPF.html?cid=1)
* "Club\_Goblet cells-signature-genes.csv" scRNA-Sequencing LGEA generated top 20 DEGs for [human lung Club/Goblet Cells](https://research.cchmc.org/pbge/lunggens/lungDisease/celltype_IPF.html?cid=4)
* General MSigDb
* "c2.all.v6.1.symbols.gmt" MSigDB C2 Curated Gene Sets [**MSigDB C2**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2)
* "c2.cp.kegg.v6.1.symbols" MSigDB C2 KEGG Gene Subsets [**KEGG**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:KEGG)
* "c5.all.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Gene Sets [**MSigDB C5**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C5)
* "c5.bp.v6.1.symbols.gmt" MSigDB C5 Gene Ontology Biological Processes Gene Subsets [**GO BP**](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=BP)
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment