632f0f38 · 632f0f38 · 632f0f38 · 632f0f38 · 632f0f38 · 632f0f38
--- a/ES_D7_H3K27ac_filtered_peaks.bed
+++ b/ES_D7_H3K27ac_filtered_peaks.bed
--- a/ES_D7_Histone_enhancers.bed
+++ b/ES_D7_Histone_enhancers.bed
--- a/ES_D7_Histone_enhancers_1kb.bed
+++ b/ES_D7_Histone_enhancers_1kb.bed
--- a/ES_D7_gro-seq_enhancers.bed
+++ b/ES_D7_gro-seq_enhancers.bed
--- a/ES_D7_gro-seq_enhancers_1kb.bed
+++ b/ES_D7_gro-seq_enhancers_1kb.bed
--- a/Gro_percentages.png
+++ b/Gro_percentages.png
--- a/H3K27ac_all_filtered_peaks.bed
+++ b/H3K27ac_all_filtered_peaks.bed
--- a/H3K27ac_distribution.png
+++ b/H3K27ac_distribution.png
--- a/H3K27ac_filtered_peaks.bed
+++ b/H3K27ac_filtered_peaks.bed
--- a/H3K4me1_all_filtered_peaks.bed
+++ b/H3K4me1_all_filtered_peaks.bed
--- a/H3K4me1_distribution.png
+++ b/H3K4me1_distribution.png
--- a/H3K4me3_3kb_flanking.bed
+++ b/H3K4me3_3kb_flanking.bed
--- a/H3K4me3_all_filtered_peaks.bed
+++ b/H3K4me3_all_filtered_peaks.bed
--- a/H3K4me3_all_filtered_peaks.tsv
+++ b/H3K4me3_all_filtered_peaks.tsv
--- a/H3K4me3_distribution.png
+++ b/H3K4me3_distribution.png
--- a/H3K4me3_filtered_peaks.bed
+++ b/H3K4me3_filtered_peaks.bed
--- a/H3K4me3_filtered_peaks.tsv
+++ b/H3K4me3_filtered_peaks.tsv
--- a/Histone_h3k27ac_filtered_peaks.tsv
+++ b/Histone_h3k27ac_filtered_peaks.tsv
--- a/Histone_percentages.png
+++ b/Histone_percentages.png
--- a/README.md
+++ b/README.md
+# Total Functional Score of Enhancer Elements Identifies Lineage-Specific Enhancers that Drive Differentiation of Pancreatic Cells

+This directory contains the scripts for identification of TFs maintaining multipotency of endodermal stem cells during differentiation into pancreatic lineages
+,using TFSEE.

-Random Forest of Enhancer Prediction
-http://enhancer.ucsd.edu/renlab/RFECS_enhancer_prediction/

+## Dependencies for TFSEE

+This code requires python 2.7+ to run.

+The pythons scripts require the following python packages:
+- biopython-1.70
+- pandas-0.20.1
+- numpy-1.12.1
+- scikit-learn-0.18.1
+- matplotlib-2.0.2
+- seaborn-0.8.1
+- scipy-0.19.0

-# Methods:

-1. Call Peaks H3K27ac, H3K4me1 using threshold of 1e-2
-2. Call Peaks H3K4me3 using threshold of 1e-5
-3. Call Transcript units per cell using GRO-seq and GRO-HMM
-4. Merge H3K27ac, H3K4me1, H3K4me3 Peaks within 500bp and make universe filter for at least 1 RPKM in a cell
-5. Merge GRO-seq transcripts and filter for
-    a. +/- 3kb from TSS of protein coding genes Gencode and H3K4me3 pekas
-    b. Merge overlaping transcripts
-    c. Filter for <=9kb Short-Short-Paired and Short-Unpaired
-    d. make universe filter for at least 1 > RPKM SSP and > 2 RPM SUP in a cell
-6. Make Enhancer regions for motif search
-    a. Short-Short-Paired +/- 500 bp center-overlap
-    b. Short-Unpaired +/- bp TSS of transcript
-    c. Histone data +/- bp 500 bp center of mark
-7. De novo motif analyses were performed using the command-line version of MEME (Bailey et al., 2009). The
-following parameters were used for motif prediction: (1) zero or one occurrence per sequence (-
-mod zoops); (2) number of motifs (-nmotifs 15); (3) minimum, maximum width of the motif (-
-minw 8, -maxw 15); and (4) search for motif in given strand and reverse complement strand (-
-revcomp). The predicted motifs from MEME were matched to known motifs using TOMTOM
-(Gupta et al., 2007).
+Install the dependencies.
+
+```sh
+pip install -r requirements.txt
+```
+
+
+#### Pipeline Description
+
+##### Pre-processing Steps
+
+1. De novo identification of enhancers using GRO-seq and [groHMM](http://www.bioconductor.org/packages/release/bioc/html/groHMM.html)
+or ChIP-seq (H3K4me1 and H3K27ac)
+
+2. Normalize Enhancer Expression using GRO-seq: For each cell line, quantify the GRO-seq reads, RPKM, that fall within a 1 kb region around the center of the overlap for paired enhancer transcripts or from the 5′ end of unpaired enhancer transcripts
+
+3. Normalize Enhancer Expression using ChIP-seq: For each cell line, quantify the ChIP-seq reads, RPKM, from H3K4me1, H3K27ac, and input for each enhancer within the universe of GRO-seq-defined enhancers
+
+4. Motif Predictions: De novo motif analyses on a 1 kb region of expressed enhancers for each cell line using [MEME](http://meme-suite.org/) and  matched to known motifs using TOMTOM and  [JASPAR](http://jaspar.genereg.net/)
+
+5. Normalize Transcription Factor Expression using RNA-seq:  For each cell line, quantify the RNA-seq reads, FPKM, for each  transcription factor that is a binding target for the motifs
+* RNA-seq analysis: RNA-seq_star.sh
+* FPKM processing RNA-seq: rnaseq_processing.sh
+
+6. Calculate TFSEE score to determining cell-type specific enhancer activity, generating:
+* unsupervised hierarchical clustering
+* tSNE representation
+* boxplot representations
+* rank order TF plots
+
+#### Data Source
+
+All dta available from NCBI’s Gene Expression Omnibus [@url:https://www.ncbi.nlm.nih.gov/geo/] or EMBL-EBI’s ArrayExpress [@url:http://www.ebi.ac.uk/arrayexpress/] repositories using the accession numbers listed:
+
+| Assay | Accessions |
+| :--------------------: | :------------------------------------------------------------: |
+| GRO-seq | GSM1316306, GSM1316313, GSM1316320, GSM1316327, GSM1316334 |
+| H3K4me3 ChIP-seq | ERR208008, ERR208014, ERR207998, ERR20798, ERR207999 |
+| H3K4me1 ChIP-seq | GSM1316302, GSM1316303, GSM1316309, GSM1316316, GSM1316317, GSM1316310, GSM1316323, GSM1316324, GSM1316330, GSM1316331 |
+| H3K27ac ChIP-seq | GSM1316300, GSM1316301, GSM1316307, GSM1316308, GSM1316314, GSM1316315, GSM1316321, GSM1316322, GSM1316328, GSM1316329 |
+| Input ChIP-seq | ERR208001, ERR208012, ERR207984, ERR208011, ERR207986, GSM1316304, GSM1316305, GSM1316311, GSM1316312, GSM1316318, GSM1316319, GSM1316325, GSM1316326, GSM1316332, GSM1316333 |
+| RNA-seq | ERR266333, ERR266335, ERR266337, ERR266338, ERR266341, ERR266342, ERR266344, ERR266346, ERR266349, ERR266351 |
+
+
+### Main Scripts
+
+- Compute TFSEE to identify cognate transcription factors are under 'analysis'
+ * Applicable to either enhancer method:
+    * Get H3K4me3 peaks: h3k4me3_processing.sh
+    * Get H3K27ac peaks: h3k27ac_processing.sh
+    * Get H3K4me1 peaks: h3k4me1_processing.sh
+    * Exclude regions based on H3K4me3 and promoters: excluded_regions_processing.sh
+    * RNA-seq analysis: RNA-seq_star.sh
+    * FPKM processing RNA-seq: rnaseq_processing.sh
+ * TFSEE using GRO-seq:
+      * Tune GroHMM: tune-hmm.sh
+      * Call Transcripts: call-transcripts.sh
+      * Make universe of Enhancers: groseq_processing.sh
+      * GRO_seq_TFSEE:
+         * TFSEE pre-processing: tfsee_processing.sh
+         * TFSEE score integration: matrix_analysis.py
+         * Rank order TF's clusters: rank_order.py
+ * TFSEE using histone modifications ChIP-seq:
+      * Make universe of Enhancers: histone_centered_processing.sh
+      * Histone_TFSEE:
+         * TFSEE pre-processing: tfsee_processing.sh
+         * TFSEE score integration: matrix_analysis.py
+         * Rank order TF's clusters: rank_order.py
No results found