GCRB

TFSEE

Repository



Total Functional Score of Enhancer Elements Identifies Lineage-Specific Enhancers that Drive Differentiation of Pancreatic Cells
This directory contains the scripts for identification of TFs maintaining multipotency of endodermal stem cells during differentiation into pancreatic lineages
,using TFSEE.

Dependencies for TFSEE
This code requires python 2.7+ to run.
The pythons scripts require the following python packages:

biopython-1.70
pandas-0.20.1
numpy-1.12.1
scikit-learn-0.18.1
matplotlib-2.0.2
seaborn-0.8.1
scipy-0.19.0

Install the dependencies.

pip install -r requirements.txt


Pipeline Description

Pre-processing Steps


De novo identification of enhancers using GRO-seq and groHMM
or ChIP-seq (H3K4me1 and H3K27ac)


Normalize Enhancer Expression using GRO-seq: For each cell line, quantify the GRO-seq reads, RPKM, that fall within a 1 kb region around the center of the overlap for paired enhancer transcripts or from the 5′ end of unpaired enhancer transcripts


Normalize Enhancer Expression using ChIP-seq: For each cell line, quantify the ChIP-seq reads, RPKM, from H3K4me1, H3K27ac, and input for each enhancer within the universe of GRO-seq-defined enhancers


Motif Predictions: De novo motif analyses on a 1 kb region of expressed enhancers for each cell line using MEME and  matched to known motifs using TOMTOM and  JASPAR


Normalize Transcription Factor Expression using RNA-seq:  For each cell line, quantify the RNA-seq reads, FPKM, for each  transcription factor that is a binding target for the motifs


RNA-seq analysis: RNA-seq_star.sh
FPKM processing RNA-seq: rnaseq_processing.sh


Calculate TFSEE score to determining cell-type specific enhancer activity, generating:


unsupervised hierarchical clustering
tSNE representation
boxplot representations
rank order TF plots


Data Source
All dta available from NCBI’s Gene Expression Omnibus [@url:https://www.ncbi.nlm.nih.gov/geo/] or EMBL-EBI’s ArrayExpress [@url:http://www.ebi.ac.uk/arrayexpress/] repositories using the accession numbers listed:


Assay
Accessions


GRO-seq
GSM1316306, GSM1316313, GSM1316320, GSM1316327, GSM1316334


H3K4me3 ChIP-seq
ERR208008, ERR208014, ERR207998, ERR20798, ERR207999


H3K4me1 ChIP-seq
GSM1316302, GSM1316303, GSM1316309, GSM1316316, GSM1316317, GSM1316310, GSM1316323, GSM1316324, GSM1316330, GSM1316331


H3K27ac ChIP-seq
GSM1316300, GSM1316301, GSM1316307, GSM1316308, GSM1316314, GSM1316315, GSM1316321, GSM1316322, GSM1316328, GSM1316329


Input ChIP-seq
ERR208001, ERR208012, ERR207984, ERR208011, ERR207986, GSM1316304, GSM1316305, GSM1316311, GSM1316312, GSM1316318, GSM1316319, GSM1316325, GSM1316326, GSM1316332, GSM1316333


RNA-seq
ERR266333, ERR266335, ERR266337, ERR266338, ERR266341, ERR266342, ERR266344, ERR266346, ERR266349, ERR266351


Main Scripts

Compute TFSEE to identify cognate transcription factors are under 'analysis'


Applicable to either enhancer method:

Get H3K4me3 peaks: h3k4me3_processing.sh
Get H3K27ac peaks: h3k27ac_processing.sh
Get H3K4me1 peaks: h3k4me1_processing.sh
Exclude regions based on H3K4me3 and promoters: excluded_regions_processing.sh
RNA-seq analysis: RNA-seq_star.sh
FPKM processing RNA-seq: rnaseq_processing.sh


TFSEE using GRO-seq:

Tune GroHMM: tune-hmm.sh
Call Transcripts: call-transcripts.sh
Make universe of Enhancers: groseq_processing.sh
GRO_seq_TFSEE:

TFSEE pre-processing: tfsee_processing.sh
TFSEE score integration: matrix_analysis.py
Rank order TF's clusters: rank_order.py


TFSEE using histone modifications ChIP-seq:

Make universe of Enhancers: histone_centered_processing.sh
Histone_TFSEE:

TFSEE pre-processing: tfsee_processing.sh
TFSEE score integration: matrix_analysis.py
Rank order TF's clusters: rank_order.py