Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • venkat.malladi/tfsee
  • gcrb/tfsee
Show changes
Showing
with 81 additions and 800696 deletions
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Gro_percentages.png

15.4 KiB

This diff is collapsed.
H3K27ac_distribution.png

26.8 KiB

This diff is collapsed.
This diff is collapsed.
H3K4me1_distribution.png

26.9 KiB

This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
H3K4me3_distribution.png

24.5 KiB

This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Histone_percentages.png

14.3 KiB

# Total Functional Score of Enhancer Elements Identifies Lineage-Specific Enhancers that Drive Differentiation of Pancreatic Cells
This directory contains the scripts for identification of TFs maintaining multipotency of endodermal stem cells during differentiation into pancreatic lineages
,using TFSEE.
Random Forest of Enhancer Prediction
http://enhancer.ucsd.edu/renlab/RFECS_enhancer_prediction/
## Dependencies for TFSEE
This code requires python 2.7+ to run.
The pythons scripts require the following python packages:
- biopython-1.70
- pandas-0.20.1
- numpy-1.12.1
- scikit-learn-0.18.1
- matplotlib-2.0.2
- seaborn-0.8.1
- scipy-0.19.0
# Methods:
1. Call Peaks H3K27ac, H3K4me1 using threshold of 1e-2
2. Call Peaks H3K4me3 using threshold of 1e-5
3. Call Transcript units per cell using GRO-seq and GRO-HMM
4. Merge H3K27ac, H3K4me1, H3K4me3 Peaks within 500bp and make universe filter for at least 1 RPKM in a cell
5. Merge GRO-seq transcripts and filter for
a. +/- 3kb from TSS of protein coding genes Gencode and H3K4me3 pekas
b. Merge overlaping transcripts
c. Filter for <=9kb Short-Short-Paired and Short-Unpaired
d. make universe filter for at least 1 > RPKM SSP and > 2 RPM SUP in a cell
6. Make Enhancer regions for motif search
a. Short-Short-Paired +/- 500 bp center-overlap
b. Short-Unpaired +/- bp TSS of transcript
c. Histone data +/- bp 500 bp center of mark
7. De novo motif analyses were performed using the command-line version of MEME (Bailey et al., 2009). The
following parameters were used for motif prediction: (1) zero or one occurrence per sequence (-
mod zoops); (2) number of motifs (-nmotifs 15); (3) minimum, maximum width of the motif (-
minw 8, -maxw 15); and (4) search for motif in given strand and reverse complement strand (-
revcomp). The predicted motifs from MEME were matched to known motifs using TOMTOM
(Gupta et al., 2007).
Install the dependencies.
```sh
pip install -r requirements.txt
```
#### Pipeline Description
##### Pre-processing Steps
1. De novo identification of enhancers using GRO-seq and [groHMM](http://www.bioconductor.org/packages/release/bioc/html/groHMM.html)
or ChIP-seq (H3K4me1 and H3K27ac)
2. Normalize Enhancer Expression using GRO-seq: For each cell line, quantify the GRO-seq reads, RPKM, that fall within a 1 kb region around the center of the overlap for paired enhancer transcripts or from the 5′ end of unpaired enhancer transcripts
3. Normalize Enhancer Expression using ChIP-seq: For each cell line, quantify the ChIP-seq reads, RPKM, from H3K4me1, H3K27ac, and input for each enhancer within the universe of GRO-seq-defined enhancers
4. Motif Predictions: De novo motif analyses on a 1 kb region of expressed enhancers for each cell line using [MEME](http://meme-suite.org/) and matched to known motifs using TOMTOM and [JASPAR](http://jaspar.genereg.net/)
5. Normalize Transcription Factor Expression using RNA-seq: For each cell line, quantify the RNA-seq reads, FPKM, for each transcription factor that is a binding target for the motifs
* RNA-seq analysis: RNA-seq_star.sh
* FPKM processing RNA-seq: rnaseq_processing.sh
6. Calculate TFSEE score to determining cell-type specific enhancer activity, generating:
* unsupervised hierarchical clustering
* tSNE representation
* boxplot representations
* rank order TF plots
#### Data Source
All dta available from NCBI’s Gene Expression Omnibus [@url:https://www.ncbi.nlm.nih.gov/geo/] or EMBL-EBI’s ArrayExpress [@url:http://www.ebi.ac.uk/arrayexpress/] repositories using the accession numbers listed:
| Assay | Accessions |
| :--------------------: | :------------------------------------------------------------: |
| GRO-seq | GSM1316306, GSM1316313, GSM1316320, GSM1316327, GSM1316334 |
| H3K4me3 ChIP-seq | ERR208008, ERR208014, ERR207998, ERR20798, ERR207999 |
| H3K4me1 ChIP-seq | GSM1316302, GSM1316303, GSM1316309, GSM1316316, GSM1316317, GSM1316310, GSM1316323, GSM1316324, GSM1316330, GSM1316331 |
| H3K27ac ChIP-seq | GSM1316300, GSM1316301, GSM1316307, GSM1316308, GSM1316314, GSM1316315, GSM1316321, GSM1316322, GSM1316328, GSM1316329 |
| Input ChIP-seq | ERR208001, ERR208012, ERR207984, ERR208011, ERR207986, GSM1316304, GSM1316305, GSM1316311, GSM1316312, GSM1316318, GSM1316319, GSM1316325, GSM1316326, GSM1316332, GSM1316333 |
| RNA-seq | ERR266333, ERR266335, ERR266337, ERR266338, ERR266341, ERR266342, ERR266344, ERR266346, ERR266349, ERR266351 |
### Main Scripts
- Compute TFSEE to identify cognate transcription factors are under 'analysis'
* Applicable to either enhancer method:
* Get H3K4me3 peaks: h3k4me3_processing.sh
* Get H3K27ac peaks: h3k27ac_processing.sh
* Get H3K4me1 peaks: h3k4me1_processing.sh
* Exclude regions based on H3K4me3 and promoters: excluded_regions_processing.sh
* RNA-seq analysis: RNA-seq_star.sh
* FPKM processing RNA-seq: rnaseq_processing.sh
* TFSEE using GRO-seq:
* Tune GroHMM: tune-hmm.sh
* Call Transcripts: call-transcripts.sh
* Make universe of Enhancers: groseq_processing.sh
* GRO_seq_TFSEE:
* TFSEE pre-processing: tfsee_processing.sh
* TFSEE score integration: matrix_analysis.py
* Rank order TF's clusters: rank_order.py
* TFSEE using histone modifications ChIP-seq:
* Make universe of Enhancers: histone_centered_processing.sh
* Histone_TFSEE:
* TFSEE pre-processing: tfsee_processing.sh
* TFSEE score integration: matrix_analysis.py
* Rank order TF's clusters: rank_order.py