README.md



Random Forest of Enhancer Prediction
http://enhancer.ucsd.edu/renlab/RFECS_enhancer_prediction/


# Methods:

1. Call Peaks H3K27ac, H3K4me1 using threshold of 1e-2
2. Call Peaks H3K4me3 using threshold of 1e-5
3. Call Transcript units per cell using GRO-seq and GRO-HMM
4. Merge H3K27ac, H3K4me1, H3K4me3 Peaks within 500bp and make universe filter for at least 1 RPKM in a cell
5. Merge GRO-seq transcripts and filter for
    a. +/- 3kb from TSS of protein coding genes Gencode and H3K4me3 pekas
    b. Merge overlaping transcripts
    c. Filter for <=9kb Short-Short-Paired and Short-Unpaired
    d. make universe filter for at least 1 > RPKM SSP and > 2 RPM SUP in a cell
6. Make Enhancer regions for motif search
    a. Short-Short-Paired +/- 500 bp center-overlap
    b. Short-Unpaired +/- bp TSS of transcript
    c. Histone data +/- bp 500 bp center of mark
7. De novo motif analyses were performed using the command-line version of MEME (Bailey et al., 2009). The
following parameters were used for motif prediction: (1) zero or one occurrence per sequence (-
mod zoops); (2) number of motifs (-nmotifs 15); (3) minimum, maximum width of the motif (-
minw 8, -maxw 15); and (4) search for motif in given strand and reverse complement strand (-
revcomp). The predicted motifs from MEME were matched to known motifs using TOMTOM
(Gupta et al., 2007).