Random Forest of Enhancer Prediction http://enhancer.ucsd.edu/renlab/RFECS_enhancer_prediction/
Methods:
- Call Peaks H3K27ac, H3K4me1 using threshold of 1e-2
- Call Peaks H3K4me3 using threshold of 1e-5
- Call Transcript units per cell using GRO-seq and GRO-HMM
- Merge H3K27ac, H3K4me1, H3K4me3 Peaks within 500bp and make universe filter for at least 1 RPKM in a cell
- Merge GRO-seq transcripts and filter for a. +/- 3kb from TSS of protein coding genes Gencode and H3K4me3 pekas b. Merge overlaping transcripts c. Filter for <=9kb Short-Short-Paired and Short-Unpaired d. make universe filter for at least 1 > RPKM SSP and > 2 RPM SUP in a cell
- Make Enhancer regions for motif search a. Short-Short-Paired +/- 500 bp center-overlap b. Short-Unpaired +/- bp TSS of transcript c. Histone data +/- bp 500 bp center of mark
- De novo motif analyses were performed using the command-line version of MEME (Bailey et al., 2009). The following parameters were used for motif prediction: (1) zero or one occurrence per sequence (- mod zoops); (2) number of motifs (-nmotifs 15); (3) minimum, maximum width of the motif (- minw 8, -maxw 15); and (4) search for motif in given strand and reverse complement strand (- revcomp). The predicted motifs from MEME were matched to known motifs using TOMTOM (Gupta et al., 2007).