@@ -25,12 +25,12 @@ Report issues to the Bioinformatic Core Facility [BICF](mailto:BICF@UTSouthweste
## Workflow Parameters
1. One or more input FASTQ files from a ChIP-seq expereiment and a design file with the link bewetwen the same file name and sample id (required) - Choose all ChIP-seq fastq files for analysis.
2. In single-end sequencing, the sequencer reads a fragment from only one end to the other, generating the sequence of base pairs. In paired-end reading it starts at one read, finishes this direction at the specified read length, and then starts another round of reading from the opposite end of the fragment. (Paired-end: True, Single-end: False) (required)
3. A design file listing sample id, fastq files, corresponding control id and additional information about the sample.
3. A design file listing sample id, fastq files, corresponding control id and additional information about the sample.
genome - Choose a genomic reference (genome).
4. Reference species and genome used for alignment and subsequent analysis. (required)
5. Run differential peak analysis (required). Must have at least 2 replicates per experiment and at least 2 experiments.
6. Run motif calling (required). Top 600 peaks sorted by p-value.
7. Ensure configuraton for astrocyte. (required; always true)
7. Ensure configuration for astrocyte. (required; always true)
## Design file
...
...
@@ -46,8 +46,8 @@ Report issues to the Bioinformatic Core Facility [BICF](mailto:BICF@UTSouthweste
9. fastq_read2 name of fastq file 2 for PE data
+ See [HERE](test_data/design_ENCSR729LGA_PE.txt) for an example design file, paired-end
+ See [HERE](test_data/design_ENCSR238SGC_SE.txt) for an example design file, single-end
+ See [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/test_data/design_ENCSR729LGA_PE.txt) for an example design file, paired-end
+ See [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/test_data/design_ENCSR238SGC_SE.txt) for an example design file, single-end
## Output Files
...
...
@@ -95,7 +95,7 @@ diffPeaks | *_diffbind.csv | Use only for replicated samples; CSV file of peaks
+ These are the list of files that should be reviewed before continuing on with the CHIPseq experiment. If your experiment fails any of these metrics, you should pause and re-evaluate whether the data should remain in the study.
1. multiqcReport/multiqc_report.html: follow the ChiP-seq standards [HERE](https://www.encodeproject.org/chip-seq/);
2. experimentQC/*_fingerprint.pdf: make sure the plots information is correct for your antibody/input. See [HERE](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html) for more details.
3. crossReads/*cc.plot.pdf: make sure your sample data has the correct signal intensity and location. See [HERE](https://hbctraining.github.io/Intro-to-ChIPseq/lessons/06_QC_cross_correlation.html) for more details.
3. crossReads/*cc.plot.pdf: make sure your sample data has the correct signal intensity and location. See [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/phantompeaks.md) for more details.
4. crossReads/*.cc.qc: Column 9 (NSC) should be > 1.1 for experiment and <1.1forinput.Column10(RSC)shouldbe> 0.8 for experiment and < 0.8 for input. See [HERE](https://genome.ucsc.edu/encode/qualityMetrics.html) for more details.
5. experimentQC/coverage.pdf, experimentQC/heatmeap_SpearmanCorr.pdf, experimentQC/heatmeap_PearsonCorr.pdf: See [HERE](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) for more details.
...
...
@@ -108,18 +108,18 @@ Please cite in publications: Pipeline was developed by BICF from funding provide
Phantompeakqualtools plots the strand cross-correlation of aligned reads for each sample. In a strand cross-correlation plot, reads are shifted in the direction of the strand they map to by an increasing number of base pairs and the Pearson correlation between the per-position read count vectors for each strand is calculated. Two cross-correlation peaks are usually observed in a ChIP experiment, one corresponding to the read length ("phantom" peak) and one to the average fragment length of the library. The absolute and relative height of the two peaks are useful determinants of the success of a ChIP-seq experiment. A high-quality IP is characterized by a ChIP peak that is much higher than the "phantom" peak, while often very small or no such peak is seen in failed experiments.
Normalized strand coefficient (NSC) is the normalized ratio between the fragment-length cross-correlation peak and the background cross-correlation. NSC values range from a minimum of 1 to larger positive numbers. 1.1 is the critical threshold. Datasets with NSC values much less than 1.1 (<1.05)tendtohavelowsignaltonoiseorfewpeaks(thiscouldbebiologicaleg.afactorthattrulybindsonlyafewsitesinaparticulartissuetypeORitcouldbeduetopoorquality).ENCODEcutoff:NSC> 1.05.
Relative strand correlation (RSC) is the ratio between the fragment-length peak and the read-length peak. RSC values range from 0 to larger positive values. 1 is the critical threshold. RSC values significantly lower than 1 (<0.8)tendtohavelowsignaltonoise.ThelowscorescanbeduetofailedandpoorqualityChIP,lowreadsequencequalityandhencelotsofmismappings,shallowsequencingdepth(significantlybelowsaturation)oracombinationofthese.LiketheNSC,datasetswithfewbindingsites(<200),whichisbiologicallyjustifiable,alsoshowlowRSCscores.ENCODEcutoff:RSC> 0.8.