**ChIP-seq Analysis** is a bioinformatics best-practice analysis pipeline used for chromatin immunoprecipitation (ChIP-seq) data analysis.
**ChIP-seq Analysis** is a bioinformatics best-practice analysis pipeline used for chromatin immunoprecipitation (ChIP-seq) data analysis.
...
@@ -7,16 +7,16 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
...
@@ -7,16 +7,16 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
### Pipeline Steps
### Pipeline Steps
1) Trim adaptors TrimGalore!
1) Trim adaptors TrimGalore!
2) Align with BWA
2) Align with BWA
3) Filter reads with Sambamba S
3) Filter reads with Sambamba S
4) Quality control with DeepTools
4) Quality control with DeepTools
5) Calculate Cross-correlation using SPP and PhantomPeakQualTools
5) Calculate Cross-correlation using SPP and PhantomPeakQualTools
6) Signal profiling using MACS2
6) Signal profiling using MACS2
7) Call consenus peaks
7) Call consenus peaks
8) Annotate all peaks using ChipSeeker
8) Annotate all peaks using ChipSeeker
9) Use MEME-ChIP to find motifs in original peaks
9) Use MEME-ChIP to find motifs in original peaks
10) Find differential expressed peaks using DiffBind (If more than 1 experiment)
10) Find differential expressed peaks using DiffBind (If more than 1 experiment)
## Workflow Parameters
## Workflow Parameters
...
@@ -25,41 +25,35 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
...
@@ -25,41 +25,35 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
pairedEnd - Choose True/False if data is paired-end
pairedEnd - Choose True/False if data is paired-end
design - Choose the file with the experiment design information. TSV format
design - Choose the file with the experiment design information. TSV format
genome - Choose a genomic reference (genome).
genome - Choose a genomic reference (genome).
skipDiff - Choose True/False if data if you want to run Differential Peaks
skipMotif - Choose True/False if data if you want to run Motif Calling
## Design file
## Design file
The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/bchen4/chipseq_analysis/raw/master/docs/design_example.csv)
The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/design_example.txt)
SampleID
sample_id
The id of the sample. This will be the header in output files, please make sure it is concise
The id of the sample. This will be the name used in output files, please make sure it is concise and informative.
Tissue
experiment_id
Tissue of the sample
The id of the experiment. Used for grouping replicates.
Factor
biosample
Factor of the experiment
The name of the biological sample.
Condition
factor
This is the group that will be used for pairwise differential expression analysis
Factor of the experiment.
Replicate
treatment
Replicate id
Treatment used in experiment.
Peaks
replicate
The file name of the peak file for this sample
Replicate number.
bamReads
control_id
The file name of the IP BAM for this sample
The sample_id of the control used for this sample.
bamControl
fastq_read1
The file name of the control BAM for this sample
File name of fastq file, if paired-end this is read1.
ContorlID
fastq_read2
The id of the control sample
File name of read2 (for paired-end), not needed for single-end data.
PeakCaller
The peak caller used
### Credits
### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics
This worklow is was developed jointly with the [Bioinformatic Core Facility (BICF), Department of Bioinformatics](http://www.utsouthwestern.edu/labs/bioinformatics/)
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354 ](https://dx.doi.org/10.1093/bioinformatics/btw354)
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354 ](https://dx.doi.org/10.1093/bioinformatics/btw354)
17.**BICF ChIP-seq Analysis Workflow**:
* Venkat S. Malladi and Beibei Chen. (2019). BICF ChIP-seq Analysis Workflow (publish_1.0.0). Zenodo. doi:[10.5281/zenodo.2648845](https://doi.org/10.5281/zenodo.2648845)
Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.