-
Jeremy Mathews authoredb36522ca
CHIPseq Manual
Version 1.0.6
May 31, 2019
BICF ChIP-seq Pipeline
Introduction
BICF ChIPseq is a bioinformatics best-practice analysis pipeline used for ChIP-seq (chromatin immunoprecipitation sequencing) data analysis at BICF at UT Southwestern Department of Bioinformatics.
The pipeline uses Nextflow, a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
This pipeline is primarily used with a SLURM cluster on the BioHPC Cluster. However, the pipeline should be able to run on any system that supports Nextflow.
Additionally, the pipeline is designed to work with Astrocyte Workflow System using a simple web interface.
Current version of the software and issue reports are at https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis
To download the current version of the software
$ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/chipseq_analysis.git
Input files
1) Fastq Files
- You will need the full path to the files for the Bash Scipt
2) Design File
-
The Design file is a tab-delimited file with 8 columns for Single-End and 9 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names can only begin with a letter. Columns must be as follows:
- sample_id a short, unique, and concise name used to label output files; will be used as a control_id if it is the control sample
- experiment_id biosample_treatment_factor; same name given for all replicates of treatment. Will be used for the consensus header.
- biosample symbol for tissue type or cell line
- factor symbol for antibody target
- treatment symbol of treatment applied
- replicate a number, usually from 1-3 (i.e. 1)
- control_id sample_id name that is the control for this sample
- fastq_read1 name of fastq file 1 for SE or PC data
- fastq_read2 name of fastq file 2 for PE data
-
See HERE for an example design file, paired-end
-
See HERE for an example design file, single-end
3) Bash Script
- You will need to create a bash script to run the CHIPseq pipeline on BioHPC
- This pipeline has been optimized for the correct partition
- See HERE for an example bash script
- The parameters that must be specified are:
- --reads '/path/to/files/name.fastq.gz'
- --designFile '/path/to/file/design.txt',
- --genome 'GRCm38', 'GRCh38', or 'GRCh37' (if you need to use another genome contact the BICF)
- --pairedEnd 'true' or 'false' (where 'true' is PE and 'false' is SE; default 'false')
- --outDir (optional) path and folder name of the output data, example: /home2/s000000/Desktop/Chipseq_output (if not specficied will be under workflow/output/)