README.md



CHIPseq Manual


Version 1.0.6

May 31, 2019

BICF ChIP-seq Pipeline


Introduction
BICF ChIPseq is a bioinformatics best-practice analysis pipeline used for ChIP-seq (chromatin immunoprecipitation sequencing) data analysis at BICF at UT Southwestern Department of Bioinformatics.
The pipeline uses Nextflow, a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
This pipeline is primarily used with a SLURM cluster on the BioHPC Cluster. However, the pipeline should be able to run on any system that supports Nextflow.
Additionally, the pipeline is designed to work with Astrocyte Workflow System using a simple web interface.
Current version of the software and issue reports are at
https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis
To download the current version of the software

$ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/chipseq_analysis.git


Input files

1) Fastq Files

You will need the full path to the files for the Bash Scipt


2) Design File


The Design file is a tab-delimited file with 8 columns for Single-End and 9 columns for Paired-End.  Letter, numbers, and underlines can be used in the names. However, the names can only begin with a letter. Columns must be as follows:

sample_id          a short, unique, and concise name used to label output files; will be used as a control_id if it is the control sample
experiment_id    biosample_treatment_factor; same name given for all replicates of treatment. Will be used for the consensus header.
biosample          symbol for tissue type or cell line
factor                 symbol for antibody target
treatment           symbol of treatment applied
replicate             a number, usually from 1-3 (i.e. 1)
control_id          sample_id name that is the control for this sample
fastq_read1        name of fastq file 1 for SE or PC data
fastq_read2        name of fastq file 2 for PE data


See HERE for an example design file, paired-end


See HERE for an example design file, single-end


3) Bash Script

You will need to create a bash script to run the CHIPseq pipeline on BioHPC

This pipeline has been optimized for the correct partition
See HERE for an example bash script
The parameters that must be specified are:

--reads '/path/to/files/name.fastq.gz'
--designFile '/path/to/file/design.txt',
--genome 'GRCm38', 'GRCh38', or 'GRCh37' (if you need to use another genome contact the BICF)
--pairedEnd 'true' or 'false' (where 'true' is PE and 'false' is SE; default 'false')
--outDir (optional) path and folder name of the output data, example: /home2/s000000/Desktop/Chipseq_output (if not specficied will be under workflow/output/)