Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • BICF/Astrocyte/chipseq_analysis
  • s190984/chipseq_analysis
  • astrocyte/workflows/bicf/chipseq_analysis
  • s219741/chipseq-analysis-containerized
Show changes
Commits on Source (269)
Showing
with 231 additions and 130 deletions
......@@ -100,3 +100,12 @@ ENV/
# Mac OS
.DS_Store
# nextflow analysis folders/files
pipeline_trace*.txt*
.nextflow*.log*
report*.html*
timeline*.html*
/workflow/output/*
/work/*
/.nextflow/*
before_script:
- module add python/3.6.1-2-anaconda
- pip install --user pytest-pythonpath==0.7.1 pytest-cov==2.5.1
- module load nextflow/0.31.0
- module load nextflow/0.31.0
- ln -s /project/shared/bicf_workflow_ref/workflow_testdata/chipseq/*fastq.gz test_data/
stages:
......@@ -10,32 +10,37 @@ stages:
- single
- multiple
- skip
- cleanup
user_configuration:
stage: unit
script:
- pytest -m unit
- pytest -m unit --cov=./workflow/scripts
bash_tests:
stage: unit
script:
- module load singularity/3.0.2
- module load deeptools/2.5.0.1
- singularity run docker://bats/bats:v1.1.0 --tap workflow/tests/plot_profile.bats
astrocyte:
stage: astrocyte
script:
- module load astrocyte/0.1.0
- module load astrocyte/0.3.1
- module unload nextflow
- cd ..
- astrocyte_cli validate chipseq_analysis
artifacts:
expire_in: 2 days
after_script:
- rm -rf work/
single_end_mouse:
stage: single
only:
- master
script:
- nextflow run workflow/main.nf --astrocyte true -resume
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --astrocyte true --ci true --dev true
- pytest -m singleend
artifacts:
expire_in: 2 days
paired_end_human:
stage: single
......@@ -44,39 +49,55 @@ paired_end_human:
except:
- master
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR729LGA_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -resume
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR729LGA_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false --ci true --dev true
- pytest -m pairedend
artifacts:
expire_in: 2 days
single_end_diff:
stage: multiple
single_end_single_control:
stage: single
only:
- branches
except:
- master
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --astrocyte false -resume
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_single_contol_SE.txt" --genome 'GRCh38' --pairedEnd false --astrocyte false --ci true --dev true
- pytest -m singlecontrol
single_end_diff:
stage: multiple
only:
- master
script:
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --astrocyte false --ci true --dev true
- pytest -m singleend
- pytest -m singlediff
artifacts:
expire_in: 2 days
paired_end_diff:
only:
- branches
except:
- master
stage: multiple
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -resume
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -with-dag flowchart.pdf --ci true --dev true
- pytest -m pairedend
- pytest -m paireddiff
artifacts:
expire_in: 2 days
name: "$CI_JOB_NAME"
when: always
paths:
- flowchart.pdf
expire_in: 7 days
single_end_skip:
stage: skip
only:
- master
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --skipDiff true --skipMotif true --astrocyte false -resume
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --skipDiff true --skipMotif true --skipPlotProfile true --astrocyte false --ci true --dev true
- pytest -m singleskip_true
artifacts:
expire_in: 2 days
cleanup_job:
stage: cleanup
script:
- cd $CI_BUILDS_DIR/$CI_RUNNER_SHORT_TOKEN/$CI_PROJECT_NAME
- rm -fr $CI_PIPELINE_ID/
Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).
## PR checklist
- [ ] This comment contains a description of changes (with reason)
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] Documentation in `docs` is updated
- [ ] Replace dag.png with the most recent CI pipleine integrated_pe artifact
- [ ] `CHANGELOG.md` is updated
- [ ] `README.md` is updated
- [ ] `LICENSE.md` is updated with new contributors
# Changelog
All notable changes to this project will be documented in this file.
## [publish_1.1.3 ] - 2020-08-16
### Updated
- Updated astrocyte to 0.3.1
### Fixed
- Fixed missing gene names in annotation
## [publish_1.1.2 ] - 2020-06-22
- Add pipeline tracking
## [publish_1.1.1 ] - 2020-04-23
### Added
- Add Nextflow to references.md
- Add test data for test_pool_and_pseudoreplicate
- Add PlotProfile Option
- Add Python version to MultiQC
- Add and Update tests
- Use GTF files instead of TxDb and org libraries in Annotate Peaks
- Make gtf and geneName files as param inputs
- Add test data for single control and single replicate
### Fixed
- Fix references.md link in citation of README.md
- Fix pool_and_psuedoreplicate.py to run single experiment
- Fix xcor to increase file size for --random-source
- Fix skip diff test for paired-end data
- Fix xcor to get lowest non zero value above 50
- Fix references to display in Multiqc report
- Update astrocyte testing to 0.2.0
## [publish_1.0.6 ] - 2019-05-31
### Added
- MIT LICENSE
- Check for correct genome input
- Check for correct output for motif serach
### Fixed
- Path to reference fasta file
## [publish_1.0.5 ] - 2019-05-16
### Fixed
- Retitled documentation for skipDiff and skipMotif to be more clear
- Add missing python module for motif search
- Updated links for phantompeaktools and references
- Fix callPeaks for single control experiments
## [publish_1.0.3 ] - 2019-04-23
### Fixed
- Variable name for design file for Astrocyte
- File path for design file to work for Astrocyte
## [publish_1.0.0 ] - 2019-04-23
Initial release of pipeline
MIT License
Copyright (c) 2019, University of Texas Southwestern Medical Center.
All rights reserved.
Contributors: Spencer D. Barnes, Holly Ruess, Jeremy A. Mathews, Beibei Chen, Venkat S. Malladi
Department: Bioinformatic Core Facility, Department of Bioinformatics
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# **CHIPseq Manual**
## Version 1.0.0
## May 2, 2019
# **ChIP-seq Manual**
## Version 1.1.2
## June 21, 2020
# BICF ChIP-seq Pipeline
[![Build Status](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/master/build.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/commits/master)
[![Coverage Report](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/master/coverage.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/commits/master)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.24.0-brightgreen.svg
)](https://www.nextflow.io/)
[![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.1.0-blue.svg)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2648845.svg)](https://doi.org/10.5281/zenodo.2648845)
|*master*|*dev*|
|:-:|:-:|
|[![pipeline status](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/master/pipeline.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/commits/master)|[![pipeline status](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/dev/pipeline.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/commits/dev)|
|[![coverage report](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/master/coverage.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/commits/master)|[![coverage report](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/dev/coverage.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/commits/dev)|
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.31.0-brightgreen)](https://www.nextflow.io/)
[![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.3.1-blue)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2648844.svg)](https://doi.org/10.5281/zenodo.2648844)
## Introduction
BICF ChIPseq is a bioinformatics best-practice analysis pipeline used for ChIP-seq (chromatin immunoprecipitation sequencing) data analysis at [BICF](http://www.utsouthwestern.edu/labs/bioinformatics/) at [UT Southwestern Department of Bioinformatics](http://www.utsouthwestern.edu/departments/bioinformatics/).
BICF ChIP-seq is a bioinformatics best-practice analysis pipeline used for ChIP-seq (chromatin immunoprecipitation sequencing) data analysis at [BICF](http://www.utsouthwestern.edu/labs/bioinformatics/) at [UT Southwestern Department of Bioinformatics](http://www.utsouthwestern.edu/departments/bioinformatics/).
The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
This pipeline is primarily used with a SLURM cluster on the [BioHPC Cluster](https://portal.biohpc.swmed.edu/content/). However, the pipeline should be able to run on any system that supports Nextflow.
Additionally, the pipeline is designed to work with [Astrocyte Workflow System](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html) using a simple web interface.
Additionally, the pipeline is designed to work with [Astrocyte Workflow System](https://astrocyte.biohpc.swmed.edu/static/docs/index.html) using a simple web interface.
Current version of the software and issue reports are at
https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis
......@@ -58,7 +60,10 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/chipseq_analysis.git
- --designFile '/path/to/file/design.txt',
- --genome 'GRCm38', 'GRCh38', or 'GRCh37' (if you need to use another genome contact the [BICF](mailto:BICF@UTSouthwestern.edu))
- --pairedEnd 'true' or 'false' (where 'true' is PE and 'false' is SE; default 'false')
- --outDir (optional) path and folder name of the output data, example: /home2/s000000/Desktop/Chipseq_output (if not specficied will be under workflow/output/)
- --skipDiff 'true' or 'false' (where 'true' is skip differential peak and 'false' is do peak differential peak calling; default 'false')
- --skipMotif 'true' or 'false' (where 'true' is skip motif calling and 'false' is do motif calling; default 'false')
- --skipPlotProfile 'true' or 'false' (where 'true' is skip metageneplot for TSS and 'false' is do metageneplot for TSS; default 'false')
- --outDir (optional) path and folder name of the output data, example: /home2/s000000/Desktop/Chipseq_output (if not specified will be under workflow/output/)
## Pipeline
+ There are 11 steps to the pipeline
......@@ -73,6 +78,7 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/chipseq_analysis.git
9. Annotate all peaks using ChipSeeker
10. Calculate Differential Binding Activity with DiffBind (If more than 1 rep in more than 1 experiment)
11. Use MEME-ChIP to find motifs in original peaks
12. Plot enrichment of signal around TSS
See [FLOWCHART](docs/flowchart.pdf)
......@@ -116,6 +122,8 @@ diffPeaks | normcount_peaksets.txt | Use only for replicated samples; peak set v
diffPeaks | pca.pdf | Use only for replicated samples; PCA of peak location and peak intensity
diffPeaks | *_diffbind.bed | Use only for replicated samples; bed file of peak locations between replicates
diffPeaks | *_diffbind.csv | Use only for replicated samples; CSV file of peaks between replicates
plotProfile | plotProfile.png | Plot profile of the TSS region
plotProfile | computeMatrix.gz | Compute Matrix from deeptools to create custom plots other than plotProfile
## Common Quality Control Metrics
+ These are the list of files that should be reviewed before continuing on with the CHIPseq experiment. If your experiment fails any of these metrics, you should pause and re-evaluate whether the data should remain in the study.
......@@ -130,25 +138,25 @@ diffPeaks | *_diffbind.csv | Use only for replicated samples; CSV file of peaks
If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know and we will add it here.
## Citation
Please cite individual programs and versions used [HERE](docs/references.txt). Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596). Also, please look out for our pipeline to be published in the future [HERE](https://zenodo.org/).
Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi:[10.5281/zenodo.2648844](https://doi.org/10.5281/zenodo.2648844). Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).
## Programs and Versions
+ python/3.6.1-2-anaconda [website](https://www.anaconda.com/download/#linux) [citation](docs/references.txt)
+ trimgalore/0.4.1 [website](https://github.com/FelixKrueger/TrimGalore) [citation](docs/references.txt)
+ cutadapt/1.9.1 [website](https://cutadapt.readthedocs.io/en/stable/index.html) [citation](docs/references.txt)
+ bwa/intel/0.7.12 [website](http://bio-bwa.sourceforge.net/) [citation](docs/references.txt)
+ samtools/1.6 [website](http://samtools.sourceforge.net/) [citation](docs/references.txt)
+ sambamba/0.6.6 [website](http://lomereiter.github.io/sambamba/) [citation](docs/references.txt)
+ bedtools/2.26.0 [website](https://bedtools.readthedocs.io/en/latest/) [citation](docs/references.txt)
+ deeptools/2.5.0.1 [website](https://deeptools.readthedocs.io/en/develop/) [citation](docs/references.txt)
+ phantompeakqualtools/1.2 [website](https://github.com/kundajelab/phantompeakqualtools) [citation](docs/references.txt)
+ macs/2.1.0-20151222 [website](http://liulab.dfci.harvard.edu/MACS/) [citation](docs/references.txt)
+ UCSC_userApps/v317 [website](https://genome.ucsc.edu/util.html) [citation](docs/references.txt)
+ R/3.4.1 [website](https://www.r-project.org/) [citation](docs/references.txt)
+ python/3.6.1-2-anaconda [website](https://www.anaconda.com/download/#linux) [citation](docs/references.md)
+ trimgalore/0.4.1 [website](https://github.com/FelixKrueger/TrimGalore) [citation](docs/references.md)
+ cutadapt/1.9.1 [website](https://cutadapt.readthedocs.io/en/stable/index.html) [citation](docs/references.md)
+ bwa/intel/0.7.12 [website](http://bio-bwa.sourceforge.net/) [citation](docs/references.md)
+ samtools/1.6 [website](http://samtools.sourceforge.net/) [citation](docs/references.md)
+ sambamba/0.6.6 [website](http://lomereiter.github.io/sambamba/) [citation](docs/references.md)
+ bedtools/2.26.0 [website](https://bedtools.readthedocs.io/en/latest/) [citation](docs/references.md)
+ deeptools/2.5.0.1 [website](https://deeptools.readthedocs.io/en/develop/) [citation](docs/references.md)
+ phantompeakqualtools/1.2 [website](https://github.com/kundajelab/phantompeakqualtools) [citation](docs/references.md)
+ macs/2.1.0-20151222 [website](http://liulab.dfci.harvard.edu/MACS/) [citation](docs/references.md)
+ UCSC_userApps/v317 [website](https://genome.ucsc.edu/util.html) [citation](docs/references.md)
+ R/3.4.1 [website](https://www.r-project.org/) [citation](docs/references.md)
+ SPP/1.14
+ meme/4.11.1-gcc-openmpi [website](http://meme-suite.org/doc/install.html?man_type=web) [citation](docs/references.txt)
+ ChIPseeker [website](https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html) [citation](docs/references.txt)
+ DiffBind [website](https://bioconductor.org/packages/release/bioc/html/DiffBind.html) [citation](docs/references.txt)
+ meme/4.11.1-gcc-openmpi [website](http://meme-suite.org/doc/install.html?man_type=web) [citation](docs/references.md)
+ ChIPseeker [website](https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html) [citation](docs/references.md)
+ DiffBind [website](https://bioconductor.org/packages/release/bioc/html/DiffBind.html) [citation](docs/references.md)
......
......@@ -9,7 +9,7 @@
# A unique identifier for the workflow package, text/underscores only
name: 'chipseq_analysis_bicf'
# Who wrote this?
author: 'Beibei Chen and Venkat Malladi'
author: 'Holly Ruess, Spencer D. Barnes, Jeremy A. Mathews, Beibei Chen and Venkat Malladi'
# A contact email address for questions
email: 'bicf@utsouthwestern.edu'
# A more informative title for the workflow package
......@@ -52,6 +52,7 @@ workflow_modules:
- 'R/3.3.2-gccmkl'
- 'meme/4.11.1-gcc-openmpi'
- 'pandoc/2.7'
- 'singularity/3.0.2'
# A list of parameters used by the workflow, defining how to present them,
......@@ -104,13 +105,9 @@ workflow_parameters:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
In single-end sequencing, the sequencer reads a fragment from only one
end to the other, generating the sequence of base pairs. In paired-end
reading it starts at one read, finishes this direction at the specified
read length, and then starts another round of reading from the opposite
end of the fragment. (Paired-end: True, Single-end: False)
If Paired-end: True, if Single-end: False.
- id: design
- id: designFile
type: file
required: true
description: |
......@@ -135,7 +132,7 @@ workflow_parameters:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Run differential peak analysis
Skip differential peak analysis
- id: skipMotif
type: select
......@@ -144,7 +141,16 @@ workflow_parameters:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Run motif calling
Skip motif calling
- id: skipPlotProfile
type: select
required: true
choices:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Skip Plot Profile Analysis
- id: astrocyte
type: select
......
No preview for this file type
docs/images/phantompeakqualtools.png

247 KiB

......@@ -20,17 +20,18 @@ Report issues to the Bioinformatic Core Facility [BICF](mailto:BICF@UTSouthweste
9. Annotate all peaks using ChipSeeker
10. Calculate Differential Binding Activity with DiffBind (If more than 1 rep in more than 1 experiment)
11. Use MEME-ChIP to find motifs in original peaks
12. Plot enrichment of signal around TSS
## Workflow Parameters
1. One or more input FASTQ files from a ChIP-seq expereiment and a design file with the link bewetwen the same file name and sample id (required) - Choose all ChIP-seq fastq files for analysis.
2. In single-end sequencing, the sequencer reads a fragment from only one end to the other, generating the sequence of base pairs. In paired-end reading it starts at one read, finishes this direction at the specified read length, and then starts another round of reading from the opposite end of the fragment. (Paired-end: True, Single-end: False) (required)
3. A design file listing sample id, fastq files, corresponding control id and additional information about the sample.
3. A design file listing sample id, fastq files, corresponding control id and additional information about the sample.
genome - Choose a genomic reference (genome).
4. Reference species and genome used for alignment and subsequent analysis. (required)
5. Run differential peak analysis (required). Must have at least 2 replicates per experiment and at least 2 experiments.
6. Run motif calling (required). Top 600 peaks sorted by p-value.
7. Ensure configuraton for astrocyte. (required; always true)
7. Ensure configuration for astrocyte. (required; always true)
## Design file
......@@ -46,8 +47,8 @@ Report issues to the Bioinformatic Core Facility [BICF](mailto:BICF@UTSouthweste
9. fastq_read2        name of fastq file 2 for PE data
+ See [HERE](test_data/design_ENCSR729LGA_PE.txt) for an example design file, paired-end
+ See [HERE](test_data/design_ENCSR238SGC_SE.txt) for an example design file, single-end
+ See [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/test_data/design_ENCSR729LGA_PE.txt) for an example design file, paired-end
+ See [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/test_data/design_ENCSR238SGC_SE.txt) for an example design file, single-end
## Output Files
......@@ -72,7 +73,7 @@ experimentQC | heatmeap_SpearmanCorr.pdf | plot of Spearman correlation between
experimentQC | heatmeap_PearsonCorr.pdf | plot of Pearson correlation between samples
experimentQC | sample_mbs.npz | array of multiple BAM summaries
crossReads | *.cc.plot.pdf | Plot of cross-correlation to assess signal-to-noise ratios
crossReads | *.cc.qc | cross-correlation metrics. File [HEADER](docs/xcor_header.txt)
crossReads | *.cc.qc | cross-correlation metrics. File [HEADER](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/xcor_header.txt)
callPeaksMACS | pooled/*pooled.fc_signal.bw | bigwig data file; raw fold enrichment of sample/control
callPeaksMACS | pooled/*pooled_peaks.xls | Excel file of peaks
callPeaksMACS | pooled/*.pvalue_signal.bw | bigwig data file; sample/control signal adjusted for pvalue significance
......@@ -90,12 +91,14 @@ diffPeaks | normcount_peaksets.txt | Use only for replicated samples; peak set v
diffPeaks | pca.pdf | Use only for replicated samples; PCA of peak location and peak intensity
diffPeaks | *_diffbind.bed | Use only for replicated samples; bed file of peak locations between replicates
diffPeaks | *_diffbind.csv | Use only for replicated samples; CSV file of peaks between replicates
plotProfile | plotProfile.png | Plot profile of the TSS region
plotProfile | computeMatrix.gz | Compute Matrix from deeptools to create custom plots other than plotProfile
## Common Quality Control Metrics
+ These are the list of files that should be reviewed before continuing on with the CHIPseq experiment. If your experiment fails any of these metrics, you should pause and re-evaluate whether the data should remain in the study.
1. multiqcReport/multiqc_report.html: follow the ChiP-seq standards [HERE](https://www.encodeproject.org/chip-seq/);
2. experimentQC/*_fingerprint.pdf: make sure the plots information is correct for your antibody/input. See [HERE](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html) for more details.
3. crossReads/*cc.plot.pdf: make sure your sample data has the correct signal intensity and location. See [HERE](https://hbctraining.github.io/Intro-to-ChIPseq/lessons/06_QC_cross_correlation.html) for more details.
3. crossReads/*cc.plot.pdf: make sure your sample data has the correct signal intensity and location. See [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/phantompeaks.md) for more details.
4. crossReads/*.cc.qc: Column 9 (NSC) should be > 1.1 for experiment and < 1.1 for input. Column 10 (RSC) should be > 0.8 for experiment and < 0.8 for input. See [HERE](https://genome.ucsc.edu/encode/qualityMetrics.html) for more details.
5. experimentQC/coverage.pdf, experimentQC/heatmeap_SpearmanCorr.pdf, experimentQC/heatmeap_PearsonCorr.pdf: See [HERE](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) for more details.
......@@ -108,18 +111,21 @@ Please cite in publications: Pipeline was developed by BICF from funding provide
### References
+ python/3.6.1-2-anaconda [website](https://www.anaconda.com/download/#linux) [citation](docs/references.txt)
+ trimgalore/0.4.1 [website](https://github.com/FelixKrueger/TrimGalore) [citation](docs/references.txt)
+ cutadapt/1.9.1 [website](https://cutadapt.readthedocs.io/en/stable/index.html) [citation](docs/references.txt)
+ bwa/intel/0.7.12 [website](http://bio-bwa.sourceforge.net/) [citation](docs/references.txt)
+ samtools/1.6 [website](http://samtools.sourceforge.net/) [citation](docs/references.txt)
+ sambamba/0.6.6 [website](http://lomereiter.github.io/sambamba/) [citation](docs/references.txt)
+ bedtools/2.26.0 [website](https://bedtools.readthedocs.io/en/latest/) [citation](docs/references.txt)
+ deeptools/2.5.0.1 [website](https://deeptools.readthedocs.io/en/develop/) [citation](docs/references.txt)
+ phantompeakqualtools/1.2 [website](https://github.com/kundajelab/phantompeakqualtools) [citation](docs/references.txt)
+ macs/2.1.0-20151222 [website](http://liulab.dfci.harvard.edu/MACS/) [citation](docs/references.txt)
+ UCSC_userApps/v317 [website](https://genome.ucsc.edu/util.html) [citation](docs/references.txt)
+ R/3.3.2-gccmkl [website](https://www.r-project.org/) [citation](docs/references.txt)
+ meme/4.11.1-gcc-openmpi [website](http://meme-suite.org/doc/install.html?man_type=web) [citation](docs/references.txt)
+ ChIPseeker [website](https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html) [citation](docs/references.txt)
+ DiffBind [website](https://bioconductor.org/packages/release/bioc/html/DiffBind.html) [citation](docs/references.txt)
+ python/3.6.1-2-anaconda [website](https://www.anaconda.com/download/#linux) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ trimgalore/0.4.1 [website](https://github.com/FelixKrueger/TrimGalore) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ cutadapt/1.9.1 [website](https://cutadapt.readthedocs.io/en/stable/index.html) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ bwa/intel/0.7.12 [website](http://bio-bwa.sourceforge.net/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ samtools/1.6 [website](http://samtools.sourceforge.net/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ sambamba/0.6.6 [website](http://lomereiter.github.io/sambamba/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ bedtools/2.26.0 [website](https://bedtools.readthedocs.io/en/latest/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ deeptools/2.5.0.1 [website](https://deeptools.readthedocs.io/en/develop/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ phantompeakqualtools/1.2 [website](https://github.com/kundajelab/phantompeakqualtools) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ macs/2.1.0-20151222 [website](http://liulab.dfci.harvard.edu/MACS/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ UCSC_userApps/v317 [website](https://genome.ucsc.edu/util.html) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ R/3.3.2-gccmkl [website](https://www.r-project.org/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ meme/4.11.1-gcc-openmpi [website](http://meme-suite.org/doc/install.html?man_type=web) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ ChIPseeker [website](https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ DiffBind [website](https://bioconductor.org/packages/release/bioc/html/DiffBind.html) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ MultiQC [website](https://multiqc.info/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ BICFChip-seqAnalysisWorkflow [website](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
+ Nextflow [website](https://www.nextflow.io/) [citation](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/references.md)
## Phantompeakqualtools
Phantompeakqualtools plots the strand cross-correlation of aligned reads for each sample. In a strand cross-correlation plot, reads are shifted in the direction of the strand they map to by an increasing number of base pairs and the Pearson correlation between the per-position read count vectors for each strand is calculated. Two cross-correlation peaks are usually observed in a ChIP experiment, one corresponding to the read length ("phantom" peak) and one to the average fragment length of the library. The absolute and relative height of the two peaks are useful determinants of the success of a ChIP-seq experiment. A high-quality IP is characterized by a ChIP peak that is much higher than the "phantom" peak, while often very small or no such peak is seen in failed experiments.
![Phantompeakqualtools](images/phantompeakqualtools.png)
*Source: Landt SG et al, Genome Research (2012)*
Normalized strand coefficient (NSC) is the normalized ratio between the fragment-length cross-correlation peak and the background cross-correlation. NSC values range from a minimum of 1 to larger positive numbers. 1.1 is the critical threshold. Datasets with NSC values much less than 1.1 (< 1.05) tend to have low signal to noise or few peaks (this could be biological eg. a factor that truly binds only a few sites in a particular tissue type OR it could be due to poor quality). ENCODE cutoff: NSC > 1.05.
Relative strand correlation (RSC) is the ratio between the fragment-length peak and the read-length peak. RSC values range from 0 to larger positive values. 1 is the critical threshold. RSC values significantly lower than 1 (< 0.8) tend to have low signal to noise. The low scores can be due to failed and poor quality ChIP, low read sequence quality and hence lots of mismappings, shallow sequencing depth (significantly below saturation) or a combination of these. Like the NSC, datasets with few binding sites (< 200), which is biologically justifiable, also show low RSC scores. ENCODE cutoff: RSC > 0.8.
......@@ -49,9 +49,12 @@
* Ross-Innes C. S., R. Stark, A. E. Teschendorff, K. A. Holmes, H. R. Ali, M. J. Dunning, G. D. Brown, O. Gojis, I. O. Ellis, A. R. Green, S. Ali, S. Chin, C. Palmieri, C. Caldas, and J. S. Carroll. 2012. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481: 389-393. doi:[10.1038/nature10730](https://dx.doi.org/10.1038/nature10730)
16. **MultiQc**:
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354 ](https://dx.doi.org/10.1093/bioinformatics/btw354)
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354](https://dx.doi.org/10.1093/bioinformatics/btw354)
17. **BICF ChIP-seq Analysis Workflow**:
* Venkat S. Malladi and Beibei Chen. (2019). BICF ChIP-seq Analysis Workflow (publish_1.0.0). Zenodo. doi:[10.5281/zenodo.2648845](https://doi.org/10.5281/zenodo.2648845)
* Spencer D. Barnes, Holly Ruess, Jeremy A. Mathews, Beibei Chen, and Venkat S. Malladi. 2020. BICF ChIP-seq Analysis Workflow (publish_1.1.3). Zenodo. doi:[10.5281/zenodo.3986942](https://doi.org/10.5281/zenodo.3986942)
18. **Nextflow**:
* Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., and Notredame, C. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology, 35(4), 316.
Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
python/3.6.1-2-anaconda:
Anaconda (Anaconda Software Distribution, https://anaconda.com)
trimgalore/0.4.1:
trimgalore/0.4.1 (https://github.com/FelixKrueger/TrimGalore)
cutadapt/1.9.1:
Marcel, M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17(1):10-12. DOI: http://dx.doi.org/10.14806/ej.17.1.200
bwa/intel/0.7.12:
Li H., and R. Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25: 1754-60.
samtools/1.6:
Li H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25: 2078-9.
sambamba/0.6.6:
Tarasov, A., A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. 2015 Sambamba: fast processing of NGS alignment formats. Bioinformatics 31(12): 2032-2034. doi:10.1093/bioinformatics/btv098.
bedtools/2.26.0:
Quinlan, A. R., and I. M. Hall. 2010. BEDTools: a flexible suite of utilities for comparing genomic feautures. Bioinformatics 26(6): 841-842. doi:10.1093/bioinformatics/btq033
deeptools/2.5.0.1:
Ramírez, F., D. P. Ryan, B. Grüning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, F. Dündar, and T. Manke. 2016. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44: W160-165. doi: 10.1093/nar/gkw257.
phantompeakqualtools/1.2:
Landt S. G., G. K. Marinov, A. Kundaje, et al. 2012. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 9: 1813-31. doi: 10.1101/gr.136184.111.
Kharchenko P. K., M. Y. Tolstorukov, and P. J. Park. 2008. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26(12): 1351-1359.
macs/2.1.0-20151222:
Zhang Y., T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nusbaum, R. M. Myers, M. Brown, W. Li, and X. S. Liu. 2008. Model-based Analysis of ChIP-Seq (MACS). Genome Biol 9: R137.
UCSC_userApps/v317
Kent W. J., A. S. Zweig, G. Barber, A. S. Hinrichs, and D. Karolchik. BigWig and BigBed: enabling browsing of large distributed data sets. Bioinformatics 26(17): 2204-2207.
R/3.3.2-gccmkl:
R Core Team 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
meme/4.11.1-gcc-openmpi:
Bailey T. L., M. Bodén, F. A. Buske, M. Frith, C. E. Grant, L. Clementi, J. Ren, W. W. Li, and W. S. Noble. 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research 37: W202-W208.
Machanick P., and T. L. Bailey. 2011. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27(12): 1696-1697.
R ChIPseeker:
Yu G., L. Wang, and Q. He. 2015. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31(14): 2382-2383. doi: 10.1093/bioinformatics/btv145.
R DiffBind:
Stark R., and G. Brown. 2011. DiffBind: differential binding analysis of ChIP-Seq peak data. http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf.
Ross-Innes C. S., R. Stark, A. E. Teschendorff, K. A. Holmes, H. R. Ali, M. J. Dunning, G. D. Brown, O. Gojis, I. O. Ellis, A. R. Green, S. Ali, S. Chin, C. Palmieri, C. Caldas, and J. S. Carroll. 2012. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature 481: 389-393. http://www.nature.com/nature/journal/v481/n7381/full/nature10730.html.
File added
File added
File added
File added
File added
File added
sample_id experiment_id biosample factor treatment replicate control_id fastq_read1
ENCLB497XZB ENCSR000DXB Panc1 H3K4me3 None 1 ENCLB304SBJ ENCFF001GBW.fastq.gz
ENCLB304SBJ ENCSR000DXC Panc1 Control None 1 ENCLB304SBJ ENCFF001HWJ.fastq.gz