Skip to content
Snippets Groups Projects
Commit 0eeb6337 authored by Holly Ruess's avatar Holly Ruess
Browse files

added merge request templates

parent e89e1ca9
Branches
Tags
1 merge request!2Resolve "Remove unnecessary columns from design file"
Pipeline #5226 canceled with stages
in 1 minute and 9 seconds
Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).
## PR checklist
- [ ] This comment contains a description of changes (with reason)
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] Documentation in `docs` is updated
- [ ] `CHANGELOG.md` is updated
- [ ] `README.md` is updated
- [ ] `LICENSE.md` is updated with new contributors
# Changelog
All notable changes to this project will be documented in this file.
## [Unreleased]
### Fixed
- Removed biosample, factor, treatment from design file
- Updated documentation
### Added
- Changelog
- Merge request template
## [publish_1.0.0 ] - 2019-12-03
Initial release of pipeline
License (coming soon)
Copyright (coming soon)
All rights reserved.
Contributors: Venkat S. Malladi, Holly Ruess, Spencer D. Barnes
Department: Bioinformatic Core Facility, Department of Bioinformatics
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# BICF ATAC-seq Pipeline
# Astrocyte ATAC-seq analysis Workflow Package
[![Build Status](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/master/build.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/commits/master)
[![Coverage Report](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/badges/master/coverage.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/commits/master)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.24.0-brightgreen.svg
)](https://www.nextflow.io/)
[![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.1.0-blue.svg)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html)
This SOP describes the analysis pipeline of downstream analysis of ChIP-seq sequencing data. This pipeline includes (1) Quality control using Deeptools, (2) Peak annotation, (3) Differential peak analysis, and (4) motif analysis. BAM files and SORTED peak BED files selected as input. For each sample this workflow:
1) Annotate all peaks using ChipSeeker
2) Qulity control and signal profiling with Deeptools
3) Find differential expressed peaks using DiffBind
4) Annotate all differentially expressed peaks
5) Using MEME-ChIP in motif finding for both original peaks and differently expressed peaks
## Introduction
BICF ATAC-seq is a bioinformatics best-practice analysis pipeline used for ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data analysis at [BICF](http://www.utsouthwestern.edu/labs/bioinformatics/) at [UT Southwestern Dept. of Bioinformatics](http://www.utsouthwestern.edu/departments/bioinformatics/).
The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
This pipeline is primarily used with a SLURM cluster on the [BioHPC Cluster](https://biohpc.swmed.edu/). However, the pipeline should be able to run on any system that Nextflow supports.
## Annotations used in the pipeline
ChipSeeker - Known gene from Bioconductor [TxDb annotation](https://bioconductor.org/packages/release/BiocViews.html#___TxDb)
Deeptools - RefGene downloaded from UCSC Table browser
## Workflow Parameters
bam - Choose all ChIP-seq alignment files for analysis.
genome - Choose a genomic reference (genome).
peaks - Choose all the peak files for analysis. All peaks should be sorted by the user
design - Choose the file with the experiment design information. CSV format
toppeak - The number of top peaks used for motif analysis. Default is all
## Design file
+ The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows:
sample_id
The id of the sample. This will be the header in output files, please make sure it is concise
experiment_id
Same name given for all replicates of treatment. Will be used for the consensus header.
replicate
Replicate number
fastq_read1
Name of fastq file 1 for SE or PE data
fastq_read2
Name of fastq file 2 for PE data
+ See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end
+ See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end
## Common Errors
If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know and we will add it here.
## Citation
Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi: coming soon. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).
### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/).
Additionally, the pipeline is designed to work with [Astrocyte Workflow System](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html) using a simple web interface.
sample_id experiment_id replicate fastq_read1
ENCLB170KFO ENCSR265ZXX 1 ENCFF115PAE.fastq.gz
ENCLB622FZX ENCSR265ZXX 2 ENCFF610JYD.fastq.gz
ENCLB969KTX ENCSR265ZXX 3 ENCFF124LBK.fastq.gz
sample_id experiment_id replicate fastq_read1 fastq_read2
ENCLB749GLW ENCSR451NAE 1 ENCFF655OFT.fastq.gz ENCFF999SZR.fastq.gz
ENCLB122XDP ENCSR451NAE 2 ENCFF913PMS.fastq.gz ENCFF483MKX.fastq.gz
SampleID,Tissue,Factor,Condition,Replicate,Peaks,bamReads,bamControl,ControlID,PeakCaller
A_1,A,H3K27AC,A,1,A_1.broadPeak,A_1.bam,A_1_input.bam,A_1_input,bed
A_2,A,H3K27AC,A,2,A_2.broadPeak,A_2.bam,A_2_input.bam,A_2_input,bed
B_1,B,H3K27AC,B,1,B_1.broadPeak,B_1.bam,B_1_input.bam,B_1_input,bed
B_2,B,H3K27AC,B,2,B_2.broadPeak,B_2.bam,B_2_input.bam,B_2_input,bed
C_1,C,H3K27AC,C,1,C_1.broadPeak,C_1.bam,C_1_input.bam,C_1_input,bed
C_2,C,H3K27AC,C,2,C_2.broadPeak,C_2.bam,C_2_input.bam,C_2_input,bed
......@@ -29,38 +29,28 @@ This SOP describes the analysis pipeline of downstream analysis of ChIP-seq sequ
## Design file
+ The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows:
The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/bchen4/chipseq_analysis/raw/master/docs/design_example.csv)
SampleID
sample_id
The id of the sample. This will be the header in output files, please make sure it is concise
Tissue
Tissue of the sample
Factor
Factor of the experiment
Condition
This is the group that will be used for pairwise differential expression analysis
Replicate
Replicate id
Peaks
The file name of the peak file for this sample
bamReads
The file name of the IP BAM for this sample
bamControl
The file name of the control BAM for this sample
ContorlID
The id of the control sample
PeakCaller
The peak caller used
experiment_id
Same name given for all replicates of treatment. Will be used for the consensus header.
replicate
Replicate number
fastq_read1
Name of fastq file 1 for SE or PE data
fastq_read2
Name of fastq file 2 for PE data
+ See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end
+ See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end
## Common Errors
If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know and we will add it here.
## Citation
Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi: coming soon. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).
### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics
### References
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/).
* ChipSeeker: http://bioconductor.org/packages/release/bioc/html/ChIPseeker.html
* DiffBind: http://bioconductor.org/packages/release/bioc/html/DiffBind.html
* Deeptools: https://deeptools.github.io/
* MEME-ChIP: http://meme-suite.org/doc/meme-chip.html
### References
1. **python**:
* Anaconda (Anaconda Software Distribution, [https://anaconda.com](https://anaconda.com))
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment