From 0eeb6337e7b5ac2e47d3d877ab4041ce3457c92e Mon Sep 17 00:00:00 2001 From: Holly Ruess <s185797@biohpcwsc012.biohpc.swmed.edu> Date: Wed, 4 Dec 2019 08:39:52 -0600 Subject: [PATCH] added merge request templates --- .../merge_request_templates/merge_request.md | 10 +++ CHANGELOG.md | 16 +++++ LICENSE.md | 16 +++++ README.md | 61 +++++++++++++++---- docs/design_ENCSR265ZXX_SE.txt | 4 ++ docs/design_ENCSR451NAE_PE.txt | 3 + docs/design_example.csv | 7 --- docs/index.md | 50 ++++++--------- docs/references.md | 4 ++ 9 files changed, 123 insertions(+), 48 deletions(-) create mode 100644 .gitlab/merge_request_templates/merge_request.md create mode 100644 CHANGELOG.md create mode 100644 LICENSE.md create mode 100644 docs/design_ENCSR265ZXX_SE.txt create mode 100644 docs/design_ENCSR451NAE_PE.txt delete mode 100644 docs/design_example.csv create mode 100644 docs/references.md diff --git a/.gitlab/merge_request_templates/merge_request.md b/.gitlab/merge_request_templates/merge_request.md new file mode 100644 index 0000000..949d264 --- /dev/null +++ b/.gitlab/merge_request_templates/merge_request.md @@ -0,0 +1,10 @@ +Please fill in the appropriate checklist below (delete whatever is not relevant). +These are the most common things requested on pull requests (PRs). + +## PR checklist + - [ ] This comment contains a description of changes (with reason) + - [ ] If you've fixed a bug or added code that should be tested, add tests! + - [ ] Documentation in `docs` is updated + - [ ] `CHANGELOG.md` is updated + - [ ] `README.md` is updated + - [ ] `LICENSE.md` is updated with new contributors diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..64d1bc8 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,16 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +## [Unreleased] +### Fixed + - Removed biosample, factor, treatment from design file + - Updated documentation + +### Added + - Changelog + - Merge request template + +## [publish_1.0.0 ] - 2019-12-03 +Initial release of pipeline + diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..5d9d53d --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,16 @@ +License (coming soon) + +Copyright (coming soon) + +All rights reserved. + +Contributors: Venkat S. Malladi, Holly Ruess, Spencer D. Barnes + +Department: Bioinformatic Core Facility, Department of Bioinformatics + +Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + diff --git a/README.md b/README.md index acd89b8..7f191ba 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,56 @@ -# BICF ATAC-seq Pipeline +# Astrocyte ATAC-seq analysis Workflow Package -[](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/commits/master) -[](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/commits/master) -[](https://www.nextflow.io/) -[](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html) +This SOP describes the analysis pipeline of downstream analysis of ChIP-seq sequencing data. This pipeline includes (1) Quality control using Deeptools, (2) Peak annotation, (3) Differential peak analysis, and (4) motif analysis. BAM files and SORTED peak BED files selected as input. For each sample this workflow: + 1) Annotate all peaks using ChipSeeker + 2) Qulity control and signal profiling with Deeptools + 3) Find differential expressed peaks using DiffBind + 4) Annotate all differentially expressed peaks + 5) Using MEME-ChIP in motif finding for both original peaks and differently expressed peaks -## Introduction -BICF ATAC-seq is a bioinformatics best-practice analysis pipeline used for ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data analysis at [BICF](http://www.utsouthwestern.edu/labs/bioinformatics/) at [UT Southwestern Dept. of Bioinformatics](http://www.utsouthwestern.edu/departments/bioinformatics/). -The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results. -This pipeline is primarily used with a SLURM cluster on the [BioHPC Cluster](https://biohpc.swmed.edu/). However, the pipeline should be able to run on any system that Nextflow supports. +## Annotations used in the pipeline + + ChipSeeker - Known gene from Bioconductor [TxDb annotation](https://bioconductor.org/packages/release/BiocViews.html#___TxDb) + Deeptools - RefGene downloaded from UCSC Table browser + + + + +## Workflow Parameters + + bam - Choose all ChIP-seq alignment files for analysis. + genome - Choose a genomic reference (genome). + peaks - Choose all the peak files for analysis. All peaks should be sorted by the user + design - Choose the file with the experiment design information. CSV format + toppeak - The number of top peaks used for motif analysis. Default is all + + + +## Design file + + The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows: + + sample_id + The id of the sample. This will be the header in output files, please make sure it is concise + experiment_id + Same name given for all replicates of treatment. Will be used for the consensus header. + replicate + Replicate number + fastq_read1 + Name of fastq file 1 for SE or PE data + fastq_read2 + Name of fastq file 2 for PE data + + + See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end + + See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end + +## Common Errors +If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know and we will add it here. + +## Citation +Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi: coming soon. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596). + +### Credits +This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/). -Additionally, the pipeline is designed to work with [Astrocyte Workflow System](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html) using a simple web interface. diff --git a/docs/design_ENCSR265ZXX_SE.txt b/docs/design_ENCSR265ZXX_SE.txt new file mode 100644 index 0000000..b41063a --- /dev/null +++ b/docs/design_ENCSR265ZXX_SE.txt @@ -0,0 +1,4 @@ +sample_id experiment_id replicate fastq_read1 +ENCLB170KFO ENCSR265ZXX 1 ENCFF115PAE.fastq.gz +ENCLB622FZX ENCSR265ZXX 2 ENCFF610JYD.fastq.gz +ENCLB969KTX ENCSR265ZXX 3 ENCFF124LBK.fastq.gz diff --git a/docs/design_ENCSR451NAE_PE.txt b/docs/design_ENCSR451NAE_PE.txt new file mode 100644 index 0000000..a8cb601 --- /dev/null +++ b/docs/design_ENCSR451NAE_PE.txt @@ -0,0 +1,3 @@ +sample_id experiment_id replicate fastq_read1 fastq_read2 +ENCLB749GLW ENCSR451NAE 1 ENCFF655OFT.fastq.gz ENCFF999SZR.fastq.gz +ENCLB122XDP ENCSR451NAE 2 ENCFF913PMS.fastq.gz ENCFF483MKX.fastq.gz diff --git a/docs/design_example.csv b/docs/design_example.csv deleted file mode 100644 index 6fa48f3..0000000 --- a/docs/design_example.csv +++ /dev/null @@ -1,7 +0,0 @@ -SampleID,Tissue,Factor,Condition,Replicate,Peaks,bamReads,bamControl,ControlID,PeakCaller -A_1,A,H3K27AC,A,1,A_1.broadPeak,A_1.bam,A_1_input.bam,A_1_input,bed -A_2,A,H3K27AC,A,2,A_2.broadPeak,A_2.bam,A_2_input.bam,A_2_input,bed -B_1,B,H3K27AC,B,1,B_1.broadPeak,B_1.bam,B_1_input.bam,B_1_input,bed -B_2,B,H3K27AC,B,2,B_2.broadPeak,B_2.bam,B_2_input.bam,B_2_input,bed -C_1,C,H3K27AC,C,1,C_1.broadPeak,C_1.bam,C_1_input.bam,C_1_input,bed -C_2,C,H3K27AC,C,2,C_2.broadPeak,C_2.bam,C_2_input.bam,C_2_input,bed diff --git a/docs/index.md b/docs/index.md index 22809cd..7f191ba 100644 --- a/docs/index.md +++ b/docs/index.md @@ -29,38 +29,28 @@ This SOP describes the analysis pipeline of downstream analysis of ChIP-seq sequ ## Design file + + The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows: - The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/bchen4/chipseq_analysis/raw/master/docs/design_example.csv) - - SampleID + sample_id The id of the sample. This will be the header in output files, please make sure it is concise - Tissue - Tissue of the sample - Factor - Factor of the experiment - Condition - This is the group that will be used for pairwise differential expression analysis - Replicate - Replicate id - Peaks - The file name of the peak file for this sample - bamReads - The file name of the IP BAM for this sample - bamControl - The file name of the control BAM for this sample - ContorlID - The id of the control sample - PeakCaller - The peak caller used - - + experiment_id + Same name given for all replicates of treatment. Will be used for the consensus header. + replicate + Replicate number + fastq_read1 + Name of fastq file 1 for SE or PE data + fastq_read2 + Name of fastq file 2 for PE data + + + See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end + + See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end + +## Common Errors +If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know and we will add it here. + +## Citation +Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi: coming soon. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596). ### Credits -This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics - -### References +This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/). -* ChipSeeker: http://bioconductor.org/packages/release/bioc/html/ChIPseeker.html -* DiffBind: http://bioconductor.org/packages/release/bioc/html/DiffBind.html -* Deeptools: https://deeptools.github.io/ -* MEME-ChIP: http://meme-suite.org/doc/meme-chip.html diff --git a/docs/references.md b/docs/references.md new file mode 100644 index 0000000..a32e4c2 --- /dev/null +++ b/docs/references.md @@ -0,0 +1,4 @@ +### References + +1. **python**: + * Anaconda (Anaconda Software Distribution, [https://anaconda.com](https://anaconda.com)) -- GitLab