Skip to content
Snippets Groups Projects
Commit 81843f5e authored by Venkat Malladi's avatar Venkat Malladi
Browse files

Merge branch '36-astrocyte-documentation' into 'master'

Resolve "Update Astrocyte documentation and errors"

Closes #36

See merge request !30
parents dd5b3b59 38d03ac0
Branches
Tags
1 merge request!30Resolve "Update Astrocyte documentation and errors"
Pipeline #3600 canceled with stages
in 2 minutes and 22 seconds
...@@ -32,7 +32,7 @@ single_end_mouse: ...@@ -32,7 +32,7 @@ single_end_mouse:
only: only:
- master - master
script: script:
- nextflow run workflow/main.nf --astrocyte 'true' -resume - nextflow run workflow/main.nf --astrocyte true -resume
- pytest -m singleend - pytest -m singleend
artifacts: artifacts:
expire_in: 2 days expire_in: 2 days
...@@ -44,7 +44,7 @@ paired_end_human: ...@@ -44,7 +44,7 @@ paired_end_human:
except: except:
- master - master
script: script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR729LGA_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte 'false' -resume - nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR729LGA_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -resume
- pytest -m pairedend - pytest -m pairedend
artifacts: artifacts:
expire_in: 2 days expire_in: 2 days
...@@ -56,7 +56,7 @@ single_end_diff: ...@@ -56,7 +56,7 @@ single_end_diff:
except: except:
- master - master
script: script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --astrocyte 'false' -resume - nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --astrocyte false -resume
- pytest -m singlediff - pytest -m singlediff
artifacts: artifacts:
expire_in: 2 days expire_in: 2 days
...@@ -66,7 +66,7 @@ paired_end_diff: ...@@ -66,7 +66,7 @@ paired_end_diff:
- master - master
stage: multiple stage: multiple
script: script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte 'false' -resume - nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -resume
- pytest -m paireddiff - pytest -m paireddiff
artifacts: artifacts:
expire_in: 2 days expire_in: 2 days
...@@ -76,7 +76,7 @@ single_end_skip: ...@@ -76,7 +76,7 @@ single_end_skip:
only: only:
- master - master
script: script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --skipDiff true --skipMotif true --astrocyte 'false' -resume - nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --skipDiff true --skipMotif true --astrocyte false -resume
- pytest -m singleskip_true - pytest -m singleskip_true
artifacts: artifacts:
expire_in: 2 days expire_in: 2 days
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.24.0-brightgreen.svg [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.24.0-brightgreen.svg
)](https://www.nextflow.io/) )](https://www.nextflow.io/)
[![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.1.0-blue.svg)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html) [![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.1.0-blue.svg)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2648845.svg)](https://doi.org/10.5281/zenodo.2648845)
## Introduction ## Introduction
......
...@@ -101,8 +101,8 @@ workflow_parameters: ...@@ -101,8 +101,8 @@ workflow_parameters:
type: select type: select
required: true required: true
choices: choices:
- [ 'true', 'True'] - [ 'true', 'true']
- [ 'false', 'False'] - [ 'false', 'false']
description: | description: |
In single-end sequencing, the sequencer reads a fragment from only one In single-end sequencing, the sequencer reads a fragment from only one
end to the other, generating the sequence of base pairs. In paired-end end to the other, generating the sequence of base pairs. In paired-end
...@@ -128,6 +128,24 @@ workflow_parameters: ...@@ -128,6 +128,24 @@ workflow_parameters:
description: | description: |
Reference species and genome used for alignment and subsequent analysis. Reference species and genome used for alignment and subsequent analysis.
- id: skipDiff
type: select
required: true
choices:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Run differential peak analysis
- id: skipMotif
type: select
required: true
choices:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Run motif calling
- id: astrocyte - id: astrocyte
type: select type: select
choices: choices:
......
SampleID,Tissue,Factor,Condition,Replicate,Peaks,bamReads,bamControl,ControlID,PeakCaller
A_1,A,H3K27AC,A,1,A_1.broadPeak,A_1.bam,A_1_input.bam,A_1_input,bed
A_2,A,H3K27AC,A,2,A_2.broadPeak,A_2.bam,A_2_input.bam,A_2_input,bed
B_1,B,H3K27AC,B,1,B_1.broadPeak,B_1.bam,B_1_input.bam,B_1_input,bed
B_2,B,H3K27AC,B,2,B_2.broadPeak,B_2.bam,B_2_input.bam,B_2_input,bed
C_1,C,H3K27AC,C,1,C_1.broadPeak,C_1.bam,C_1_input.bam,C_1_input,bed
C_2,C,H3K27AC,C,2,C_2.broadPeak,C_2.bam,C_2_input.bam,C_2_input,bed
sample_id experiment_id biosample factor treatment replicate control_id fastq_read1
A1 A tissueA H3K27AC None 1 B1 A1.fastq.gz
A2 A tissueA H3K27AC None 2 B2 A2.fastq.gz
B1 B tissueB Input None 1 B1 B1.fastq.gz
B2 A tissueB Input None 2 B2 B2.fastq.gz
# Astrocyte ChIPseq analysis Workflow Package # BICF ChIP-seq Analysis Workflow
## Introduction ## Introduction
**ChIP-seq Analysis** is a bioinformatics best-practice analysis pipeline used for chromatin immunoprecipitation (ChIP-seq) data analysis. **ChIP-seq Analysis** is a bioinformatics best-practice analysis pipeline used for chromatin immunoprecipitation (ChIP-seq) data analysis.
...@@ -7,16 +7,16 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow ...@@ -7,16 +7,16 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
### Pipeline Steps ### Pipeline Steps
1) Trim adaptors TrimGalore! 1) Trim adaptors TrimGalore!
2) Align with BWA 2) Align with BWA
3) Filter reads with Sambamba S 3) Filter reads with Sambamba S
4) Quality control with DeepTools 4) Quality control with DeepTools
5) Calculate Cross-correlation using SPP and PhantomPeakQualTools 5) Calculate Cross-correlation using SPP and PhantomPeakQualTools
6) Signal profiling using MACS2 6) Signal profiling using MACS2
7) Call consenus peaks 7) Call consenus peaks
8) Annotate all peaks using ChipSeeker 8) Annotate all peaks using ChipSeeker
9) Use MEME-ChIP to find motifs in original peaks 9) Use MEME-ChIP to find motifs in original peaks
10) Find differential expressed peaks using DiffBind (If more than 1 experiment) 10) Find differential expressed peaks using DiffBind (If more than 1 experiment)
## Workflow Parameters ## Workflow Parameters
...@@ -25,41 +25,35 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow ...@@ -25,41 +25,35 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
pairedEnd - Choose True/False if data is paired-end pairedEnd - Choose True/False if data is paired-end
design - Choose the file with the experiment design information. TSV format design - Choose the file with the experiment design information. TSV format
genome - Choose a genomic reference (genome). genome - Choose a genomic reference (genome).
skipDiff - Choose True/False if data if you want to run Differential Peaks
skipMotif - Choose True/False if data if you want to run Motif Calling
## Design file ## Design file
The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/bchen4/chipseq_analysis/raw/master/docs/design_example.csv) The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/design_example.txt)
SampleID sample_id
The id of the sample. This will be the header in output files, please make sure it is concise The id of the sample. This will be the name used in output files, please make sure it is concise and informative.
Tissue experiment_id
Tissue of the sample The id of the experiment. Used for grouping replicates.
Factor biosample
Factor of the experiment The name of the biological sample.
Condition factor
This is the group that will be used for pairwise differential expression analysis Factor of the experiment.
Replicate treatment
Replicate id Treatment used in experiment.
Peaks replicate
The file name of the peak file for this sample Replicate number.
bamReads control_id
The file name of the IP BAM for this sample The sample_id of the control used for this sample.
bamControl fastq_read1
The file name of the control BAM for this sample File name of fastq file, if paired-end this is read1.
ContorlID fastq_read2
The id of the control sample File name of read2 (for paired-end), not needed for single-end data.
PeakCaller
The peak caller used
### Credits ### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics This worklow is was developed jointly with the [Bioinformatic Core Facility (BICF), Department of Bioinformatics](http://www.utsouthwestern.edu/labs/bioinformatics/)
### References
* ChipSeeker: http://bioconductor.org/packages/release/bioc/html/ChIPseeker.html Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
* DiffBind: http://bioconductor.org/packages/release/bioc/html/DiffBind.html
* Deeptools: https://deeptools.github.io/
* MEME-ChIP: http://meme-suite.org/doc/meme-chip.html
...@@ -50,3 +50,8 @@ ...@@ -50,3 +50,8 @@
16. **MultiQc**: 16. **MultiQc**:
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354 ](https://dx.doi.org/10.1093/bioinformatics/btw354) * Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354 ](https://dx.doi.org/10.1093/bioinformatics/btw354)
17. **BICF ChIP-seq Analysis Workflow**:
* Venkat S. Malladi and Beibei Chen. (2019). BICF ChIP-seq Analysis Workflow (publish_1.0.0). Zenodo. doi:[10.5281/zenodo.2648845](https://doi.org/10.5281/zenodo.2648845)
Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
...@@ -5,14 +5,14 @@ ...@@ -5,14 +5,14 @@
// Define Input variables // Define Input variables
params.reads = "$baseDir/../test_data/*.fastq.gz" params.reads = "$baseDir/../test_data/*.fastq.gz"
params.pairedEnd = 'false' params.pairedEnd = false
params.designFile = "$baseDir/../test_data/design_ENCSR238SGC_SE.txt" params.designFile = "$baseDir/../test_data/design_ENCSR238SGC_SE.txt"
params.genome = 'GRCm38' params.genome = 'GRCm38'
params.cutoffRatio = 1.2 params.cutoffRatio = 1.2
params.outDir= "$baseDir/output" params.outDir= "$baseDir/output"
params.extendReadsLen = 100 params.extendReadsLen = 100
params.topPeakCount = 600 params.topPeakCount = 600
params.astrocyte = 'false' params.astrocyte = false
params.skipDiff = false params.skipDiff = false
params.skipMotif = false params.skipMotif = false
params.references = "$baseDir/../docs/references.md" params.references = "$baseDir/../docs/references.md"
...@@ -56,6 +56,7 @@ readsList = Channel ...@@ -56,6 +56,7 @@ readsList = Channel
.collectFile( name: 'fileList.tsv', newLine: true ) .collectFile( name: 'fileList.tsv', newLine: true )
// Define regular variables // Define regular variables
pairedEnd = params.pairedEnd
designFile = params.designFile designFile = params.designFile
genomeSize = params.genomeSize genomeSize = params.genomeSize
genome = params.genome genome = params.genome
...@@ -70,12 +71,6 @@ skipMotif = params.skipMotif ...@@ -70,12 +71,6 @@ skipMotif = params.skipMotif
references = params.references references = params.references
multiqc = params.multiqc multiqc = params.multiqc
if (params.pairedEnd == 'false'){
pairedEnd = false
} else {
pairedEnd = true
}
// Check design file for errors // Check design file for errors
process checkDesignFile { process checkDesignFile {
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment