Skip to content
Snippets Groups Projects
Commit 81843f5e authored by Venkat Malladi's avatar Venkat Malladi
Browse files

Merge branch '36-astrocyte-documentation' into 'master'

Resolve "Update Astrocyte documentation and errors"

Closes #36

See merge request !30
parents dd5b3b59 38d03ac0
Branches
Tags
1 merge request!30Resolve "Update Astrocyte documentation and errors"
Pipeline #3600 canceled with stages
in 2 minutes and 22 seconds
......@@ -32,7 +32,7 @@ single_end_mouse:
only:
- master
script:
- nextflow run workflow/main.nf --astrocyte 'true' -resume
- nextflow run workflow/main.nf --astrocyte true -resume
- pytest -m singleend
artifacts:
expire_in: 2 days
......@@ -44,7 +44,7 @@ paired_end_human:
except:
- master
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR729LGA_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte 'false' -resume
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR729LGA_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -resume
- pytest -m pairedend
artifacts:
expire_in: 2 days
......@@ -56,7 +56,7 @@ single_end_diff:
except:
- master
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --astrocyte 'false' -resume
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --astrocyte false -resume
- pytest -m singlediff
artifacts:
expire_in: 2 days
......@@ -66,7 +66,7 @@ paired_end_diff:
- master
stage: multiple
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte 'false' -resume
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_PE.txt" --genome 'GRCh38' --pairedEnd true --astrocyte false -resume
- pytest -m paireddiff
artifacts:
expire_in: 2 days
......@@ -76,7 +76,7 @@ single_end_skip:
only:
- master
script:
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --skipDiff true --skipMotif true --astrocyte 'false' -resume
- nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_diff_SE.txt" --genome 'GRCm38' --skipDiff true --skipMotif true --astrocyte false -resume
- pytest -m singleskip_true
artifacts:
expire_in: 2 days
......@@ -5,6 +5,7 @@
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.24.0-brightgreen.svg
)](https://www.nextflow.io/)
[![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.1.0-blue.svg)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2648845.svg)](https://doi.org/10.5281/zenodo.2648845)
## Introduction
......
......@@ -101,8 +101,8 @@ workflow_parameters:
type: select
required: true
choices:
- [ 'true', 'True']
- [ 'false', 'False']
- [ 'true', 'true']
- [ 'false', 'false']
description: |
In single-end sequencing, the sequencer reads a fragment from only one
end to the other, generating the sequence of base pairs. In paired-end
......@@ -128,6 +128,24 @@ workflow_parameters:
description: |
Reference species and genome used for alignment and subsequent analysis.
- id: skipDiff
type: select
required: true
choices:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Run differential peak analysis
- id: skipMotif
type: select
required: true
choices:
- [ 'true', 'true']
- [ 'false', 'false']
description: |
Run motif calling
- id: astrocyte
type: select
choices:
......
SampleID,Tissue,Factor,Condition,Replicate,Peaks,bamReads,bamControl,ControlID,PeakCaller
A_1,A,H3K27AC,A,1,A_1.broadPeak,A_1.bam,A_1_input.bam,A_1_input,bed
A_2,A,H3K27AC,A,2,A_2.broadPeak,A_2.bam,A_2_input.bam,A_2_input,bed
B_1,B,H3K27AC,B,1,B_1.broadPeak,B_1.bam,B_1_input.bam,B_1_input,bed
B_2,B,H3K27AC,B,2,B_2.broadPeak,B_2.bam,B_2_input.bam,B_2_input,bed
C_1,C,H3K27AC,C,1,C_1.broadPeak,C_1.bam,C_1_input.bam,C_1_input,bed
C_2,C,H3K27AC,C,2,C_2.broadPeak,C_2.bam,C_2_input.bam,C_2_input,bed
sample_id experiment_id biosample factor treatment replicate control_id fastq_read1
A1 A tissueA H3K27AC None 1 B1 A1.fastq.gz
A2 A tissueA H3K27AC None 2 B2 A2.fastq.gz
B1 B tissueB Input None 1 B1 B1.fastq.gz
B2 A tissueB Input None 2 B2 B2.fastq.gz
# Astrocyte ChIPseq analysis Workflow Package
# BICF ChIP-seq Analysis Workflow
## Introduction
**ChIP-seq Analysis** is a bioinformatics best-practice analysis pipeline used for chromatin immunoprecipitation (ChIP-seq) data analysis.
......@@ -7,16 +7,16 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
### Pipeline Steps
1) Trim adaptors TrimGalore!
2) Align with BWA
3) Filter reads with Sambamba S
4) Quality control with DeepTools
5) Calculate Cross-correlation using SPP and PhantomPeakQualTools
6) Signal profiling using MACS2
7) Call consenus peaks
8) Annotate all peaks using ChipSeeker
9) Use MEME-ChIP to find motifs in original peaks
10) Find differential expressed peaks using DiffBind (If more than 1 experiment)
1) Trim adaptors TrimGalore!
2) Align with BWA
3) Filter reads with Sambamba S
4) Quality control with DeepTools
5) Calculate Cross-correlation using SPP and PhantomPeakQualTools
6) Signal profiling using MACS2
7) Call consenus peaks
8) Annotate all peaks using ChipSeeker
9) Use MEME-ChIP to find motifs in original peaks
10) Find differential expressed peaks using DiffBind (If more than 1 experiment)
## Workflow Parameters
......@@ -25,41 +25,35 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow
pairedEnd - Choose True/False if data is paired-end
design - Choose the file with the experiment design information. TSV format
genome - Choose a genomic reference (genome).
skipDiff - Choose True/False if data if you want to run Differential Peaks
skipMotif - Choose True/False if data if you want to run Motif Calling
## Design file
The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/bchen4/chipseq_analysis/raw/master/docs/design_example.csv)
SampleID
The id of the sample. This will be the header in output files, please make sure it is concise
Tissue
Tissue of the sample
Factor
Factor of the experiment
Condition
This is the group that will be used for pairwise differential expression analysis
Replicate
Replicate id
Peaks
The file name of the peak file for this sample
bamReads
The file name of the IP BAM for this sample
bamControl
The file name of the control BAM for this sample
ContorlID
The id of the control sample
PeakCaller
The peak caller used
The following columns are necessary, must be named as in template. An design file template can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/chipseq_analysis/blob/master/docs/design_example.txt)
sample_id
The id of the sample. This will be the name used in output files, please make sure it is concise and informative.
experiment_id
The id of the experiment. Used for grouping replicates.
biosample
The name of the biological sample.
factor
Factor of the experiment.
treatment
Treatment used in experiment.
replicate
Replicate number.
control_id
The sample_id of the control used for this sample.
fastq_read1
File name of fastq file, if paired-end this is read1.
fastq_read2
File name of read2 (for paired-end), not needed for single-end data.
### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics
### References
This worklow is was developed jointly with the [Bioinformatic Core Facility (BICF), Department of Bioinformatics](http://www.utsouthwestern.edu/labs/bioinformatics/)
* ChipSeeker: http://bioconductor.org/packages/release/bioc/html/ChIPseeker.html
* DiffBind: http://bioconductor.org/packages/release/bioc/html/DiffBind.html
* Deeptools: https://deeptools.github.io/
* MEME-ChIP: http://meme-suite.org/doc/meme-chip.html
Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
......@@ -50,3 +50,8 @@
16. **MultiQc**:
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354 ](https://dx.doi.org/10.1093/bioinformatics/btw354)
17. **BICF ChIP-seq Analysis Workflow**:
* Venkat S. Malladi and Beibei Chen. (2019). BICF ChIP-seq Analysis Workflow (publish_1.0.0). Zenodo. doi:[10.5281/zenodo.2648845](https://doi.org/10.5281/zenodo.2648845)
Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
......@@ -5,14 +5,14 @@
// Define Input variables
params.reads = "$baseDir/../test_data/*.fastq.gz"
params.pairedEnd = 'false'
params.pairedEnd = false
params.designFile = "$baseDir/../test_data/design_ENCSR238SGC_SE.txt"
params.genome = 'GRCm38'
params.cutoffRatio = 1.2
params.outDir= "$baseDir/output"
params.extendReadsLen = 100
params.topPeakCount = 600
params.astrocyte = 'false'
params.astrocyte = false
params.skipDiff = false
params.skipMotif = false
params.references = "$baseDir/../docs/references.md"
......@@ -56,6 +56,7 @@ readsList = Channel
.collectFile( name: 'fileList.tsv', newLine: true )
// Define regular variables
pairedEnd = params.pairedEnd
designFile = params.designFile
genomeSize = params.genomeSize
genome = params.genome
......@@ -70,12 +71,6 @@ skipMotif = params.skipMotif
references = params.references
multiqc = params.multiqc
if (params.pairedEnd == 'false'){
pairedEnd = false
} else {
pairedEnd = true
}
// Check design file for errors
process checkDesignFile {
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment