Skip to content
Snippets Groups Projects
Commit 54529fc9 authored by Holly Ruess's avatar Holly Ruess
Browse files

Merge branch '39-astrocyte_cli' into 'master'

Resolve "Tests for astrocyte"

Closes #39

See merge request !27
parents ad75fe9a c34d2612
1 merge request!27Resolve "Tests for astrocyte"
Pipeline #7289 passed with stages
in 9 seconds
...@@ -6,21 +6,31 @@ before_script: ...@@ -6,21 +6,31 @@ before_script:
stages: stages:
- unit - unit
- astrocyte
- integration - integration
user_configuration: user_configuration:
stage: unit stage: unit
script: script:
- pytest -m unit
- pytest -m unit --cov=./workflow/scripts - pytest -m unit --cov=./workflow/scripts
astrocyte:
stage: astrocyte
script:
- module load astrocyte/0.2.0
- module unload nextflow
- cd ..
- astrocyte_cli validate atacseq_analysis
single_end_human: single_end_human:
stage: integration stage: integration
only: only:
- branches - branches
- master - master
script: script:
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf - NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --ci true --dev true
- pytest -m singleend_human - pytest -m singleend_human
...@@ -30,5 +40,5 @@ paired_end_mouse: ...@@ -30,5 +40,5 @@ paired_end_mouse:
- branches - branches
- master - master
script: script:
- NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR451NAE_PE.txt" --genome 'GRCm38' --pairedEnd true --blacklist true - NXF_OPTS="-Dleveldb.mmap=false" nextflow run workflow/main.nf --designFile "$CI_PROJECT_DIR/test_data/design_ENCSR451NAE_PE.txt" --genome 'GRCm38' --pairedEnd true --blacklist true --astrocyte true --ci true --dev true
- pytest -m pairedend_mouse - pytest -m pairedend_mouse
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
All notable changes to this project will be documented in this file. All notable changes to this project will be documented in this file.
## [Unreleased] ## [publish_2.0.0 ] - 2020-06-12
### Fixed ### Fixed
- Removed biosample, factor, treatment from design file - Removed biosample, factor, treatment from design file
- Updated documentation - Updated documentation
...@@ -28,4 +28,3 @@ All notable changes to this project will be documented in this file. ...@@ -28,4 +28,3 @@ All notable changes to this project will be documented in this file.
## [publish_1.0.0 ] - 2019-12-03 ## [publish_1.0.0 ] - 2019-12-03
Initial release of pipeline Initial release of pipeline
...@@ -25,11 +25,11 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git ...@@ -25,11 +25,11 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git
``` ```
## Input files ## Input files
##### 1) Fastq Files ### Fastq Files
+ You will need the full path to the files for the Bash Scipt + You will need the full path to the files for the Bash Scipt
## Design file ### Design file
+ The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows: + The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows:
1. sample_id - The id of the sample. This will be the header in output files, please make sure it is concise 1. sample_id - The id of the sample. This will be the header in output files, please make sure it is concise
...@@ -37,7 +37,7 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git ...@@ -37,7 +37,7 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git
3. replicate - Replicate number 3. replicate - Replicate number
4. fastq_read1 - Name of fastq file 1 for SE or PE data 4. fastq_read1 - Name of fastq file 1 for SE or PE data
5. fastq_read2 - Name of fastq file 2 for PE data 5. fastq_read2 - Name of fastq file 2 for PE data
+ See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end + See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end
+ See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end + See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end
...@@ -112,9 +112,8 @@ If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know ...@@ -112,9 +112,8 @@ If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know
## Citation ## Citation
Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi: coming soon. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596). Please cite individual programs and versions of pipeline used [HERE](docs/references.md), and the overall pipeline doi: 10.5281/zenodo.3526149. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).
### Credits ### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/). This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/).
...@@ -46,12 +46,12 @@ workflow_modules: ...@@ -46,12 +46,12 @@ workflow_modules:
- 'bwa/intel/0.7.12' - 'bwa/intel/0.7.12'
- 'samtools/1.4.1' - 'samtools/1.4.1'
- 'sambamba/0.6.6' - 'sambamba/0.6.6'
- 'bedtools/2.26.0' - 'bedtools/2.25.0'
- 'deeptools/2.5.0.1' - 'deeptools/2.5.0.1'
- 'phantompeakqualtools/1.2' - 'phantompeakqualtools/1.2'
- 'macs/2.1.0-20151222' - 'macs/2.1.0-20151222'
- 'UCSC_userApps/v317' - 'UCSC_userApps/v317'
- 'singularity/2.6.1' - 'R/3.3.2-gccmkl'
- 'pandoc/2.7' - 'pandoc/2.7'
- 'singularity/3.0.2' - 'singularity/3.0.2'
...@@ -95,7 +95,7 @@ workflow_parameters: ...@@ -95,7 +95,7 @@ workflow_parameters:
description: | description: |
One or more input FASTQ files from a ATAC-seq expereiment and a design One or more input FASTQ files from a ATAC-seq expereiment and a design
file with the link bewetwen the same file name and sample id file with the link bewetwen the same file name and sample id
regex: ".*(fastq|fq)*" regex: ".*(fastq|fq)*gz"
- id: pairedEnd - id: pairedEnd
type: select type: select
...@@ -117,7 +117,7 @@ workflow_parameters: ...@@ -117,7 +117,7 @@ workflow_parameters:
- [ 'true', 'True'] - [ 'true', 'True']
- [ 'false', 'False'] - [ 'false', 'False']
description: | description: |
The Blacklisted Regions aim to identify a comprehensive set of regions The Blacklisted Regions aim to identify a comprehensive set of regions
that have anomalous, unstructured, high signal/read counts in next gen that have anomalous, unstructured, high signal/read counts in next gen
sequencing experiments independent of cell line and type of experiment. sequencing experiments independent of cell line and type of experiment.
If you would like these regions excluded from replicated peaks, select If you would like these regions excluded from replicated peaks, select
...@@ -134,12 +134,22 @@ workflow_parameters: ...@@ -134,12 +134,22 @@ workflow_parameters:
type: select type: select
choices: choices:
- [ 'GRCh38', 'Human GRCh38'] - [ 'GRCh38', 'Human GRCh38']
- [ 'GRCh38', 'Mouse GRCh38'] - [ 'GRCm38', 'Mouse GRCm38']
required: true required: true
description: | description: |
Reference species and genome used for alignment and subsequent analysis. Reference species and genome used for alignment and subsequent analysis.
- id: astrocyte
type: select
choices:
- [ 'true', 'true' ]
required: true
default: 'true'
description: |
Ensure configuraton for astrocyte.
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# SHINY APP CONFIGURATION # SHINY APP CONFIGURATION
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
...@@ -148,7 +158,7 @@ workflow_parameters: ...@@ -148,7 +158,7 @@ workflow_parameters:
# The workflow must publish all final output into $baseDir # The workflow must publish all final output into $baseDir
# Name of the R module that the vizapp will run against # Name of the R module that the vizapp will run against
vizapp_r_module: 'R/3.2.1-intel' vizapp_r_module: 'R/3.4.1-gccmkl'
# List of any CRAN packages, not provided by the modules, that must be made # List of any CRAN packages, not provided by the modules, that must be made
# available to the vizapp # available to the vizapp
...@@ -158,8 +168,4 @@ vizapp_cran_packages: ...@@ -158,8 +168,4 @@ vizapp_cran_packages:
# List of any Bioconductor packages, not provided by the modules, # List of any Bioconductor packages, not provided by the modules,
# that must be made available to the vizapp # that must be made available to the vizapp
vizapp_bioc_packages: vizapp_bioc_packages: []
# - qusage
# - ballgown
vizapp_github_packages:
- js229/Vennerable
# Astrocyte ATAC-seq analysis Workflow Package # BICF ATAC-seq Analysis Workflow
[![pipeline Status](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/badges/master/pipeline.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/commits/master)
[![Coverage Report](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/badges/master/coverage.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis/commits/master)
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A50.24.0-brightgreen.svg
)](https://www.nextflow.io/)
[![Astrocyte](https://img.shields.io/badge/astrocyte-%E2%89%A50.1.0-blue.svg)](https://astrocyte-test.biohpc.swmed.edu/static/docs/index.html)
## Introduction ## Introduction
...@@ -12,22 +6,10 @@ BICF ATAC-seq is a bioinformatics best-practice analysis pipeline used for ATAC- ...@@ -12,22 +6,10 @@ BICF ATAC-seq is a bioinformatics best-practice analysis pipeline used for ATAC-
The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results. The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
This pipeline is primarily used with a SLURM cluster on the [BioHPC Cluster](https://portal.biohpc.swmed.edu/content/). However, the pipeline should be able to run on any system that supports Nextflow.
Additionally, the pipeline is designed to work with [Astrocyte Workflow System](https://astrocyte.biohpc.swmed.edu/static/docs/index.html) using a simple web interface.
Current version of the software and issue reports are at
https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis
To download the current (working not tagged) version of the software
```bash
$ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git
```
## Input files ## Input files
##### 1) Fastq Files ##### 1) Fastq Files
+ You will need the full path to the files for the Bash Scipt + One or more input FASTQ files from a ATAC-seq experiment
## Design file ## Design file
+ The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows: + The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End. Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows:
...@@ -37,7 +19,7 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git ...@@ -37,7 +19,7 @@ $ git clone git@git.biohpc.swmed.edu:BICF/Astrocyte/atacseq_analysis.git
3. replicate - Replicate number 3. replicate - Replicate number
4. fastq_read1 - Name of fastq file 1 for SE or PE data 4. fastq_read1 - Name of fastq file 1 for SE or PE data
5. fastq_read2 - Name of fastq file 2 for PE data 5. fastq_read2 - Name of fastq file 2 for PE data
+ See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end + See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end
+ See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end + See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end
...@@ -112,9 +94,8 @@ If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know ...@@ -112,9 +94,8 @@ If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know
## Citation ## Citation
Please cite individual programs and versions used [HERE](docs/references.md), and the pipeline doi: coming soon. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596). Please cite individual programs and versions of pipeline used [HERE](docs/references.md), and the overall pipeline doi: 10.5281/zenodo.3526149. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).
### Credits ### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/). This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/).
...@@ -40,7 +40,11 @@ ...@@ -40,7 +40,11 @@
13. **MultiQc**: 13. **MultiQc**:
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354](https://dx.doi.org/10.1093/bioinformatics/btw354) * Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354](https://dx.doi.org/10.1093/bioinformatics/btw354)
14. **Nextflow**:
14. **BICF ChIP-seq Analysis Workflow**:
* Holly Ruess, Spencer D. Barnes and Venkat S. Malladi. 2020. BICF ATAC-seq Analysis Workflow (publish_2.0.0). Zenodo. doi:[10.5281/zenodo.3891417](https://doi.org/10.5281/zenodo.3891417)
15. **Nextflow**:
* Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., and Notredame, C. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology, 35(4), 316. * Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., and Notredame, C. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology, 35(4), 316.
Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**. Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
...@@ -50,7 +50,7 @@ process { ...@@ -50,7 +50,7 @@ process {
executor = 'local' executor = 'local'
} }
withName: experimentQC { withName: experimentQC {
module = ['python/3.6.1-2-anaconda', 'deeptools/2.5.0.1', 'samtools/1.4.1', 'bedtools/2.25.0', 'singularity/2.6.1'] module = ['python/3.6.1-2-anaconda', 'deeptools/2.5.0.1', 'samtools/1.4.1', 'bedtools/2.25.0', 'singularity/3.0.2']
queue = '128GB,256GB,256GBv1' queue = '128GB,256GB,256GBv1'
} }
withName: multiqcReport { withName: multiqcReport {
...@@ -81,19 +81,3 @@ params { ...@@ -81,19 +81,3 @@ params {
} }
} }
} }
trace {
enabled = true
file = 'pipeline_trace.txt'
fields = 'task_id,native_id,process,name,status,exit,submit,start,complete,duration,realtime,%cpu,%mem,rss'
}
timeline {
enabled = true
file = 'timeline.html'
}
report {
enabled = true
file = 'report.html'
}
...@@ -20,6 +20,8 @@ params.astrocyte = false ...@@ -20,6 +20,8 @@ params.astrocyte = false
params.outDir= "${baseDir}/output" params.outDir= "${baseDir}/output"
params.references = "${baseDir}/../docs/references.md" params.references = "${baseDir}/../docs/references.md"
params.multiqc = "${baseDir}/conf/multiqc_config.yaml" params.multiqc = "${baseDir}/conf/multiqc_config.yaml"
params.ci = false
params.dev = false
// Check inputs // Check inputs
if(params.bwaIndex) { if(params.bwaIndex) {
...@@ -50,6 +52,33 @@ gtfFile = params.gtfFile ...@@ -50,6 +52,33 @@ gtfFile = params.gtfFile
blacklistFile = params.blacklistFile blacklistFile = params.blacklistFile
geneNames = params.geneNames geneNames = params.geneNames
/*
* trackStart: track start of pipeline
*/
process trackStart {
script:
"""
hostname
ulimit -a
export https_proxy=\${http_proxy}
curl -H 'Content-Type: application/json' -X PUT -d '{ \
"sessionId": "${workflow.sessionId}", \
"pipeline": "atacseq_analysis", \
"start": "${workflow.start}", \
"astrocyte": ${params.astrocyte}, \
"status": "started", \
"nextflowVersion": "${workflow.nextflow.version}", \
"pipelineVersion": "2.0.0", \
"ci": ${params.ci}, \
"dev": ${params.dev}}' \
"https://xku43pcwnf.execute-api.us-east-1.amazonaws.com/ProdDeploy/pipeline-tracking"
"""
}
process checkDesignFile { process checkDesignFile {
publishDir "${outDir}/design", mode: 'copy' publishDir "${outDir}/design", mode: 'copy'
...@@ -97,7 +126,7 @@ if (pairedEnd) { ...@@ -97,7 +126,7 @@ if (pairedEnd) {
rawReads = designFilePaths rawReads = designFilePaths
.splitCsv(sep: '\t', header: true) .splitCsv(sep: '\t', header: true)
.map { row -> [ row.sample_id, [row.fastq_read1, row.fastq_read2], row.experiment_id, row.replicate, row.fq_length ] } .map { row -> [ row.sample_id, [row.fastq_read1, row.fastq_read2], row.experiment_id, row.replicate, row.fq_length ] }
} }
else { else {
rawReads = designFilePaths rawReads = designFilePaths
.splitCsv(sep: '\t', header: true) .splitCsv(sep: '\t', header: true)
...@@ -518,14 +547,14 @@ process consensusPeaks { ...@@ -518,14 +547,14 @@ process consensusPeaks {
if (blacklist) { if (blacklist) {
""" """
module load python/3.6.1-2-anaconda module load python/3.6.1-2-anaconda
module load bedtools/2.26.0 module load bedtools/2.25.0
python3 ${baseDir}/scripts/overlap_peaks.py -d ${peaksDesign} -f ${preDiffDesign} -b ${blacklistFile} python3 ${baseDir}/scripts/overlap_peaks.py -d ${peaksDesign} -f ${preDiffDesign} -b ${blacklistFile}
""" """
} }
else { else {
""" """
module load python/3.6.1-2-anaconda module load python/3.6.1-2-anaconda
module load bedtools/2.26.0 module load bedtools/2.25.0
python3 ${baseDir}/scripts/overlap_peaks.py -d ${peaksDesign} -f ${preDiffDesign} python3 ${baseDir}/scripts/overlap_peaks.py -d ${peaksDesign} -f ${preDiffDesign}
""" """
} }
...@@ -602,7 +631,7 @@ process experimentQC { ...@@ -602,7 +631,7 @@ process experimentQC {
module load bedtools/2.26.0 module load bedtools/2.26.0
python3 ${baseDir}/scripts/experiment_qc.py -d ${designExperimentQC} -p python3 ${baseDir}/scripts/experiment_qc.py -d ${designExperimentQC} -p
bash ${baseDir}/scripts/make_tss.sh ${gtfFile} bash ${baseDir}/scripts/make_tss.sh ${gtfFile}
module load singularity/2.6.1 module load singularity/3.0.2
singularity run /project/shared/bicf_workflow_ref/singularity_images/metaseq.simg ${baseDir}/scripts/atac_qc.py -d ${designExperimentQC} -l ${fqlengthDesign} -t gencode.tss -c ${chromSizes} singularity run /project/shared/bicf_workflow_ref/singularity_images/metaseq.simg ${baseDir}/scripts/atac_qc.py -d ${designExperimentQC} -l ${fqlengthDesign} -t gencode.tss -c ${chromSizes}
""" """
} }
...@@ -614,7 +643,7 @@ process experimentQC { ...@@ -614,7 +643,7 @@ process experimentQC {
module load bedtools/2.26.0 module load bedtools/2.26.0
python3 ${baseDir}/scripts/experiment_qc.py -d ${designExperimentQC} python3 ${baseDir}/scripts/experiment_qc.py -d ${designExperimentQC}
bash ${baseDir}/scripts/make_tss.sh ${gtfFile} bash ${baseDir}/scripts/make_tss.sh ${gtfFile}
module load singularity/2.6.1 module load singularity/3.0.2
singularity run /project/shared/bicf_workflow_ref/singularity_images/metaseq.simg ${baseDir}/scripts/atac_qc.py -d ${designExperimentQC} -l ${fqlengthDesign} -t gencode.tss -c ${chromSizes} singularity run /project/shared/bicf_workflow_ref/singularity_images/metaseq.simg ${baseDir}/scripts/atac_qc.py -d ${designExperimentQC} -l ${fqlengthDesign} -t gencode.tss -c ${chromSizes}
""" """
} }
...@@ -693,4 +722,3 @@ process multiqcReport { ...@@ -693,4 +722,3 @@ process multiqcReport {
} }
} }
...@@ -3,3 +3,29 @@ profiles { ...@@ -3,3 +3,29 @@ profiles {
includeConfig 'conf/biohpc.config' includeConfig 'conf/biohpc.config'
} }
} }
trace {
enabled = true
file = 'pipeline_trace.txt'
fields = 'task_id,native_id,process,name,status,exit,submit,start,complete,duration,realtime,%cpu,%mem,rss'
}
timeline {
enabled = true
file = 'timeline.html'
}
report {
enabled = true
file = 'report.html'
}
manifest {
name = 'atacseq_analysis'
description = 'BICF ATAC-seq Analysis Workflow.'
homePage = 'https://git.biohpc.swmed.edu/BICF/Astrocyte/atacseq_analysis'
version = '2.0.0'
mainScript = 'main.nf'
nextflowVersion = '>=0.31.0'
}
...@@ -40,8 +40,7 @@ def get_args(): ...@@ -40,8 +40,7 @@ def get_args():
parser.add_argument('-b', '--blacklist', parser.add_argument('-b', '--blacklist',
help="Bed file of blacklisted regions to remove", help="Bed file of blacklisted regions to remove",
required=False, required=False)
default="None")
args = parser.parse_args() args = parser.parse_args()
return args return args
...@@ -240,18 +239,19 @@ def main(): ...@@ -240,18 +239,19 @@ def main():
design_anno = pd.DataFrame(columns=anno_cols) design_anno = pd.DataFrame(columns=anno_cols)
# Find consenus overlap peaks for each experiment # Find consenus overlap peaks for each experiment
for experiment, df_experiment in design_peaks_df.groupby('experiment_id'): for experiment, df_experiment in design_peaks_df.groupby('experiment_id'):
replicated_peak, chr_peak = overlap(experiment, df_experiment) replicated_peak, chr_peak = overlap(experiment, df_experiment)
design_diff.loc[design_diff.experiment_id == experiment, "peak"] = replicated_peak design_diff.loc[design_diff.experiment_id == experiment, "peak"] = replicated_peak
design_anno.loc[experiment] = [experiment, chr_peak] design_anno.loc[experiment] = [experiment, chr_peak]
# Remove blacklist regions; if blacklist = True # Remove blacklist regions; if blacklist = True
if os.path.exists(blacklist): if blacklist and os.path.exists(blacklist):
for experiment, df_experiment in design_peaks_df.groupby('experiment_id'):
bl_peaks, bl_chr_peak = blacklist_peaks(experiment, blacklist) bl_peaks, bl_chr_peak = blacklist_peaks(experiment, blacklist)
design_diff.loc[design_diff.experiment_id == experiment, "peak"] = bl_peaks design_diff.loc[design_diff.experiment_id == experiment, "peak"] = bl_peaks
design_anno.loc[design_anno.Condition == experiment, "Peaks"] = bl_chr_peak design_anno.loc[design_anno.Condition == experiment, "Peaks"] = bl_chr_peak
# Write out file # Write out file
design_diff.columns = ['SampleID', design_diff.columns = ['SampleID',
'bamReads', 'bamReads',
......
...@@ -17,7 +17,7 @@ def test_software_references_singleend_human(): ...@@ -17,7 +17,7 @@ def test_software_references_singleend_human():
software_references = os.path.join(test_output_path, 'software_references_mqc.yaml') software_references = os.path.join(test_output_path, 'software_references_mqc.yaml')
with open(software_references, 'r') as stream: with open(software_references, 'r') as stream:
data_loaded = yaml.load(stream) data_loaded = yaml.load(stream)
assert len(data_loaded['data'].split('<ul>')) == 15 assert len(data_loaded['data'].split('<ul>')) == 16
multiqc_report = os.path.join(test_output_path, 'multiqc_report.html') multiqc_report = os.path.join(test_output_path, 'multiqc_report.html')
html_file = open(multiqc_report, 'r') html_file = open(multiqc_report, 'r')
...@@ -25,7 +25,7 @@ def test_software_references_singleend_human(): ...@@ -25,7 +25,7 @@ def test_software_references_singleend_human():
multiqc_html = BeautifulSoup(source_code, 'html.parser') multiqc_html = BeautifulSoup(source_code, 'html.parser')
references = multiqc_html.find(id="mqc-module-section-Software_References") references = multiqc_html.find(id="mqc-module-section-Software_References")
assert references is not None assert references is not None
assert len(references.find_all('ul')) == 14 assert len(references.find_all('ul')) == 15
...@@ -35,7 +35,7 @@ def test_software_references_pairedend_mouse(): ...@@ -35,7 +35,7 @@ def test_software_references_pairedend_mouse():
software_references = os.path.join(test_output_path, 'software_references_mqc.yaml') software_references = os.path.join(test_output_path, 'software_references_mqc.yaml')
with open(software_references, 'r') as stream: with open(software_references, 'r') as stream:
data_loaded = yaml.load(stream) data_loaded = yaml.load(stream)
assert len(data_loaded['data'].split('<ul>')) == 15 assert len(data_loaded['data'].split('<ul>')) == 16
multiqc_report = os.path.join(test_output_path, 'multiqc_report.html') multiqc_report = os.path.join(test_output_path, 'multiqc_report.html')
html_file = open(multiqc_report, 'r') html_file = open(multiqc_report, 'r')
...@@ -43,5 +43,4 @@ def test_software_references_pairedend_mouse(): ...@@ -43,5 +43,4 @@ def test_software_references_pairedend_mouse():
multiqc_html = BeautifulSoup(source_code, 'html.parser') multiqc_html = BeautifulSoup(source_code, 'html.parser')
references = multiqc_html.find(id="mqc-module-section-Software_References") references = multiqc_html.find(id="mqc-module-section-Software_References")
assert references is not None assert references is not None
assert len(references.find_all('ul')) == 14 assert len(references.find_all('ul')) == 15
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment