Skip to content
Snippets Groups Projects
Commit 8391da95 authored by Jeremy Mathews's avatar Jeremy Mathews
Browse files

Add MultiQC to output. Add version and reference material. fix README

parent 87ca7c11
Branches
Tags
4 merge requests!53Develop,!47Resolve "Add MultiQC",!46Resolve "Add MultiQC",!44Add MultiQC to output. Add version and reference material. fix README
Pipeline #4173 passed with stages
in 42 minutes and 36 seconds
......@@ -27,7 +27,7 @@ To Run:
* path to the fastq location
* R1 and R2 only necessary but can include I2
* only fastq's in designFile (see below) are used, not present will be ignored
* eg: **--fastq '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/v3s2r100k/\*.fastq.gz'**
* eg: **--fastq '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r100k/\*.fastq.gz'**
* **--designFile**
* path to design file (csv format) location
* column 1 = "Sample"
......@@ -35,7 +35,7 @@ To Run:
* column 3 = "fastq_R2"
* can have repeated "Sample" if there are multiple fastq R1/R2 pairs for the samples
* can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/blob/master/docs/design.csv)
* eg: **--designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/v3s2r100k/design.csv'**
* eg: **--designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r100k/design.csv'**
* **--genome**
* reference genome
* requires workflow/conf/biohpc.config to work
......@@ -44,8 +44,8 @@ To Run:
* *'GRCh38-1.2.0'* = Human GRCh38 release 84
* *'hg19-3.0.0'* = Human GRCh37 (hg19) release 87
* *'hg19-1.2.0'* = Human GRCh37 (hg19) release 84
* *'mm10-3.0.0'* = Human GRCm38 (mm10) release 93
* *'mm10-3.0.0'* = Human GRCm38 (mm10) release 84
* *'mm10-3.0.0'* = Mouse GRCm38 (mm10) release 93
* *'mm10-3.0.0'* = Mouse GRCm38 (mm10) release 84
* *'hg19_and_mm10-3.0.0'* = Human GRCh37 (hg19) + Mouse GRCm38 (mm19) release 93
* *'hg19_and_mm10-1.2.0'* = Human GRCh37 (hg19) + Mouse GRCm38 (mm19) release 84
* *'ercc92-1.2.0'* = ERCC.92 Spike-In
......@@ -92,7 +92,7 @@ To Run:
* eg: **--outDir 'test'**
* FULL EXAMPLE:
**nextflow main.nf --fastq '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/v3s2r100k/\*.fastq.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/v3s2r100k/design.csv' --genome 'GRCh38-3.0.0' --kitVersion 'three' --version '3.0.2' --outDir 'test'**
**nextflow run workflow/main.nf --fastq '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r100k/\*.fastq.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r100k/design.csv' --genome 'GRCh38-3.0.0' --kitVersion 'three' --version '3.0.2' --outDir 'test'**
* Design example:
......
### References
1. **python**:
* Anaconda (Anaconda Software Distribution, [https://anaconda.com](https://anaconda.com))
2. **cellranger**
* Cellranger mkfastq [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/mkfastq](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/mkfastq)
3. **MultiQc**:
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354](https://dx.doi.org/10.1093/bioinformatics/btw354)
workflow/conf/bicf_logo.png

24.3 KiB

# Custom Logo
custom_logo: 'bicf_logo.png'
custom_logo_url: 'https://www.utsouthwestern.edu/labs/bioinformatics/'
custom_logo_title: 'Bioinformatics Core Facility'
report_header_info:
- Contact E-mail: 'bicf@utsouthwestern.edu'
- Application Type: 'CellRanger_Count'
- Department: 'Bioinformatic Core Facility, Department of Bioinformatics'
# Title to use for the report.
title: BICF CellRanger Count Analysis Report
report_comment: >
This report has been generated by the <a href="https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count"
target="_blank">BICF/cellranger_count</a> pipeline.
custom_data:
metrics_summary:
file_format: 'tsv'
id: 'metrics_summary'
contents: 'Estimated Number of Cells Mean Reads per Cell Median Genes per Cell Number of Reads Valid Barcodes Sequencing Saturation Q30 Bases in Barcode Q30 Bases in RNA Read Q30 Bases in UMI Reads Mapped to Genome Reads Mapped Confidently to Genome Reads Mapped Confidently to Intergenic Regions Reads Mapped Confidently to Intronic Regions Reads Mapped Confidently to Exonic Regions Reads Mapped Confidently to Transcriptome Reads Mapped Antisense to Gene Fraction Reads in Cells Total Genes Detected Median UMI Counts per Cell'
section_name: 'Metrics Summary'
plot_type: 'generalstats'
sp:
metrics_summary:
fn: 'metrics_summary_mqc.tsv'
table_columns_placement:
metrics_summary:
Estimated Number of Cells: 1
Mean Reads per Cell: 2
Median Genes per Cell: 3
Number of Reads: 4
Sequencing Saturation: 5
Reads Mapped Confidently to Genome: 6
Reads Mapped Confidently to Transcriptome: 7
Fraction Reads in Cells: 8
Total Genes Detected: 9
Median UMI Counts per Cell: 10
Valid Barcodes: 1100
Reads Mapped Antisense to Gene: 1200
table_columns_visible:
metrics_summary:
Q30 Bases in Barcode: False
Q30 Bases in RNA Read: False
Q30 Bases in UMI: False
Reads Mapped to Genome: False
Reads Mapped Confidently to Intergenic Regions: False
Reads Mapped Confidently to Intronic Regions: False
Reads Mapped Confidently to Exonic Regions: False
thousandsSep_format: ''
......@@ -14,6 +14,8 @@ params.kitVersion = 'three'
params.version = '3.0.2'
params.astrocyte = false
params.outDir = "$baseDir/output"
params.multiqc = "$baseDir/conf/multiqc_config.yaml"
params.references = "$baseDir/../docs/references.md"
// Assign variables if astrocyte
if (params.astrocyte) {
......@@ -54,6 +56,8 @@ forceCells = params.forceCells
chemistryParam = params.chemistryParam
version = params.version
outDir = params.outDir
multiqc = params.multiqc
references = params.references
process checkDesignFile {
......@@ -121,6 +125,7 @@ process count211 {
output:
file("**/outs/**") into outPaths211
file("*_metrics_summary.tsv") into metricsSummary211
when:
version == '2.1.1'
......@@ -132,6 +137,7 @@ process count211 {
ulimit -a
bash "$baseDir/scripts/filename_check.sh" -r "$ref"
cellranger count --id="$sample" --transcriptome="./$ref" --fastqs=. --sample="$sample" --expect-cells=$expectCells211
sed -E 's/("([^"]*)")?,/\\2\t/g' ${sample}/outs/metrics_summary.csv | tr -d "," | sed "s/^/${sample}\t/" > ${sample}_metrics_summary.tsv
"""
} else {
"""
......@@ -139,6 +145,7 @@ process count211 {
ulimit -a
bash "$baseDir/scripts/filename_check.sh" -r "$ref"
cellranger count --id="$sample" --transcriptome="./$ref" --fastqs=. --sample="$sample" --force-cells=$forceCells211
sed -E 's/("([^"]*)")?,/\\2\t/g' ${sample}/outs/metrics_summary.csv | tr -d "," | sed "s/^/${sample}\t/" > ${sample}_metrics_summary.tsv
"""
}
}
......@@ -160,6 +167,7 @@ process count301 {
output:
file("**/outs/**") into outPaths301
file("*_metrics_summary.tsv") into metricsSummary301
when:
version == '3.0.1'
......@@ -171,6 +179,7 @@ process count301 {
ulimit -a
bash "$baseDir/scripts/filename_check.sh" -r "$ref"
cellranger count --id="$sample" --transcriptome="./$ref" --fastqs=. --sample="$sample" --expect-cells=$expectCells301 --chemistry="$chemistryParam301"
sed -E 's/("([^"]*)")?,/\\2\t/g' ${sample}/outs/metrics_summary.csv | tr -d "," | sed "s/^/${sample}\t/" > ${sample}_metrics_summary.tsv
"""
} else {
"""
......@@ -178,6 +187,7 @@ process count301 {
ulimit -a
bash "$baseDir/scripts/filename_check.sh" -r "$ref"
cellranger count --id="$sample" --transcriptome="./$ref" --fastqs=. --sample="$sample" --force-cells=$forceCells301 --chemistry="$chemistryParam301"
sed -E 's/("([^"]*)")?,/\\2\t/g' ${sample}/outs/metrics_summary.csv | tr -d "," | sed "s/^/${sample}\t/" > ${sample}_metrics_summary.tsv
"""
}
}
......@@ -199,6 +209,7 @@ process count302 {
output:
file("**/outs/**") into outPaths302
file("*_metrics_summary.tsv") into metricsSummary302
when:
version == '3.0.2'
......@@ -210,6 +221,7 @@ process count302 {
ulimit -a
bash "$baseDir/scripts/filename_check.sh" -r "$ref"
cellranger count --id="$sample" --transcriptome="./$ref" --fastqs=. --sample="$sample" --expect-cells=$expectCells302 --chemistry="$chemistryParam302"
sed -E 's/("([^"]*)")?,/\\2\t/g' ${sample}/outs/metrics_summary.csv | tr -d "," | sed "s/^/${sample}\t/" > ${sample}_metrics_summary.tsv
"""
} else {
"""
......@@ -217,6 +229,57 @@ process count302 {
ulimit -a
bash "$baseDir/scripts/filename_check.sh" -r "$ref"
cellranger count --id="$sample" --transcriptome="./$ref" --fastqs=. --sample="$sample" --force-cells=$forceCells302 --chemistry="$chemistryParam302"
sed -E 's/("([^"]*)")?,/\\2\t/g' ${sample}/outs/metrics_summary.csv | tr -d "," | sed "s/^/${sample}\t/" > ${sample}_metrics_summary.tsv
"""
}
}
process versions {
tag "$name"
publishDir "$outDir/${task.process}", mode: 'copy'
module 'python/3.6.1-2-anaconda:pandoc/2.7:multiqc/1.7'
input:
output:
file("*.yaml") into yamlPaths
script:
"""
hostname
ulimit -a
echo $workflow.nextflow.version > version_nextflow.txt
echo $version > version_cellranger.txt
multiqc --version | tr -d 'multiqc, version ' > version_multiqc.txt
python3 "$baseDir/scripts/generate_versions.py" -f version_*.txt -o versions
python3 "$baseDir/scripts/generate_references.py" -r "$references" -o references
"""
}
metricsSummary = metricsSummary211.mix(metricsSummary301, metricsSummary302)
// Generate MultiQC Report
process multiqcReport {
publishDir "$outDir/${task.process}", mode: 'copy'
input:
file ('*') from metricsSummary.collect()
file yamlPaths
output:
file "multiqc_report.html" into multiqcReport
script:
"""
awk 'FNR==1 && NR!=1{next;}{print}' *.tsv > metrics_summary_mqc.tsv
sed -i '1s/^.*\tE/Sample\tE/' metrics_summary_mqc.tsv
module load multiqc/1.7
multiqc -c $multiqc .
"""
}
#
# * --------------------------------------------------------------------------
# * Licensed under MIT (https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/LICENSE.md)
# * --------------------------------------------------------------------------
#
'''Make header for HTML of references.'''
import argparse
import subprocess
import shlex
import logging
EPILOG = '''
For more details:
%(prog)s --help
'''
# SETTINGS
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())
logger.propagate = False
logger.setLevel(logging.INFO)
def get_args():
'''Define arguments.'''
parser = argparse.ArgumentParser(
description=__doc__, epilog=EPILOG,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('-r', '--reference',
help="The reference file (markdown format).",
required=True)
parser.add_argument('-o', '--output',
help="The out file name.",
default='references')
args = parser.parse_args()
return args
def main():
args = get_args()
reference = args.reference
output = args.output
out_filename = output + '_mqc.yaml'
# Header for HTML
print('''
id: 'Software References'
section_name: 'Software References'
description: 'This section describes references for the tools used.'
plot_type: 'html'
data: |
'''
, file = open(out_filename, "w")
)
# Turn Markdown into HTML
references_html = 'bash -c "pandoc -p {} | sed \'s/^/ /\' >> {}"'
references_html = references_html.format(reference, out_filename)
subprocess.check_call(shlex.split(references_html))
if __name__ == '__main__':
main()
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''Make YAML of software versions.'''
from __future__ import print_function
from collections import OrderedDict
import re
import logging
import argparse
import numpy as np
EPILOG = '''
For more details:
%(prog)s --help
'''
# SETTINGS
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())
logger.propagate = False
logger.setLevel(logging.INFO)
SOFTWARE_REGEX = {
'Nextflow': ['version_nextflow.txt', r"(\S+)"],
'Cellranger Count': ['version_cellranger.txt', r"(\S+)"],
'MultiQC': ['version_multiqc.txt', r"(\S+)"],
}
def get_args():
'''Define arguments.'''
parser = argparse.ArgumentParser(
description=__doc__, epilog=EPILOG,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('-f', '--files',
help="The version files.",
required=True,
nargs='*')
parser.add_argument('-o', '--output',
help="The out file name.",
required=True)
args = parser.parse_args()
return args
def check_files(files):
'''Check if version files are found.'''
logger.info("Running file check.")
software_files = np.array(list(SOFTWARE_REGEX.values()))[:,0]
extra_files = set(files) - set(software_files)
if len(extra_files) > 0:
logger.error('Missing regex: %s', list(extra_files))
raise Exception("Missing regex: %s" % list(extra_files))
def main():
args = get_args()
files = args.files
output = args.output
out_filename = output + '_mqc.yaml'
results = OrderedDict()
results['Nextflow'] = '<span style="color:#999999;\">N/A</span>'
results['Cellranger Count'] = '<span style="color:#999999;\">N/A</span>'
results['MultiQC'] = '<span style="color:#999999;\">N/A</span>'
# Check for version files:
check_files(files)
# Search each file using its regex
for k, v in SOFTWARE_REGEX.items():
with open(v[0]) as x:
versions = x.read()
match = re.search(v[1], versions)
if match:
results[k] = "v{}".format(match.group(1))
# Dump to YAML
print(
'''
id: 'software_versions'
section_name: 'Software Versions'
section_href: 'https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/'
plot_type: 'html'
description: 'are collected at run time from the software output.'
data: |
<dl class="dl-horizontal">
'''
, file = open(out_filename, "w"))
for k, v in results.items():
print(" <dt>{}</dt><dd>{}</dd>".format(k, v), file = open(out_filename, "a"))
print(" </dl>", file = open(out_filename, "a"))
if __name__ == '__main__':
main()
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment