nf-core Methylseq
This workflow includes a runner for nf-core/methylseq (ver. 2.6.0).
The Workflow
nf-core/methylseq is a bioinformatics analysis pipeline used for Methylation (Bisulfite) sequencing data. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
The pipeline allows you to choose between running either Bismark or bwa-meth / MethylDackel.
Choose between workflows by using --aligner bismark
(default, uses bowtie2 for alignment), --aligner bismark_hisat
or --aligner bwameth
.
Step | Bismark workflow | bwa-meth workflow |
---|---|---|
Generate Reference Genome Index (optional) | Bismark | bwa-meth |
Merge re-sequenced FastQ files | cat | cat |
Raw data QC | FastQC | FastQC |
Adapter sequence trimming | Trim Galore! | Trim Galore! |
Align Reads | Bismark | bwa-meth |
Deduplicate Alignments | Bismark | Picard MarkDuplicates |
Extract methylation calls | Bismark | MethylDackel |
Sample report | Bismark | - |
Summary Report | Bismark | - |
Alignment QC | Qualimap | Qualimap |
Sample complexity | Preseq | Preseq |
Project Report | MultiQC | MultiQC |
Parameters
All parameters are setup as nf-core
documentation.
Example Usage
Running in Astrocyte interface
First, prepare a samplesheet with your input data that looks as follows, make sure to upload all relative files to the Astrocyte project:
sample,fastq_1,fastq_2,genome
SRR389222_sub1,SRR389222_sub1.fastq.gz,,
SRR389222_sub2,SRR389222_sub2.fastq.gz,,
SRR389222_sub2,SRR389222_sub3.fastq.gz,,
Most Astrocyte parameters are following nf-core parameters. Check the parameters as needed.
If input files mentioned by the design file are not public URL, please make sure to include them in input reads
box.
If you have other input files (for parameters known_splices
/bamqc_regions_file
), please make sure to include them in input reads
box as well.
Running in command line interface with Nextflow
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,fastq_1,fastq_2
SRR389222_sub1,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz
SRR389222_sub2,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz
SRR389222_sub2,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub3.fastq.gz
Ecoli_10K_methylated,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R2.fastq.gz
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/methylseq --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Credits
These scripts were originally written for use at the National Genomics Infrastructure at SciLifeLab in Stockholm, Sweden.
- Main author:
- Phil Ewels (@ewels)
- Maintainers:
- Felix Krueger (@FelixKrueger)
- Sateesh Peri (@Sateesh_Peri)
- Edmund Miller (@EMiller88)
- Contributors:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #methylseq
channel (you can join with this invite).
Citations
If you use nf-core/methylseq for your analysis, please cite it using the following doi: 10.5281/zenodo.1343417
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.