master | develop |
---|---|
10x Genomics scRNA-Seq (cellranger/spaceranger) mkfastq Pipeline
Introduction
This pipeline is a wrapper for the cellranger or spaceranger mkfastq tool from 10x Genomics (which uses Illumina's bcl2fastq). It takes demultiplexes samples from 10x Genomics Single Cell Gene Expression libraries into fastqs.
FastQC is run on the resulting fastq and those reports and bcl2fastq reports are collated with the MultiQC tool.
The pipeline uses Nextflow, a bioinformatics workflow tool.
This pipeline is primarily used with a SLURM cluster on the BioHPC Cluster. However, the pipeline should be able to run on any system that Nextflow supports.
Additionally, the pipeline is designed to work with Astrocyte Workflow System using a simple web interface.
Cloud Compatibility
This pipeline is also capable of being run on AWS.
NOTE: This pipeline has been reverted to a non-containerized version to work on Astrocyte. Tag containerized
for the last working containerized version which will be compatible with AWS.
To do so:
- Build a AWS batch queue and environment either manually or with aws-cloudformantion
- In the aws configs in
workflow/configs/
:- Replace workDir with the S3 bucket generated
- Change region if different
- In the ondemand and spot configs in
workflow/config/
:- Change queue to the aws batch queues generated
- The user must be have awscli configured with an appropriate authentication (with
aws configure
and access keys) in the environment which nextflow will be run - Add
-profile
with aws and the queue config which was customized- eg.
nextflow run workflow/main.nf -profile aws,ondemand
- eg.
To Run:
- Available parameters:
-
-profile
- what environments to run on, available:
biohpc
,local
,cluster
,aws
,ondemand
,spot
- eg:
-profile biohpc,cluster
to run on BioHPC in cluster mode - eg:
-profile aws,ondemand
to run on AWS on a on-demand queue
- what environments to run on, available:
-
--name
- run name, puts outputs in a directory with this name
- eg:
--name 'test'
-
--ranger
- select the 10x ranger being run.
- eg:
--ranger 'cellranger'
to run cellranger mkfastq - eg:
--ranger 'spaceranger'
to run spaceranger mkfastq
-
--bcl
- base call files (tarballed [.tar] +/- gunzipping [.tar.gz] from a sequencing of 10x single-cell expereiment, supports pigr parallelization)
- there can be multiple basecall files, but they all will be demultiplexed by the same design file
- eg:
--bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-1_2_0.tar.gz'
-
--designFile
- path to design file (csv format) location
- column 1 = "Lane" (number of lanes to demultiplex, * for all lanes)
- column 2 = "Sample" (sample name)
- column 3 = "Index" (10x sample index barcode, eg SI-GA-A1)
- last character set in Index references the well position of the 96-well plate that the sample barcode kit is sold in
- Current sample barcode IDs
- can have repeated "Sample" if there are multiple fastq R1/R2 pairs for the samples
- can be downloaded HERE
- eg:
--designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.csv'
-
--mask
- add a base mask if the sequencing strategy doesn't match the requirements of 10x Genomics
- eg:
--mask '--ignore-dual-index --use-bases-mask=Y\*,I8n\*,n\*,Y\*'
-
--outDir
- optional output directory for run
- eg:
--outDir 'test'
-
-profile
- FULL EXAMPLE:
nextflow run workflow/main.nf -profile biohpc,cluster --name 'test' --ranger 'cellranger' --bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-1_2_0.tar.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-simple-1_2_0.csv' --outDir 'test'
- Design example:
Lane | Sample | Index |
---|---|---|
* | test_sample | SI-GA-C9 |
Credits
This worklow is was developed jointly with the Bioinformatic Core Facility (BICF), Department of Bioinformatics
Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).