Skip to content
Snippets Groups Projects
Jeremy Mathews's avatar
Jeremy Mathews authored
3a4c7a97
master develop
Pipeline Status Pipeline Status

DOI

10x Genomics scRNA-Seq (cellranger/spaceranger) mkfastq Pipeline

Introduction

This pipeline is a wrapper for the cellranger or spaceranger mkfastq tool from 10x Genomics (which uses Illumina's bcl2fastq). It takes demultiplexes samples from 10x Genomics Single Cell Gene Expression libraries into fastqs.

FastQC is run on the resulting fastq and those reports and bcl2fastq reports are collated with the MultiQC tool.

The pipeline uses Nextflow, a bioinformatics workflow tool.

This pipeline is primarily used with a SLURM cluster on the BioHPC Cluster. However, the pipeline should be able to run on any system that Nextflow supports.

Additionally, the pipeline is designed to work with Astrocyte Workflow System using a simple web interface.

Cloud Compatibility

This pipeline is also capable of being run on AWS.
NOTE: This pipeline has been reverted to a non-containerized version to work on Astrocyte. Tag containerized for the last working containerized version which will be compatible with AWS.

To do so:

  • Build a AWS batch queue and environment either manually or with aws-cloudformantion
  • In the aws configs in workflow/configs/:
    • Replace workDir with the S3 bucket generated
    • Change region if different
  • In the ondemand and spot configs in workflow/config/:
    • Change queue to the aws batch queues generated
  • The user must be have awscli configured with an appropriate authentication (with aws configure and access keys) in the environment which nextflow will be run
  • Add -profile with aws and the queue config which was customized
    • eg. nextflow run workflow/main.nf -profile aws,ondemand

To Run:

  • Available parameters:
    • -profile
      • what environments to run on, available: biohpc, local, cluster, aws, ondemand, spot
      • eg: -profile biohpc,cluster to run on BioHPC in cluster mode
      • eg: -profile aws,ondemand to run on AWS on a on-demand queue
    • --name
      • run name, puts outputs in a directory with this name
      • eg: --name 'test'
    • --ranger
      • select the 10x ranger being run.
      • eg: --ranger 'cellranger' to run cellranger mkfastq
      • eg: --ranger 'spaceranger' to run spaceranger mkfastq
    • --bcl
      • base call files (tarballed [.tar] +/- gunzipping [.tar.gz] from a sequencing of 10x single-cell expereiment, supports pigr parallelization)
      • there can be multiple basecall files, but they all will be demultiplexed by the same design file
      • eg: --bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-1_2_0.tar.gz'
    • --designFile
      • path to design file (csv format) location
      • column 1 = "Lane" (number of lanes to demultiplex, * for all lanes)
      • column 2 = "Sample" (sample name)
      • column 3 = "Index" (10x sample index barcode, eg SI-GA-A1)
      • last character set in Index references the well position of the 96-well plate that the sample barcode kit is sold in
      • Current sample barcode IDs
      • can have repeated "Sample" if there are multiple fastq R1/R2 pairs for the samples
      • can be downloaded HERE
      • eg: --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.csv'
    • --mask
      • add a base mask if the sequencing strategy doesn't match the requirements of 10x Genomics
      • eg: --mask '--use-bases-mask=Y\*,I8n\*,n\*,Y\*'
    • --outDir
      • optional output directory for run
      • eg: --outDir 'test'
  • FULL EXAMPLE:
    nextflow run workflow/main.nf -profile biohpc,cluster --name 'test' --ranger 'cellranger' --bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-1_2_0.tar.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-simple-1_2_0.csv' --outDir 'test'
  • Design example:
Lane Sample Index
* test_sample SI-GA-C9

CHANGELOG

Credits

This worklow is was developed jointly with the Bioinformatic Core Facility (BICF), Department of Bioinformatics

Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).