Skip to content
Snippets Groups Projects
John Lafin's avatar
4c2fccb3

Cellranger count Astrocyte workflow

This workflow runs cellranger count on data generated from 10x Genomics Chromium single cell gene expression workflows. It accepts fastq or fastq.gz files as input, which can be generated with the "BICF CellRanger mkfastq" Astrocyte workflow. It will also run fastqc on the submitted fastq(.gz) files. Finally, multiqc is used to generate an aggregate report collecting the QC information for both cellranger count and fastqc.

Version 1.x of this workflow runs cellranger-count version 7 with the 2020-A references. Version 2.x of this workflow runs cellranger-count version 8 with the 2024-A references.

Note: this workflow runs cellranger count in a container, and is neither provided nor supported by 10x Genomics.

Parameters

To run this workflow, you must supply the following parameters:

  • Fastq: The fastq files to be analyzed.

  • sample_sheet: A CSV-formatted file with the following named columns:

    • sample: The name of the sample. This must match the prefix of the associated fastq files.
    • reference: Which reference genome to use. This workflow currently supports the values "hg38", "mm10", or "barnyard" (hg38 and mm10 combined).
    • expectCells: The number of cells expected from this sample. Set to "auto" for auto-detection (recommended).
    • chemistry: The chemistry used to generate libraries. Set to "auto" for auto-detection (recommended). Note that if the chemistry is 3' v1 or you're analyzing GEX data alone generated from multiome, you must set this explicitly. Possible values:
      • "auto": auto detect
      • "SC3Pv1": Single cell 3' v1
      • "SC3Pv2": Single cell 3' v2
      • "SC3Pv3": Single cell 3' v3
      • "SC3Pv3LT": Single cell 3' v3 LT
      • "SC3Pv3HT": Single cell 3' v3 HT
      • "SC5P-PE": Single cell 5' paired-end
      • "SC5P-R2": Single cell 5' R2-only
      • "ARC-v1": GEX only from multiome
    • introns: true/false. Whether to count intronic reads.
    • noBam: true/false. Whether to skip bam file generation. This will save some time and space, but bam files may be required for downstream analysis and/or deposition into public databases.

Here's an example sample sheet:

sample reference expectCells chemistry introns noBam
Brain_Tumor_3p_LT hg38 0 auto true true
hgmm_100 mm10 0 auto true true
hgmm_100 barnyard 0 auto true true
hgmm_100 barnyard 100 auto true false

Output

Cell Ranger count

Cell Ranger count output is located in the count directory. Each sample has its own subdirectory containing the following:

  • analysis: a directory containing data regarding differential expression, clustering, etc. The files in this directory can be consumed by downstream analysis tools.
  • raw_feature_bc_matrix(.h5): the raw (all droplets included) counts matrix in MTX (or .h5) format.
  • filtered_feature_bc_matrix(.h5): the filtered (empty droplets removed) counts matrix in MTX (or .h5) format.
  • web_summary.html: an HTML report providing a summary of the run.
  • metrics_summary.csv: a summary of the metrics reported in web_summary.html.
  • molecule_info.h5: a file containing per-molecule information for all high-quality and assigned reads
  • cloupe.cloupe: a file that can be read by 10x Genomics Loupe Cell Browser for interactive visualization
  • possorted_genome_bam.bam(.bai): a bam/index file containing alignment. Not generated if noBam set to "true".

FastQC

FastQC output is located in the fastqc directory. These results are summarized together in the MultiQC output.

MultiQC

An aggregate report summarizing the results of the previous steps is saved in the root output directory as multiqc_report.html. This report summarizes the results of Cell Ranger count and FastQC together for all samples.

Test data

The test_data directory contains scripts to download small datasets from 10x Genomics to test the pipeline.

Credits

This workflow was adapted and updated from the cellranger_count BICF workflow with the guidance of experts at BioHPC.

Cell Ranger is software developed by 10x Genomics. The containerized version of Cell Ranger utilized in this pipeline is packaged by nf-core, and is neither provided nor supported by 10x Genomics.

FastQC is software developed at the Babraham Institute. The containerized version used in this workflow is packaged by BioContainers.

MultiQC is software developed at Seqera Labs. The containerized version used in this workflow is also released by Seqera Labs.