Astrocyte

Workflows

Strand Lab

cellranger-count

Repository



Cellranger count Astrocyte workflow
This workflow runs cellranger count on data generated from 10x Genomics Chromium single cell gene expression workflows. It accepts fastq or fastq.gz files as input, which can be generated with the "BICF CellRanger mkfastq" Astrocyte workflow. It will also run fastqc on the submitted fastq(.gz) files. Finally, multiqc is used to generate an aggregate report collecting the QC information for both cellranger count and fastqc.
Version 1.x of this workflow runs cellranger-count version 7 with the 2020-A references. Version 2.x of this workflow runs cellranger-count version 8 with the 2024-A references.
Note: this workflow runs cellranger count in a container, and is neither provided nor supported by 10x Genomics.

Parameters
To run this workflow, you must supply the following parameters:


Fastq: The fastq files to be analyzed.


sample_sheet: A CSV-formatted file with the following named columns:


sample: The name of the sample. This must match the prefix of the associated fastq files.

reference: Which reference genome to use. This workflow currently supports the values "hg38", "mm10", or "barnyard" (hg38 and mm10 combined).

expectCells: The number of cells expected from this sample. Set to "auto" for auto-detection (recommended).

chemistry: The chemistry used to generate libraries. Set to "auto" for auto-detection  (recommended). Note that if the chemistry is 3' v1 or you're analyzing GEX data alone generated from multiome, you must set this explicitly. Possible values:

"auto": auto detect
"SC3Pv1": Single cell 3' v1
"SC3Pv2": Single cell 3' v2
"SC3Pv3": Single cell 3' v3
"SC3Pv3LT": Single cell 3' v3 LT
"SC3Pv3HT": Single cell 3' v3 HT
"SC5P-PE": Single cell 5' paired-end
"SC5P-R2": Single cell 5' R2-only
"ARC-v1": GEX only from multiome


introns: true/false. Whether to count intronic reads.

noBam: true/false. Whether to skip bam file generation. This will save some time and space, but bam files may be required for downstream analysis and/or deposition into public databases.


Here's an example sample sheet:


sample
reference
expectCells
chemistry
introns
noBam


Brain_Tumor_3p_LT
hg38
0
auto
true
true


hgmm_100
mm10
0
auto
true
true


hgmm_100
barnyard
0
auto
true
true


hgmm_100
barnyard
100
auto
true
false


Output

Cell Ranger count
Cell Ranger count output is located in the count directory. Each sample has its own subdirectory containing the following:

analysis: a directory containing data regarding differential expression, clustering, etc. The files in this directory can be consumed by downstream analysis tools.
raw_feature_bc_matrix(.h5): the raw (all droplets included) counts matrix in MTX (or .h5) format.
filtered_feature_bc_matrix(.h5): the filtered (empty droplets removed) counts matrix in MTX (or .h5) format.
web_summary.html: an HTML report providing a summary of the run.
metrics_summary.csv: a summary of the metrics reported in web_summary.html.
molecule_info.h5: a file containing per-molecule information for all high-quality and assigned reads
cloupe.cloupe: a file that can be read by 10x Genomics Loupe Cell Browser for interactive visualization
possorted_genome_bam.bam(.bai): a bam/index file containing alignment. Not generated if noBam set to "true".


FastQC
FastQC output is located in the fastqc directory. These results are summarized together in the MultiQC output.

MultiQC
An aggregate report summarizing the results of the previous steps is saved in the root output directory as multiqc_report.html. This report summarizes the results of Cell Ranger count and FastQC together for all samples.

Test data
The test_data directory contains scripts to download small datasets from 10x Genomics to test the pipeline.

Brain_Tumor_LT_3p: 200 Sorted Cells from Human Glioblastoma Multiforme, 3’ LT v3.1, https://www.10xgenomics.com/resources/datasets/200-sorted-cells-from-human-glioblastoma-multiforme-3-lt-v-3-1-3-1-low-6-0-0

hgmm_100: 100 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells, https://www.10xgenomics.com/resources/datasets/100-1-1-mixture-of-fresh-frozen-human-hek-293-t-and-mouse-nih-3-t-3-cells-2-standard-2-1-0

pbmc1k: 1k PBMCs from a Healthy Donor (v3 chemistry),
https://www.10xgenomics.com/resources/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0


Credits
This workflow was adapted and updated from the cellranger_count BICF workflow with the guidance of experts at BioHPC.
Cell Ranger is software developed by 10x Genomics. The containerized version of Cell Ranger utilized in this pipeline is packaged by nf-core, and is neither provided nor supported by 10x Genomics.
FastQC is software developed at the Babraham Institute. The containerized version used in this workflow is packaged by BioContainers.
MultiQC is software developed at Seqera Labs. The containerized version used in this workflow is also released by Seqera Labs.