Cellranger count Astrocyte workflow
This workflow runs cellranger count
on data generated from 10x Genomics Chromium single cell gene expression workflows. It accepts fastq or fastq.gz files as input, which can be generated with the "BICF CellRanger mkfastq" Astrocyte workflow. It will also run fastqc
on the submitted fastq(.gz) files. Finally, multiqc
is used to generate an aggregate report collecting the QC information for both cellranger count
and fastqc
.
Version 1.x of this workflow runs cellranger-count version 7 with the 2020-A references. Version 2.x of this workflow runs cellranger-count version 8 with the 2024-A references.
Note: this workflow runs cellranger count
in a container, and is neither provided nor supported by 10x Genomics.
Parameters
To run this workflow, you must supply the following parameters:
-
Fastq: The fastq files to be analyzed.
-
sample_sheet: A CSV-formatted file with the following named columns:
- sample: The name of the sample. This must match the prefix of the associated fastq files.
- reference: Which reference genome to use. This workflow currently supports the values "hg38", "mm10", or "barnyard" (hg38 and mm10 combined).
- expectCells: The number of cells expected from this sample. Set to "auto" for auto-detection (recommended).
-
chemistry: The chemistry used to generate libraries. Set to "auto" for auto-detection (recommended). Note that if the chemistry is 3' v1 or you're analyzing GEX data alone generated from multiome, you must set this explicitly. Possible values:
- "auto": auto detect
- "SC3Pv1": Single cell 3' v1
- "SC3Pv2": Single cell 3' v2
- "SC3Pv3": Single cell 3' v3
- "SC3Pv3LT": Single cell 3' v3 LT
- "SC3Pv3HT": Single cell 3' v3 HT
- "SC5P-PE": Single cell 5' paired-end
- "SC5P-R2": Single cell 5' R2-only
- "ARC-v1": GEX only from multiome
- introns: true/false. Whether to count intronic reads.
- noBam: true/false. Whether to skip bam file generation. This will save some time and space, but bam files may be required for downstream analysis and/or deposition into public databases.
Here's an example sample sheet:
sample | reference | expectCells | chemistry | introns | noBam |
---|---|---|---|---|---|
Brain_Tumor_3p_LT | hg38 | 0 | auto | true | true |
hgmm_100 | mm10 | 0 | auto | true | true |
hgmm_100 | barnyard | 0 | auto | true | true |
hgmm_100 | barnyard | 100 | auto | true | false |
Output
Cell Ranger count
Cell Ranger count output is located in the count
directory. Each sample has its own subdirectory containing the following:
- analysis: a directory containing data regarding differential expression, clustering, etc. The files in this directory can be consumed by downstream analysis tools.
- raw_feature_bc_matrix(.h5): the raw (all droplets included) counts matrix in MTX (or .h5) format.
- filtered_feature_bc_matrix(.h5): the filtered (empty droplets removed) counts matrix in MTX (or .h5) format.
- web_summary.html: an HTML report providing a summary of the run.
- metrics_summary.csv: a summary of the metrics reported in web_summary.html.
- molecule_info.h5: a file containing per-molecule information for all high-quality and assigned reads
- cloupe.cloupe: a file that can be read by 10x Genomics Loupe Cell Browser for interactive visualization
- possorted_genome_bam.bam(.bai): a bam/index file containing alignment. Not generated if
noBam
set to "true".
FastQC
FastQC output is located in the fastqc
directory. These results are summarized together in the MultiQC output.
MultiQC
An aggregate report summarizing the results of the previous steps is saved in the root output directory as multiqc_report.html
. This report summarizes the results of Cell Ranger count and FastQC together for all samples.
Test data
The test_data directory contains scripts to download small datasets from 10x Genomics to test the pipeline.
- Brain_Tumor_LT_3p: 200 Sorted Cells from Human Glioblastoma Multiforme, 3’ LT v3.1, https://www.10xgenomics.com/resources/datasets/200-sorted-cells-from-human-glioblastoma-multiforme-3-lt-v-3-1-3-1-low-6-0-0
- hgmm_100: 100 1:1 Mixture of Fresh Frozen Human (HEK293T) and Mouse (NIH3T3) Cells, https://www.10xgenomics.com/resources/datasets/100-1-1-mixture-of-fresh-frozen-human-hek-293-t-and-mouse-nih-3-t-3-cells-2-standard-2-1-0
- pbmc1k: 1k PBMCs from a Healthy Donor (v3 chemistry), https://www.10xgenomics.com/resources/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0
Credits
This workflow was adapted and updated from the cellranger_count BICF workflow with the guidance of experts at BioHPC.
Cell Ranger is software developed by 10x Genomics. The containerized version of Cell Ranger utilized in this pipeline is packaged by nf-core, and is neither provided nor supported by 10x Genomics.
FastQC is software developed at the Babraham Institute. The containerized version used in this workflow is packaged by BioContainers.
MultiQC is software developed at Seqera Labs. The containerized version used in this workflow is also released by Seqera Labs.