README.md 7.36 KB
Newer Older
Gervaise Henry's avatar
Gervaise Henry committed
1
2
|*master*|*develop*|
|:-:|:-:|
Gervaise Henry's avatar
Gervaise Henry committed
3
|[![pipeline status](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/badges/master/pipeline.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/commits/master)|[![pipeline status](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/badges/develop/pipeline.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/commits/develop)|
Gervaise Henry's avatar
Gervaise Henry committed
4

Venkat Malladi's avatar
Venkat Malladi committed
5
6
7
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2652622.svg)](https://doi.org/10.5281/zenodo.2652622)


Gervaise Henry's avatar
Gervaise Henry committed
8
10x Genomics scRNA-Seq (cellranger) count Pipeline
Gervaise Henry's avatar
Gervaise Henry committed
9
==================================================
Gervaise Henry's avatar
Gervaise Henry committed
10
11
12
13
14
15
16
17
18
19

Introduction
------------

This pipeline is a wrapper for the cellranger count tool from 10x Genomics. It takes fastq files from 10x Genomics Single Cell Gene Expression libraries, performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate gene-barcode matrices, determine clusters, and perform gene expression analysis.

The pipeline uses Nextflow, a bioinformatics workflow tool.

This pipeline is primarily used with a SLURM cluster on the BioHPC Cluster. However, the pipeline should be able to run on any system that Nextflow supports.

Gervaise Henry's avatar
Gervaise Henry committed
20
21
Additionally, the pipeline is designed to work with Astrocyte Workflow System using a simple web interface.

22
23
Cloud Compatibility
-------------------
Gervaise Henry's avatar
Gervaise Henry committed
24
25
26
27
This pipeline is also capable of being run on AWS.
**NOTE: This pipeline has been reverted to a non-containerized version to work on Astrocyte. Tag `containerized` for the last working containerized version which will be compatible with AWS.**

To do so:
28
29
30
31
32
33
34
35
36
* Build a AWS batch queue and environment either manually or with [aws-cloudformantion](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=Nextflow&templateURL=https://s3.amazonaws.com/aws-genomics-workflows/templates/nextflow/nextflow-aio.template.yaml)
* Edit one of the aws configs in workflow/config/
  * Replace workDir with the S3 bucket generated
  * Change region if different
  * Change queue to the aws batch queue generated
* The user must be have awscli configured with an appropriate authentication (with ```aws configure``` and access keys) in the environment which nextflow will be run
* Add ```-profile ``` with the name aws config which was custamized
  * eg. ```nextflow run workflow/main.nf -profile aws_ondemand```

Gervaise Henry's avatar
Gervaise Henry committed
37
38
39
40
To Run:
-------

* Available parameters:
Gervaise Henry's avatar
Gervaise Henry committed
41
42
43
44
  * **-profile**
    * what environments to run on, available: `biohpc`, `local`, `cluster`, `aws`, `ondemand`, `spot`
    * eg: **-profile biohpc,cluster** to run on BioHPC in cluster mode
    * eg: **-profile aws,ondemand** to run on AWS on a on-demand queue
Gervaise Henry's avatar
Gervaise Henry committed
45
  * **--fastq**
Gervaise Henry's avatar
Gervaise Henry committed
46
47
48
    * path to the fastq location
    * R1 and R2 only necessary but can include I2
    * only fastq's in designFile (see below) are used, not present will be ignored
49
    * eg: **--fastq '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r10k/\*.fastq.gz'**
Gervaise Henry's avatar
Gervaise Henry committed
50
  * **--designFile**
Gervaise Henry's avatar
Gervaise Henry committed
51
52
53
54
55
56
    * path to design file (csv format) location
    * column 1 = "Sample"
    * column 2 = "fastq_R1"
    * column 3 = "fastq_R2"
    * can have repeated "Sample" if there are multiple fastq R1/R2 pairs for the samples
    * can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/blob/master/docs/design.csv)
57
    * eg: **--designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r10k/design.csv'**
Gervaise Henry's avatar
Gervaise Henry committed
58
59
60
61
62
63
64
65
66
67
  * **--genome**
    * reference genome
    * requires workflow/conf/biohpc.config to work
    * name of available 10x Gemomics premade reference genomes:
        * *'GRCh38-3.0.0'* = Human GRCh38 release 93
        * *'GRCh38-1.2.0'* = Human GRCh38 release 84
        * *'hg19-3.0.0'* = Human GRCh37 (hg19) release 87
        * *'hg19-1.2.0'* = Human GRCh37 (hg19) release 84
        * *'mm10-3.0.0'* = Mouse GRCm38 (mm10) release 93
        * *'mm10-3.0.0'* = Mouse GRCm38 (mm10) release 84
Jeremy Mathews's avatar
Jeremy Mathews committed
68
69
70
        * *'GRCh38_and_mm10-3.1.0'* = Human GRCh38 + Mouse GRCm38 (mm10) release 93
        * *'hg19_and_mm10-3.0.0'* = Human GRCh37 (hg19) + Mouse GRCm38 (mm10) release 93
        * *'hg19_and_mm10-1.2.0'* = Human GRCh37 (hg19) + Mouse GRCm38 (mm10) release 84
Gervaise Henry's avatar
Gervaise Henry committed
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
        * *'ercc92-1.2.0'* = ERCC.92 Spike-In
    * if --genome is used then --genomeLocationFull is not necessary
    * eg: **--genome 'GRCh38-3.0.0'**
  * **--genomeLocationFull**
    * path to a custom genome
    * if --genomeLocationFull is used --genome is not necessary and is ignored
    * eg. **--genomeLocationFull '/project/apps_database/cellranger/refdata-cellranger-GRCh38-3.0.0'**
  * **--expectCells**
    * expected number of cells to be detected
    * guides cellranger in it's cutoff for background/low quality cells
    * as a guide it doesn't have to be exact
    * 0-10000
    * if --expextedCells is used then --forceCells is not necessary
    * only used if --forceCells is not entered or set to 0
    * eg: **--expectCells 10000**
  * **--forceCells**
    * forces filtering of the top number of cells matching this parameter
    * 0-10000
    * if --forceCells is used then --expectedCells is not necessary and is ignored
    * eg: **--forceCells 10000**
  * **--kitVersion**
    * the library chemistry version number for the 10x Genomics Gene Expression kit
    * setting to auto will attempt to autodetect from the detected sequencing strategy in the fastq's
    * version numbers are spelled out
    * --kitversion is only used if --version (cellranger version) is > 2
    * --version (cellranger version) 2.1.1 can only read --kitVersion of two (2)
    * options:
        * *'auto'*
Gervaise Henry's avatar
Gervaise Henry committed
99
        * *'3GEX'*
Gervaise Henry's avatar
Gervaise Henry committed
100
101
102
        * *'3GEXv3'*
        * *'3GEXv2'*
        * *'5GEX'*
Gervaise Henry's avatar
Gervaise Henry committed
103
104
        * *'5GEXPE'*
        * *'5GEXR2'*
Gervaise Henry's avatar
Gervaise Henry committed
105
    * eg: **--kitVersion '3GEXv3'**
Gervaise Henry's avatar
Gervaise Henry committed
106
107
  * **--version**
    * cellranger version
Gervaise Henry's avatar
Gervaise Henry committed
108
    * --version (cellranger version) 2.1.1 can only read --kitVersion of 3GEXv2
Gervaise Henry's avatar
Gervaise Henry committed
109
    * options:
Jeremy Mathews's avatar
Jeremy Mathews committed
110
        * *'3.1.0'*
Gervaise Henry's avatar
Gervaise Henry committed
111
112
        * *'3.0.2'*
        * *'2.1.1'*
Jeremy Mathews's avatar
Jeremy Mathews committed
113
    * eg: **--version '3.1.0'**
114
115
116
117
118
  * **--vizFiles**
    * create objects which can be used for downstream visualization and analysis of each sample outputs, currently creates:
      * Seurat R-objects
    * true/false
    * eg: **--version true**
Gervaise Henry's avatar
Gervaise Henry committed
119
120
121
122
  * **--outDir**
    * optional output directory for run
    * eg: **--outDir 'test'**
* FULL EXAMPLE:
Gervaise Henry's avatar
Gervaise Henry committed
123
  ```
124
  nextflow run workflow/main.nf -profile biohpc,cluster --fastq '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r10k/*.fastq.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_count/hu.v3s2r10k/design.csv' --genome 'GRCh38-3.0.0' --kitVersion '3GEXv3' --version '3.1.0' --vizFiles true --outDir 'test'
Gervaise Henry's avatar
Gervaise Henry committed
125
  ```
Gervaise Henry's avatar
Gervaise Henry committed
126
127
* Design example:

Gervaise Henry's avatar
Gervaise Henry committed
128
129
| Sample | fastq_R1 | fastq_R2 |
|--------|----------|----------|
Gervaise Henry's avatar
Gervaise Henry committed
130
131
| sample1 | pbmc_1k_v2_S1_L001_R1_001.fastq.gz | pbmc_1k_v2_S1_L001_R2_001.fastq.gz |
| sample2 | pbmc_1k_v2_S2_L001_R1_001.fastq.gz | pbmc_1k_v2_S2_L001_R2_001.fastq.gz |
Gervaise Henry's avatar
Gervaise Henry committed
132
| sample2 | pbmc_1k_v2_S2_L002_R1_001.fastq.gz | pbmc_1k_v2_S2_L002_R2_001.fastq.gz |
Gervaise Henry's avatar
Gervaise Henry committed
133
134

[**CHANGELOG**](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/blob/develop/CHANGELOG.md)
Holly Ruess's avatar
Holly Ruess committed
135
136
137
138
139
140
141

Credits
-------
This worklow is was developed jointly with the [Bioinformatic Core Facility (BICF), Department of Bioinformatics](http://www.utsouthwestern.edu/labs/bioinformatics/)


Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.