README.md 6.18 KB
Newer Older
1
## Overview
yy1533's avatar
yy1533 committed
2

3
This is `celseq2`, a Python framework for generating the UMI count matrix
yy1533's avatar
yy1533 committed
4
from CEL-Seq2 [\*] sequencing data. We believe data digestion
5
6
should be automated, and it should be done in a manner not just computational
efficient, but also user-friendly and developer-friendly.
yy1533's avatar
yy1533 committed
7

8
## Install `celseq2`
Puriney's avatar
Puriney committed
9

yy1533's avatar
yy1533 committed
10
``` bash
11
git clone git@github.com:yanailab/celseq2.git
yy1533's avatar
yy1533 committed
12
13
14
15
cd celseq2
pip install ./
```

16
## Quick Start
Puriney's avatar
Puriney committed
17

18
19
20
21
Running `celseq2` pipeline is as easy as 1-2-3. Below is the visualization of
the experiment design as same as the
[sample sheet](https://github.com/yanailab/CEL-Seq-pipeline/blob/133912cd4ceb20af0c67627ab883dfce8b9668df/sample_sheet_example.txt)
used in last generation of the pipeline ([CEL-Seq-pipeline](https://github.com/yanailab/CEL-Seq-pipeline)) as example.
yy1533's avatar
yy1533 committed
22

23
![experiment-old-pipeline-visualize](https://i.imgur.com/ntJVTYM.gif)
yy1533's avatar
yy1533 committed
24

25
26
27
The user had two biological samples which could come from two different
experiments, two time-points, two types of tissues, or even two labs. They were
denoted as squares and circles, respectively. Each sample had 9 cells.
28

29
30
In principle, what the user would expect as final output was one UMI count matrix
for each sample, which meant two UMI matrices in total in this example.
31

32
33
34
35
36
37
38
During the CEL-Seq2 experiment, all cells were placed in one 96-well cell plate.
They were labeled with same sequencing barcodes (shown as orange plate)
but each cell was labeled with its own CEL-Seq2 cell barcode, so that all of them
could be sequenced together without losing identities. In details, the
nine cells from Experiment-1 were labeled with CEL-Seq2 cell barcodes indexed
from 1 to 9, respectively, while the other nine cells from Experiment-2 were
labeled with cell barcodes 10 to 18.
yy1533's avatar
yy1533 committed
39

40
41
42
Finally the library was distributed in two lanes (purple and dark gray bar) of a
sequencer, and got sequenced, which resulted in two sets of CEL-Seq2 data (per
lane per sequencing barcode).
yy1533's avatar
yy1533 committed
43

44
45
46
47
What would the pipeline of `celseq2` do for the user was to generate UMI-count
matrix per experiment with the two sets of CEL-Seq2 data as input.

### Step-1: Specify Global Configuration of Workflow
48
49
50

Run `new-configuration-file` command to initiate configuration file (YAML
format), which specifies the details of CEL-Seq2 techniques the users perform,
51
52
e.g. the cell barcodes sequence dictionary, and transcriptome annotation
information for quantifying UMIs, etc.
53
54

This configuration can be shared and used more than once as long as user is
55
running pipeline on same species.
56
57
58
59
60

``` bash
new-configuration-file -o /path/to/wonderful_CEL-Seq2_config.yaml
```

61
62
Example of configuration is [here](https://github.com/yanailab/celseq2/blob/master/example/config.yaml).

yy1533's avatar
yy1533 committed
63
Example of CEL-Seq2 cell barcodes sequence dictionary is [here](https://github.com/yanailab/celseq2/blob/master/example/barcodes_cel-seq_umis96.tab).
64

yy1533's avatar
yy1533 committed
65
Read ["Setup Configuration"](https://yanailab.github.io/celseq2/user_guide/setup_config/)
66
for full instructions.
67

68
### Step-2: Define Experiment Table
yy1533's avatar
yy1533 committed
69

70
Run `new-experiment-table` command to initiate a table (space/tab separated
71
file format) specifying the experiment layout.
yy1533's avatar
yy1533 committed
72

73
``` bash
yy1533's avatar
yy1533 committed
74
75
76
new-experiment-table -o /path/to/wonderful_experiment_table.txt
```

77
Fill information into the generated experiment table file row by row.
yy1533's avatar
yy1533 committed
78

79
The content of experiment table in this example could be:
yy1533's avatar
yy1533 committed
80
81
82

| SAMPLE_NAME               | CELL_BARCODES_INDEX   | R1                        | R2                        |
|-----------------------    |---------------------  |-------------------------  |-------------------------  |
83
84
| wonderful_experiment1     | 1-9                   | path/to/lane1-R1.fastq.gz   | path/to/lane1-R2.fastq.gz   |
| wonderful_experiment2     | 10-18                 | path/to/lane1-R1.fastq.gz   | path/to/lane1-R2.fastq.gz   |
85
86
| wonderful_experiment1     | 1-9              | path/to/lane2-R1.fastq.gz   | path/to/lane2-R2.fastq.gz   |
| wonderful_experiment2     | 10-18                 | path/to/lane2-R1.fastq.gz | path/to/lane2-R2.fastq.gz   |
yy1533's avatar
yy1533 committed
87

yy1533's avatar
yy1533 committed
88
Read ["Experiment Table Specification"](https://yanailab.github.io/celseq2/user_guide/experiment_table/)
89
for full instructions when more complexed experiment designs take place.
yy1533's avatar
yy1533 committed
90

91
### Step-3: Run Pipeline of `celseq2`
92

93
Launch pipeline in the computing node which performs 10 tasks in parallel.
Puriney's avatar
Puriney committed
94

95
``` bash
96
97
98
99
celseq2 --config-file /path/to/wonderful_CEL-Seq2_config.yaml \
    --experiment-table /path/to/wonderful_experiment_table.txt \
    --output-dir /path/to/result_dir \
    -j 10
100
101
```

yy1533's avatar
yy1533 committed
102
Read ["Launch Pipeline"](https://yanailab.github.io/celseq2/user_guide/launch_pipeline/)
103
104
for full instructions to see how to submit jobs to cluster, or preview how many
tasks are going to be scheduled.
105

106
### Results
107

108
109
All the results are saved under <kbd>/path/to/result_dir</kbd> that user
specified, which has folder structure:
110
111
112

```
├── annotation
113
├── expr                  # <== Here saves all the UMI count matrices
114
115
116
117
118
119
120
121
122
123
124
├── input
├── small_diagnose
├── small_fq
├── small_log
├── small_sam
├── small_umi_count
└── small_umi_set
```

In particular, **UMI count matrix** for each of the experiments is
saved in both CSV and HDF5 format and exported to <kbd>expr/</kbd> folder.
Puriney's avatar
Puriney committed
125

126
127
```
expr/
128
├── wonderful_experiment1
129
│   ├── expr.csv          # <== UMI count matrix for cells denoted as squares
130
│   ├── expr.h5
131
132
133
134
│   ├── item-1
│   │   ├── expr.csv
│   │   └── expr.h5
│   └── item-3
135
136
│       ├── expr.csv
│       └── expr.h5
137
└── wonderful_experiment2
138
    ├── expr.csv          # <== UMI count matrix for cells denoted as circles
139
    ├── expr.h5
140
141
142
143
    ├── item-2
    │   ├── expr.csv
    │   └── expr.h5
    └── item-4
144
145
146
        ├── expr.csv
        └── expr.h5
```
147

yy1533's avatar
yy1533 committed
148
Results of <kbd>item-X</kbd> are useful to assess technical variation when FASTQ
149
files from multiple lanes, or technical/biological replicates are present.
Yun Yan's avatar
Yun Yan committed
150

151
## About
Yun Yan's avatar
Yun Yan committed
152

153
Authors: See <https://github.com/yanailab/celseq2/blob/master/AUTHORS>
Yun Yan's avatar
Yun Yan committed
154

155
License: See <https://github.com/yanailab/celseq2/blob/master/LICENSE>
Yun Yan's avatar
Yun Yan committed
156

yy1533's avatar
yy1533 committed
157
---
Yun Yan's avatar
Yun Yan committed
158

yy1533's avatar
yy1533 committed
159
[\*] Hashimshony, T. et al. CEL-Seq2: sensitive highly-
160
161
162
multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
<https://doi.org/10.1186/s13059-016-0938-8>