index.md 7.11 KB
Newer Older
1
# BICF ATAC-seq Analysis Workflow
2 3


Holly Ruess's avatar
Holly Ruess committed
4 5
## Introduction
BICF ATAC-seq is a bioinformatics best-practice analysis pipeline used for ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data analysis at [BICF](http://www.utsouthwestern.edu/labs/bioinformatics/) at [UT Southwestern Department of Bioinformatics](http://www.utsouthwestern.edu/departments/bioinformatics/).
6

Holly Ruess's avatar
Holly Ruess committed
7
The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results.
8 9


Holly Ruess's avatar
Holly Ruess committed
10 11
## Input files
##### 1) Fastq Files
12
  + One or more input FASTQ files from a ATAC-seq experiment
13 14

## Design file
Holly Ruess's avatar
Holly Ruess committed
15
  + The Design file is a tab-delimited file with 4 columns for Single-End and 5 columns for Paired-End.  Letter, numbers, and underlines can be used in the names. However, the names must begin with a letter. Columns must be as follows:
16

Holly Ruess's avatar
Holly Ruess committed
17 18 19 20 21
    1. sample_id - The id of the sample. This will be the header in output files, please make sure it is concise
    2. experiment_id - Same name given for all replicates of treatment. Will be used for the consensus header.
    3. replicate - Replicate number
    4. fastq_read1 - Name of fastq file 1 for SE or PE data
    5. fastq_read2 - Name of fastq file 2 for PE data
22

Holly Ruess's avatar
Holly Ruess committed
23 24 25
  + See [HERE](/docs/design_ENCSR451NAE_PE.txt) for an example design file, paired-end
  + See [HERE](/docs/design_ENCSR265ZXX_SE.txt) for an example design file, single-end

Holly Ruess's avatar
Holly Ruess committed
26 27

## Pipeline
Holly Ruess's avatar
Holly Ruess committed
28
  + There are 10 steps to the pipeline
Holly Ruess's avatar
Holly Ruess committed
29 30 31
    1. Check input files
    2. Trim adaptors with TrimGalore!
    3. Map reads with BWA, filter with SamTools, and sort with Sambamba
Holly Ruess's avatar
Holly Ruess committed
32
    4. Mark duplicates with Sambamba, Filter reads with SamTools, calculate percentage of reads in mitochondria, and calculate library complexity with SamTools and bedtools
Holly Ruess's avatar
Holly Ruess committed
33 34
    5. Calculate cross-correlation using PhantomPeakQualTools
    6. Call peaks with MACS2 from overlaps of pooled replicates
Holly Ruess's avatar
Holly Ruess committed
35
    7. Call consensus peaks and optional removal of blacklist peaks
Holly Ruess's avatar
Holly Ruess committed
36 37 38
    8. Annotate peaks (chr only; either blacklist removed or replicated)
    9. QC metrics
    10. MultiQC report
Holly Ruess's avatar
Holly Ruess committed
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53


## Output Files
Folder | File | Description
--- | --- | ---
design | N/A | Inputs used for analysis; can ignore
trimReads | *_trimming_report.txt | report detailing how many reads were trimmed
trimReads | *_trimmed.fq.gz | trimmed fastq files used for analysis
alignReads | *.flagstat.qc | QC metrics from the mapping process
alignReads | *.bam | sorted bam file
filterReads | *.dedup.qc | QC metrics of find duplicate reads (sambamba)
filterReads | *.dedup.bam | filtered bam file with duplicate reads removed
filterReads | *.dedup.bam.bai | indexed filtered bam file
filterReads | *.dedup.flagstat.qc | QC metrics of filtered bam file (mapping stats, samtools)
filterReads | *.dedup.pbc.qc | QC metrics of library complexity
Holly Ruess's avatar
Holly Ruess committed
54
filterReads | *.pctmito.tsv | QC percentage of reads in mitochondria
Holly Ruess's avatar
Holly Ruess committed
55 56 57 58 59 60 61 62 63 64
convertReads | *.filt.nodup.bedse.gz | bed alignment in BEDPE format
convertReads | *.tagAlign.gz | bed alignent in BEDPE or BEDSE format
crossReads | *.cc.plot.pdf | Plot of cross-correlation to assess signal-to-noise ratios
crossReads | *.cc.qc | cross-correlation metrics. File [HEADER](docs/xcor_header.txt)
callPeaksMACS | pooled/*pooled.fc_signal.bw | bigwig data file; raw fold enrichment of sample/control
callPeaksMACS | pooled/*pooled_peaks.xls | Excel file of peaks
callPeaksMACS | pooled/*.pvalue_signal.bw | bigwig data file; sample/control signal adjusted for pvalue significance
callPeaksMACS | pooled/*_pooled.narrowPeak | peaks file; see [HERE](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) for ENCODE narrowPeak header format
consensusPeaks | *.rejected.narrowPeak | peaks not supported by multiple testing (replicates and pseudo-replicates)
consensusPeaks | *.replicated.narrowPeak | peaks supported by multiple testing (replicates and pseudo-replicates)
Holly Ruess's avatar
Holly Ruess committed
65
consensusPeaks | *.replicated_noblacklist.narrowPeak | peaks supported by multiple testing (replicates and pseudo-replicates) with blacklist regions removed
Holly Ruess's avatar
Holly Ruess committed
66 67 68
peakAnnotation | *.chipseeker_annotation.tsv | annotation of chromosomal peaks of either blacklist removed or replicated peaks
peakAnnotation | *.chipseeker_upsetplot.pdf | upsetplot showing the count of overlaps of the genes with different annotated location
peakAnnotation | *.chipseeker_pie.pdf | pie graph of where narrow annotated peaks occur
Holly Ruess's avatar
Holly Ruess committed
69 70 71 72
experimentQC | coverage.pdf | plot to assess the sequencing depth of a given sample
experimentQC | heatmeap_SpearmanCorr.pdf | plot of Spearman correlation between samples
experimentQC | heatmeap_PearsonCorr.pdf | plot of Pearson correlation between samples
experimentQC | sample_mbs.npz | array of multiple BAM summaries
Holly Ruess's avatar
Holly Ruess committed
73 74 75 76
experimentQC | *.FRiPscore.tsv | File containing FRiP score
experimentQC | *.TSSenrichment.tsv | File containing TSS enrichment
experimentQC | *_large_tss-enrich.pdf | TSS Enrichment heatmap and metagene plot
experimentQC | *_tss-enrich.pdf | TSS Enrichment metagene plot
Holly Ruess's avatar
Holly Ruess committed
77 78 79
experimentQC | *.fragment_length_linear.pdf | Paired-end only, fragment/insert size densities, linear
experimentQC | *.fragment_length_linear.pdf | Paired-end only, log10 fragment/insert size densities
experimentQC | *.fragment_length_count.txt | Paired-end only, count and fragment length, raw data
Holly Ruess's avatar
Holly Ruess committed
80
multiqcReport | multiqc_report.html | Quality control report of percent mitochondria, NRF, PBC1, PBC2, NSC, and RSC. Also contains software versions and references to cite.
Holly Ruess's avatar
Holly Ruess committed
81

Holly Ruess's avatar
Holly Ruess committed
82

Holly Ruess's avatar
Holly Ruess committed
83
## Common Quality Control Metrics
Holly Ruess's avatar
Holly Ruess committed
84
  + These are the list of files that should be reviewed before continuing on with the ATAC-seq experiment. If your experiment fails any of these metrics, you should pause and re-evaluate whether the data should remain in the study.
Holly Ruess's avatar
Holly Ruess committed
85 86 87 88
    1. multiqcReport/multiqc_report.html: follow the ATAC-seq standards [HERE](https://www.encodeproject.org/atac-seq/);
    2. crossReads/*cc.plot.pdf: make sure your sample data has the correct signal intensity and location.  See [HERE](https://ccg.epfl.ch//var/sib_april15/cases/landt12/strand_correlation.html) for more details.
    3. filterReads/sample/*.pbc.qc: column 6 (NRF) > 0.9, column 7 (PBC1) > 0.9, and column 8 (PBC2) >3.
    4. experimentQC/coverage.pdf, experimentQC/heatmeap_SpearmanCorr.pdf, experimentQC/heatmeap_PearsonCorr.pdf: See [HERE](https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html) for more details.
Holly Ruess's avatar
Holly Ruess committed
89
    5. experimentQC/: Common Quality controls for ATAC-seq: FRiP score, TSS enrichment, Fragment/Insert length densities (paired-end only)
Holly Ruess's avatar
Holly Ruess committed
90 91


Holly Ruess's avatar
Holly Ruess committed
92 93 94
## Common Errors
If you find an error, please let the [BICF](mailto:BICF@UTSouthwestern.edu) know and we will add it here.

Holly Ruess's avatar
Holly Ruess committed
95

Holly Ruess's avatar
Holly Ruess committed
96
## Citation
97
Please cite individual programs and versions of pipeline used [HERE](docs/references.md), and the overall pipeline doi: 10.5281/zenodo.3526149. Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).
98

Holly Ruess's avatar
Holly Ruess committed
99

100
### Credits
Holly Ruess's avatar
Holly Ruess committed
101
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility ([BICF](https://www.utsouthwestern.edu/labs/bioinformatics/)), in the [Department of Bioinformatics](https://www.utsouthwestern.edu/departments/bioinformatics/).