-
Achisha Saikia authored5c8f8159
ATAC-seq Astrocyte Workflow
ATAC-seq is a bioinformatics best-practice analysis pipeline used for ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data analysis. It is built upon the ENCODE ATAC workflow written in workflow description language (wdl) and uses Nextflow to run in the astrocyte platform.
Requirements
- The ATAC-seq Source workflow ['astrocyte-atac-source] (https://git.biohpc.swmed.edu/s219741/astrocyte-atac-source). This repo is used to wrap the existing ATAC-seq pipeline listed below (Runner), so that it can be run on the Astrocyte platform.
- The ATAC-seq Runner workflow, 'astrocyte-atac-runner] (https://git.biohpc.swmed.edu/s219741/astrocyte-atac-runner). This repo contains the original ATAC-seq pipeline developed by the ENCODE team.
The ATAC-seq Runner workflow
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq. The pipeline can be run end-to-end, starting from raw FASTQ files all the way to peak calling and signal track generation using a single caper submit command. One can also start the pipeline from intermediate stages (for example, using alignment files as input). The pipeline supports both single-end and paired-end data as well as replicated or non-replicated datasets. The outputs produced by the pipeline include 1) formatted HTML reports that include quality control measures specifically designed for ATAC-seq and DNase-seq data, 2) analysis of reproducibility, 3) stringent and relaxed thresholding of peaks, 4) fold-enrichment and pvalue signal tracks.
On HPC, make sure that Caper's conf ~/.caper/default.conf is correctly configured to work with HPC. The following command will submit Caper as a leader job to SLURM with Singularity
caper hpc submit atac.wdl -i "${INPUT_JSON}" --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
The ATAC-seq Source workflow
The Source workflow interacts with the Astrocyte platform and the Runner workflow. It contains some basic components that are required by the Astrocyte platform. It will be imported to the Astrocyte platform as an independent Astrocyte workflow after connecting to the Runner workflow.
Critical files in the base workflow
.gitmodules
and workflow/external_repo/
These are the places where the magic happens. The Runner repo will be cloned and saved to workflow/external_repo/
as a submodule, once the .gitmodules
configured correctly.
Please run the following commands in the base repo to update the submodule and check the status every time you made any changes in the remote repo or the .gitmodules
file.
# register the submodule
git submodule init
# update changes from the remote repo to the submodule
git submodule update --remote
# check the status of the submodule
git submodule status
docs/index.md
The documentation file of the workflow that you see here. The content in this file will be displayed on the website once the workflow is imported to the Astrocyte platform.
test_data/
This folder saves the data for tests only. When running astrocyte_cli test YOUR_WORKFLOW_FOLDER
, it will check this folder for the test data defined in main.nf
or astrocyte_pkg.yml
.
vizapp/
This folder contains the downstream visualization files for the Astrocyte workflow. Currently, only R-shiny is supported.