Skip to content
Snippets Groups Projects
Commit 7e7587e9 authored by Felix Perez's avatar Felix Perez
Browse files

Continue building up the workflow script using the container approach.

parent 9fbdd656
Branches
1 merge request!5Complete workflow using the conda env approach.
......@@ -13,12 +13,17 @@ author: 'Felix Perez, Achisha Saikia, Peng Lian'
# A contact email address for questions
email: 'felix.perez@utsouthwestern.edu, achisha.saikia@utsouthwestern.edu, biohpc-help@utsouthwestern.edu'
# A more informative title for the workflow package
title: 'ATAC-seq Source Workflow"
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq. The pipeline can be run end-to-end, starting from raw FASTQ files all the way to peak calling and signal track generation using a single caper submit command. One can also start the pipeline from intermediate stages (for example, using alignment files as input). The pipeline supports both single-end and paired-end data as well as replicated or non-replicated datasets. The outputs produced by the pipeline include 1) formatted HTML reports that include quality control measures specifically designed for ATAC-seq and DNase-seq data, 2) analysis of reproducibility, 3) stringent and relaxed thresholding of peaks, 4) fold-enrichment and pvalue signal tracks.
title: 'ATAC-seq Source Workflow'
description: |
# TODO: Please describe the workflow. (AS)
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq.
The pipeline can be run end-to-end, starting from raw FASTQ files all the way to peak calling
and signal track generation using a single caper submit command. One can also start the pipeline from intermediate
stages (for example, using alignment files as input). The pipeline supports both single-end and paired-end data as
well as replicated or non-replicated datasets. The outputs produced by the pipeline include 1) formatted HTML reports
that include quality control measures specifically designed for ATAC-seq and DNase-seq data, 2) analysis of
reproducibility, 3) stringent and relaxed thresholding of peaks, 4) fold-enrichment and pvalue signal tracks.
#### New Features in Astrocyte 0.4.0 and above ####
citation: |
......
singularity {
enabled = true
runOptions = ' -B /cm/shared/apps/slurm/16.05.8 -B /etc/slurm -B /cm/shared/apps/slurm/var/etc/ -B /usr/lib64/libreadline.so.6 -B /usr/lib64/libhistory.so.6 -B /usr/lib64/libtinfo.so.5 -B /var/run/munge -B /usr/lib64/libmunge.so.2 -B /usr/lib64/libmunge.so.2.0.0 -B /cm/shared/apps/slurm/16.05.8/lib64/slurm/ -B /cm/shared/apps/slurm/16.05.8/lib64'
runOptions = '\
--bind /cm/shared/apps/slurm/16.05.8,/etc/slurm,/cm/shared/apps/slurm/var/etc/,/usr/lib64/libreadline.so.6 \
--bind /usr/lib64/libhistory.so.6,/usr/lib64/libtinfo.so.5,/var/run/munge,/usr/lib64/libmunge.so.2 \
--bind /usr/lib64/libmunge.so.2.0.0,/cm/shared/apps/slurm/16.05.8/lib64/slurm/ \
--bind /cm/shared/apps/java/oracle/jdk1.8.0_231'
// Please do NOT use "--disable-cache" in this runOptions.
// Starting from version 2.0.0, the astrocyte_cli will clean up the cache automatically.
// runOptions = '--bind /vagrant:/vagrant' // Use this one for vagrant development env only
......
/*
* Copyright (c) 2024. The University of Texas Southwestern Medical Center
*
* TODO: (AC) Brief description of ATAC-seq (DONE)
* ATAC-seq is a molecular biology technique that assesses chromatin accessibility in a genome. It uses a hyperactive Tn5 transposase to insert sequencing adapters into open chromatin regions, allowing researchers to identify and sequence these accessible genomic regions. ATAC-seq is widely used to study gene regulation, identify enhancers and promoters, and gain insights into chromatin structure.
* ATAC-seq is a molecular biology technique that assesses chromatin accessibility in a genome.
* It uses a hyperactive Tn5 transposase to insert sequencing adapters into open chromatin regions,
* allowing researchers to identify and sequence these accessible genomic regions. ATAC-seq is widely
* used to study gene regulation, identify enhancers and promoters, and gain insights into chromatin structure.
*
* @authors
* Felix Perez, Achisha Saikia
*
......@@ -23,10 +26,35 @@ process runSource {
output:
file '*'
"""
export LD_LIBRARY_PATH=/usr/lib64/:$LD_LIBRARY_PATH
shell:
'''
# Allow for the container to use the libraries & paths of Slurm on BioHPC.
export LD_LIBRARY_PATH=/atac/jdk-12/lib:/usr/lib64:/lib:$LD_LIBRARY_PATH
export PATH=/atac/jdk-12:/atac/jdk-12/bin:/bin:/cm/shared/apps/slurm/16.05.8/bin:$PATH
# Provide the container the SlurmUser (user and group) info used on Nucleus.
echo "slurm:x:450:450::/cm/local/apps/slurm:/bin/bash" >> /etc/passwd
echo "slurm:x:450:" >> /etc/group
# Source the container's entrypoint script to have access to the caper
# commands to run the ATAC-seq pipeline in the runner.
source /atac/entrypoint.sh
# Record the relevant software versions.
java -version 2> java_version.txt
sinfo -V > slurm_version.txt
caper --version > caper_version.txt
caper hpc submit $baseDir/external_repo/astrocyte-atac-runner/atac.wdl -i $inputJson --singularity --leader-job-name atac-source 1> batch_job.txt 2>> caper_err.txt
"""
# Launch the ATAC-seq leader job.
submit=$(caper hpc submit !{baseDir}/external_repo/astrocyte-atac-runner/atac.wdl -i !{inputJson} --singularity --leader-job-name atac-source)
# Monitor the state of the leader job; if it enters the COMPLETED, FAILED, or CANCELLED state, then finish the workflow process.
state=$(bash !{baseDir}/scripts/checkJobState.sh "${submit}")
echo "Lead Job state check $(date) - State: $state" >> lead_job_check.txt
while [[ "$state" != *"COMPLETED"* ]] && [[ "$state" != *"FAILED"* ]] && [[ "$state" != *"CANCELLED"* ]]; do
sleep 15
state=$(bash !{baseDir}/scripts/checkJobState.sh "${submit}")
echo "Lead Job state check $(date) - State: $state" >> lead_job_check.txt
done
'''
}
#!/bin/bash
# Get the jobID of the caper lead job from the input txt file.
read -ra line <<< "$1"
jobID=${line[3]}
# Query Slurm for the state of the caper lead job.
jobq=$(sacct --format State -j $jobID)
# Return the caper lead job state.
IFS=$'\n' read -rd '' -a jobstate <<< "$jobq"
echo $(echo "${jobstate[2]}" | xargs)
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment