Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
To learn more about this project, read the wiki.
README.md 13.54 KiB
master develop
pipeline status pipeline status
pipeline pipeline
nextflow nextflow

DOI

RNA-Seq Analytic Pipeline for GUDMAP/RBK

Introduction

This pipeline was created to be a standard mRNA-sequencing analysis pipeline which integrates with the GUDMAP and RBK consortium data-hub. It is designed to run on the HPC cluster (BioHPC) at UT Southwestern Medical Center (in conjunction with the standard nextflow profile: config biohpc.config)

Authentication:

The consortium server used must be authentificated with the deriva authentication client, and remain authentificated till the end of the pipeline run. Prematurely closing the client will result in invalidation of the tokens, and may result in the pipeline failure. The use of long-lived "globus" tokens is on the roadmap for use in the future.

To Run:

  • Available parameters:

    • --deriva active credential.json file from deriva-auth
    • --bdbag active cookies.txt file from deriva-auth
    • --repRID mRNA-seq replicate RID
    • --source consortium server source
    • --refMoVersion mouse reference version (optional, default = 38.p6.vM25)
    • --refHuVersion human reference version (optional, default = 38.p13.v36)
    • --refERCCVersion human reference version (optional, default = 92)
    • --upload option to not upload output back to the data-hub (optional, default = false)
      • true = upload outputs to the data-hub
      • false = do NOT upload outputs to the data-hub
    • -profile config profile to use (optional):
      • defaut = processes on BioHPC cluster
      • biohpc = process on BioHPC cluster
      • biohpc_max = process on high power BioHPC cluster nodes (=> 128GB nodes), for resource testing
      • aws_ondemand = AWS Batch on-demand instant requests
      • aws_spot = AWS Batch spot instance requests
    • --email email address(es) to send failure notification (comma separated) (optional):
      • e.g: --email 'Venkat.Malladi@utsouthwestern.edu,Gervaise.Henry@UTSouthwestern.edu'
  • NOTES:

    • once deriva-auth is run and authenticated, the two files above are saved in ~/.deriva/ (see official documents from deriva on the lifetime of the credentials)
    • reference version consists of Genome Reference Consortium version, patch release and GENCODE annotation release # (leaving the params blank will use the default version tied to the pipeline version)
      • current mouse 38.p6.vM25 = GRCm38.p6 with GENCODE annotation release M25
      • current human 38.p13.v36 = GRCh38.p13 with GENCODE annotation release 36
  • Optional input overrides

    • --refSource source for pulling references
      • biohpc = source references from BICF_Core gudmap reference local location (workflow must be run on BioHPC system)
      • datahub = source references from GUDMAP/RBK reference_table location (currently uses dev.gudmap.org)
    • --inputBagForce utilizes a local replicate inputBag instead of downloading from the data-hub (still requires accurate repRID input)
      • eg: --inputBagForce test_data/bag/Q-Y5F6_inputBag_xxxxxxxx.zip (must be the expected bag structure, this example will not work because it is a test bag)
    • --fastqsForce utilizes local fastq's instead of downloading from the data-hub (still requires accurate repRID input)
      • eg: --fastqsForce 'test_data/fastq/small/Q-Y5F6_1M.R{1,2}.fastq.gz' (note the quotes around fastq's which must me named in the correct standard [*.R1.fastq.gz and/or *.R2.fastq.gz] and in the correct order, also consider using endsForce if the endness doesn't match submitted value)
    • --speciesForce forces the species to be "Mus musculus" or "Homo sapiens", it bypasses a metadata mismatch or an ambiguous species error
      • eg: --speciesForce 'Mus musculus'
    • --endsForce forces the endness to be "se", or "pe", it bypasses a metadata mismatch error
      • eg: --endsForce 'pe'
    • --strandedForce forces the strandedness to be "forward", "reverse" or "unstranded", it bypasses a metadata mismatch error
      • eg: --strandedForce 'unstranded'
    • --spikeForce forces the spike-in to be "false", or "true", it bypasses a metadata mismatch error
      • eg: --spikeForce 'true'
  • Tracking parameters (Tracking Site):

    • --ci boolean (default = false)
    • --dev boolean (default = true)

FULL EXAMPLE: