master | develop |
---|---|
RNA-Seq Analytic Pipeline for GUDMAP/RBK
Introduction
This pipeline was created to be a standard mRNA-sequencing analysis pipeline which integrates with the GUDMAP and RBK consortium data-hub.
Cloud Compatibility:
This pipeline is also capable of being run on AWS. To do so:
- Build a AWS batch queue and environment either manually or with aws-cloudformantion
- Edit one of the aws configs in workflow/config/
- Replace workDir with the S3 bucket generated
- Change region if different
- Change queue to the aws batch queue generated
- The user must have awscli configured with an appropriate authentication (with
aws configure
and access keys) in the environment which nextflow will be run - Add
-profile
with the name aws config which was customized
To Run:
- Available parameters:
-
--deriva
active credential.json file from deriva-auth -
--bdbag
active cookies.txt file from deriva-auth -
--repRID
mRNA-seq replicate RID -
--refMoVersion
mouse reference version (optional) -
--refHuVersion
human reference version (optional) -
-profile
config profile to use: standard = local processes on BioHPC (default), biohpc = BioHPC cluster, aws_ondemand = AWS Batch on-demand instant requests, aws_spot = AWS Batch spot instance requests (optional)
-
- NOTES:
- once deriva-auth is run and authenticated, the two files above are saved in
~/.deriva/
(see official documents from deriva on the lifetime of the credentials) - reference version consists of Genome Reference Consortium version, patch release and GENCODE annotation release # (leaving the params blank will use the default version tied to the pipeline version)
- current mouse 38.p6.vM22 = GRCm38.p6 with GENCODE annotation release M22
- current human 38.p6.v31 = GRCh38.p12 with GENCODE annotation release 31
- once deriva-auth is run and authenticated, the two files above are saved in
FULL EXAMPLE:
nextflow run workflow/rna-seq.nf --deriva ./data/credential.json --bdbag ./data/cookies.txt --repRID Q-Y5JA
Credits
This workflow is was developed by Bioinformatic Core Facility (BICF), Department of Bioinformatics
PI
Venkat S. Malladi
Faculty Associate & Director
Bioinformatics Core Facility
UT Southwestern Medical Center
orcid.org/0000-0002-0144-0564
venkat.malladi@utsouthwestern.edu
Developers
Gervaise H. Henry
Computational Biologist
Department of Urology
UT Southwestern Medical Center
orcid.org/0000-0001-7772-9578
gervaise.henry@utsouthwestern.edu
Jonathan Gesell
Computational Biologist
Bioinformatics Core Facility
UT Southwestern Medical Center
orcid.org/0000-0001-5902-3299
johnathan.gesell@utsouthwestern.edu
Jeremy A. Mathews
Computational Intern
Bioinformatics Core Facility
UT Southwestern Medical Center
orcid.org/0000-0002-2931-1430
jeremy.mathews@utsouthwestern.edu
Please cite in publications: Pipeline was developed by BICF from funding provided by Cancer Prevention and Research Institute of Texas (RP150596).