Skip to content
Snippets Groups Projects
Commit f8e44f43 authored by Gervaise Henry's avatar Gervaise Henry :cowboy:
Browse files

Merge branch 'seqwho' into 'develop'

Seqwho

Closes #49 and #122

See merge request !71
parents 4d7e5c98 eba289dc
2 merge requests!76Develop,!71Seqwho
Pipeline #9470 passed with stages
in 7 minutes and 26 seconds
Showing
with 620 additions and 1249 deletions
This diff is collapsed.
......@@ -4,7 +4,11 @@
* Strandedness metadata "yes"/"no" changed to boolean "t"/"f" in data-hub, pipeline updated to handle (#70) ("yes"/"no" still acceptable for backwards compatibility)
* Upload empty mRNA_QC entry if data error (#111)
* Allow forcing of strandedness and spike (#100)
* Add seqwho
* Add seqwho results to multiqc report
* Modify repository structure to allow for use with XPACK-DNANEXUS
* Add override for endness
* Add seqtk to references
**Background**
* Add memory limit (75%) per thread for samtools sort (#108)
......@@ -27,11 +31,18 @@
* Add new CI py tests for override and integration
* Fix fastq file and species error status detail bub (#118)
* Make compatible with XPACK-DNANEXUS
* Don't download fastq's if fastq override present
* Override fastq count to override counts
* Change ambiguous species ci to wrong species
*Known Bugs*
* Override params (inputBag, fastq, species) aren't checked for integrity
* Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)
* Check for outputBag in hatrac doesn't check for any uploaded by chaise
* CI container cache will fail if cache folder is not owned by CI runner user
* CI container cache will not error if container failed to pull
* CI (container cache, version collection, and unit tests) will not work correctly if containers reffered to in nextflow.config aren't prefixed perfectly with: "container = "
* also, it is assumed that the containers are on dockerhub and don't have the "docker://" prefix
<hr>
......
......@@ -56,12 +56,14 @@ To Run:
* `--inputBagForce` utilizes a local replicate inputBag instead of downloading from the data-hub (still requires accurate repRID input)
* eg: `--inputBagForce test_data/bag/Q-Y5F6_inputBag_xxxxxxxx.zip` (must be the expected bag structure, this example will not work because it is a test bag)
* `--fastqsForce` utilizes local fastq's instead of downloading from the data-hub (still requires accurate repRID input)
* eg: `--fastqsForce 'test_data/fastq/small/Q-Y5F6_1M.R{1,2}.fastq.gz'` (note the quotes around fastq's which must me named in the correct standard [*\*.R1.fastq.gz and/or \*.R2.fastq.gz*] and in the correct order)
* eg: `--fastqsForce 'test_data/fastq/small/Q-Y5F6_1M.R{1,2}.fastq.gz'` (note the quotes around fastq's which must me named in the correct standard [*\*.R1.fastq.gz and/or \*.R2.fastq.gz*] and in the correct order, also consider using `endsForce` if the endness doesn't match submitted value)
* `--speciesForce` forces the species to be "Mus musculus" or "Homo sapiens", it bypasses a metadata mismatch or an ambiguous species error
* eg: `--speciesForce 'Mus musculus'`
* `--endsForce` forces the endness to be "se", or "pe", it bypasses a metadata mismatch error
* eg: `--endsForce 'pe'`
* `--strandedForce` forces the strandedness to be "forward", "reverse" or "unstranded", it bypasses a metadata mismatch error
* eg: `--strandedForce 'unstranded'`
* `--spikeForce` forces the spike-in to be "false" or "true", it bypasses a metadata mismatch error
* `--spikeForce` forces the spike-in to be "false", or "true", it bypasses a metadata mismatch error
* eg: `--spikeForce 'true'`
* Tracking parameters ([Tracking Site](http://bicf.pipeline.tracker.s3-website-us-east-1.amazonaws.com/)):
* `--ci` boolean (default = false)
......@@ -86,6 +88,15 @@ This pipeline is also capable of being run on AWS and DNAnexus. To do so:
* Add `-profile` with the name aws config which was customized
### DNAnexus (utilizes the [DNAnexus extension package for Nextflow (XPACK-DNANEXUS)](https://github.com/seqeralabs/xpack-dnanexus))
* Follow the istructions from [XPACK-DNANEXUS](https://github.com/seqeralabs/xpack-dnanexus) about installing and authenticating (a valid license must be available for the extension package from Seqera Labs, as well as a subsription with DNAnexus)
* The nf-dxapp needs to be built with a custom scm config to allow nextflow to pull the pipeline from the UTSW self-hosted GitLab server (git.biohpc.swmed.edu)
```
providers {
bicf {
server = 'https://git.biohpc.swmed.edu'
platform = 'gitlab'
}
}
```
* Follow the instructions from [XPACK-DNANEXUS](https://github.com/seqeralabs/xpack-dnanexus) about launching runs. A template *json* file has been included ([dnanexusExample.json](docs/dnanexusExample.json))
* `[version]` should be replaced with the pipeline version required (eg: `v2.0.0`)
* `[credential.json]` should be replaced with the location of the credential file outpted by authentification with Deriva
......@@ -110,7 +121,11 @@ Error reported back to the data-hub are (they aren't thrown on the command line
|**Number of fastqs detected does not match submitted endness**|Single-end sequenced replicates can only have one fastq, while paried\-end can only have two (see above).|
|**Number of reads do not match for R1 and R2**|For paired\-end sequenced studies the number of reads in read\-1 fastq must match that of read\-2. This error is usually indicative of uploading of currupted, trunkated, or wrong fastq files.|
|**There is an error with the structure of the fastq**|The fastq's fail a test of their structure. This error is usually indicative of uploading of currupted, trunkated, or wrong fastq files.|
|**Inference of species returns an ambiguous result**|Species of the replicate is done by aligning a random subset of 1 million reads from the data to both the human and mouse reference genomes. If there isn't a clear difference between the alignment rates (`>=40%` of one species, but `<40%` of the other), then this error is detected.|
|**Infered species does not match for R1 and R2**|The species inferred from each read does not match. This error is usually indicative of uploading of wrong fastq files.|
|**Infered species confidence is low**|The confidence of the species inferrence call is low. This is usually indicative of very low quality samples.|
|**Infered sequencing type is not mRNA-seq**|The sequence type inferred is not mRNA-seq. This is usually indicative of uploading wrong fastq files.|
|**Infered sequencing type does not match for R1 and R2**|The sequencing type inferred from each read does not match. This error is usually indicative of uploading of wrong fastq files.|
|**Infered species confidence is low**|The confidence of the species inferrence call is low AND 3 sets of a random sampling of the fastq's do not match. This is usually indicative of very low quality samples.|
|**Submitted metadata does not match inferred**|All required metadata for analysis of the data is internally inferred by the pipeline, if any of those do not match the submitted metadata, this error is detected to notify of a potential error. The mismatched metadata will be listed.|
<hr>
......
{
"bag": {
"bag_name": "Execution_Run_{rid}",
"bag_algorithms": [
"md5"
],
"bag_archiver": "zip",
"bag_metadata": {}
},
"catalog": {
"catalog_id": "2",
"query_processors": [
{
"processor": "csv",
"processor_params": {
"output_path": "Execution_Run",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/RID,Replicate_RID:=Replicate,Workflow_RID:=Workflow,Reference_Genone_RID:=Reference_Genome,Input_Bag_RID:=Input_Bag,Notes,Execution_Status,Execution_Status_Detail,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Workflow",
"query_path": "/entity/M:=RNASeq:Execution_Run/RID=17-BPAG/RNASeq:Workflow?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Reference_Genome",
"query_path": "/entity/M:=RNASeq:Execution_Run/RID=17-BPAG/RNASeq:Reference_Genome?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Input_Bag",
"query_path": "/entity/M:=RNASeq:Execution_Run/RID=17-BPAG/RNASeq:Input_Bag?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "mRNA_QC",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/(RID)=(RNASeq:mRNA_QC:Execution_Run)/RID,Execution_Run_RID:=Execution_Run,Replicate_RID:=Replicate,Paired_End,Strandedness,Median_Read_Length,Raw_Count,Final_Count,Notes,RCT,RMT?limit=none"
}
},
{
"processor": "fetch",
"processor_params": {
"output_path": "assets/Study/{Study_RID}/Experiment/{Experiment_RID}/Replicate/{Replicate_RID}/Execution_Run/{Execution_Run_RID}/Output_Files",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/R:=RNASeq:Replicate/$M/(RID)=(RNASeq:Processed_File:Execution_Run)/url:=File_URL,length:=File_Bytes,filename:=File_Name,md5:=File_MD5,Execution_Run_RID:=M:RID,Study_RID:=R:Study_RID,Experiment_RID:=R:Experiment_RID,Replicate_RID:=R:RID?limit=none"
}
},
{
"processor": "fetch",
"processor_params": {
"output_path": "assets/Study/{Study_RID}/Experiment/{Experiment_RID}/Replicate/{Replicate_RID}/Execution_Run/{Execution_Run_RID}/Input_Bag",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/R:=RNASeq:Replicate/$M/RNASeq:Input_Bag/url:=File_URL,length:=File_Bytes,filename:=File_Name,md5:=File_MD5,Execution_Run_RID:=M:RID,Study_RID:=R:Study_RID,Experiment_RID:=R:Experiment_RID,Replicate_RID:=R:RID?limit=none"
}
}
]
}
}
\ No newline at end of file
{
"bag": {
"bag_name": "{rid}_inputBag",
"bag_algorithms": [
"md5"
],
"bag_archiver": "zip"
},
"catalog": {
"query_processors": [
{
"processor": "csv",
"processor_params": {
"output_path": "Study",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Study_RID)=(RNASeq:Study:RID)/Study_RID:=RID,Internal_ID,Title,Summary,Overall_Design,GEO_Series_Accession_ID,GEO_Platform_Accession_ID,Funding,Pubmed_ID,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment:RID)/Experiment_RID:=RID,Study_RID,Internal_ID,Name,Description,Experiment_Method,Experiment_Type,Species,Specimen_Type,Molecule_Type,Pooled_Sample,Pool_Size,Markers,Cell_Count,Treatment_Protocol,Treatment_Protocol_Reference,Isolation_Protocol,Isolation_Protocol_Reference,Growth_Protocol,Growth_Protocol_Reference,Label_Protocol,Label_Protocol_Reference,Hybridization_Protocol,Hybridization_Protocol_Reference,Scan_Protocol,Scan_Protocol_Reference,Data_Processing,Value_Definition,Notes,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment Antibodies",
"query_path": "/entity/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment_Antibodies:Experiment_RID)?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment Custom Metadata",
"query_path": "/entity/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment_Custom_Metadata:Experiment_RID)?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment Settings",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment_Settings:Experiment_RID)/RID,Experiment_RID,Alignment_Format,Aligner,Aligner_Version,Reference_Genome,Sequence_Trimming,Duplicate_Removal,Pre-alignment_Sequence_Removal,Junction_Reads,Library_Type,Protocol_Reference,Library_Selection,Quantification_Format,Quantification_Software,Expression_Metric,Transcriptome_Model,Sequencing_Platform,Paired_End,Read_Length,Strandedness,Used_Spike_Ins,Spike_Ins_Amount,Visualization_Format,Visualization_Software,Visualization_Version,Visualization_Setting,Notes,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Replicate",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/RID,Study_RID,Experiment_RID,Biological_Replicate_Number,Technical_Replicate_Number,Specimen_RID,Collection_Date,Mapped_Reads,GEO_Sample_Accession_ID,Notes,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Specimen",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/S:=(Specimen_RID)=(Gene_Expression:Specimen:RID)/T:=left(Stage_ID)=(Vocabulary:Developmental_Stage:ID)/$S/RID,Title,Species,Stage_ID,Stage_Name:=T:Name,Stage_Detail,Assay_Type,Strain,Wild_Type,Sex,Passage,Phenotype,Cell_Line,Parent_Specimen,Upload_Notes,Preparation,Fixation,Embedding,Internal_ID,Principal_Investigator,Consortium,Release_Date,RCT,RMT,GUDMAP2_Accession_ID?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Specimen_Anatomical_Source",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Specimen_RID)=(Gene_Expression:Specimen:RID)/(RID)=(Gene_Expression:Specimen_Tissue:Specimen_RID)/RID,Specimen_RID,Tissue,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Specimen_Cell_Types",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Specimen_RID)=(Gene_Expression:Specimen:RID)/(RID)=(Gene_Expression:Specimen_Cell_Type:Specimen)/RID,Specimen_RID:=Specimen,Cell_Type,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Single Cell Metrics",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(RID)=(RNASeq:Single_Cell_Metrics:Replicate_RID)/RID,Study_RID,Experiment_RID,Replicate_RID,Reads_%28Millions%29,Reads%2FCell,Detected_Gene_Count,Genes%2FCell,UMI%2FCell,Estimated_Cell_Count,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "File",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(RID)=(RNASeq:File:Replicate_RID)/RID,Study_RID,Experiment_RID,Replicate_RID,Caption,File_Type,File_Name,URI,File_size,MD5,GEO_Archival_URL,dbGaP_Accession_ID,Processed,Notes,Principal_Investigator,Consortium,Release_Date,RCT,RMT,Legacy_File_RID,GUDMAP_NGF_OID,GUDMAP_NGS_OID?limit=none"
}
},
{
"processor": "fetch",
"processor_params": {
"output_path": "assets/Study/{Study_RID}/Experiment/{Experiment_RID}/Replicate/{Replicate_RID}",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(RID)=(RNASeq:File:Replicate_RID)/File_Type=FastQ/File_Name::ciregexp::%5B_.%5DR%5B12%5D%5C.fastq%5C.gz/url:=URI,length:=File_size,filename:=File_Name,md5:=MD5,Study_RID,Experiment_RID,Replicate_RID?limit=none"
}
}
]
}
}
params {
refSource = "aws"
}
workDir = 's3://gudmap-rbk.output/work'
aws.client.storageEncryption = 'AES256'
aws {
region = 'us-east-2'
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
}
process {
executor = 'awsbatch'
cpus = 1
memory = '1 GB'
withName:trackStart {
cpus = 1
memory = '1 GB'
}
withName:getBag {
cpus = 1
memory = '1 GB'
}
withName:getData {
cpus = 1
memory = '1 GB'
}
withName:parseMetadata {
cpus = 15
memory = '1 GB'
}
withName:trimData {
cpus = 20
memory = '2 GB'
}
withName:getRefInfer {
cpus = 1
memory = '1 GB'
}
withName:downsampleData {
cpus = 1
memory = '1 GB'
}
withName:alignSampleData {
cpus = 50
memory = '5 GB'
}
withName:inferMetadata {
cpus = 5
memory = '1 GB'
}
withName:checkMetadata {
cpus = 1
memory = '1 GB'
}
withName:getRef {
cpus = 1
memory = '1 GB'
}
withName:alignData {
cpus = 50
memory = '10 GB'
}
withName:dedupData {
cpus = 5
memory = '20 GB'
}
withName:countData {
cpus = 2
memory = '5 GB'
}
withName:makeBigWig {
cpus = 15
memory = '5 GB'
}
withName:fastqc {
cpus = 1
memory = '1 GB'
}
withName:dataQC {
cpus = 15
memory = '2 GB'
}
withName:aggrQC {
cpus = 2
memory = '1 GB'
}
withName:uploadInputBag {
cpus = 1
memory = '1 GB'
}
withName:uploadExecutionRun {
cpus = 1
memory = '1 GB'
}
withName:uploadQC {
cpus = 1
memory = '1 GB'
}
withName:uploadProcessedFile {
cpus = 1
memory = '1 GB'
}
withName:uploadOutputBag {
cpus = 1
memory = '1 GB'
}
withName:finalizeExecutionRun {
cpus = 1
memory = '1 GB'
}
withName:failPreExecutionRun {
cpus = 1
memory = '1 GB'
}
withName:failExecutionRun {
cpus = 1
memory = '1 GB'
}
withName:uploadQC_fail {
cpus = 1
memory = '1 GB'
}
}
params {
refSource = "biohpc"
}
process {
executor = 'slurm'
queue = 'super'
clusterOptions = '--hold'
time = '4h'
errorStrategy = 'retry'
maxRetries = 1
withName:trackStart {
executor = 'local'
}
withName:getBag {
executor = 'local'
}
withName:getData {
queue = 'super'
}
withName:parseMetadata {
executor = 'local'
}
withName:trimData {
queue = 'super'
}
withName:getRefInfer {
queue = 'super'
}
withName:downsampleData {
executor = 'local'
}
withName:alignSampleData {
queue = '128GB,256GB,256GBv1,384GB'
}
withName:inferMetadata {
queue = 'super'
}
withName:checkMetadata {
executor = 'local'
}
withName:getRef {
queue = 'super'
}
withName:alignData {
queue = '256GB,256GBv1'
}
withName:dedupData {
queue = 'super'
}
withName:countData {
queue = 'super'
}
withName:makeBigWig {
queue = 'super'
}
withName:fastqc {
queue = 'super'
}
withName:dataQC {
queue = 'super'
}
withName:aggrQC {
executor = 'local'
}
withName:uploadInputBag {
executor = 'local'
}
withName:uploadExecutionRun {
executor = 'local'
}
withName:uploadQC {
executor = 'local'
}
withName:uploadProcessedFile {
executor = 'local'
}
withName:uploadOutputBag {
executor = 'local'
}
withName:finalizeExecutionRun {
executor = 'local'
}
withName:failPreExecutionRun {
executor = 'local'
}
withName:failExecutionRun {
executor = 'local'
}
withName:uploadQC_fail {
executor = 'local'
}
}
singularity {
enabled = true
cacheDir = '/project/BICF/BICF_Core/shared/gudmap/singularity_cache/'
}
env {
http_proxy = 'http://proxy.swmed.edu:3128'
https_proxy = 'http://proxy.swmed.edu:3128'
all_proxy = 'http://proxy.swmed.edu:3128'
}
process {
executor = 'slurm'
queue = '256GB,256GBv1,384GB,128GB'
clusterOptions = '--hold'
}
singularity {
enabled = true
cacheDir = '/project/BICF/BICF_Core/shared/gudmap/singularity_cache/'
}
env {
http_proxy = 'http://proxy.swmed.edu:3128'
https_proxy = 'http://proxy.swmed.edu:3128'
all_proxy = 'http://proxy.swmed.edu:3128'
}
custom_logo: './bicf_logo.png'
custom_logo_url: 'https/utsouthwestern.edu/labs/bioinformatics/'
custom_logo_title: 'Bioinformatics Core Facility'
report_header_info:
- Contact Email: 'bicf@utsouthwestern.edu'
- Application Type: 'RNA-Seq Analytic Pipeline for GUDMAP/RBK'
- Department: 'Bioinformatic Core Facility, Department of Bioinformatics, University of Texas Southwestern Medical Center'
title: RNA-Seq Analytic Pipeline for GUDMAP/RBK
report_comment: >
This report has been generated by the <a href="https://doi.org/10.5281/zenodo.3625056">GUDMAP/RBK RNA-Seq Pipeline</a>
top_modules:
- fastqc:
name: 'Raw'
info: 'Replicate Raw fastq QC Results'
- cutadapt:
name: 'Trim'
info: 'Replicate Trim Adapter QC Results'
- hisat2:
name: 'Align'
info: 'Replicate Alignment QC Results'
path_filters:
- '*alignSummary*'
- picard:
name: 'Dedup'
info: 'Replicate Alignement Deduplication QC Results'
- rseqc:
name: 'Inner Distance'
info: 'Replicate Paired End Inner Distance Distribution Results'
path_filters:
- '*insertSize*'
- custom_content
- featureCounts:
name: 'Count'
info: 'Replicate Feature Count QC Results'
- hisat2:
name: 'Inference: Align'
info: 'Inference Alignment (1M downsampled reads) QC Results'
path_filters:
- '*alignSampleSummary*'
- rseqc:
name: 'Inference: Stranded'
info: '1M Downsampled Reads Strandedness Inference Results'
path_filters:
- '*infer_experiment*'
report_section_order:
run:
order: 4000
rid:
order: 3000
meta:
order: 2000
ref:
order: 1000
software_versions:
order: -1000
software_references:
order: -2000
skip_generalstats: true
custom_data:
run:
file_format: 'tsv'
section_name: 'Run'
description: 'This is the run information'
plot_type: 'table'
pconfig:
id: 'run'
scale: false
format: '{}'
headers:
Session:
description: ''
Session ID:
description: 'Nextflow session ID'
Pipeline Version:
description: 'BICF pipeline version'
Input:
description: 'Input overrides'
rid:
file_format: 'tsv'
section_name: 'RID'
description: 'This is the identifying RIDs'
plot_type: 'table'
pconfig:
id: 'rid'
scale: false
format: '{}'
headers:
Replicate:
description: ''
Replicate RID:
description: 'Replicate RID'
Experiment RID:
description: 'Experiment RID'
Study RID:
description: 'Study RID'
meta:
file_format: 'tsv'
section_name: 'Metadata'
description: 'This is the comparison of infered metadata, submitter provided, and calculated'
plot_type: 'table'
pconfig:
id: 'meta'
scale: false
format: '{:,.0f}'
headers:
Source:
description: 'Metadata source'
Species:
description: 'Species'
Ends:
description: 'Single or paired end sequencing'
Stranded:
description: 'Stranded (forward/reverse) or unstranded library prep'
Spike-in:
description: 'ERCC spike in'
Raw Reads:
description: 'Number of reads of the sequencer'
Assigned Reads:
description: 'Final reads after fintering'
Median Read Length:
description: 'Average read length'
Median TIN:
description: 'Average transcript integrity number'
ref:
file_format: 'tsv'
section_name: 'Reference'
description: 'This is the reference version information'
plot_type: 'table'
pconfig:
id: 'ref'
scale: false
format: '{}'
headers:
Species:
description: 'Reference species'
Genome Reference Consortium Build:
description: 'Reference source build'
Genome Reference Consortium Patch:
description: 'Reference source patch version'
GENCODE Annotation Release:
description: 'Annotation release version'
tin:
file_format: 'tsv'
section_name: 'TIN'
description: 'This is the distribution of TIN values calculated by the tool RSeQC'
plot_type: 'bargraph'
pconfig:
id: 'tin'
headers:
chrom
1 - 10
11 - 20
21 - 30
31 - 40
41 - 50
51 - 60
61 - 70
71 - 80
81 - 90
91 - 100
sp:
run:
fn: "run.tsv"
rid:
fn: 'rid.tsv'
meta:
fn: 'metadata.tsv'
ref:
fn: 'reference.tsv'
tin:
fn: '*_tin.hist.tsv'
process {
queue = 'highpriority-0ef8afb0-c7ad-11ea-b907-06c94a3c6390'
}
process {
queue = 'default-0ef8afb0-c7ad-11ea-b907-06c94a3c6390'
}
This image diff could not be displayed because it is too large. You can view the blob instead.
......@@ -9,35 +9,41 @@
3. **BDBag**:
* D'Arcy, M., Chard, K., Foster, I., Kesselman, C., Madduri, R., Saint, N., & Wagner, R.. 2019. Big Data Bags: A Scalable Packaging Format for Science. Zenodo. doi:[10.5281/zenodo.3338725](http://doi.org/10.5281/zenodo.3338725).
4. **RSeQC**:
* Wang, L., Wang, S., Li, W. 2012 RSeQC: quality control of RNA-seq experiments. Bioinformatics. Aug 15;28(16):2184-5. doi:[10.1093/bioinformatics/bts356](https://doi.org/10.1093/bioinformatics/bts356).
5. **trimgalore**:
4. **trimgalore**:
* trimgalore [https://github.com/FelixKrueger/TrimGalore](https://github.com/FelixKrueger/TrimGalore)
6. **hisat2**:
5. **hisat2**:
* Kim ,D.,Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L. 2019 Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. Aug;37(8):907-915. doi:[10.1038/s41587-019-0201-4](https://doi.org/10.1038/s41587-019-0201-4).
7. **samtools**:
6. **samtools**:
* Li H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25: 2078-9. doi:[10.1093/bioinformatics/btp352](http://dx.doi.org/10.1093/bioinformatics/btp352)
8. **picard**:
7. **picard**:
* “Picard Toolkit.” 2019. Broad Institute, GitHub Repository. [http://broadinstitute.github.io/picard/](http://broadinstitute.github.io/picard/); Broad Institute
9. **featureCounts**:
8. **featureCounts**:
* Liao, Y., Smyth, G.K., Shi, W. 2014 featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. Apr 1;30(7):923-30. doi:[10.1093/bioinformatics/btt656](https://doi.org/10.1093/bioinformatics/btt656).
10. **R**:
* R Core Team 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL:[http://www.R-project.org/](http://www.R-project.org/).
11. **deeptools**:
9. **deeptools**:
* Ramírez, F., D. P. Ryan, B. Grüning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, F. Dündar, and T. Manke. 2016. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44: W160-165. doi:[10.1093/nar/gkw257](http://dx.doi.org/10.1093/nar/gkw257)
10. **Seqtk**:
* Seqtk [https://github.com/lh3/seqtk](https://github.com/lh3/seqtk)
11. **R**:
* R Core Team 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL:[http://www.R-project.org/](http://www.R-project.org/).
12. **FastQC**
* FastQC [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
13. **MultiQC**:
13. **SeqWho**
* SeqWho [https://git.biohpc.swmed.edu/s181649/seqwho](https://git.biohpc.swmed.edu/s181649/seqwho)
14. **RSeQC**:
* Wang, L., Wang, S., Li, W. 2012 RSeQC: quality control of RNA-seq experiments. Bioinformatics. Aug 15;28(16):2184-5. doi:[10.1093/bioinformatics/bts356](https://doi.org/10.1093/bioinformatics/bts356).
15. **MultiQC**:
* Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354](https://dx.doi.org/10.1093/bioinformatics/btw354)
14. **Nextflow**:
* Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., and Notredame, C. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology, 35(4), 316.
16. **Nextflow**:
* Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., and Notredame, C. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology, 35(4), 316.
\ No newline at end of file
......@@ -25,12 +25,6 @@
<ul>
<li>D'Arcy, M., Chard, K., Foster, I., Kesselman, C., Madduri, R., Saint, N., &amp; Wagner, R.. 2019. Big Data Bags: A Scalable Packaging Format for Science. Zenodo. doi:<a href="http://doi.org/10.5281/zenodo.3338725">10.5281/zenodo.3338725</a>.</li>
</ul>
<ol start="4" style="list-style-type: decimal">
<li><strong>RSeQC</strong>:</li>
</ol>
<ul>
<li>Wang, L., Wang, S., Li, W. 2012 RSeQC: quality control of RNA-seq experiments. Bioinformatics. Aug 15;28(16):2184-5. doi:<a href="https://doi.org/10.1093/bioinformatics/bts356">10.1093/bioinformatics/bts356</a>.</li>
</ul>
<ol start="5" style="list-style-type: decimal">
<li><strong>trimgalore</strong>:</li>
</ol>
......@@ -61,17 +55,23 @@
<ul>
<li>Liao, Y., Smyth, G.K., Shi, W. 2014 featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. Apr 1;30(7):923-30. doi:<a href="https://doi.org/10.1093/bioinformatics/btt656">10.1093/bioinformatics/btt656</a>.</li>
</ul>
<ol start="10" style="list-style-type: decimal">
<li><strong>R</strong>:</li>
<ol start="11" style="list-style-type: decimal">
<li><strong>deeptools</strong>:</li>
</ol>
<ul>
<li>R Core Team 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL:<a href="http://www.R-project.org/" class="uri">http://www.R-project.org/</a>.</li>
<li>Ramírez, F., D. P. Ryan, B. Grüning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, F. Dündar, and T. Manke. 2016. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44: W160-165. doi:<a href="http://dx.doi.org/10.1093/nar/gkw257">10.1093/nar/gkw257</a></li>
</ul>
<ol start="11" style="list-style-type: decimal">
<li><strong>deeptools</strong>:</li>
<li><strong>Seqtk</strong>:</li>
</ol>
<ul>
<li>Ramírez, F., D. P. Ryan, B. Grüning, V. Bhardwaj, F. Kilpert, A. S. Richter, S. Heyne, F. Dündar, and T. Manke. 2016. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research 44: W160-165. doi:<a href="http://dx.doi.org/10.1093/nar/gkw257">10.1093/nar/gkw257</a></li>
<li>Seqtk <a href="https://github.com/lh3/seqtk" class="uri">https://github.com/lh3/seqtk</a></li>
</ul>
<ol start="10" style="list-style-type: decimal">
<li><strong>R</strong>:</li>
</ol>
<ul>
<li>R Core Team 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL:<a href="http://www.R-project.org/" class="uri">http://www.R-project.org/</a>.</li>
</ul>
<ol start="12" style="list-style-type: decimal">
<li><strong>FastQC</strong></li>
......@@ -79,6 +79,18 @@
<ul>
<li>FastQC <a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/" class="uri">https://www.bioinformatics.babraham.ac.uk/projects/fastqc/</a></li>
</ul>
<ol start="12" style="list-style-type: decimal">
<li><strong>SeqWho</strong></li>
</ol>
<ul>
<li>SeqWho <a href="https://git.biohpc.swmed.edu/s181649/seqwho" class="uri">https://git.biohpc.swmed.edu/s181649/seqwho</a></li>
</ul>
<ol start="4" style="list-style-type: decimal">
<li><strong>RSeQC</strong>:</li>
</ol>
<ul>
<li>Wang, L., Wang, S., Li, W. 2012 RSeQC: quality control of RNA-seq experiments. Bioinformatics. Aug 15;28(16):2184-5. doi:<a href="https://doi.org/10.1093/bioinformatics/bts356">10.1093/bioinformatics/bts356</a>.</li>
</ul>
<ol start="13" style="list-style-type: decimal">
<li><strong>MultiQC</strong>:</li>
</ol>
......
id: 'software_versions'
section_name: 'Software Versions'
section_href: 'https://git.biohpc.swmed.edu/gudmap_rbk/rna-seq/-/blob/78-tool_version/docs/RNA-Seq%20Pipeline%20Design%20Process%20Table.pdf'
section_href: 'https://git.biohpc.swmed.edu/gudmap_rbk/rna-seq/-/wikis/Pipeline/Tool-Versions'
plot_type: 'html'
description: 'are collected for pipeline version.'
data: |
......@@ -10,15 +10,17 @@
<dt>Python</dt><dd>v3.8.3</dd>
<dt>DERIVA</dt><dd>v1.4.3</dd>
<dt>BDBag</dt><dd>v1.5.6</dd>
<dt>RSeQC</dt><dd>v4.0.0</dd>
<dt>Trim Galore!</dt><dd>v0.6.4_dev</dd>
<dt>HISAT2</dt><dd>v2.2.1</dd>
<dt>Samtools</dt><dd>v1.11</dd>
<dt>picard (MarkDuplicates)</dt><dd>v2.23.9</dd>
<dt>featureCounts</dt><dd>v2.0.1</dd>
<dt>R</dt><dd>v4.0.3</dd>
<dt>deepTools</dt><dd>v3.5.0</dd>
<dt>Seqtk</dt><dd>v1.3-r106</dd>
<dt>R</dt><dd>v4.0.3</dd>
<dt>FastQC</dt><dd>v0.11.9</dd>
<dt>SeqWho</dt><dd>vBeta-1.0.0</dd>
<dt>RSeQC</dt><dd>v4.0.0</dd>
<dt>MultiQC</dt><dd>v1.9</dd>
<dt>Pipeline Version</dt><dd>v1.0.2</dd>
<dt>Pipeline Version</dt><dd>v2.0.0rc01</dd>
</dl>
......@@ -22,6 +22,9 @@ profiles {
}
process {
withName:trackStart {
container = 'gudmaprbk/gudmap-rbk_base:1.0.0'
}
withName:getBag {
container = 'gudmaprbk/deriva1.4:1.0.0'
}
......@@ -31,15 +34,27 @@ process {
withName:parseMetadata {
container = 'gudmaprbk/python3:1.0.0'
}
withName:trimData {
container = 'gudmaprbk/trimgalore0.6.5:1.0.0'
withName:getRefERCC {
container = 'gudmaprbk/deriva1.4:1.0.0'
}
withName:getRefInfer {
withName:getRef {
container = 'gudmaprbk/deriva1.4:1.0.0'
}
withName:fastqc {
container = 'gudmaprbk/fastqc0.11.9:1.0.0'
}
withName:seqwho {
container = 'gudmaprbk/seqwho0.0.1:1.0.0'
}
withName:trimData {
container = 'gudmaprbk/trimgalore0.6.5:1.0.0'
}
withName:downsampleData {
container = 'gudmaprbk/seqtk1.3:1.0.0'
}
withName:alignSampleDataERCC {
container = 'gudmaprbk/hisat2.2.1:1.0.0'
}
withName:alignSampleData {
container = 'gudmaprbk/hisat2.2.1:1.0.0'
}
......@@ -49,9 +64,6 @@ process {
withName:checkMetadata {
container = 'gudmaprbk/gudmap-rbk_base:1.0.0'
}
withName:getRef {
container = 'gudmaprbk/deriva1.4:1.0.0'
}
withName:alignData {
container = 'gudmaprbk/hisat2.2.1:1.0.0'
}
......@@ -64,9 +76,6 @@ process {
withName:makeBigWig {
container = 'gudmaprbk/deeptools3.5.0:1.0.0'
}
withName:fastqc {
container = 'gudmaprbk/fastqc0.11.9:1.0.0'
}
withName:dataQC {
container = 'gudmaprbk/rseqc4.0.0:1.0.0'
}
......
{
"bag": {
"bag_name": "Execution_Run_{rid}",
"bag_algorithms": [
"md5"
],
"bag_archiver": "zip",
"bag_metadata": {}
},
"catalog": {
"catalog_id": "2",
"query_processors": [
{
"processor": "csv",
"processor_params": {
"output_path": "Execution_Run",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/RID,Replicate_RID:=Replicate,Workflow_RID:=Workflow,Reference_Genone_RID:=Reference_Genome,Input_Bag_RID:=Input_Bag,Notes,Execution_Status,Execution_Status_Detail,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Workflow",
"query_path": "/entity/M:=RNASeq:Execution_Run/RID=17-BPAG/RNASeq:Workflow?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Reference_Genome",
"query_path": "/entity/M:=RNASeq:Execution_Run/RID=17-BPAG/RNASeq:Reference_Genome?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Input_Bag",
"query_path": "/entity/M:=RNASeq:Execution_Run/RID=17-BPAG/RNASeq:Input_Bag?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "mRNA_QC",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/(RID)=(RNASeq:mRNA_QC:Execution_Run)/RID,Execution_Run_RID:=Execution_Run,Replicate_RID:=Replicate,Paired_End,Strandedness,Median_Read_Length,Raw_Count,Final_Count,Notes,RCT,RMT?limit=none"
}
},
{
"processor": "fetch",
"processor_params": {
"output_path": "assets/Study/{Study_RID}/Experiment/{Experiment_RID}/Replicate/{Replicate_RID}/Execution_Run/{Execution_Run_RID}/Output_Files",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/R:=RNASeq:Replicate/$M/(RID)=(RNASeq:Processed_File:Execution_Run)/url:=File_URL,length:=File_Bytes,filename:=File_Name,md5:=File_MD5,Execution_Run_RID:=M:RID,Study_RID:=R:Study_RID,Experiment_RID:=R:Experiment_RID,Replicate_RID:=R:RID?limit=none"
}
},
{
"processor": "fetch",
"processor_params": {
"output_path": "assets/Study/{Study_RID}/Experiment/{Experiment_RID}/Replicate/{Replicate_RID}/Execution_Run/{Execution_Run_RID}/Input_Bag",
"query_path": "/attribute/M:=RNASeq:Execution_Run/RID=17-BPAG/R:=RNASeq:Replicate/$M/RNASeq:Input_Bag/url:=File_URL,length:=File_Bytes,filename:=File_Name,md5:=File_MD5,Execution_Run_RID:=M:RID,Study_RID:=R:Study_RID,Experiment_RID:=R:Experiment_RID,Replicate_RID:=R:RID?limit=none"
}
}
]
}
}
\ No newline at end of file
{
"bag": {
"bag_name": "{rid}_inputBag",
"bag_algorithms": [
"md5"
],
"bag_archiver": "zip"
},
"catalog": {
"query_processors": [
{
"processor": "csv",
"processor_params": {
"output_path": "Study",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Study_RID)=(RNASeq:Study:RID)/Study_RID:=RID,Internal_ID,Title,Summary,Overall_Design,GEO_Series_Accession_ID,GEO_Platform_Accession_ID,Funding,Pubmed_ID,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment:RID)/Experiment_RID:=RID,Study_RID,Internal_ID,Name,Description,Experiment_Method,Experiment_Type,Species,Specimen_Type,Molecule_Type,Pooled_Sample,Pool_Size,Markers,Cell_Count,Treatment_Protocol,Treatment_Protocol_Reference,Isolation_Protocol,Isolation_Protocol_Reference,Growth_Protocol,Growth_Protocol_Reference,Label_Protocol,Label_Protocol_Reference,Hybridization_Protocol,Hybridization_Protocol_Reference,Scan_Protocol,Scan_Protocol_Reference,Data_Processing,Value_Definition,Notes,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment Antibodies",
"query_path": "/entity/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment_Antibodies:Experiment_RID)?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment Custom Metadata",
"query_path": "/entity/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment_Custom_Metadata:Experiment_RID)?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Experiment Settings",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Experiment_RID)=(RNASeq:Experiment_Settings:Experiment_RID)/RID,Experiment_RID,Alignment_Format,Aligner,Aligner_Version,Reference_Genome,Sequence_Trimming,Duplicate_Removal,Pre-alignment_Sequence_Removal,Junction_Reads,Library_Type,Protocol_Reference,Library_Selection,Quantification_Format,Quantification_Software,Expression_Metric,Transcriptome_Model,Sequencing_Platform,Paired_End,Read_Length,Strandedness,Used_Spike_Ins,Spike_Ins_Amount,Visualization_Format,Visualization_Software,Visualization_Version,Visualization_Setting,Notes,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Replicate",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/RID,Study_RID,Experiment_RID,Biological_Replicate_Number,Technical_Replicate_Number,Specimen_RID,Collection_Date,Mapped_Reads,GEO_Sample_Accession_ID,Notes,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Specimen",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/S:=(Specimen_RID)=(Gene_Expression:Specimen:RID)/T:=left(Stage_ID)=(Vocabulary:Developmental_Stage:ID)/$S/RID,Title,Species,Stage_ID,Stage_Name:=T:Name,Stage_Detail,Assay_Type,Strain,Wild_Type,Sex,Passage,Phenotype,Cell_Line,Parent_Specimen,Upload_Notes,Preparation,Fixation,Embedding,Internal_ID,Principal_Investigator,Consortium,Release_Date,RCT,RMT,GUDMAP2_Accession_ID?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Specimen_Anatomical_Source",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Specimen_RID)=(Gene_Expression:Specimen:RID)/(RID)=(Gene_Expression:Specimen_Tissue:Specimen_RID)/RID,Specimen_RID,Tissue,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Specimen_Cell_Types",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(Specimen_RID)=(Gene_Expression:Specimen:RID)/(RID)=(Gene_Expression:Specimen_Cell_Type:Specimen)/RID,Specimen_RID:=Specimen,Cell_Type,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "Single Cell Metrics",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(RID)=(RNASeq:Single_Cell_Metrics:Replicate_RID)/RID,Study_RID,Experiment_RID,Replicate_RID,Reads_%28Millions%29,Reads%2FCell,Detected_Gene_Count,Genes%2FCell,UMI%2FCell,Estimated_Cell_Count,Principal_Investigator,Consortium,Release_Date,RCT,RMT?limit=none"
}
},
{
"processor": "csv",
"processor_params": {
"output_path": "File",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(RID)=(RNASeq:File:Replicate_RID)/RID,Study_RID,Experiment_RID,Replicate_RID,Caption,File_Type,File_Name,URI,File_size,MD5,GEO_Archival_URL,dbGaP_Accession_ID,Processed,Notes,Principal_Investigator,Consortium,Release_Date,RCT,RMT,Legacy_File_RID,GUDMAP_NGF_OID,GUDMAP_NGS_OID?limit=none"
}
},
{
"processor": "fetch",
"processor_params": {
"output_path": "assets/Study/{Study_RID}/Experiment/{Experiment_RID}/Replicate/{Replicate_RID}",
"query_path": "/attribute/M:=RNASeq:Replicate/RID={rid}/(RID)=(RNASeq:File:Replicate_RID)/File_Type=FastQ/File_Name::ciregexp::%5B_.%5DR%5B12%5D%5C.fastq%5C.gz/url:=URI,length:=File_size,filename:=File_Name,md5:=MD5,Study_RID,Experiment_RID,Replicate_RID?limit=none"
}
}
]
}
}
{
"fetch_config": {
"http": {
"http_cookies": {
"file_names": [
"*cookies.txt"
],
"scan_for_cookie_files": true,
"search_paths": [
"."
],
"search_paths_filter": "*cookies.txt"
}
},
"https": {
"http_cookies": {
"file_names": [
"*cookies.txt"
],
"scan_for_cookie_files": true,
"search_paths": [
"."
],
"search_paths_filter": "*cookies.txt"
}
}
}
}
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment