CHANGELOG.md



To find the state of this project's repository at the time of any of these versions, check out the tags.


v2.0.0
User Facing

Endness metadata "Single Read" changed to "Single End" in data-hub, pipeline updated to handle (#110) ("Single Read" still acceptable for backwards compatibility)
Strandedness metadata "yes"/"no" changed to boolean "t"/"f" in data-hub, pipeline updated to handle (#70) ("yes"/"no" still acceptable for backwards compatibility)
Upload empty mRNA_QC entry if data error (#111)
Allow forcing of strandedness and spike (#100)

Background

Add memory limit (75%) per thread for samtools sort (#108)
Remove parsing restrictions for submitted stranded/spike/species (#105, #106)
Pass unidentified ends instead of overwriting it as unknown
Move fastqc process before trim to catch fastq errors (#107)
Only use fastq's that match *[_.]R[1-2].fastq.gz naming convention (#107)
Add error output for no fastq's
Update input bag export config to only fetch fastq's that match *[_.]R[1-2].fastq.gz naming convention
Remove check for multiple fastq check in parse metadata (redundant and no longer valid)
Handle blank submitted endness better
Don't use file.csv from inputBag to parse manual endness, use counted from getData
Detect malformed fastq's (#107)
Restrict sampled alignment process to use >32GB nodes on BioHPC (#108)
Use nproc**-1** for alignment processes (#108)
Data-hub column title change from "Sequencing_Type" to "Experiment_Type" (#114)
Data-hub column title change from "Has_Strand_Specific_Information" to "Strandedness" (#115)
Merge data error pre-inference execution run upload/finalize to 1 process
Change uploadOutputBag logic to change reuse hatrac file if alread exists (re-uses Output_Bag entry by reassigning Execution_Run RID) (#112)
Add new CI py tests for override and integration

Known Bugs

Override params (inputBag, fastq, species) aren't checked for integrity
Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)
Check for outputBag in hatrac doesn't check for any uploaded by chaise


v1.0.2
User Facing
Background

Fix spelling in config file for process of failed fastq to upload error message (#104)

Known Bugs

Override params (inputBag, fastq, species) aren't checked for integrity
Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)


v1.0.1
User Facing
Background

Split non-metadata mismatch error handling proces into 2, 1 to handle fastq errors and one for species errors (BUG FIX #101)
Add known errors to integration CI tests (ambiguous species, trunkated fastq, R1/R2 mismatch (#103)
Fix pre exeuction run fails uploading of execution run RID to tracking site (#96, #97)
Change CI replicate count badge CI to count all execution runs that match major version

Known Bugs

Override params (inputBag, fastq, species) aren't checked for integrity
Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)


v1.0.0
User Facing

Add link to reference builder script
Output median TIN to mRNA_QC table

Background

Change consistency test to check if +/- 5% of standard
Change tool version checker for badges to use latest tag
Utilize pipeline tracking and qc AWS tables

Known Bugs

Override params (inputBag, fastq, species) aren't checked for integrity
Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)


v0.1.0
User Facing

Add option to pull references from datahub
Add option to send email on workflow error, with pipeline error message
Add versions and paper references of software used to report
Upload input bag
Upload execution run
Upload mRNA QC
Create and upload output bag
Add optional to not upload
Update references to use bags
Update to newer references (GRCh38.p13.v36 and GRCm38.p6.vM25)
Use production server for data-hub reference call
Error pipeline if submitted does not match infered
Update execution run with "Success" or "Error"
Error if fastq error (>2, if pe != 2, if se !=1)
Error if pe and line count of R1 != R2
Error if ambiguous species inference
Remove non fastq from inputBag from the export bag config level

Background

Remove (comment out) option to pull references from S3
Make pull references from BioHPC default (including in biohpc.config)
Start using new gudmaprbk dockerhub (images autobuilt)
Moved consistency checks to be fully python
Changed order of steps so that fastqc is done after the trim step
Change docker images to production
Add automated version badges
Only calculate/report tin values on regular chromosomes (from gtf)
Change inputBag fetch to manifest then validate (if fail fetch missing and revalidate up to 3 times)
Retry getData and trimData processes up to once
Make inputBag export config to create inputBag with only small txt file for CI unit test of getData (and update test)

Known Bugs

Override params (inputBag, fastq, species) aren't checked for integrity


v0.0.3
User Facing

TPM table:

Add Ensembl Gene ID
Rename columns: GENCODE_Gene_Symbol, Ensembl_GeneID, NCBI_GeneID


MultiQC output custom tables (html+JSON):

Run table: Session ID and Pipeline Version

Reference Table: Species, Genome Reference Consortium Build, Genome Reference Consortium Patch, GENCODE Annotation Release (outputs both human and mouse versions)


Add inputBag override param (inputBagForce) [*.zip]

Uses provided inputBag instead of downloading from data-hub
Still requires matching repRID input param


Add fastq override param (fastqsForce) [R1,R2]

Uses provided fastq instead of downloading from data-hub
Still requires matching repRID input param and will pull inputBag from data-hub to access submitted metadata for reporting


Add species override param (speciesForce) [Mus musculus or Homo sapiens]

forces the use of the provided species
ignores inferred ambiguous species


Background

Add GeneSymbol/EnsemblID/EntrezID translation files to references

Known Bugs

outputBag does not contain fetch for processed data
Does not include automatic data upload
Override params (inputBag, fastq, species) aren't checked for integrity


v0.0.2
User Facing

Output:

inputBag
outputBag


Remove gene details from tpm table
Add EntrezID translation to tpm table (from version specific reference)

Background

Add GeneSymbol/EnsemblID/EntrezID translation files to references

Known Bugs

outputBag does not contain fetch for processed data
Does not include automatic data upload


v0.0.1
INITIAL BETA VERSION

Does not include automatic data upload

This version is for initial upload of test data to GUDMAP/RBK data-hub for internal integration