-
Gervaise Henry authoredecd4e76f
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
To find the state of this project's repository at the time of any of these versions, check out the tags.
v2.0.0
User Facing
- Endness metadata "Single Read" changed to "Single End" in data-hub, pipeline updated to handle (#110) ("Single Read" still acceptable for backwards compatibility)
- Strandedness metadata "yes"/"no" changed to boolean "t"/"f" in data-hub, pipeline updated to handle (#70) ("yes"/"no" still acceptable for backwards compatibility)
- Upload empty mRNA_QC entry if data error (#111)
- Allow forcing of strandedness and spike (#100)
Background
- Add memory limit (75%) per thread for samtools sort (#108)
- Remove parsing restrictions for submitted stranded/spike/species (#105, #106)
- Pass unidentified ends instead of overwriting it as unknown
- Move fastqc process before trim to catch fastq errors (#107)
- Only use fastq's that match *[_.]R[1-2].fastq.gz naming convention (#107)
- Add error output for no fastq's
- Update input bag export config to only fetch fastq's that match *[_.]R[1-2].fastq.gz naming convention
- Remove check for multiple fastq check in parse metadata (redundant and no longer valid)
- Handle blank submitted endness better
- Don't use file.csv from inputBag to parse manual endness, use counted from getData
- Detect malformed fastq's (#107)
- Restrict sampled alignment process to use >32GB nodes on BioHPC (#108)
- Use nproc**-1** for alignment processes (#108)
- Data-hub column title change from "Sequencing_Type" to "Experiment_Type" (#114)
- Data-hub column title change from "Has_Strand_Specific_Information" to "Strandedness" (#115)
- Merge data error pre-inference execution run upload/finalize to 1 process
- Change uploadOutputBag logic to change reuse hatrac file if alread exists (re-uses Output_Bag entry by reassigning Execution_Run RID) (#112)
- Add new CI py tests for override and integration
Known Bugs
- Override params (inputBag, fastq, species) aren't checked for integrity
- Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)
- Check for outputBag in hatrac doesn't check for any uploaded by chaise
v1.0.2
User Facing
Background
- Fix spelling in config file for process of failed fastq to upload error message (#104)
Known Bugs
- Override params (inputBag, fastq, species) aren't checked for integrity
- Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)
v1.0.1
User Facing
Background
- Split non-metadata mismatch error handling proces into 2, 1 to handle fastq errors and one for species errors (BUG FIX #101)
- Add known errors to integration CI tests (ambiguous species, trunkated fastq, R1/R2 mismatch (#103)
- Fix pre exeuction run fails uploading of execution run RID to tracking site (#96, #97)
- Change CI replicate count badge CI to count all execution runs that match major version
Known Bugs
- Override params (inputBag, fastq, species) aren't checked for integrity
- Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)
v1.0.0
User Facing
- Add link to reference builder script
- Output median TIN to mRNA_QC table
Background
- Change consistency test to check if +/- 5% of standard
- Change tool version checker for badges to use latest tag
- Utilize pipeline tracking and qc AWS tables
Known Bugs
- Override params (inputBag, fastq, species) aren't checked for integrity
- Authentication files and tokens must be active (active auth client) for the duration of the pipeline run (until long-lived token utilization included)
v0.1.0
User Facing
- Add option to pull references from datahub
- Add option to send email on workflow error, with pipeline error message
- Add versions and paper references of software used to report
- Upload input bag
- Upload execution run
- Upload mRNA QC
- Create and upload output bag
- Add optional to not upload
- Update references to use bags
- Update to newer references (GRCh38.p13.v36 and GRCm38.p6.vM25)
- Use production server for data-hub reference call
- Error pipeline if submitted does not match infered
- Update execution run with "Success" or "Error"
- Error if fastq error (>2, if pe != 2, if se !=1)
- Error if pe and line count of R1 != R2
- Error if ambiguous species inference
- Remove non fastq from inputBag from the export bag config level
Background
- Remove (comment out) option to pull references from S3
- Make pull references from BioHPC default (including in biohpc.config)
- Start using new gudmaprbk dockerhub (images autobuilt)
- Moved consistency checks to be fully python
- Changed order of steps so that fastqc is done after the trim step
- Change docker images to production
- Add automated version badges
- Only calculate/report tin values on regular chromosomes (from gtf)
- Change inputBag fetch to manifest then validate (if fail fetch missing and revalidate up to 3 times)
- Retry getData and trimData processes up to once
- Make inputBag export config to create inputBag with only small txt file for CI unit test of getData (and update test)
Known Bugs
- Override params (inputBag, fastq, species) aren't checked for integrity
v0.0.3
User Facing
- TPM table:
- Add Ensembl Gene ID
- Rename columns: GENCODE_Gene_Symbol, Ensembl_GeneID, NCBI_GeneID
- MultiQC output custom tables (html+JSON):
- Run table: Session ID and Pipeline Version
- Reference Table: Species, Genome Reference Consortium Build, Genome Reference Consortium Patch, GENCODE Annotation Release (outputs both human and mouse versions)
- Add inputBag override param (
inputBagForce
) [*.zip
]- Uses provided inputBag instead of downloading from data-hub
- Still requires matching repRID input param
- Add fastq override param (
fastqsForce
) [R1
,R2
]- Uses provided fastq instead of downloading from data-hub
- Still requires matching repRID input param and will pull inputBag from data-hub to access submitted metadata for reporting
- Add species override param (
speciesForce
) [Mus musculus
orHomo sapiens
]- forces the use of the provided species
- ignores inferred ambiguous species
Background
- Add GeneSymbol/EnsemblID/EntrezID translation files to references
Known Bugs
- outputBag does not contain fetch for processed data
- Does not include automatic data upload
- Override params (inputBag, fastq, species) aren't checked for integrity
v0.0.2
User Facing
- Output:
- inputBag
- outputBag
- Remove gene details from tpm table
- Add EntrezID translation to tpm table (from version specific reference)
Background
- Add GeneSymbol/EnsemblID/EntrezID translation files to references
Known Bugs
- outputBag does not contain fetch for processed data
- Does not include automatic data upload
v0.0.1
INITIAL BETA VERSION
Does not include automatic data upload
This version is for initial upload of test data to GUDMAP/RBK data-hub for internal integration