title: 'BICF RNASeq Variant Analysis Workflow'
This is a workflow package for the BICF RNASeq Germline Variant workflow system.
It implements a simple germline variant analysis workflow using TrimGalore, HiSAT,
Speedseq, GATK, Samtools and FeatureCount. SNPs and Indels are integrated using BAYSIC;
then annotated using SNPEFF and SnpSift.
THIS WORKFLOW IS OBSOLETE! The Main BICF workflow includes variant analysis and differential expression analysis as one easy to use workflow.
# Astrocyte Germline Variant Calling Workflow Package
This workflow carries out a Germline Exome Analysis pipeline, including the integration of variants from various callers and basic annotation.
1) RNA Alignments are then recalibrated and realigned using GATK3 (DePristo et al 2011;McKenna et al 2010)
2) To detect genome germline variants, GATK3 (DePristo et al 2011, McKenna et al 2010), Platypus (Rimmer et al 2014), Samtools version 1.3 and FreeBayes version 0.9.7 (Garrison and Marth 2012) are used.
3) Integration of predicted SNPs and INDELs from these algorithms is performed using BAYSIC (Cantarel et al 2014).
4) Effect of SNPs and INDELs on genes is predicted using snpEff (Cingolani et al 2012) using the gencode gene annotations. For GRCH38 Only: allele frequency in the general population is determined by comparison to ExAC (The ExAC Consortium 2015). Additionally for this build, discovered variants are annotated using SnpSift (Cingolani et al 2012) using the dbSNP, COSMIC (Forbes et al. 2009), CLINVAR (Landrum et al 2014), GWAS Catalog (Welter et al 2014) and dbNSFP (Liu et al 2011) databases.
5) Features (genes, transcripts and exons) are counted using featureCounts (Liao et al 2014) using the Gencode feature table(Harrow et al. 2012)
##Workflow Parameters
rnabam - Choose the alignments of your RNASeq data (generated by RNASEq Differential Expression Pipeline).
dnabam - Choose the bamfiles from genomic data that should be used for gene fusion
genome - Choose a genomic reference (genome).
pairs - Choose if pair-ended or single-end sequences
incdna - Choose whether GeneFusion analysis should include evidence from genomic data from the same sample
design - This file matches the fastq files to data about the sample
The following columns are necessary, must be named as in template and can be in any order:
This ID should match the name in the fastq file ie S0001.R1.fastq.gz the sample ID is S0001
This ID can be the identifier of the researcher or clinician
Used in order to link samples from the same patient
2= Case or Diseaes Phenotype, 1= Healthy Control
1=male, 2=female
Name of the fastq file R1
Name of the fastq file R2
There are some optional columns that might help with the analysis:
GeneticFeature (WT or KO)
### Test Data
### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics and Clinical Sciences.
