7.62 KB
Newer Older
Brandi Cantarel's avatar
Brandi Cantarel committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
# Astrocyte Germline Variant Calling Workflow Package

This workflow carries out a Germline Exome Analysis pipeline, including the integration of variants from various callers and basic annotation.

    1) RNA Alignments are then recalibrated and realigned using GATK3 (DePristo et al 2011;McKenna et al 2010)

    2) To detect genome germline variants, GATK3 (DePristo et al 2011, McKenna et al 2010), Platypus (Rimmer et al 2014), Samtools version 1.3 and FreeBayes version 0.9.7 (Garrison and Marth 2012) are used. 

    3) Integration of predicted SNPs and INDELs from these algorithms is performed using BAYSIC (Cantarel et al 2014).

    4) Effect of SNPs and INDELs on genes is predicted using snpEff (Cingolani et al 2012) using the gencode gene annotations. For GRCH38 Only: allele frequency in the general population is determined by comparison to ExAC (The ExAC Consortium 2015). Additionally for this build, discovered variants are annotated using SnpSift (Cingolani et al 2012) using the dbSNP, COSMIC (Forbes et al. 2009), CLINVAR (Landrum et al 2014), GWAS Catalog (Welter et al 2014) and dbNSFP (Liu et al 2011) databases.

    5) Features (genes, transcripts and exons) are counted using featureCounts (Liao et al 2014) using the Gencode feature table(Harrow et al. 2012)

##Workflow Parameters

    rnabam - Choose the alignments of your RNASeq data (generated by RNASEq Differential Expression Pipeline).
    dnabam - Choose the bamfiles from genomic data that should be used for gene fusion

    genome - Choose a genomic reference (genome).

    pairs - Choose if pair-ended or single-end sequences

    incdna - Choose whether GeneFusion analysis should include evidence from genomic data from the same sample

    design - This file matches the fastq files to data about the sample

 The following columns are necessary, must be named as in template and can be in any order:

        This ID should match the name in the fastq file ie S0001.R1.fastq.gz the sample ID is S0001
        This ID can be the identifier of the researcher or clinician
        Used in order to link samples from the same patient
	2= Case or Diseaes Phenotype, 1= Healthy Control
	1=male, 2=female
	Name of the fastq file R1
	Name of the fastq file R2

There are some optional columns that might help with the analysis:
      GeneticFeature (WT or KO)

### Test Data

### Credits
This example worklow is derived from original scripts kindly contributed by the Bioinformatic Core Facility (BICF), Department of Bioinformatics and Clinical Sciences.

### References

    Andy Rimmer, Hang Phan, Iain Mathieson, Zamin Iqbal, Stephen R. F. Twigg, WGS500 Consortium, Andrew O. M. Wilkie, Gil McVean, Gerton Lunter. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics (2014) doi:10.1038/ng.3036
    Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., & Snyder, M. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. doi:10.1038/nature11247
    Cantarel, B. L., Weaver, D., McNeill, N., Zhang, J., Mackey, A. J., & Reese, J. (2014). BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity. BMC Bioinformatics, 15, 104. doi:10.1186/1471-2105-15-104
    Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., ? Ruden, D. M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. doi:10.4161/fly.19695
    Cingolani, P., Patel, V. M., Coon, M., Nguyen, T., Land, S. J., Ruden, D. M., & Lu, X. (2012). Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift. Frontiers in Genetics. doi:10.3389/fgene.2012.00035
    Challis, D., Yu, J., Evani, U. S., Jackson, A. R., Paithankar, S., Coarfa, C., ? Yu, F. (2012). An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. doi:10.1186/1471-2105-13-8
    DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V, Maguire, J. R., Hartl, C., ? Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43, 491–498. doi:10.1038/ng.806
    Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: [01 (01, 2016) accessed].
    Forbes, S. A., Tang, G., Bindal, N., Bamford, S., Dawson, E., Cole, C., ? Futreal, P. A. (2009). COSMIC (the Catalogue of Somatic Mutations In Cancer): A resource to investigate acquired mutations in human cancer. Nucleic Acids Research, 38. doi:10.1093/nar/gkp995
    Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012
    Hansen NF, Gartner JJ, Mei L, Samuels Y, Mullikin JC. Shimmer: detection of genetic alterations in tumors using next-generation sequence data. Bioinformatics. 2013 Jun 15;29(12):1498-503. doi: 10.1093/bioinformatics/btt183. Epub 2013 Apr 24. PubMed PMID: 23620360; PubMed Central PMCID: PMC3673219.
    Kim S, Jeong K, Bhutani K, Lee J, Patel A, Scott E, Nam H, Lee H, Gleeson JG, Bafna V. Virmid: accurate detection of somatic mutations with sample impurity inference. Genome Biol. 2013 Aug 29;14(8):R90. doi: 10.1186/gb-2013-14-8-r90. PubMed PMID: 23987214; PubMed Central PMCID: PMC4054681.
    Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2. PubMed PMID: 22300766; PubMed Central PMCID: PMC3290792.
    Landrum, M. J., Lee, J. M., Riley, G. R., Jang, W., Rubinstein, W. S., Church, D. M., & Maglott, D. R. (2014). ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 42. doi:10.1093/nar/gkt1113
    Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv Preprint arXiv, 00, 3. doi:arXiv:1303.3997 [q-bio.GN]
    Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011 Aug;32(8):894-9. doi: 10.1002/humu.21517. PubMed PMID: 21520341
    McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., ? DePristo, M. A. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 1297–1303. doi:10.1101/gr.107524.110
    The 1000 Genome Consortium. An integrated map of genetic variation from 1,092 human genomes. (2012). Nature, 491(7422), 56–65. Retrieved from
    Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012 Jul 15;28(14):1811-7. doi: 10.1093/bioinformatics/bts271. Epub 2012 May 10. PubMed PMID: 22581179.
    Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, and Parkinson H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research, 2014, Vol. 42 (Database issue): D1001-D1006.