Merge branch 'develop' into 'master'

Develop See merge request !41

Merge branch 'develop' into 'master'
Develop See merge request !41
33e60071 · Gervaise Henry · 4c37ee84 · 2a53502c · 33e60071 · 33e60071
Commit 33e60071 authored 5 years ago by Gervaise Henry
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
 before_script:
  - module load astrocyte
  - module load python/3.6.1-2-anaconda
+  - pip install --user pytest-pythonpath==0.7.1 pytest-cov==2.5.1
  - module load nextflow/0.31.1_Ignite
  - mkdir -p test_data/simple1
  - mkdir -p test_data/simple2
@@ -14,14 +15,44 @@ stages:
 astrocyte_check:
  stage: astrocyte
  script:
-  - astrocyte_cli check ../cellranger_mkfastq
+    - astrocyte_cli check ../cellranger_mkfastq
+  artifacts:
+    expire_in: 2 days
+  retry:
+    max: 1
+    when:
+      - always

 simple_1FC:
  stage: simple
+  except:
+    - tags
  script:
-  - nextflow run workflow/main.nf --bcl "test_data/simple1/*.tar.gz" --designFile "test_data/simple1/cellranger-tiny-bcl-simple-1_2_0.csv"
+    - nextflow run workflow/main.nf --bcl "test_data/simple1/*.tar.gz" --designFile "test_data/simple1/cellranger-tiny-bcl-simple-1_2_0.csv"
+    - pytest -m simple1
+  artifacts:
+    name: "$CI_JOB_NAME"
+    when: always
+    paths:
+      - .nextflow.log
+    expire_in: 2 days
+  retry:
+    max: 1
+    when:
+      - always

 simple_2FC:
  stage: simple
  script:
-  - nextflow run workflow/main.nf --bcl "test_data/simple2/*.tar.gz" --designFile "test_data/simple2/cellranger-tiny-bcl-simple-1_2_0.csv"
+    - nextflow run workflow/main.nf --bcl "test_data/simple2/*.tar.gz" --designFile "test_data/simple2/cellranger-tiny-bcl-simple-1_2_0.csv"
+    - pytest -m simple2
+  artifacts:
+    name: "$CI_JOB_NAME"
+    when: always
+    paths:
+      - .nextflow.log
+    expire_in: 2 days
+  retry:
+    max: 1
+    when:
+      - always
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
+# v1.2.0 (in development)
+**User Facing**
+* Add references to of tools to mutiQC report
+* Add BICF details to multiqc report
+* Create cellranger_count design file (if only 1 flowcell is inputted)
+
+**Background**
+* Add DOI (develop branch)
+* Add changelog as link to astrocyte docs (master branch)
+* Update example design file link in astrocyte docs (master branch)
+* Check tarballed bcl directory for spaces and exit if it contains one...cellranger mkfastq cannot handle spaces (develop branch)
+* Move untar (including space check) to bash script
+* Add Jeremy Mathews to author list
+* Apply style guide
+* Add pytests for ouptuts
+
+*Known Bugs*
+* cellranger mkfastq will not accept spaces in path for run param even if quoted, issue raised on 10XGenomics/cellranger github issue [#31](https://github.com/10XGenomics/cellranger/issues/31)
+    * note: 10x doesn't check github issues, emailed instead
+    * note: pipeline checks for spaces and exits prematurely if found
+* If multiple flowcells (tar'd) files are inputted then there will be multiple fastq's by the same name, currently dealing with that name conflict is not tractable
+    * note: if multiple bcl files are detected then cellranger_count design file is not created
+
 # v1.1.4
-### User Facing
+**User Facing**
 * Fix design file not visible in Astrocyte
 * Fix handling of multiple flowcells in 1 submission
-### Background
+
+**Background**
 * Move multiqc config to conf folder
 * Add CI test for multiple flowcells
 * Add changelog
 * Quote design/tarball/$baseDir path in processes in case of spaces
-### *Known Bugs*
+
+*Known Bugs*
 * cellranger mkfastq will not accept spaces in path for run param even if quoted, issue raised on 10XGenomics/cellranger github issue [#31](https://github.com/10XGenomics/cellranger/issues/31)
--- a/README.md
+++ b/README.md
@@ -2,6 +2,8 @@
 |:-:|:-:|
 |[![Build Status](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/badges/master/build.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/commits/master)|[![Build Status](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/badges/develop/build.svg)](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/commits/develop)|

+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2652611.svg)](https://doi.org/10.5281/zenodo.2652611)
+
 10x Genomics scRNA-Seq (cellranger) mkfastq Pipeline
 ==================================================

@@ -23,28 +25,39 @@ To Run:

 * Available parameters:
  * **--name**
-        * run name, puts outputs in a directory with this name
-        * eg: **--name 'test'**
+    * run name, puts outputs in a directory with this name
+    * eg: **--name 'test'**
  * **--bcl**
-        * Base call files (tarballed [*.tar] +/- gunzipping [*.tar.gz] from a sequencing of 10x single-cell expereiment, supports pigr parallelization).
-        * There can be multiple basecall files, but they all will be demultiplexed by the same design file.
-        * eg: **--bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.tar.gz'**
+    * Base call files (tarballed [*.tar] +/- gunzipping [*.tar.gz] from a sequencing of 10x single-cell expereiment, supports pigr parallelization).
+    * There can be multiple basecall files, but they all will be demultiplexed by the same design file.
+    * eg: **--bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-1_2_0.tar.gz'**
  * **--designFile**
-        * path to design file (csv format) location
-        * column 1 = "Lane" (number of lanes to demultiplex, */** for all lanes)
-        * column 2 = "Sample" (sample name)
-        * column 3 = "Index" (10x sample index barcode, eg SI-GA-A1)
-        * can have repeated "Sample" if there are multiple fastq R1/R2 pairs for the samples
-        * eg: **--designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.csv'**
-    * **--outDir**
-        * optional output directory for run
-        * eg: **--outDir 'test'**
-    * FULL EXAMPLE:
-
-**nextflow run workflow/main.nf --name 'test' --bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.tar.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.csv' --outDir 'test'**
-
+    * path to design file (csv format) location
+    * column 1 = "Lane" (number of lanes to demultiplex, */** for all lanes)
+    * column 2 = "Sample" (sample name)
+    * column 3 = "Index" (10x sample index barcode, eg SI-GA-A1)
+    * can have repeated "Sample" if there are multiple fastq R1/R2 pairs for the samples
+    * can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/blob/master/docs/design.csv)
+    * eg: **--designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple/cellranger-tiny-bcl-simple-1_2_0.csv'**
+  * **--outDir**
+    * optional output directory for run
+    * eg: **--outDir 'test'**
+* FULL EXAMPLE:
+  ```
+  nextflow run workflow/main.nf --name 'test' --bcl '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-1_2_0.tar.gz' --designFile '/project/shared/bicf_workflow_ref/workflow_testdata/cellranger/cellranger_mkfastq/simple1/cellranger-tiny-bcl-simple-1_2_0.csv' --outDir 'test'
+  ```
 * Design example:

 | Lane | Sample      | Index     |
 |------|-------------|-----------|
 | *    | test_sample | SI-P03-C9 |
+
+
+[**CHANGELOG**](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/blob/develop/CHANGELOG.md)
+
+Credits
+-------
+This worklow is was developed jointly with the [Bioinformatic Core Facility (BICF), Department of Bioinformatics](http://www.utsouthwestern.edu/labs/bioinformatics/)
+
+
+Please cite in publications: Pipeline was developed by BICF from funding provided by **Cancer Prevention and Research Institute of Texas (RP150596)**.
--- a/astrocyte_pkg.yml
+++ b/astrocyte_pkg.yml
@@ -9,7 +9,7 @@
 # A unique identifier for the workflow package, text/underscores only
 name: 'cellranger_mkfastq'
 # Who wrote this?
-author: 'Gervaise H. Henry, Venkat Malladi, and Jon Gesell'
+author: 'Gervaise H. Henry, Jon Gesell, Jeremy Mathews, and Venkat Malladi'
 # A contact email address for questions
 email: 'bicf@utsouthwestern.edu'
 # A more informative title for the workflow package
@@ -85,13 +85,13 @@ workflow_parameters:
    required: true
    description: |
      One or more input tarball (+/- gunzip) basecall files (bcl) from a sequencing of 10x single-cell expereiment (can be .tar or .tar.gz).
-    regex: ".*tar*"
+    regex: ".*\\.tar*"
    min: 1

  - id: designFile
    type: file
    required: true
-    regex: ".*csv"
+    regex: ".*\\.csv"
    description: |
      A design file listing lane, sample, corresponding index.


--- a/docs/index.md
+++ b/docs/index.md
@@ -15,16 +15,16 @@ To Run:

 * Workflow parameters:
  * **bcl**
-        * Base call files (tarballed [*.tar] +/- gunzipping [*.tar.gz] from a sequencing of 10x single-cell expereiment, supports pigr parallelization).
-        * There can be multiple basecall files, but they all will be demultiplexed by the same design file.
-        * REQUIRED
+    * Base call files (tarballed [*.tar] +/- gunzipping [*.tar.gz] from a sequencing of 10x single-cell expereiment, supports pigr parallelization).
+    * There can be multiple basecall files, but they all will be demultiplexed by the same design file.
+    * REQUIRED
  * **design file**
-        * A design file listing lane, sample, corresponding sample barcode. There can be multiple rows with the same sample name, if there are multiple fastq's for that sample.
-        * REQUIRED
-        * column 1 = "Lane" (number of lanes to demultiplex, */** for all lanes)
-        * column 2 = "Sample" (sample name)
-        * column 3 = "Index" (10x sample index barcode, eg SI-GA-A1)
-        * eg: can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/docs/design.csv)
+    * A design file listing lane, sample, corresponding sample barcode. There can be multiple rows with the same sample name, if there are multiple fastq's for that sample.
+    * REQUIRED
+    * column 1 = "Lane" (number of lanes to demultiplex, */** for all lanes)
+    * column 2 = "Sample" (sample name)
+    * column 3 = "Index" (10x sample index barcode, eg SI-GA-A1)
+    * eg: can be downloaded [HERE](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/blob/master/docs/design.csv)


 * Design example:
@@ -35,6 +35,8 @@ To Run:
    


+[**CHANGELOG**](https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq/blob/master/CHANGELOG.md)
+
 Credits
 -------
 This worklow is was developed jointly with the [Bioinformatic Core Facility (BICF), Department of Bioinformatics](http://www.utsouthwestern.edu/labs/bioinformatics/)

--- a/docs/references.md
+++ b/docs/references.md
+### References
+
+1. **python**:
+  * Anaconda (Anaconda Software Distribution, [https://anaconda.com](https://anaconda.com))
+
+2. **pigz**:
+  * Parallel implementation of gzip [https://zlib.net/pigz/](https://zlib.net/pigz/)
+
+3. **bcl2fastq**:
+  * Ilumina's bcl2fastq [https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html](https://support.illumina.com/sequencing/sequencing_software/bcl2fastq-conversion-software.html)
+
+3. **cellranger**:
+  * 10x Genomics cellranger mkfastq [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/mkfastq](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/mkfastq)
+
+4. **fastqc**:
+  * fastqc [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
+
+5. **MultiQc**:
+  * Ewels P., Magnusson M., Lundin S. and Käller M. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19): 3047–3048. doi:[10.1093/bioinformatics/btw354](https://dx.doi.org/10.1093/bioinformatics/btw354)
+
+6. **Nextflow**:
+  * Di Tommaso P., Chatzou M., Floden E. W., Barja P. P., Palumbo E., and Notredame C. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology 35(4): 316. doi:[10.1038/nbt.3820](https://doi.org/10.1038/nbt.3820)
--- a/workflow/conf/bicf_logo.png
+++ b/workflow/conf/bicf_logo.png
--- a/workflow/conf/biohpc.config
+++ b/workflow/conf/biohpc.config
@@ -25,7 +25,7 @@ process {
  }
  withLabel:multiqc {
    module = ['multiqc/1.7']
-    executor = 'super'
+    executor = 'local'
  }
 }


--- a/workflow/conf/multiqc_config.yaml
+++ b/workflow/conf/multiqc_config.yaml
+# Custom Logo
+custom_logo: 'bicf_logo.png'
+custom_logo_url: 'https://www.utsouthwestern.edu/labs/bioinformatics/'
+custom_logo_title: 'Bioinformatics Core Facility'
+
+report_header_info:
+    - Contact E-mail: 'bicf@utsouthwestern.edu'
+    - Application Type: 'cellranger_mkfastq'
+    - Department: 'Bioinformatic Core Facility, Department of Bioinformatics'
+
+
+# Title to use for the report.
+title: BICF CellRanger MKfastq Analysis Report
+
+report_comment: >
+  This report has been generated by the <a href="https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_mkfastq"
+  target="_blank">BICF/cellranger_mkfastq</a> pipeline.
+
 module_order:
    - bcl2fastq
    - fastqc:
@@ -19,3 +37,9 @@ module_order:
        path_filters:
            - '*_R2_*fastqc.zip'
    - custom_content
+
+report_section_order:
+    software_versions:
+      order: -1100
+    software_references:
+      order: -1200
--- a/workflow/main.nf
+++ b/workflow/main.nf
@@ -3,14 +3,23 @@
 // Path to an input file, or a pattern for multiple inputs
 // Note - $baseDir is the location of this workflow file main.nf

+
 // Define Input variables
 params.name = "run"
-params.bcl = "$baseDir/../test_data/*.tar.gz"
-params.designFile = "$baseDir/../test_data/design.csv"
-params.outDir = "$baseDir/output"
+params.bcl = "${baseDir}/../test_data/*.tar.gz"
+params.designFile = "${baseDir}/../test_data/design.csv"
+params.outDir = "${baseDir}/output"
+params.multiqcConf = "${baseDir}/conf/multiqc_config.yaml"
+params.references = "${baseDir}/../docs/references.md"
+

 // Define List of Files
-tarList = Channel.fromPath( params.bcl )
+tarList = Channel
+  .fromPath( params.bcl )
+bclCount = Channel
+  .fromPath( params.bcl )
+  .count()
+

 // Define regular variables
 name = params.name
@@ -18,153 +27,176 @@ designLocation = Channel
  .fromPath(params.designFile)
  .ifEmpty { exit 1, "design file not found: ${params.designFile}" }
 outDir = params.outDir
+multiqcConf = params.multiqcConf
+references = params.references
+

 process checkDesignFile {
-  tag "$name"
-  publishDir "$outDir/misc/${task.process}/$name", mode: 'copy'
+
+  tag "${name}"
+  publishDir "${outDir}/misc/${task.process}/${name}", mode: 'copy'
  module 'python/3.6.1-2-anaconda'

  input:
-
-  file designLocation
+    file designLocation

  output:
-
-  file("design.checked.csv") into designPaths
+    file("design.checked.csv") into designPaths
+    file("design.checked.csv") into designCount

  script:
+    """
+    hostname
+    ulimit -a
+    python3 ${baseDir}/scripts/check_design.py -d ${designLocation}
+    """

-  """
-  hostname
-  ulimit -a
-  python3 "$baseDir/scripts/check_design.py" -d "$designLocation"
-  """
 }


 process untarBCL {
-  tag "$tar"
-  publishDir "$outDir/${task.process}", mode: 'copy'
+
+  tag "${tar}"
+  publishDir "${outDir}/${task.process}", mode: 'copy'
  module 'pigz/2.4'

  input:
-
-  file tar from tarList
+    file tar from tarList

  output:
-
-  file("*") into bclPaths mode flatten
+    file("*") into bclPaths mode flatten

  script:
+    """
+    hostname
+    ulimit -a
+    bash ${baseDir}/scripts/untarBCL.sh -t ${tar}
+    """

-  """
-  hostname
-  ulimit -a
-  name=`echo ${tar} | rev | cut -f1 -d '.' | rev`;
-  if [ "\${name}" == "gz" ];
-  then tar -xvf "$tar" -I pigz;
-  else tar -xvf "$tar";
-  fi;
-  """
 }


 process mkfastq {
+
  tag "${bcl.baseName}"
  queue '128GB,256GB,256GBv1,384GB'
-  publishDir "$outDir/${task.process}", mode: 'copy', pattern: "{*/outs/**/*.fastq.gz}"
+  publishDir "${outDir}/${task.process}", mode: 'copy', pattern: "{*/outs/**/*.fastq.gz}"
  module 'cellranger/3.0.2:bcl2fastq/2.19.1'

  input:
-
-  each bcl from bclPaths.collect()
-  file design from designPaths
+    each bcl from bclPaths.collect()
+    file design from designPaths

  output:
-
-  file("**/outs/**/*.fastq.gz") into fastqPaths
-  file("**/outs/fastq_path/Stats/Stats.json") into bqcPaths
-  val "${bcl.baseName}" into bclName
+    file("**/outs/**/*.fastq.gz") into fastqPaths
+    file("**/outs/**/*.fastq.gz") into cellrangerCount
+    file("**/outs/fastq_path/Stats/Stats.json") into bqcPaths
+    val "${bcl.baseName}" into bclName

  script:
+    """
+    hostname
+    ulimit -a  
+    cellranger mkfastq --id=${bcl.baseName} --run=${bcl} --csv=${design} -r \$SLURM_CPUS_ON_NODE  -p \$SLURM_CPUS_ON_NODE  -w \$SLURM_CPUS_ON_NODE 
+    """
+
+}
+
+
+if (bclCount.value == 1) {
+
+  process countDesign {
+
+    tag "${name}"
+    publishDir "${outDir}/misc/${task.process}/${name}", mode: 'copy'
+
+    input:
+      file fastqs from cellrangerCount.collect()
+      file design from designCount
+
+    output:
+      file("Cellranger_Count_Design.csv") into CountDesign
+
+    script:
+      """
+      bash ${baseDir}/scripts/countDesign.sh
+      """
+
+  }

-  """
-  hostname
-  ulimit -a
-  cellranger mkfastq --id="${bcl.baseName}" --run="$bcl" --csv=$design -r \$SLURM_CPUS_ON_NODE  -p \$SLURM_CPUS_ON_NODE  -w \$SLURM_CPUS_ON_NODE 
-  """
 }

+
 process fastqc {
-  tag "$bclName"
+
+  tag "${bclName}"
  queue 'super'
-  publishDir "$outDir/misc/${task.process}/$name/$bclName", mode: 'copy', pattern: "{*fastqc.zip}"
+  publishDir "${outDir}/misc/${task.process}/${name}/${bclName}", mode: 'copy', pattern: "{*fastqc.zip}"
  module 'fastqc/0.11.5:parallel'

  input:
-  file fastqPaths
-  val bclName
+    file fastqPaths
+    val bclName

  output:
-
-  file("*fastqc.zip") into fqcPaths
+    file("*fastqc.zip") into fqcPaths

  script:
+    """
+    hostname
+    ulimit -a
+    find *.fastq.gz -exec mv {} ${bclName}.{} \\;
+    bash ${baseDir}/scripts/fastqc.sh
+    """

-  """
-  hostname
-  ulimit -a
-  find *.fastq.gz -exec mv {} $bclName.{} \\;
-  bash "$baseDir/scripts/fastqc.sh"
-  
-  """
 }


 process versions {
-  tag "$name"
-  publishDir "$outDir/misc/${task.process}/$name", mode: 'copy'
-  module 'python/3.6.1-2-anaconda:cellranger/3.0.2:bcl2fastq/2.19.1:fastqc/0.11.5'
+
+  tag "${name}"
+  publishDir "${outDir}/misc/${task.process}/${name}", mode: 'copy'
+  module 'python/3.6.1-2-anaconda:cellranger/3.0.2:bcl2fastq/2.19.1:fastqc/0.11.5:pandoc/2.7'

  input:

  output:
-
-  file("*.yaml") into yamlPaths
+    file("*.yaml") into yamlPaths

  script:
+    """
+    hostname
+    ulimit -a
+    echo ${workflow.nextflow.version} > version_nextflow.txt
+    bash ${baseDir}/scripts/versions_mkfastq.sh
+    bash ${baseDir}/scripts/versions_fastqc.sh
+    python3 ${baseDir}/scripts/generate_versions.py -f version_*.txt -o versions
+    python3 ${baseDir}/scripts/generate_references.py -r ${references} -o references
+    """

-  """
-  hostname
-  ulimit -a
-  echo $workflow.nextflow.version > version_nextflow.txt
-  bash "$baseDir/scripts/versions_mkfastq.sh"
-  bash "$baseDir/scripts/versions_fastqc.sh"
-  python3 "$baseDir/scripts/generate_versions.py" -f version_*.txt -o versions
-  """
 }

+
 process multiqc {
-  tag "$name"
+
+  tag "${name}"
  queue 'super'
-  publishDir "$outDir/${task.process}/$name", mode: 'copy', pattern: "{multiqc*}"
+  publishDir "${outDir}/${task.process}/${name}", mode: 'copy', pattern: "{multiqc*}"
  module 'multiqc/1.7'

  input:
-
-  file bqc name "bqc/?/*" from bqcPaths.collect()
-  file fqc name "fqc/*" from fqcPaths.collect()
-  file yamlPaths
+    file bqc name "bqc/?/*" from bqcPaths.collect()
+    file fqc name "fqc/*" from fqcPaths.collect()
+    file yamlPaths

  output:
-
-  file("*") into mqcPaths
+    file("multiqc_report.html") into mqcPaths

  script:
+    """
+    hostname
+    ulimit -a
+    multiqc -c ${multiqcConf} .
+    """

-  """
-  hostname
-  ulimit -a
-  multiqc . -c "$baseDir/conf/multiqc_config.yaml"
-  """
 }
--- a/workflow/scripts/check_design.py
+++ b/workflow/scripts/check_design.py
@@ -35,7 +35,7 @@ def get_args():


 def check_design_headers(design):
-    '''Check if design file conforms to sequencing type.'''
+    '''Check if design file has correct headers.'''

    # Default headers
    design_template = [

--- a/workflow/scripts/countDesign.sh
+++ b/workflow/scripts/countDesign.sh
+#!/bin/bash
+#countDesign.sh
+
+fastqs=$(ls *.fastq.gz)
+design=$(ls *.csv)
+sample=$(cat ${design} | tail -n +2 | cut -d ',' -f2)
+
+for i in ${fastqs};
+do
+  if [[ ${i} == *_S0_* ]]; then
+    continue
+  elif [[ ${i} == *_I* ]]; then
+    continue
+  else
+    good=(${good[@]} ${i})
+  fi
+done
+
+echo "Sample,fastq_R1,fastq_R2" > Cellranger_Count_Design.csv;
+echo "${sample},${good[0]},${good[1]}" >> Cellranger_Count_Design.csv;
--- a/workflow/scripts/fastqc.sh
+++ b/workflow/scripts/fastqc.sh
 #!/bin/bash

-find . -name '*.fastq.gz' | awk '{printf("fastqc \"%s\"\n", $0)}' | parallel -j `grep -c ^processor /proc/cpuinfo` --verbose
+find . -name '*.fastq.gz' | awk '{printf("fastqc \"%s\"\n", $0)}' | parallel -j $(grep -c ^processor /proc/cpuinfo) --verbose
 #find . -name '*fastqc.*' | xargs -I '{}' mv '{}' ./ 
 #for i in `ls *.fastq.gz`;
 #do echo "fastqc ${i}";

--- a/workflow/scripts/generate_references.py
+++ b/workflow/scripts/generate_references.py
+#!/usr/bin/env python
+
+#
+# * --------------------------------------------------------------------------
+# * Licensed under MIT (https://git.biohpc.swmed.edu/BICF/Astrocyte/cellranger_count/LICENSE.md)
+# * --------------------------------------------------------------------------
+#
+
+'''Make header for HTML of references.'''
+
+import argparse
+import subprocess
+import shlex
+import logging
+
+EPILOG = '''
+For more details:
+	%(prog)s --help
+'''
+
+# SETTINGS
+
+logger = logging.getLogger(__name__)
+logger.addHandler(logging.NullHandler())
+logger.propagate = False
+logger.setLevel(logging.INFO)
+
+
+def get_args():
+    '''Define arguments.'''
+
+    parser = argparse.ArgumentParser(
+        description=__doc__, epilog=EPILOG,
+        formatter_class=argparse.RawDescriptionHelpFormatter)
+
+    parser.add_argument('-r', '--reference',
+                        help="The reference file (markdown format).",
+                        required=True)
+
+    parser.add_argument('-o', '--output',
+                        help="The out file name.",
+                        default='references')
+
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = get_args()
+    reference = args.reference
+    output = args.output
+
+    out_filename = output + '_mqc.yaml'
+
+    # Header for HTML
+    print('''
+        id: 'software_references'
+        section_name: 'Software References'
+        description: 'This section describes references for the tools used.'
+        plot_type: 'html'
+        data: |
+        '''
+    , file = open(out_filename, "w")
+    )
+
+    # Turn Markdown into HTML
+    references_html = 'bash -c "pandoc -p {} | sed \'s/^/                /\' >> {}"'
+    references_html = references_html.format(reference, out_filename)
+    subprocess.check_call(shlex.split(references_html))
+
+
+if __name__ == '__main__':
+    main()
--- a/workflow/scripts/generate_versions.py
+++ b/workflow/scripts/generate_versions.py
@@ -57,7 +57,7 @@ def check_files(files):

    software_files = np.array(list(SOFTWARE_REGEX.values()))[:,0]

-    extra_files =  set(files) - set(software_files)
+    extra_files = set(files) - set(software_files)

    if len(extra_files) > 0:
            logger.error('Missing regex: %s', list(extra_files))

--- a/workflow/scripts/untarBCL.sh
+++ b/workflow/scripts/untarBCL.sh
+#!/bin/bash
+#untarBCL.sh
+
+usage() {
+  echo "-t  --tar file"
+  exit 1
+}
+OPTIND=1
+while getopts :t: opt
+do
+  case ${opt} in
+	t) tar=${OPTARG};;
+  esac
+done
+
+shift $((${OPTIND} -1))
+
+folder=$(tar -tf ${tar} | grep -o "^[^/]*/\$")
+folder1=$(echo "$folder" | tr -d ' ')
+
+if [ "${folder}" != "${folder1}" ]; then
+  echo "Error: Spaces found in BCL Directory Path"
+  echo ${folder}
+  exit 21
+fi
+
+name=$(echo ${tar} | rev | cut -f1 -d '.' | rev)
+
+if [ "${name}" == "gz" ]; then 
+  tar -xvf ${tar} -I pigz
+  else tar -xvf ${tar}
+fi
--- a/workflow/tests/test_check_design.py
+++ b/workflow/tests/test_check_design.py
+#!/usr/bin/env python3
+
+import pytest
+import pandas as pd
+from io import StringIO
+import os
+
+test_output_path = os.path.dirname(os.path.abspath(__file__)) + \
+                '/../output/misc/checkDesignFile/run/'
+
+@pytest.mark.simple1
+def test_simple1_design():
+    assert os.path.exists(os.path.join(test_output_path, 'design.checked.csv'))
+
+@pytest.mark.simple2
+def test_simple2_design():
+    assert os.path.exists(os.path.join(test_output_path, 'design.checked.csv'))    
--- a/workflow/tests/test_fastqc.py
+++ b/workflow/tests/test_fastqc.py
+#!/usr/bin/env python3
+
+import pytest
+import pandas as pd
+from io import StringIO
+import os
+
+test_output_path = os.path.dirname(os.path.abspath(__file__)) + \
+                '/../output/misc/fastqc/run/'
+
+@pytest.mark.simple1
+def test_simple1_fastqc():
+    assert os.path.exists(os.path.join(test_output_path, 'cellranger-tiny-bcl-1_2_0'))
+
+@pytest.mark.simple2
+def test_simple2_fastqc():
+    assert os.path.exists(os.path.join(test_output_path, 'cellranger-tiny-bcl-1_2_0-1'))
+    assert os.path.exists(os.path.join(test_output_path, 'cellranger-tiny-bcl-1_2_0-2'))
--- a/workflow/tests/test_mkfastq.py
+++ b/workflow/tests/test_mkfastq.py
+#!/usr/bin/env python3
+
+import pytest
+import pandas as pd
+from io import StringIO
+import os
+
+test_output_path = os.path.dirname(os.path.abspath(__file__)) + \
+		'/../output/mkfastq/'
+
+@pytest.mark.simple1
+def test_simple1_mkfastq():
+    assert os.path.exists(os.path.join(test_output_path, 'cellranger-tiny-bcl-1_2_0', 'outs'))
+
+@pytest.mark.simple2
+def test_simple2_mkfastq():
+    assert os.path.exists(os.path.join(test_output_path, 'cellranger-tiny-bcl-1_2_0-1', 'outs'))
+    assert os.path.exists(os.path.join(test_output_path, 'cellranger-tiny-bcl-1_2_0-2', 'outs'))
--- a/workflow/tests/test_multiqc.py
+++ b/workflow/tests/test_multiqc.py
+#!/usr/bin/env python3
+
+import pytest
+import pandas as pd
+from io import StringIO
+import os
+
+test_output_path = os.path.dirname(os.path.abspath(__file__)) + \
+                '/../output/multiqc/run/'
+
+@pytest.mark.simple1
+def test_simple1_multiqc():
+    assert os.path.exists(os.path.join(test_output_path, 'multiqc_report.html'))
+
+@pytest.mark.simple2
+def test_simple2_multiqc():
+    assert os.path.exists(os.path.join(test_output_path, 'multiqc_report.html'))