David Trudgian · 0a68f01a
--- a/README.md 100644 → 100755

+ 300

− 300
+++ b/README.md 100644 → 100755

+ 300

− 300
-# Astrocyte Example Workflow Package
+DEMOOOOO
-This is an example workflow package for the BioHPC Astrocyte workflow engine.
-Astrocyte is a system allowing workflows to be run easily from the web in a
+# Astrocyte Example Workflow Package
-push-button manner, taking advantage of the BioHPC compute cluster. Astrocyte
-allows users to access this workflow package using a simple web interface,
+This is an example workflow package for the BioHPC Astrocyte workflow engine.
-created automatically from the definitions in this package.
+Astrocyte is a system allowing workflows to be run easily from the web in a
+push-button manner, taking advantage of the BioHPC compute cluster. Astrocyte
-## This Example Package
+allows users to access this workflow package using a simple web interface,
+created automatically from the definitions in this package.
-This workflow package provides:
+## This Example Package
-  1) A sample ChIP-Seq data analysis workflow, which uses BWA to align reads to
-   a reference genome, and MACS to call peaks. The workflow is written in the
+This workflow package provides:
-   [*Nextflow*](http://www.nextflow.io) workflow language. *Nextflow* is a
-   simple yet powerful workflow scripting language based on the *Groovy*
+  1) A sample ChIP-Seq data analysis workflow, which uses BWA to align reads to
-   scripting language. It supports advanced features such as implicit
+   a reference genome, and MACS to call peaks. The workflow is written in the
-   parallelization on the cluster - Nextflow will launch concurrent jobs for
+   [*Nextflow*](http://www.nextflow.io) workflow language. *Nextflow* is a
-   each input file. 
+   simple yet powerful workflow scripting language based on the *Groovy*
+   scripting language. It supports advanced features such as implicit
-  2) A sample *Shiny* visualization app, which provides a web-based tool for
+   parallelization on the cluster - Nextflow will launch concurrent jobs for
-  visualizing results. *Shiny* is a framework to provide web interfaces to
+   each input file. 
-  data and analysis implemented in the *R* statistical language. *R* is a
-  powerful language for manipulating and interrogating data, and *Shiny* allows
+  2) A sample *Shiny* visualization app, which provides a web-based tool for
-  analysis in R to be presented simply and easily as a web application.
+  visualizing results. *Shiny* is a framework to provide web interfaces to
+  data and analysis implemented in the *R* statistical language. *R* is a
-  3) Meta-data describing the workflow, it's inputs, output etc. The Astrocyte
+  powerful language for manipulating and interrogating data, and *Shiny* allows
-  web application and command-line runner use this meta-data to understand the
+  analysis in R to be presented simply and easily as a web application.
-  workflow, what input it needs, how the documentation is arranged etc.
+  3) Meta-data describing the workflow, it's inputs, output etc. The Astrocyte
-  4) User-focused documentation, in *markdown* format, that will be displayed to
+  web application and command-line runner use this meta-data to understand the
-  users in the Astrocyte web interface. Markdown is a simple plain-text based
+  workflow, what input it needs, how the documentation is arranged etc.
-  syntax which is especially suited for writing documentation that will be
-  displayed on the web.
+  4) User-focused documentation, in *markdown* format, that will be displayed to
+  users in the Astrocyte web interface. Markdown is a simple plain-text based
-  5) Developer-focused documentation, in this file - `README.md`. This
+  syntax which is especially suited for writing documentation that will be
-  documentation should summmarize features of the workflow package that are of
+  displayed on the web.
-  interest to anyone who would want to extend it, or use it as a template for
-  their own work.
+  5) Developer-focused documentation, in this file - `README.md`. This
+  documentation should summmarize features of the workflow package that are of
-## Workflow Package Layout
+  interest to anyone who would want to extend it, or use it as a template for
+  their own work.
-Workflow packages for Astrocyte are Git repositories, and have a common layout
-which must be followed so that Astrocyte understands how to present them to
+## Workflow Package Layout
-users. The folder structure, and names of key files listed below should not be
-changed. Although a workflow package with a modified structure may work, it is
+Workflow packages for Astrocyte are Git repositories, and have a common layout
-not guaranteed to be accepted by future versions of Astrocyte.
+which must be followed so that Astrocyte understands how to present them to
+users. The folder structure, and names of key files listed below should not be
-The following structure of files and directories is always present:
+changed. Although a workflow package with a modified structure may work, it is
+not guaranteed to be accepted by future versions of Astrocyte.
-```
+The following structure of files and directories is always present:
-   - docs/
-       index.md
+```
-   - test_data/ 
-   - vizapp/
+   - docs/
-       server.R
+       index.md
-       ui.R
+   - test_data/ 
-   - workflow/
+   - vizapp/
-       - lib/
+       server.R
-       - output/
+       ui.R
-       - scripts/
+   - workflow/
-       main.nf
+       - lib/
-   astrocyte_pkg.yml
+       - output/
-   CHANGES.md
+       - scripts/
-   LICENSE.md
+       main.nf
-   README.md  
+   astrocyte_pkg.yml
+   CHANGES.md
-```
+   LICENSE.md
+   README.md  
-### Meta-Data
+```
-  * `astrocyte_pkg.yml` - A file in the root directory of the package, which 
-  contains the metadata describing the workflow in human & machine readable text
+### Meta-Data
-  format called *YAML*. This includes information about the workflow package
-  such as it's name, synopsis, input parameters, outputs etc.
+  * `astrocyte_pkg.yml` - A file in the root directory of the package, which 
+  contains the metadata describing the workflow in human & machine readable text
-  See the documentation inside the example `astrocyte_pkg.yml` file for a
+  format called *YAML*. This includes information about the workflow package
-  guide to specifying Astrocyte metadata.
+  such as it's name, synopsis, input parameters, outputs etc.
+  See the documentation inside the example `astrocyte_pkg.yml` file for a
-### The Workflow
+  guide to specifying Astrocyte metadata.
-  * `workflow/main.nf` - A *Nextflow* workflow file, which will be run by
-  Astrocyte using parameters provided by the user.
+### The Workflow
-  * `workflow/scripts` - A directory for any scripts (e.g. bash, python, 
-  ruby scripts) that the `main.nf` workflow will call. This might be empty if
+  * `workflow/main.nf` - A *Nextflow* workflow file, which will be run by
-  the workflow is implemented entirely in nextflow. You should *not* include
+  Astrocyte using parameters provided by the user.
-  large pieces of software here. Workflows should be designed to use *modules*
+  * `workflow/scripts` - A directory for any scripts (e.g. bash, python, 
-  available on the BioHPC cluster. The modules a workflow needs will be defined
+  ruby scripts) that the `main.nf` workflow will call. This might be empty if
-  in the `astrocyte_pkg.yml` metadata file.
+  the workflow is implemented entirely in nextflow. You should *not* include
-  * `workflow/lib` - A directory for any netflow/groovy libraries that might be
+  large pieces of software here. Workflows should be designed to use *modules*
-  included by workflows using advanced features. Usually empty for simpler
+  available on the BioHPC cluster. The modules a workflow needs will be defined
-  workflows.
+  in the `astrocyte_pkg.yml` metadata file.
-  * `workflow/output` - An empty directory, into which an final output of the
+  * `workflow/lib` - A directory for any netflow/groovy libraries that might be
-  workflow should be published using the `publishDir "$baseDir/output", mode: 'copy'`
+  included by workflows using advanced features. Usually empty for simpler
-  directive inside a process.
+  workflows.
+  * `workflow/output` - An empty directory, into which an final output of the
-  To learn about the *Nextflow* language, take a look at this and other example
+  workflow should be published using the `publishDir "$baseDir/output", mode: 'copy'`
-  workflows, and refer to the [nextflow.io](http://www.nextflow.io) website.
+  directive inside a process.
-  Nextflow workflows used in an Astrocyte package must be written in a certain
+  To learn about the *Nextflow* language, take a look at this and other example
-  way, with specific rules so that Astrocyte can run them successfully on the
+  workflows, and refer to the [nextflow.io](http://www.nextflow.io) website.
-  cluster. See the *Workflow Requirements* section below for details.
+  Nextflow workflows used in an Astrocyte package must be written in a certain
+  way, with specific rules so that Astrocyte can run them successfully on the
-### The Visualization App *(Optional)*
+  cluster. See the *Workflow Requirements* section below for details.
-  * `vizapp/` - A directory that will contain an *R Shiny* visualization app, if
-   required. The vizualization app will be made available to the user via the
+### The Visualization App *(Optional)*
-   Astrocyte web interface. At minimum the directory requires the standard Shiny
-  `ui.R` and `server.R` files. The exact Shiny app structure is not 
+  * `vizapp/` - A directory that will contain an *R Shiny* visualization app, if
-  prescribed. Any R packages required by the Shiny app will be listed in the
+   required. The vizualization app will be made available to the user via the
-  `astrocyte_pkg.yml` metadata.
+   Astrocyte web interface. At minimum the directory requires the standard Shiny
+  `ui.R` and `server.R` files. The exact Shiny app structure is not 
-  Shiny apps used in an Astrocyte package must be written in a certain
+  prescribed. Any R packages required by the Shiny app will be listed in the
-  way, with specific rules so that Astrocyte can run them successfully, and find
+  `astrocyte_pkg.yml` metadata.
-  data files to visualize. See the *Vizapp Requirements* section below for
-  details.
+  Shiny apps used in an Astrocyte package must be written in a certain
+  way, with specific rules so that Astrocyte can run them successfully, and find
+  data files to visualize. See the *Vizapp Requirements* section below for
-### User Documentation 
+  details.
-  * `docs/index.md` - The first page of user documentation, in *markdown*
-  format. Astrocyte will display this documentation to users of the workflow
+### User Documentation 
-  package.
+  * `docs/index.md` - The first page of user documentation, in *markdown*
-  * `docs/...` - Any other documentation files. *Markdown* `.md` files will be
+  format. Astrocyte will display this documentation to users of the workflow
-  rendered for display on the web. Any images used in the documentation should
+  package.
-  also be placed here.
+  * `docs/...` - Any other documentation files. *Markdown* `.md` files will be
+  rendered for display on the web. Any images used in the documentation should
-### Developer Documentation
+  also be placed here.
-  * `README.md` - Documentation for developers of the workflow giving a brief
-  overview and any important notes that are not for workflow users.
+### Developer Documentation
-  * `LICENSE.md` *(Optional)* - The license applied to the workflow package.
-  * `CHANGES.md` - A brief summary of changes made through time to the workflow.
+  * `README.md` - Documentation for developers of the workflow giving a brief
+  overview and any important notes that are not for workflow users.
-### Testing
+  * `LICENSE.md` *(Optional)* - The license applied to the workflow package.
+  * `CHANGES.md` - A brief summary of changes made through time to the workflow.
-  * `test_data/` - Every workflow package should include a minimal set of test
-  data that allows the workflow to be run, testing its features. The
+### Testing
-  `test_data/` directory is a location for test data files. Test data should be
-  kept as small as possible. If large datasets (over 20MB total) are unavoidable
+  * `test_data/` - Every workflow package should include a minimal set of test
-  provide a `fetch_test_data.sh` bash script which obtains the data from an
+  data that allows the workflow to be run, testing its features. The
-  external source.
+  `test_data/` directory is a location for test data files. Test data should be
-  * `test_data/fetch_test_data.sh` *Optional* - A bash script that fetches large
+  kept as small as possible. If large datasets (over 20MB total) are unavoidable
-  test data from an external source, placing it into the `test_data/` directory.
+  provide a `fetch_test_data.sh` bash script which obtains the data from an
+  external source.
+  * `test_data/fetch_test_data.sh` *Optional* - A bash script that fetches large
-# Workflow Requirements & Testing
+  test data from an external source, placing it into the `test_data/` directory.
-So that Astrocyte can successfully run a workflow for any BioHPC user, and
-make efficient use of the Nucleus compute cluster, the Nextflow workflow must
+# Workflow Requirements & Testing
-be written according to some rules.
+So that Astrocyte can successfully run a workflow for any BioHPC user, and
-## Astrocyte / Nextflow Basics
+make efficient use of the Nucleus compute cluster, the Nextflow workflow must
+be written according to some rules.
-A Nextflow workflow run by astrocyte must be in a file named `workflow/main.nf`
-within the workflow package. Do not use other filenames for your workflow.
+## Astrocyte / Nextflow Basics
-When a workflow runs on the Astrocyte platform the work area will be created
+A Nextflow workflow run by astrocyte must be in a file named `workflow/main.nf`
-dynamically and its path will not be known in advance. The `$baseDir`
+within the workflow package. Do not use other filenames for your workflow.
-variable can be used inside the Nextflow workflow to refer to this path.
+When a workflow runs on the Astrocyte platform the work area will be created
-Data files for analysis will be uploaded or linked into Astrocyte by users. 
+dynamically and its path will not be known in advance. The `$baseDir`
-Workflows cannot access input data directly. **All input file names must be
+variable can be used inside the Nextflow workflow to refer to this path.
-accepted as workflow parameters**. Astrocyte will allow users to select input
-files, and pass the parameter values to your workflow. 
+Data files for analysis will be uploaded or linked into Astrocyte by users. 
+Workflows cannot access input data directly. **All input file names must be
-Output files for the user should be published to `$baseDir/output` using the nextflow
+accepted as workflow parameters**. Astrocyte will allow users to select input
-directive `publishDir "$baseDir/output", mode: 'copy'` in a process block.
+files, and pass the parameter values to your workflow. 
-You can create a directory structure inside `$baseDir/output` if required to
-organize the output files. Note that we use the 'copy' mode so that Nextflow's
+Output files for the user should be published to `$baseDir/output` using the nextflow
-work directories can be cleaned up by Astrocyte. By default Nextflow will link
+directive `publishDir "$baseDir/output", mode: 'copy'` in a process block.
-published files from work directories, so output would be lost on cleanup if
+You can create a directory structure inside `$baseDir/output` if required to
-`mode: copy` is not used.
+organize the output files. Note that we use the 'copy' mode so that Nextflow's
+work directories can be cleaned up by Astrocyte. By default Nextflow will link
-Reference data paths, even in permanent locations such as 
+published files from work directories, so output would be lost on cleanup if
-`/project/apps_database/iGenomes` should not be hard-coded into workflows.\
+`mode: copy` is not used.
-Provide parameters in your workflow for specifying reference files, and then
-specify possible choices for those parameters in the `astrocyte_pkg.yml` 
+Reference data paths, even in permanent locations such as 
-metadata file. If you need to make reference data available for a workflow,
+`/project/apps_database/iGenomes` should not be hard-coded into workflows.\
-please ask the BioHPC team to place it in the central `/project/apps_database`
+Provide parameters in your workflow for specifying reference files, and then
-area.
+specify possible choices for those parameters in the `astrocyte_pkg.yml` 
+metadata file. If you need to make reference data available for a workflow,
-The `workflow/scripts` directory in a package is reserved for any scripts, 
+please ask the BioHPC team to place it in the central `/project/apps_database`
-e.g. Perl, Python, Bash scripts which implement processing that you don't want
+area.
-to write orre-write in Nextflow. They should be called from `main.nf` as a
-Nextflow *process*. The path to the scripts directory will be `$baseDir/scripts` 
+The `workflow/scripts` directory in a package is reserved for any scripts, 
-and this should be used instead of other relative or absolute paths within your
+e.g. Perl, Python, Bash scripts which implement processing that you don't want
-workflow. Don't put large, or even small applications here. Please use cluster
+to write orre-write in Nextflow. They should be called from `main.nf` as a
-modules within your workflow, and ask BioHPC to install software as a module if
+Nextflow *process*. The path to the scripts directory will be `$baseDir/scripts` 
-you need it.
+and this should be used instead of other relative or absolute paths within your
+workflow. Don't put large, or even small applications here. Please use cluster
+modules within your workflow, and ask BioHPC to install software as a module if
-## Optimizations
+you need it.
-To make sure that your workflow can be run efficiently on the BioHPC cluster
-please:
+## Optimizations
-  * **Do** specify cpu and memory requirements for Nextflow processes using the
+To make sure that your workflow can be run efficiently on the BioHPC cluster
-  `cpus` and `memory` directives. Nextflow will use this information to schedule
+please:
-  tasks and complete a job as quickly as possible.
+  * **Do** specify cpu and memory requirements for Nextflow processes using the
-  * **Do** split complex sections of a workflow into multiple Nextflow processes
+  `cpus` and `memory` directives. Nextflow will use this information to schedule
-  so that they can be parallelized by the system.
+  tasks and complete a job as quickly as possible.
-  * **Do** use software modules from the cluster, and specify exact versions for
+  * **Do** split complex sections of a workflow into multiple Nextflow processes
-  modules when loading them in your workflow.
+  so that they can be parallelized by the system.
-  * **Do** thorougly check the execution of your workflow using the command line
+  * **Do** use software modules from the cluster, and specify exact versions for
-  runner, before attempting to bring it into the Astrocyte web application
+  modules when loading them in your workflow.
-  * **Don't** use absolute paths for any files - see above.
+  * **Do** thorougly check the execution of your workflow using the command line
+  runner, before attempting to bring it into the Astrocyte web application
-  * **Don't** work with any files outside of `$baseDir`, except permanent
-  reference data administered by BioHPC.
+  * **Don't** use absolute paths for any files - see above.
+  * **Don't** work with any files outside of `$baseDir`, except permanent
-# Vizapp / Shiny Requirements
+  reference data administered by BioHPC.
-The visualization app will have access to any final output that was published
-to the `$baseDir/output` location in the nextflow workflow. This path will be
+# Vizapp / Shiny Requirements
-accessible as `Sys.getenv('outputDir')`.
+The visualization app will have access to any final output that was published
-Parameters specified when the workflow was launched will also be available as
+to the `$baseDir/output` location in the nextflow workflow. This path will be
-environment variables - their name prefixed by `param-` e.g. the workflow
+accessible as `Sys.getenv('outputDir')`.
-parameter `fastqs` will be available in R as `Sys.getenv('param-fastqs')`.
+Parameters specified when the workflow was launched will also be available as
-  * **Don't** try to access any files outside of `outputDir` except permanent
+environment variables - their name prefixed by `param-` e.g. the workflow
-  reference data administered by biohpc.
+parameter `fastqs` will be available in R as `Sys.getenv('param-fastqs')`.
-  * **Do** list any CRAN or Bioconductor packages needed by the vizapp in the
+  * **Don't** try to access any files outside of `outputDir` except permanent
-  `astrocyte_pkg.yml` metadata file.
+  reference data administered by biohpc.
-  * **Don't** do heavy processing in the vizapp. Shiny apps in Astrocyte share
+  * **Do** list any CRAN or Bioconductor packages needed by the vizapp in the
-  resources, and are intended for basic visualization. You may wish to provide
+  `astrocyte_pkg.yml` metadata file.
-  instructions to users in your workflow package documentation directing them
-  to RStudio for follow-up work. Moderate I/O (e.g. scanning a large reference
+  * **Don't** do heavy processing in the vizapp. Shiny apps in Astrocyte share
-  file) is acceptable, as BioHPC systems have access to high performance
+  resources, and are intended for basic visualization. You may wish to provide
-  storage.
+  instructions to users in your workflow package documentation directing them
+  to RStudio for follow-up work. Moderate I/O (e.g. scanning a large reference
+  file) is acceptable, as BioHPC systems have access to high performance
-# Testing/Running the Workflow with the Astrocyte CLI
+  storage.
-**Work in Progress - CLI not yet available on the cluster**
+# Testing/Running the Workflow with the Astrocyte CLI
-Workflows will usually be run from the Astrocyte web interface, by importing the
-workflow package repository and making it available to users. During development
+**Work in Progress - CLI not yet available on the cluster**
-You can use the Astrocyte CLI scripts to check, test, and run your workflow
-against non-test data.
+Workflows will usually be run from the Astrocyte web interface, by importing the
+workflow package repository and making it available to users. During development
-First load the `astrocyte` module on a biohpc system.
+You can use the Astrocyte CLI scripts to check, test, and run your workflow
+against non-test data.
-To check the structure and syntax of the workflow package in the directory
-`astrocyte_example_chipseq`:
+First load the `astrocyte` module on a biohpc system.
-```bash
+To check the structure and syntax of the workflow package in the directory
-$ astrocyte_cli check astrocyte_example_chipseq
+`astrocyte_example_chipseq`:
-```
+```bash
-To launch the workflows defined tests, against included test data:
+$ astrocyte_cli check astrocyte_example_chipseq
+```
-```bash
-$ astrocyte_cli test astrocyte_example_chipseq
+To launch the workflows defined tests, against included test data:
-```
+```bash
-To run the workflow using specific data and parameters. A working directory will
+$ astrocyte_cli test astrocyte_example_chipseq
-be created.
+```
-```bash
+To run the workflow using specific data and parameters. A working directory will
-$ astrocyte_cli run astrocyte_example_chipseq --parameter1 "value1" --parameter2 "value2"...
+be created.
-```
+```bash
-To run the Shiny vizualization app against test_data
+$ astrocyte_cli run astrocyte_example_chipseq --parameter1 "value1" --parameter2 "value2"...
+```
-```bash
-$ astrocyte_cli shinytest astrocyte_example_chipseq
+To run the Shiny vizualization app against test_data
-```
+```bash
-To run the Shiny vizualization app against output from `astrocyte_cli run`,
+$ astrocyte_cli shinytest astrocyte_example_chipseq
-which will be in the work directory created by `run`:
+```
-```bash
+To run the Shiny vizualization app against output from `astrocyte_cli run`,
-$ astrocyte_cli shiny astrocyte_example_chipseq
+which will be in the work directory created by `run`:
-```
+```bash
-To generate the user-facing documentation for the workflow and display it in a
+$ astrocyte_cli shiny astrocyte_example_chipseq
-web browser:
+```
-```bash
+To generate the user-facing documentation for the workflow and display it in a
-$ astrocyte_cli docs astrocyte_example_chipseq
+web browser:
-```
+```bash
+$ astrocyte_cli docs astrocyte_example_chipseq
+```
+\ No newline at end of file