Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
A
astrocyte_example_chipseq
Manage
Activity
Members
Labels
Plan
Issues
1
Issue boards
Milestones
Iterations
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Container Registry
Monitor
Service Desk
Analyze
Contributor analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Astrocyte
Workflows
BioHPC
astrocyte_example_chipseq
Merge requests
!1
Master
Code
Review changes
Check out branch
Download
Patches
Plain diff
Closed
Master
david.trudgian/astrocyte_example_chipseq:master
into
master
Overview
1
Commits
1
Pipelines
0
Changes
Closed
David Trudgian
requested to merge
david.trudgian/astrocyte_example_chipseq:master
into
master
8 years ago
Overview
1
Commits
1
Pipelines
0
Changes
-
Expand
Please merge readme change
0
0
Merge request reports
Compare
master
master (base)
and
latest version
latest version
0a68f01a
1 commit,
8 years ago
+
300
−
300
Expand all files
Preferences
File browser
List view
Tree view
Compare changes
Inline
Side-by-side
Show whitespace changes
Show one file at a time
README.md
100644 → 100755
+
300
−
300
Options
# Astrocyte Example Workflow Package
DEMOOOOO
This is an example workflow package for the BioHPC Astrocyte workflow engine.
Astrocyte is a system allowing workflows to be run easily from the web in a
# Astrocyte Example Workflow Package
push-button manner, taking advantage of the BioHPC compute cluster. Astrocyte
allows users to access this workflow package using a simple web interface,
This is an example workflow package for the BioHPC Astrocyte workflow engine.
created automatically from the definitions in this package.
Astrocyte is a system allowing workflows to be run easily from the web in a
push-button manner, taking advantage of the BioHPC compute cluster. Astrocyte
## This Example Package
allows users to access this workflow package using a simple web interface,
created automatically from the definitions in this package.
This workflow package provides:
## This Example Package
1) A sample ChIP-Seq data analysis workflow, which uses BWA to align reads to
a reference genome, and MACS to call peaks. The workflow is written in the
This workflow package provides:
[*Nextflow*](http://www.nextflow.io) workflow language. *Nextflow* is a
simple yet powerful workflow scripting language based on the *Groovy*
1) A sample ChIP-Seq data analysis workflow, which uses BWA to align reads to
scripting language. It supports advanced features such as implicit
a reference genome, and MACS to call peaks. The workflow is written in the
parallelization on the cluster - Nextflow will launch concurrent jobs for
[*Nextflow*](http://www.nextflow.io) workflow language. *Nextflow* is a
each input file.
simple yet powerful workflow scripting language based on the *Groovy*
scripting language. It supports advanced features such as implicit
2) A sample *Shiny* visualization app, which provides a web-based tool for
parallelization on the cluster - Nextflow will launch concurrent jobs for
visualizing results. *Shiny* is a framework to provide web interfaces to
each input file.
data and analysis implemented in the *R* statistical language. *R* is a
powerful language for manipulating and interrogating data, and *Shiny* allows
2) A sample *Shiny* visualization app, which provides a web-based tool for
analysis in R to be presented simply and easily as a web application.
visualizing results. *Shiny* is a framework to provide web interfaces to
data and analysis implemented in the *R* statistical language. *R* is a
3) Meta-data describing the workflow, it's inputs, output etc. The Astrocyte
powerful language for manipulating and interrogating data, and *Shiny* allows
web application and command-line runner use this meta-data to understand the
analysis in R to be presented simply and easily as a web application.
workflow, what input it needs, how the documentation is arranged etc.
3) Meta-data describing the workflow, it's inputs, output etc. The Astrocyte
4) User-focused documentation, in *markdown* format, that will be displayed to
web application and command-line runner use this meta-data to understand the
users in the Astrocyte web interface. Markdown is a simple plain-text based
workflow, what input it needs, how the documentation is arranged etc.
syntax which is especially suited for writing documentation that will be
displayed on the web.
4) User-focused documentation, in *markdown* format, that will be displayed to
users in the Astrocyte web interface. Markdown is a simple plain-text based
5) Developer-focused documentation, in this file - `README.md`. This
syntax which is especially suited for writing documentation that will be
documentation should summmarize features of the workflow package that are of
displayed on the web.
interest to anyone who would want to extend it, or use it as a template for
their own work.
5) Developer-focused documentation, in this file - `README.md`. This
documentation should summmarize features of the workflow package that are of
## Workflow Package Layout
interest to anyone who would want to extend it, or use it as a template for
their own work.
Workflow packages for Astrocyte are Git repositories, and have a common layout
which must be followed so that Astrocyte understands how to present them to
## Workflow Package Layout
users. The folder structure, and names of key files listed below should not be
changed. Although a workflow package with a modified structure may work, it is
Workflow packages for Astrocyte are Git repositories, and have a common layout
not guaranteed to be accepted by future versions of Astrocyte.
which must be followed so that Astrocyte understands how to present them to
users. The folder structure, and names of key files listed below should not be
The following structure of files and directories is always present:
changed. Although a workflow package with a modified structure may work, it is
not guaranteed to be accepted by future versions of Astrocyte.
```
The following structure of files and directories is always present:
- docs/
index.md
```
- test_data/
- vizapp/
- docs/
server.R
index.md
ui.R
- test_data/
- workflow/
- vizapp/
- lib/
server.R
- output/
ui.R
- scripts/
- workflow/
main.nf
- lib/
astrocyte_pkg.yml
- output/
CHANGES.md
- scripts/
LICENSE.md
main.nf
README.md
astrocyte_pkg.yml
CHANGES.md
```
LICENSE.md
README.md
### Meta-Data
```
* `astrocyte_pkg.yml` - A file in the root directory of the package, which
contains the metadata describing the workflow in human & machine readable text
### Meta-Data
format called *YAML*. This includes information about the workflow package
such as it's name, synopsis, input parameters, outputs etc.
* `astrocyte_pkg.yml` - A file in the root directory of the package, which
contains the metadata describing the workflow in human & machine readable text
See the documentation inside the example `astrocyte_pkg.yml` file for a
format called *YAML*. This includes information about the workflow package
guide to specifying Astrocyte metadata.
such as it's name, synopsis, input parameters, outputs etc.
See the documentation inside the example `astrocyte_pkg.yml` file for a
### The Workflow
guide to specifying Astrocyte metadata.
* `workflow/main.nf` - A *Nextflow* workflow file, which will be run by
Astrocyte using parameters provided by the user.
### The Workflow
* `workflow/scripts` - A directory for any scripts (e.g. bash, python,
ruby scripts) that the `main.nf` workflow will call. This might be empty if
* `workflow/main.nf` - A *Nextflow* workflow file, which will be run by
the workflow is implemented entirely in nextflow. You should *not* include
Astrocyte using parameters provided by the user.
large pieces of software here. Workflows should be designed to use *modules*
* `workflow/scripts` - A directory for any scripts (e.g. bash, python,
available on the BioHPC cluster. The modules a workflow needs will be defined
ruby scripts) that the `main.nf` workflow will call. This might be empty if
in the `astrocyte_pkg.yml` metadata file.
the workflow is implemented entirely in nextflow. You should *not* include
* `workflow/lib` - A directory for any netflow/groovy libraries that might be
large pieces of software here. Workflows should be designed to use *modules*
included by workflows using advanced features. Usually empty for simpler
available on the BioHPC cluster. The modules a workflow needs will be defined
workflows.
in the `astrocyte_pkg.yml` metadata file.
* `workflow/output` - An empty directory, into which an final output of the
* `workflow/lib` - A directory for any netflow/groovy libraries that might be
workflow should be published using the `publishDir "$baseDir/output", mode: 'copy'`
included by workflows using advanced features. Usually empty for simpler
directive inside a process.
workflows.
* `workflow/output` - An empty directory, into which an final output of the
To learn about the *Nextflow* language, take a look at this and other example
workflow should be published using the `publishDir "$baseDir/output", mode: 'copy'`
workflows, and refer to the [nextflow.io](http://www.nextflow.io) website.
directive inside a process.
Nextflow workflows used in an Astrocyte package must be written in a certain
To learn about the *Nextflow* language, take a look at this and other example
way, with specific rules so that Astrocyte can run them successfully on the
workflows, and refer to the [nextflow.io](http://www.nextflow.io) website.
cluster. See the *Workflow Requirements* section below for details.
Nextflow workflows used in an Astrocyte package must be written in a certain
way, with specific rules so that Astrocyte can run them successfully on the
### The Visualization App *(Optional)*
cluster. See the *Workflow Requirements* section below for details.
* `vizapp/` - A directory that will contain an *R Shiny* visualization app, if
required. The vizualization app will be made available to the user via the
### The Visualization App *(Optional)*
Astrocyte web interface. At minimum the directory requires the standard Shiny
`ui.R` and `server.R` files. The exact Shiny app structure is not
* `vizapp/` - A directory that will contain an *R Shiny* visualization app, if
prescribed. Any R packages required by the Shiny app will be listed in the
required. The vizualization app will be made available to the user via the
`astrocyte_pkg.yml` metadata.
Astrocyte web interface. At minimum the directory requires the standard Shiny
`ui.R` and `server.R` files. The exact Shiny app structure is not
Shiny apps used in an Astrocyte package must be written in a certain
prescribed. Any R packages required by the Shiny app will be listed in the
way, with specific rules so that Astrocyte can run them successfully, and find
`astrocyte_pkg.yml` metadata.
data files to visualize. See the *Vizapp Requirements* section below for
details.
Shiny apps used in an Astrocyte package must be written in a certain
way, with specific rules so that Astrocyte can run them successfully, and find
data files to visualize. See the *Vizapp Requirements* section below for
### User Documentation
details.
* `docs/index.md` - The first page of user documentation, in *markdown*
format. Astrocyte will display this documentation to users of the workflow
### User Documentation
package.
* `docs/index.md` - The first page of user documentation, in *markdown*
* `docs/...` - Any other documentation files. *Markdown* `.md` files will be
format. Astrocyte will display this documentation to users of the workflow
rendered for display on the web. Any images used in the documentation should
package.
also be placed here.
* `docs/...` - Any other documentation files. *Markdown* `.md` files will be
rendered for display on the web. Any images used in the documentation should
### Developer Documentation
also be placed here.
* `README.md` - Documentation for developers of the workflow giving a brief
overview and any important notes that are not for workflow users.
### Developer Documentation
* `LICENSE.md` *(Optional)* - The license applied to the workflow package.
* `CHANGES.md` - A brief summary of changes made through time to the workflow.
* `README.md` - Documentation for developers of the workflow giving a brief
overview and any important notes that are not for workflow users.
### Testing
* `LICENSE.md` *(Optional)* - The license applied to the workflow package.
* `CHANGES.md` - A brief summary of changes made through time to the workflow.
* `test_data/` - Every workflow package should include a minimal set of test
data that allows the workflow to be run, testing its features. The
### Testing
`test_data/` directory is a location for test data files. Test data should be
kept as small as possible. If large datasets (over 20MB total) are unavoidable
* `test_data/` - Every workflow package should include a minimal set of test
provide a `fetch_test_data.sh` bash script which obtains the data from an
data that allows the workflow to be run, testing its features. The
external source.
`test_data/` directory is a location for test data files. Test data should be
* `test_data/fetch_test_data.sh` *Optional* - A bash script that fetches large
kept as small as possible. If large datasets (over 20MB total) are unavoidable
test data from an external source, placing it into the `test_data/` directory.
provide a `fetch_test_data.sh` bash script which obtains the data from an
external source.
* `test_data/fetch_test_data.sh` *Optional* - A bash script that fetches large
# Workflow Requirements & Testing
test data from an external source, placing it into the `test_data/` directory.
So that Astrocyte can successfully run a workflow for any BioHPC user, and
make efficient use of the Nucleus compute cluster, the Nextflow workflow must
# Workflow Requirements & Testing
be written according to some rules.
So that Astrocyte can successfully run a workflow for any BioHPC user, and
## Astrocyte / Nextflow Basics
make efficient use of the Nucleus compute cluster, the Nextflow workflow must
be written according to some rules.
A Nextflow workflow run by astrocyte must be in a file named `workflow/main.nf`
within the workflow package. Do not use other filenames for your workflow.
## Astrocyte / Nextflow Basics
When a workflow runs on the Astrocyte platform the work area will be created
A Nextflow workflow run by astrocyte must be in a file named `workflow/main.nf`
dynamically and its path will not be known in advance. The `$baseDir`
within the workflow package. Do not use other filenames for your workflow.
variable can be used inside the Nextflow workflow to refer to this path.
When a workflow runs on the Astrocyte platform the work area will be created
Data files for analysis will be uploaded or linked into Astrocyte by users.
dynamically and its path will not be known in advance. The `$baseDir`
Workflows cannot access input data directly. **All input file names must be
variable can be used inside the Nextflow workflow to refer to this path.
accepted as workflow parameters**. Astrocyte will allow users to select input
files, and pass the parameter values to your workflow.
Data files for analysis will be uploaded or linked into Astrocyte by users.
Workflows cannot access input data directly. **All input file names must be
Output files for the user should be published to `$baseDir/output` using the nextflow
accepted as workflow parameters**. Astrocyte will allow users to select input
directive `publishDir "$baseDir/output", mode: 'copy'` in a process block.
files, and pass the parameter values to your workflow.
You can create a directory structure inside `$baseDir/output` if required to
organize the output files. Note that we use the 'copy' mode so that Nextflow's
Output files for the user should be published to `$baseDir/output` using the nextflow
work directories can be cleaned up by Astrocyte. By default Nextflow will link
directive `publishDir "$baseDir/output", mode: 'copy'` in a process block.
published files from work directories, so output would be lost on cleanup if
You can create a directory structure inside `$baseDir/output` if required to
`mode: copy` is not used.
organize the output files. Note that we use the 'copy' mode so that Nextflow's
work directories can be cleaned up by Astrocyte. By default Nextflow will link
Reference data paths, even in permanent locations such as
published files from work directories, so output would be lost on cleanup if
`/project/apps_database/iGenomes` should not be hard-coded into workflows.\
`mode: copy` is not used.
Provide parameters in your workflow for specifying reference files, and then
specify possible choices for those parameters in the `astrocyte_pkg.yml`
Reference data paths, even in permanent locations such as
metadata file. If you need to make reference data available for a workflow,
`/project/apps_database/iGenomes` should not be hard-coded into workflows.\
please ask the BioHPC team to place it in the central `/project/apps_database`
Provide parameters in your workflow for specifying reference files, and then
area.
specify possible choices for those parameters in the `astrocyte_pkg.yml`
metadata file. If you need to make reference data available for a workflow,
The `workflow/scripts` directory in a package is reserved for any scripts,
please ask the BioHPC team to place it in the central `/project/apps_database`
e.g. Perl, Python, Bash scripts which implement processing that you don't want
area.
to write orre-write in Nextflow. They should be called from `main.nf` as a
Nextflow *process*. The path to the scripts directory will be `$baseDir/scripts`
The `workflow/scripts` directory in a package is reserved for any scripts,
and this should be used instead of other relative or absolute paths within your
e.g. Perl, Python, Bash scripts which implement processing that you don't want
workflow. Don't put large, or even small applications here. Please use cluster
to write orre-write in Nextflow. They should be called from `main.nf` as a
modules within your workflow, and ask BioHPC to install software as a module if
Nextflow *process*. The path to the scripts directory will be `$baseDir/scripts`
you need it.
and this should be used instead of other relative or absolute paths within your
workflow. Don't put large, or even small applications here. Please use cluster
modules within your workflow, and ask BioHPC to install software as a module if
## Optimizations
you need it.
To make sure that your workflow can be run efficiently on the BioHPC cluster
please:
## Optimizations
* **Do** specify cpu and memory requirements for Nextflow processes using the
To make sure that your workflow can be run efficiently on the BioHPC cluster
`cpus` and `memory` directives. Nextflow will use this information to schedule
please:
tasks and complete a job as quickly as possible.
* **Do** specify cpu and memory requirements for Nextflow processes using the
* **Do** split complex sections of a workflow into multiple Nextflow processes
`cpus` and `memory` directives. Nextflow will use this information to schedule
so that they can be parallelized by the system.
tasks and complete a job as quickly as possible.
* **Do** use software modules from the cluster, and specify exact versions for
* **Do** split complex sections of a workflow into multiple Nextflow processes
modules when loading them in your workflow.
so that they can be parallelized by the system.
* **Do** thorougly check the execution of your workflow using the command line
* **Do** use software modules from the cluster, and specify exact versions for
runner, before attempting to bring it into the Astrocyte web application
modules when loading them in your workflow.
* **Don't** use absolute paths for any files - see above.
* **Do** thorougly check the execution of your workflow using the command line
runner, before attempting to bring it into the Astrocyte web application
* **Don't** work with any files outside of `$baseDir`, except permanent
reference data administered by BioHPC.
* **Don't** use absolute paths for any files - see above.
* **Don't** work with any files outside of `$baseDir`, except permanent
# Vizapp / Shiny Requirements
reference data administered by BioHPC.
The visualization app will have access to any final output that was published
to the `$baseDir/output` location in the nextflow workflow. This path will be
# Vizapp / Shiny Requirements
accessible as `Sys.getenv('outputDir')`.
The visualization app will have access to any final output that was published
Parameters specified when the workflow was launched will also be available as
to the `$baseDir/output` location in the nextflow workflow. This path will be
environment variables - their name prefixed by `param-` e.g. the workflow
accessible as `Sys.getenv('outputDir')`.
parameter `fastqs` will be available in R as `Sys.getenv('param-fastqs')`.
Parameters specified when the workflow was launched will also be available as
* **Don't** try to access any files outside of `outputDir` except permanent
environment variables - their name prefixed by `param-` e.g. the workflow
reference data administered by biohpc.
parameter `fastqs` will be available in R as `Sys.getenv('param-fastqs')`.
* **Do** list any CRAN or Bioconductor packages needed by the vizapp in the
* **Don't** try to access any files outside of `outputDir` except permanent
`astrocyte_pkg.yml` metadata file.
reference data administered by biohpc.
* **Don't** do heavy processing in the vizapp. Shiny apps in Astrocyte share
* **Do** list any CRAN or Bioconductor packages needed by the vizapp in the
resources, and are intended for basic visualization. You may wish to provide
`astrocyte_pkg.yml` metadata file.
instructions to users in your workflow package documentation directing them
to RStudio for follow-up work. Moderate I/O (e.g. scanning a large reference
* **Don't** do heavy processing in the vizapp. Shiny apps in Astrocyte share
file) is acceptable, as BioHPC systems have access to high performance
resources, and are intended for basic visualization. You may wish to provide
storage.
instructions to users in your workflow package documentation directing them
to RStudio for follow-up work. Moderate I/O (e.g. scanning a large reference
file) is acceptable, as BioHPC systems have access to high performance
# Testing/Running the Workflow with the Astrocyte CLI
storage.
**Work in Progress - CLI not yet available on the cluster**
# Testing/Running the Workflow with the Astrocyte CLI
Workflows will usually be run from the Astrocyte web interface, by importing the
workflow package repository and making it available to users. During development
**Work in Progress - CLI not yet available on the cluster**
You can use the Astrocyte CLI scripts to check, test, and run your workflow
against non-test data.
Workflows will usually be run from the Astrocyte web interface, by importing the
workflow package repository and making it available to users. During development
First load the `astrocyte` module on a biohpc system.
You can use the Astrocyte CLI scripts to check, test, and run your workflow
against non-test data.
To check the structure and syntax of the workflow package in the directory
`astrocyte_example_chipseq`:
First load the `astrocyte` module on a biohpc system.
```bash
To check the structure and syntax of the workflow package in the directory
$ astrocyte_cli check astrocyte_example_chipseq
`astrocyte_example_chipseq`:
```
```bash
To launch the workflows defined tests, against included test data:
$ astrocyte_cli check astrocyte_example_chipseq
```
```bash
$ astrocyte_cli test astrocyte_example_chipseq
To launch the workflows defined tests, against included test data:
```
```bash
To run the workflow using specific data and parameters. A working directory will
$ astrocyte_cli test astrocyte_example_chipseq
be created.
```
```bash
To run the workflow using specific data and parameters. A working directory will
$ astrocyte_cli run astrocyte_example_chipseq --parameter1 "value1" --parameter2 "value2"...
be created.
```
```bash
To run the Shiny vizualization app against test_data
$ astrocyte_cli run astrocyte_example_chipseq --parameter1 "value1" --parameter2 "value2"...
```
```bash
$ astrocyte_cli shinytest astrocyte_example_chipseq
To run the Shiny vizualization app against test_data
```
```bash
To run the Shiny vizualization app against output from `astrocyte_cli run`,
$ astrocyte_cli shinytest astrocyte_example_chipseq
which will be in the work directory created by `run`:
```
```bash
To run the Shiny vizualization app against output from `astrocyte_cli run`,
$ astrocyte_cli shiny astrocyte_example_chipseq
which will be in the work directory created by `run`:
```
```bash
To generate the user-facing documentation for the workflow and display it in a
$ astrocyte_cli shiny astrocyte_example_chipseq
web browser:
```
```bash
To generate the user-facing documentation for the workflow and display it in a
$ astrocyte_cli docs astrocyte_example_chipseq
web browser:
```
```bash
$ astrocyte_cli docs astrocyte_example_chipseq
```
\ No newline at end of file