### STATUS - Alpha version, testing with initial users.
## Introduction
The BioHPC `param_runner` is a command line tool to run a command multiple times, exploring a defined parameter space, summarizing results.
The BioHPC `param_runner` is a command line tool to run a command multiple
times, exploring a defined parameter space, summarizing results.
This tool uses a simple YAML configuration file to define a parameter space to exhaustively search, and runs tasks in parallel by distributing them over a set of allocated nodes on the BioHPC Nucleus cluster. Supported parameter spaces are:
This tool uses a simple YAML configuration file to define a parameter space to
exhaustively search, and runs tasks in parallel by distributing them over a set
of allocated nodes on the BioHPC Nucleus cluster. Supported parameter spaces are:
* Arithmetic progressions of integers or real numbers.
* Geometric progressions of integers or real numbers.
* Defined lists of strings or other values.
The output of a command run with a certain parameter combination can be captured for summary in tabular format by defining regular expressions matching output.
The output of a command run with a certain parameter combination can be captured
for summary in tabular format by defining regular expressions matching against
the output of the command.
## Using the Parameter Runner on the Nucleus Cluster
...
...
@@ -26,32 +31,99 @@ The output of a command run with a certain parameter combination can be captured
3. Check the parameter .yml file: `param_runner check myexperiment.yml`
4. Submit to the cluster: `param_runner submit myexperiment.yml`
Output from individual commands run will be stored in a 'param_output/' subdirectory.
Summary information from the commands execute will be tabulated according to
your summary configuration, into the file 'param_output/param_summary.csv'.
When `param_runner submit` is run it will submit a job to the cluster to perform
the parameter optimization. The job number will be reported on the command line:
```
...
INFO Submitted batch job 290006
...
```
You can monitor the status of the job using the `squeue` command, or the BioHPC
web portal. When the job starts to run it will create a log file in the
directory it was launched from, named according to the job ID that was reported
above.
```
param_runner_290008.out
```
You can examine this file with commands such as `cat` or `less`. Look for a line
stating the output directory for the optimization:
```
...
- Trace file and output will be at: test/test_data/param_runner_20170111-12131484158381
...
```
Inside this directory you will find the master `trace.txt` file that summarizes
the parameter exploration. There are also many xxx.out and xxx.err file which
contain the full standard and error output from each individual command that was
run throughout the parameter exploration, e.g:
```
...
100.err
100.out
101.err
101.out
102.err
102.out
103.err
103.out
104.err
104.out
105.err
105.out
...
```
The number in the filename corresponds to the index column in the master summary
`trace.txt`.
## Parameter File Format
The parameter file that defines the parameter exploration to be run is written
in a simple text based format called YAML. A simple introduction to YAML can be
found at the link below, but it should be easy to work from the example
parameter file given later.
https://learnxinyminutes.com/docs/yaml/
Whenever you write a parameter file be sure to `param_runner check` it before
you attempt to submit to the cluster. The check will list any problems with the
parameter file that you should correct before submitting to the cluster.
The example below includes documentation of each parameter, and can be
used as a starting point for your own parameter files
```yaml
# The command to run, including any arguments that will not be explored by
# the runner.
command:train_ann --input train.set --crossk 10
# The command to run, including any arguments that always stay the same,
# and will not be explored by the runner.
# e.g. in this example we run an imaginary program called `train_ann`,
# which could be a program to train an artificial neural network. We
# want to run it with various different training parameters, but the
# input and crossk arguments are always the same. This example is
# typical of a machine learning experiment.
# The standard output from a command will be collected into a file named: