Commit 869c8573 authored by David Trudgian's avatar David Trudgian
Browse files

Update README.md

parent 55623222
......@@ -3,7 +3,10 @@
[![build status](https://git.biohpc.swmed.edu/biohpc/param_runner/badges/master/build.svg)](https://git.biohpc.swmed.edu/biohpc/param_runner/commits/master)
[![coverage report](https://git.biohpc.swmed.edu/biohpc/param_runner/badges/master/coverage.svg)](https://git.biohpc.swmed.edu/biohpc/param_runner/commits/master)
STATUS - Development (not yet functional)
### STATUS - Development (not yet functional)
## Introduction
The BioHPC `param_runner` is a command line tool to run a command multiple times, exploring a defined parameter space, summarizing results.
......@@ -14,3 +17,118 @@ This tool uses a simple YAML configuration file to define a parameter space to e
* Defined lists of strings or other values.
The output of a command run with a certain parameter combination can be captured for summary in tabular format by defining regular expressions matching output.
## Using the Parameter Runner on the Nucleus Cluster
1. Arrange your data and programs on the cluster.
2. Create a parameter .yml file (see below)
3. Check the parameter .yml file: `param_runner check myexperiment.yml`
4. Submit to the cluster: `param_runner submit myexperiment.yml`
Output from individual commands run will be stored in a 'param_output/' subdirectory.
Summary information from the commands execute will be tabulated according to
your summary configuration, into the file 'param_output/param_summary.csv'.
## Parameter File Format
```yaml
# The command to run, including any arguments that will not be explored by
# the runner.
command: train_ann --input train.set --crossk 10
# The standard output from a command will be collected into a file named:
# out_<param1 val>_<param2 val>_<param3_val>.....out
# The standard error from a command will be collected into a file named:
# out_<param1 val>_<param2 val>_<param3_val>.....err
# If summary is specified, we will create a summary.txt file listing in tabular
# format, the value of each parameter, the standard output of the task.
# To extract only part of the standard output specify a regular expression here.
# Any capture groups in parentheses will be collected as columns in the summary
# file.
summary:
- id: True_Pos_Fraction
regex: 'TPF: ([-+]?[0-9]*\.?[0-9])'
- id: False_pos_Fraction
regex: 'FPF: ([-+]?[0-9]*\.?[0-9])'
# Cluster partition to use
partition: 256GB
# Total number of nodes to use
nodes: 4
# Number of CPUs required by each task
cpus_per_task: 4
# Time limit
time_limit: 3d:00:00:00
# The list of parameters to explore
#
# For each parameter the following properties are required:
#
# type: 'int_range' A range of integers -or-
# 'real_range' A range of real numbers -or-
# 'choice' A list of string options
#
#
# The following properties are optional:
#
# flag: '--example' A flag which should proceed the value of the paremeter
# optional: true If true, we consider combinations excluding the parameter
#
# int_range and real_range types take paremeters:
#
# min: 10 Minimum value for the parameter
# max: 1e4 Maximum value for the parameter
# step: 20 Amount to add at each step through the range
# multiply: Amount to multiply value at each step through the range
#
# e.g. for an arithmetic progression of 10 - 100 in steps of 5:
# min: 10
# max: 100
# step: 5
#
# e.g. for a geometric progression of 1, 10, 100, 1,000, 10,000 (scale factor 10)
# min: 1
# max: 10000
# scale: 10
parameters:
- id: 'hidden'
type: "int_range"
flag: '--hidden'
min: 0
max: 32
step: 2
description: "Number of hidden nodes"
- id: 'beta'
type: real_range
flag: '-b'
optional: true
min: 0.1
max: 100
scale: 10
description: Regularization beta parameter
- id: 'activation'
type: choice
flag: '--activation'
values:
- step
- sigmoid
description: Activation function
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment