_ __ __ _ /\ | | | \/ | | / \ _ _| |_ ___ | \ / | | / /\ \| | | | __/ _ \| |\/| | | / ____ \ |_| | || (_) | | | | |____ /_/ \_\__,_|\__\___/|_| |_|______| AutoML: taking the human expert out of the loop |
AutoML.org is no longer updated. We moved all information over to ml4aad.org/automl.
After having succesfully installed the basic HPOlib you can download more benchmarks or create your own. Each benchmarks comes with an algorithm and (if necessary) a wrapper and data. If you want to use one of the benchmarks listed here, follow these steps:
Let's say you want to run the logistic regression:
wget www.automl.org/logistic.tar.gz
tar -xf logistic.tar.gz
wrappingLogistic.py
,
three directories with the name of the optimizers (plus one directory for
random search), two other directories, named cv
and nocv
and a script
theano_teardown.py
.
Now choose if you want to run the experiment with
crossvalidation or without. Change into the cv
directory if you want to
use crossvalidation, if not, change into the nocv
directory. There you
will find a config.cfg
and which contains information about how long to run
the experiment, how many cross validation folds to use etc.scikit-data
to download data, when called for the first time. Then run:
HPOlib-run /path/to/optimizers/<tpe/hyperopt|smac|spearmint|tpe/random> [-s seed] [-t title]
from inside the cv
or nocv
folder to run one optimizer for as many
evaluations as stated in config.cfg
, (100 times in this example)config.cfg
can be found hereNOTE: Since calculations are done with the THEANO library, you can also run
this benchmark on a NVIDIA GPU. This is switched off by default, but you can
change this with the THEANO flags. You find them in config.cfg
and information
on how to set the THEANO flags here.
To run your own benchmark you basically need the software for the benchmark and a search space description for the optimizers smac, spearmint and tpe. In order to work with HPOlib you must put these files into a special directory structure. It is the same directory structure as for the benchmarks which you can download on this website and is explained in the list below. The following lines will guide you through the creation of such a benchmark. Here is a rough guide on what files you need:
hyperopt_august2013_mod
, random_hyperopt2013_mod
,
smac_2_06_01-dev
and spearmint_april2013_mod
. HPOlib-convert
from and to all three different optimizers.config.cfg
. See the section on configuring the HPOlib
for details.First, create a directory myBenchmark
inside the HPOlib/benchmarks
folder.
The executable HPOlib/benchmarks/myBenchmark/myAlgo.py
with the target algorithm can be as easy as
import math
import time
import HPOlib.benchmark_util as benchmark_util
def myAlgo(params, **kwargs):
# Params is a dict that contains the params
# As the values are forwarded as strings you might want to convert and check them
if not params.has_key('x'):
raise ValueError("x is not a valid key in params")
x = float(params["x"])
if x < 0 or x > 3.5:
raise ValueError("x not between 0 and 3.5: %s" % x)
# **kwargs contains further information, like
# for crossvalidation
# kwargs['folds'] is 1 when no cv
# kwargs['fold'] is the current fold. The index is zero-based
# Run your algorithm and receive a result, you want to minimize
result = -math.sin(x)
return result
if __name__ == "__main__":
starttime = time.time()
# Use a library function which parses the command line call
args, params = benchmark_util.parse_cli()
result = myAlgo(params, **args)
duration = time.time() - starttime
print "Result for ParamILS: %s, %f, 1, %f, %d, %s" % \
("SAT", abs(duration), result, -1, str(__file__))
As you can see, the script parses command line arguments, calls the target function which is implemented in myAlgo, measures the runtime of the target algorithm and prints a return string to the command line. This relevant information is extracted by the HPOlib. If you write a new algorithm/wrapper script, you must parse the following call:
target_algorithm_executable --fold 0 --folds 1 --params [ [ -param1 value1 ] ]
The return string must take the following form:
Result for ParamILS: SAT, <duration>, 1, <result>, -1, <additional information>
This return string is far from optimal and contains unnecessary and confusing parts. It is therefore subject to change in one of the next versions of the HPOlib.
Next, create HPOlib/benchmarks/myBenchmark/config.cfg
,
which is the configuration file. It tells the HPOlib what to do then looks like this:
[TPE]
space = mySpace.py
[HPOLIB]
function = python ../myAlgo.py
number_of_jobs = 200
# worst possible result
result_on_terminate = 0
Since the hyperparameter optimization algorithm must know about the variables
and their possible values for
your target algorithms, the next step is to specify these in a so-called search space.
Create a new directory hyperopt_august2013_mod
inside the HPOlib/benchmarks/myBenchmark
directory and save
these two lines of python in a file called mySpace.py
. If you look at the
config.cfg
, we already the use of the newly created search space. As problems get more complex, you may
want to specify more complex search spaces. It is recommended to do this in the
TPE format, then translate it into the SMAC format which can then be
translated into the spearmint format. More information on how to write search spaces in the TPE format
can be found here and
here.
from hyperopt import hp
space = {'x': hp.uniform('x', 0, 3.5)}
Now you can run your benchmark with tpe. The command (which has to be executed from HPOlib/benchmarks/myBenchmark
) is
HPOlib-run -o ../../optimizers/tpe/hyperopt_august2013_mod
Further you can run your benchmark with the other optimizers:
mkdir smac
python path/to/hpolib/format_converter/TpeSMAC.py tpe/mySpace.py >> smac/params.pcs
python path/to/wrapping.py smac
mkdir spearmint
python path/to/hpolib/format_converter/SMACSpearmint.py >> spearmint/config.pb
python path/to/wrapping.py spearmint
The config.cfg
is a file, which contains necessary settings about your
experiment. It is designed such that as little as possible information needs to be given.
This means all values for optimizers and the wrapping software are set to the default
values, except you want to change them. Default values are stored in a file called
config_parser/generalDefault.cfg
. The following table describes the values you must provide:
The file is divided into sections. You only need to fill in values for the HPOLIB section.
Parameter | Description |
---|---|
function | The executeable for the target algorithm. The path can either be either absolute or relative to an optimizer directory in your benchmark folder (if the executeable is not found you can try to prepend the parent directory to the path) |
number_of_jobs | number of evaluations that are performed by the optimizers. NOTE:When using k-fold-crossvalidation, SMAC will use k * number_of_jobs evaluations |
result_on_terminate | If your algorithms crashes, is killed, takes too long etc. this result is given to the optimizer. Should be the worst possible, but realistic result for a problem |
An example can be found in the section adding your own benchmark. The following parameters can be specified:
Section | Parameter | Default value | Description |
---|---|---|---|
HPOLIB | number_cv_folds |
1 | number of folds for a crossvalidation |
HPOLIB | max_crash_per_cv |
3 | If some runs of the crossvalidation fail, stop the crossvalidation for this configuration after max_crash_per_cv failed folds. |
HPOLIB | remove_target_algorithm_output |
True | Per default, the target algorithm output is deleted. Set to False to keep the output. This is useful for debugging. |
HPOLIB | console_output_delay |
1.0 | HPOlib reads the experiment pickle periodically to print the current status to the command line interface. Doing this often can inhibit performance of your hard-drive (espacially if perform a lot of HPOlib experiments in parallel) so you might want to increase this number if you experience delay when accessing your hard drive. |
HPOLIB | runsolver_time_limit, memory_limit, cpu_limit |
Enforce resource limits to a target algorithm run. If these limits are exceeded, the target algorithm will be killed by the runsolver. This can be used to ensure e.g. a runtime per algorithm or make sure an algorithm does not use too much space on a computing cluster. | |
HPOLIB | total_time_limit |
Enforce a total time limit on th hyperparameter optimization. | |
HPOLIB | leading_runsolver_info |
Important when using THEANO and CUDA, see section configure theano | |
HPOLIB | use_own_time_measurement |
True | When set to True (the default), the runsolver time measurement is saved. Otherwise, the time measurement of the target algorithm is saved. |
HPOLIB | number_of_concurrent_jobs |
1 | WARNING: this only works for spearmint and SMAC and is not tested! |
HPOLIB | function_setup |
An executable which is called before the first target algorithm call. This can be for example check if everything is installed properly. | |
HPOLIB | function_teardown |
An executable which is called after the last target algorithm call. This can be for example delete temporary directories. | |
HPOLIB | experiment_directory_prefix |
Adds a prefix to the automatically generated experiment directory. Can be useful if one experiments is run several times with different parameter settings. | |
HPOLIB | handles_cv |
This flag determines whether cv.py or runsolver_wrapper.py is the proxy which a hyperparameter optimization package optimizes. This is only set to 1 for SMAC and must only be used by optimization algorithm developers. |
Hyperparameter optimization package parameters:
TPE | space |
space.py | Name of the search space for tpe |
TPE | path_to_optimizer |
./hyperopt_august2013_mod_src | |
SMAC | p |
smac/params.pcs | Path to search space for SMAC relative to Benchmark directory |
SMAC | run_obj |
QUALITY | Please consult the SMAC documentation. |
SMAC | intra_instance_obj |
MEAN | Please consult the SMAC documentation. |
SMAC | rf_full_tree_bootstrap |
False | Please consult the SMAC documentation. |
SMAC | rf_split_min |
10 | Please consult the SMAC documentation. |
SMAC | adaptive_capping |
false | Please consult the SMAC documentation. |
SMAC | max_incumbent_runs |
2000 | Please consult the SMAC documentation. |
SMAC | num_iterations |
2147483647 | Please consult the SMAC documentation. |
SMAC | deterministic |
True | Please consult the SMAC documentation. |
SMAC | retry_target_algorithm_run_count |
0 | Please consult the SMAC documentation. |
SMAC | intensification_percentage |
0 | Please consult the SMAC documentation. |
SMAC | validation |
false | Please consult the SMAC documentation. |
SMAC | path_to_optimizer |
./smac_2_06_01-dev_src | |
SPEARMINT | config |
config.pb | Name of the spearmint grid_seed. For syntax details, please look at the examples in the spearmint package. |
SPEARMINT | method |
GPEIOptChooser | The spearmint chooser to be used. Please consult the spearmint documentation for possible choices. WARNING: Only the GPEIOptChooser is tested! |
SPEARMINT | method_args |
Pass arguments to the chooser method. Please consult the spearmint documentation for possible choices. | |
SPEARMINT | grid_size |
20000 | Length of the Sobol sequence spearmint uses to optimize the Expected Improvement. |
SPEARMINT | spearmint_polling_time |
3.0 | Spearmint reads its experiment pickle and checks for finished jobs periodically to find out whether a new job has to be started. For very short functions evaluations, this value can be decreased. Bear in mind that this puts load on your hard drive and can slow down your system if the experiment pickle becomes large (e.g. for the AutoWeka benchmark) or you run a lot of parallel jobs (>100). |
SPEARMINT | path_to_optimizer |
./spearmint_april2013_mod_src |
The config parameters can also be set via the command line. A use case for this feature is to run the same experiment multiple times, but with different parameters. The syntax is:
HPOlib-run -o spearmint/spearmint_april2013_mod --SECTION:argument value
To set for example the spearmint grid size to 40000, use the following call
HPOlib-run -o spearmint/spearmint_april2013_mod --SPEARMINT:grid_size 40000
If your target algorithm is a python script, you can also load the config file
from within your target algorithm. This allows you to specify extra parameters
for your target algorithm in the config file. Simply import HPOlib.wrapping_util
in your python script and call HPOlib.wrapping_util.load_experiment_config_file()
.
The return value is a python config parser object.
The theano-based benchmarks can be speed-up by either running them on a nvidia GPU or with an optimized BLAS library. Theano is either configured with theano flags, by changing the value of a variable in the target program (not recommended as you have to change source code) or by using a .theanorc file. The .theanorc file is good for global configurations and you can find more information on how to use it on the theano config page. For a more fine-grained control of theano you have to use theano flags.
Unfortunately, setting them in the shell before invoking HPOlib-run
does not work and therefore
these parameters have to be added set via the config variable leading_runsolver_info
.
This is already set to a reasonable default for the respective benchmarks but has
to be changed in order to speed up calculations.
For openBlas, change the paths in the following paragraph and replace the value of the
config variable leading_runsolver_info
. In case you want to change more of the theano
behaviour (e.g. the compile directory) you must append these flags to the config
variable.
OPENBLAS_NUM_THREADS=2 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/the/openBLAS/lib LIBRARY_PATH=$LIBRARY_PATH:/path/to/the/openBLAS/lib THEANO_FLAGS=floatX=float32,device=cpu,blas.ldflags=-lopenblas
If you want to use CUDA on your nvidia GPU, you have to change device=cpu
to device=gpu
and add cuda.root=/usr/local/cuda fill to the THEANO flags. Change cuda.root
to your cuda installation directory if you did not install cuda to the
default location. For that, replace the path `cuda.root=/usr/local/cuda~ with
the path to your CUDA installation.
The interface to include your own optimizer is straight-forward. Let's assume
that you have written a hyperparameter optimization package called BayesOpt2.
You tell the HPOlib to use your software with the command line argument -o
or
--optimizer
. A call to HPOlib-run -o /path/to/BayesOpt2
should the run
an experiment with your newly written software.
But so far, the HPOlib does not know how to call your software. To let the HPOlib know about the interface to your optimizer, you need to create the three following files (replace BayesOpt2 if your optimization package has a different name):
The rest of this section will explain interface these scripts must provide and the functionality which they must perform
To run BayesOpt2, HPOlib will call the main function of the script bayesopt2.py
.
The function signature is as follows:
(call_string, directory) = optimizer_module.main(config=config, options=args, experiment_dir=experiment_dir, experiment_directory_prefix=experiment_directory_prefix)
Argument config
is of type ConfigParser,
options
of type ArgumentParser
and experiment_dir
is a string to the experiment directory. The return value
is a tuple (call_string, directory)
. call_string
must be a valid (bash)
shell command which calls your hyperparameter optimization package in the way you intend.
You can construct the call string based on the information in the config and the
options you are provided with. directory
must be a new directory in which all
experiment output will be stored. HPOlib-run
will the change in to the output directory
which your function returned and execute the call string. Your script must therefor
do the following in the main
function:
Set up an experiment directory and return the path to the experiment directory. It is highly recommended to create a directory with the following name:
<experiment_directory_prefix><bayesopt2><time_string>
Return a valid bash shell command, which will be used to call your optimizer
from the command line interface. The target algorithm you want to optimize
is mostly called cv.py
, except for SMAC which handles corssvalidation on
its own. Calling cv.py
allows optimizer independend bookkeeping. The actual
function call is the invoked by the HPOlib. Its interface is
python cv.py -param_name1 'param_value' -x '5' -y '3.0'
etc... The function
simply prints the loss to the command line. If your hyperparameter
optimization package is written in python, you can also directly call the
method doForTPE(params)
, where the params argument is a dictionary with
all parameter values (both key and value being strings).
Have a look at the bundled scripts smac_2_06_01-dev.py
,
spearmint_april2013_mod.py
and hyperopt_august2013_mod.py
to get an idea what can/must be done.
The parser file implements a simple interface which only allows the manipulation of the config file:
config = manipulate_config(config)
See the python documentation
for the documentation of the config object. Common usage of manipulate_config
is to check if mandatory arguments are provided. This is also the recommended
place to convert values from the HPOLIB section to the appropriate values of the
optimization package.
A configuration file for your optimization package as described in the configuration section.