Maintained by Steven Adriaensen.
The following list considers papers related to dynamic algorithm configuration. It is by no means complete. If you miss a paper on the list, please let me know.
Please note that dynamic configuration has been studied in many different communities (under many different names) and each community has developed a slightly different focus or evaluation criteria. Our criteria for maintaining this literature list are as follows:
- Does the presented work change (hyper-)parameters on the fly (i.e., during the run of a target algorithm)?
- Is this done in an automated fashion (e.g., via a learned update policy)?
- Does it have a meta-learning component (i.e., can the configuration policies be transferred to problems that it has not been ‘learned’ on)?
2023
Sabbioni, Luca; Corda, Francesco; Restelli, Marcello
Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes Unpublished
2023.
@unpublished{sabbioni-arxiv23a,
title = {Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes},
author = {Luca Sabbioni and Francesco Corda and Marcello Restelli},
url = {https://arxiv.org/abs/2306.07741},
year = {2023},
date = {2023-06-13},
abstract = {Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance, and tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on their ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the difference of performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments.},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
Chen, Deyao; Buzdalov, Maxim; Doerr, Carola; Dang, Nguyen
Using Automated Algorithm Configuration for Parameter Control Conference
Proceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms., 2023.
@conference{chen2023using,
title = {Using Automated Algorithm Configuration for Parameter Control},
author = {Deyao Chen and Maxim Buzdalov and Carola Doerr and Nguyen Dang},
url = {https://arxiv.org/abs/2302.12334},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {Proceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms.},
journal = {arXiv preprint arXiv:2302.12334},
pages = {38-49},
abstract = {Dynamic Algorithm Configuration (DAC) tackles the question of how to automatically learn policies to control parameters of algorithms in a data-driven fashion. This question has received considerable attention from the evolutionary community in recent years. Having a good benchmark collection to gain structural understanding on the effectiveness and limitations of different solution methods for DAC is therefore strongly desirable. Following recent work on proposing DAC benchmarks with well-understood theoretical properties and ground truth information, in this work, we suggest as a new DAC benchmark the controlling of the key parameter λ in the (1+(λ,λ)) Genetic Algorithm for solving OneMax problems. We conduct a study on how to solve the DAC problem via the use of (static) automated algorithm configuration on the benchmark, and propose techniques to significantly improve the performance of the approach. Our approach is able to consistently outperform the default parameter control policy of the benchmark derived from previous theoretical work on sufficiently large problem sizes. We also present new findings on the landscape of the parameter-control search policies and propose methods to compute stronger baselines for the benchmark via numerical approximations of the true optimal policies.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
2022
Adriaensen, Steven; Biedenkapp, André; Shala, Gresa; Awad, Noor; Eimer, Theresa; Lindauer, Marius; Hutter, Frank
Automated Dynamic Algorithm Configuration Journal Article
In: Journal of Artificial Intelligence Research (JAIR), vol. 75, pp. 1633-1699, 2022.
@article{adriaens-arxiv22a,
title = {Automated Dynamic Algorithm Configuration},
author = {Steven Adriaensen and André Biedenkapp and Gresa Shala and Noor Awad and Theresa Eimer and Marius Lindauer and Frank Hutter},
url = {https://doi.org/10.1613/jair.1.13922
https://github.com/automl/2022_JAIR_DAC_experiments, code},
year = {2022},
date = {2022-12-30},
urldate = {2022-05-30},
journal = {Journal of Artificial Intelligence Research (JAIR)},
volume = {75},
pages = {1633-1699},
abstract = {The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. However, it has been shown that some algorithm parameters are best adjusted dynamically during execution. Thus far, this is most commonly achieved through hand-crafted heuristics. A promising recent alternative is to automatically learn such dynamic parameter adaptation policies from data. In this article, we give the first comprehensive account of this new field of automated dynamic algorithm configuration (DAC), present a series of recent advances, and provide a solid foundation for future research in this field. Specifically, we (i) situate DAC in the broader historical context of AI research; (ii) formalize DAC as a computational problem; (iii) identify the methods used in prior art to tackle this problem; and (iv) conduct empirical case studies for using DAC in evolutionary optimization, AI planning, and machine learning.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Xue, Ke; Xu, Jiacheng; Yuan, Lei; Li, Miqing; Qian, Chao; Zhang, Zongzhang; Yu, Yang
Multi-agent Dynamic Algorithm Configuration Proceedings Article
In: Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS'22), 2022.
@inproceedings{xue-neurips22a,
title = {Multi-agent Dynamic Algorithm Configuration},
author = {Ke Xue and Jiacheng Xu and Lei Yuan and Miqing Li and Chao Qian and Zongzhang Zhang and Yang Yu},
url = {https://arxiv.org/abs/2210.06835v1, arxiv},
year = {2022},
date = {2022-11-28},
urldate = {2022-11-28},
booktitle = {Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS'22)},
abstract = {Automated algorithm configuration relieves users from tedious, trial-and-error tuning tasks. A popular algorithm configuration tuning paradigm is dynamic algorithm configuration (DAC), in which an agent learns dynamic configuration policies across instances by reinforcement learning (RL). However, in many complex algorithms, there may exist different types of configuration hyperparameters, and such heterogeneity may bring difficulties for classic DAC which uses a single-agent RL policy. In this paper, we aim to address this issue and propose multi-agent DAC (MA-DAC), with one agent working for one type of configuration hyperparameter. MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm. To instantiate, we apply MA-DAC to a well-known optimization algorithm for multi-objective optimization problems. Experimental results show the effectiveness of MA-DAC in not only achieving superior performance compared with other configuration tuning approaches based on heuristic rules, multi-armed bandits, and single-agent RL, but also being capable of generalizing to different problem classes. Furthermore, we release the environments in this paper as a benchmark for testing MARL algorithms, with the hope of facilitating the application of MARL.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Biedenkapp, André
Dynamic Algorithm Configuration by Reinforcement Learning PhD Thesis
2022.
@phdthesis{biedenkapp22,
title = {Dynamic Algorithm Configuration by Reinforcement Learning},
author = {André Biedenkapp},
url = {https://freidok.uni-freiburg.de/data/230869, UniFreiburg
https://ml.informatik.uni-freiburg.de/wp-content/uploads/2022/11/2022_Dissertation_Andre_Biedenkapp.pdf, pdf},
year = {2022},
date = {2022-10-14},
abstract = {The performance of algorithms, be it in the domain of machine learning, hard combinatorial problem solving or AI in general depends on their many parameters. Tuning an algorithm manually, however, is error-prone and very time-consuming. Many, if not most, algorithms are iterative in nature. Thus, they traverse a potentially diverse solution space, which might require different parameter settings at different stages to behave optimally. Further, algorithms are often used for solving a diverse set of problem instances, which by themselves might require different parameters. Taking all of this into account is infeasible for a human designer. Automated methods have therefore been proposed to mitigate human errors and minimize manual efforts. While such meta-algorithmic methods have shown large successes, there is still a lot of untapped potentials as prior approaches typically only consider configurations that do not change during an algorithm’s run or do not adapt to the problem instance.
In this dissertation, we present the first framework that is capable of dynamically configuring algorithms, in other words, capable of adapting configurations to the problem instance at hand during an algorithm’s solving process. To this end, we model the dynamic algorithm configuration (DAC) problem as a contextual Markov decision process. This enables us to learn dynamic configuration policies in a data-driven way by means of reinforcement learning.
We empirically demonstrate the effectiveness of our framework on a diverse set of problem settings consisting of artificial benchmarks, evolutionary algorithms, AI planning systems, as well as deep learning. We show that DAC outperforms previous meta-algorithmic approaches. Building on these successes, we formulate the first standardized interface for dynamic configuration and an extensive benchmark to facilitate reproducibility and lower the barrier of entry for new researchers into this novel research field. Lastly, our work on DAC feeds back into the reinforcement learning paradigm. Through the lens of DAC, we identify shortcomings in current state-of-the-art approaches and demonstrate how to solve these. In particular, intending to learn general policies for DAC, our work pushes the boundaries of generalization in reinforcement learning. We demonstrate how to efficiently incorporate domain knowledge when training general agents and propose to move from a reactive way of doing reinforcement learning to a proactive way by learning when to make new decisions.},
howpublished = {https://freidok.uni-freiburg.de/data/230869},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}
In this dissertation, we present the first framework that is capable of dynamically configuring algorithms, in other words, capable of adapting configurations to the problem instance at hand during an algorithm’s solving process. To this end, we model the dynamic algorithm configuration (DAC) problem as a contextual Markov decision process. This enables us to learn dynamic configuration policies in a data-driven way by means of reinforcement learning.
We empirically demonstrate the effectiveness of our framework on a diverse set of problem settings consisting of artificial benchmarks, evolutionary algorithms, AI planning systems, as well as deep learning. We show that DAC outperforms previous meta-algorithmic approaches. Building on these successes, we formulate the first standardized interface for dynamic configuration and an extensive benchmark to facilitate reproducibility and lower the barrier of entry for new researchers into this novel research field. Lastly, our work on DAC feeds back into the reinforcement learning paradigm. Through the lens of DAC, we identify shortcomings in current state-of-the-art approaches and demonstrate how to solve these. In particular, intending to learn general policies for DAC, our work pushes the boundaries of generalization in reinforcement learning. We demonstrate how to efficiently incorporate domain knowledge when training general agents and propose to move from a reactive way of doing reinforcement learning to a proactive way by learning when to make new decisions.
Michele Tessari, Giovanni Iacca
Reinforcement learning based adaptive metaheuristics Workshop
Genetic and Evolutionary Computation Conference (GECCO) 2022, Companion Proceedings, 2022.
@workshop{tessari-geccocompanion22a,
title = {Reinforcement learning based adaptive metaheuristics},
author = {Michele Tessari, Giovanni Iacca},
url = {https://arxiv.org/abs/2206.12233, arxiv},
year = {2022},
date = {2022-07-10},
urldate = {2022-07-10},
booktitle = {Genetic and Evolutionary Computation Conference (GECCO) 2022, Companion Proceedings},
abstract = {Parameter adaptation, that is the capability to automatically adjust an algorithm's hyperparameters depending on the problem being faced, is one of the main trends in evolutionary computation applied to numerical optimization. While several handcrafted adaptation policies have been proposed over the years to address this problem, only few attempts have been done so far at applying machine learning to learn such policies. Here, we introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms. We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE), for which we learn, respectively, adaptation policies for the step-size (for CMA-ES), and the scale factor and crossover rate (for DE). We train these policies on a set of 46 benchmark functions at different dimensionalities, with various inputs to the policies, in two settings: one policy per function, and one global policy for all functions. Compared, respectively, to the Cumulative Step-size Adaptation (CSA) policy and to two well-known adaptive DE variants (iDE and jDE), our policies are able to produce competitive results in the majority of cases, especially in the case of DE. },
keywords = {},
pubstate = {published},
tppubtype = {workshop}
}
Biedenkapp, André; Dang, Nguyen; Krejca, Martin S.; Hutter, Frank; Doerr, Carola
Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration Proceedings Article
In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'22), 2022.
@inproceedings{biedenkapp-gecco22a,
title = {Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration},
author = {André Biedenkapp and Nguyen Dang and Martin S. Krejca and Frank Hutter and Carola Doerr},
url = {https://arxiv.org/abs/2202.03259},
year = {2022},
date = {2022-07-09},
urldate = {2022-07-09},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'22)},
journal = {arXiv:2202.03259 [cs.NE]},
abstract = {It has long been observed that the performance of evolutionary algorithms and other randomized search heuristics can benefit from a non-static choice of the parameters that steer their optimization behavior. Mechanisms that identify suitable configurations on the fly ("parameter control") or via a dedicated training process ("dynamic algorithm configuration") are therefore an important component of modern evolutionary computation frameworks. Several approaches to address the dynamic parameter setting problem exist, but we barely understand which ones to prefer for which applications. As in classical benchmarking, problem collections with a known ground truth can offer very meaningful insights in this context. Unfortunately, settings with well-understood control policies are very rare.
One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. This also allows us to compute optimal parameter portfolios of a given size. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. This also allows us to compute optimal parameter portfolios of a given size. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
Biedenkapp, André; Speck, David; Sievers, Silvan; Hutter, Frank; Lindauer, Marius; Seipp, Jendrik
Learning Domain-Independent Policies for Open List Selection Workshop
Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL @ ICAPS'22), 2022.
@workshop{biedenkapp-prl22a,
title = {Learning Domain-Independent Policies for Open List Selection},
author = {André Biedenkapp and David Speck and Silvan Sievers and Frank Hutter and Marius Lindauer and Jendrik Seipp},
url = {http://ml.informatik.uni-freiburg.de/wp-content/uploads/2022/06/22-PRL-DAC4AIPlanning.pdf, pdf},
year = {2022},
date = {2022-06-13},
urldate = {2022-06-13},
booktitle = {Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL @ ICAPS'22)},
abstract = {Since its proposal over a decade ago, LAMA has been con-
sidered one of the best-performing satisficing classical plan-
ners. Its key component is heuristic search with multiple open
lists, each using a different heuristic function to order states.
Even with a very simple, ad-hoc policy for open list selec-
tion, LAMA achieves state-of-the-art results. In this paper, we
propose to use dynamic algorithm configuration to learn such
policies in a principled and data-driven manner. On the learn-
ing side, we show how to train a reinforcement learning agent
over several heterogeneous environments, aiming at zero-shot
generalization to new related domains. On the planning side,
our experimental results show that the trained policies often
reach the performance of LAMA, and sometimes even per-
form better. Furthermore, our analysis of different policies
shows that prioritizing states reached via preferred operators
is crucial, explaining the strong performance of LAMA.},
keywords = {},
pubstate = {published},
tppubtype = {workshop}
}
sidered one of the best-performing satisficing classical plan-
ners. Its key component is heuristic search with multiple open
lists, each using a different heuristic function to order states.
Even with a very simple, ad-hoc policy for open list selec-
tion, LAMA achieves state-of-the-art results. In this paper, we
propose to use dynamic algorithm configuration to learn such
policies in a principled and data-driven manner. On the learn-
ing side, we show how to train a reinforcement learning agent
over several heterogeneous environments, aiming at zero-shot
generalization to new related domains. On the planning side,
our experimental results show that the trained policies often
reach the performance of LAMA, and sometimes even per-
form better. Furthermore, our analysis of different policies
shows that prioritizing states reached via preferred operators
is crucial, explaining the strong performance of LAMA.
Bhatia, Abhinav; Svegliato, Justin; Nashed, Samer B.; Zilberstein, Shlomo
Tuning the Hyperparameters of Anytime Planning:A Metareasoning Approach with Deep Reinforcement Learning Proceedings Article
In: Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS'22), 2022.
@inproceedings{bathia-icaps22a,
title = {Tuning the Hyperparameters of Anytime Planning:A Metareasoning Approach with Deep Reinforcement Learning},
author = {Abhinav Bhatia and Justin Svegliato and Samer B. Nashed and Shlomo Zilberstein },
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/19842/19601},
year = {2022},
date = {2022-06-13},
urldate = {2022-06-13},
booktitle = {Proceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS'22)},
abstract = {Anytime planning algorithms often have hyperparameters that can be tuned at runtime to optimize their performance. While work on metareasoning has focused on when to interrupt an anytime planner and act on the current plan, the scope of metareasoning can be expanded to tuning the hyperparameters of the anytime planner at runtime. This paper introduces a general, decision-theoretic metareasoning approach that optimizes both the stopping point and hyperparameters of anytime planning. We begin by proposing a generalization of the standard meta-level control problem for anytime algorithms. We then offer a meta-level control technique that monitors and controls an anytime algorithm using deep reinforcement learning. Finally, we show that our approach boosts performance on a common benchmark domain that uses anytime weighted A* to solve a range of heuristic search problems and a mobile robot application that uses RRT* to solve motion planning problems. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Mandhane, Amol; Zhernov, Anton; Rauh, Maribeth; Gu, Chenjie; Wang, Miaosen; Xue, Flora; Shang, Wendy; Pang, Derek; Claus, Rene; Chiang, Ching-Han; others,
Muzero with self-competition for rate control in vp9 video compression Unpublished
2022.
@unpublished{mandhane-arxiv22a,
title = {Muzero with self-competition for rate control in vp9 video compression},
author = {Amol Mandhane and Anton Zhernov and Maribeth Rauh and Chenjie Gu and Miaosen Wang and Flora Xue and Wendy Shang and Derek Pang and Rene Claus and Ching-Han Chiang and others},
url = {https://arxiv.org/abs/2202.06626},
year = {2022},
date = {2022-02-14},
urldate = {2023-06-13},
abstract = {Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce energy use and costs overall. In this paper, we present an application of the MuZero algorithm to the challenge of video compression. Specifically, we target the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library widely used by popular video-on-demand (VOD) services. We treat this as a sequential decision making problem to maximize the video quality with an episodic constraint imposed by the target bitrate. Notably, we introduce a novel self-competition based reward mechanism to solve constrained RL with variable constraint satisfaction difficulty, which is challenging for existing constrained RL methods. We demonstrate that the MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level (measured as PSNR BD-rate) compared to libvpx's two-pass VBR rate control policy, while having better constraint satisfaction behavior.},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
2021
Getzelman, Grant; Balaprakash, Prasanna
Learning to Switch Optimizers for Quadratic Programming Proceedings Article
In: Balasubramanian, Vineeth N.; Tsang, Ivor (Ed.): Proceedings of The 13th Asian Conference on Machine Learning, pp. 1553–1568, PMLR, 2021.
@inproceedings{getzelman-acml21a,
title = {Learning to Switch Optimizers for Quadratic Programming},
author = {Getzelman, Grant and Balaprakash, Prasanna},
editor = {Balasubramanian, Vineeth N. and Tsang, Ivor},
url = {https://proceedings.mlr.press/v157/getzelman21a.html},
year = {2021},
date = {2021-11-17},
booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
volume = {157},
pages = {1553--1568},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
abstract = {Quadratic programming (QP) seeks to solve optimization problems involving quadratic functions that can include complex boundary constraints. QP in the unrestricted form is $mathcal{NP}$-hard; but when restricted to the convex case, it becomes tractable. Active set and interior point methods are used to solve convex problems, and in the nonconvex case various heuristics or relaxations are used to produce high-quality solutions in finite time. Learning to optimize (L2O) is an emerging approach to design solvers for optimization problems. We develop an L2O approach that uses reinforcement learning to learn a stochastic policy to switch between pre-existing optimization algorithms to solve QP problem instances. In particular, our agent switches between three simple optimizers: Adam, gradient descent, and random search. Our experiments show that the learned optimizer minimizes quadratic functions faster and finds better-quality solutions in the long term than do any of the possible optimizers switched between. We also compare our solver with the standard QP algorithms in MATLAB and find better performance in fewer function evaluations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Olegovich Malashin, Roman
Sparsely Ensembled Convolutional Neural Network Classifiers via Reinforcement Learning Proceedings Article
In: 2021 6th International Conference on Machine Learning Technologies, pp. 102–110, 2021, ISBN: 9781450389402.
@inproceedings{malashin-acm21,
title = {Sparsely Ensembled Convolutional Neural Network Classifiers via Reinforcement Learning},
author = {Olegovich Malashin, Roman},
url = {https://dl.acm.org/doi/10.1145/3468891.3468906
https://arxiv.org/abs/2102.03921},
doi = {10.1145/3468891.3468906},
isbn = {9781450389402},
year = {2021},
date = {2021-09-06},
booktitle = {2021 6th International Conference on Machine Learning Technologies},
pages = {102–110},
series = {ICMLT 2021},
abstract = {We consider convolutional neural network (CNN) ensemble learning with the objective function inspired by the least action principle; it includes resource consumption component. We teach an agent to perceive images through the set of pre-trained classifiers and want the resulting dynamically configured system to unfold the computational graph with the trajectory that refers to the minimal number of operations and maximal expected accuracy. The proposed agent's architecture implicitly approximates the required classifier selection function with the help of reinforcement learning. Our experimental results prove, that if the agent exploits the dynamic (and context-dependent) structure of computations, it outperforms conventional ensemble learning.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Nguyen, Manh Hung; Grinsztajn, Nathan; Guyon, Isabelle; Sun-Hosoy, Lisheng
MetaREVEAL: RL-based Meta-learning from Learning Curves Proceedings Article
In: Workshop on Interactive Adaptive Learning co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), 2021.
@inproceedings{nguyen-ial21a,
title = {MetaREVEAL: RL-based Meta-learning from Learning Curves},
author = {Manh Hung Nguyen and Nathan Grinsztajn and Isabelle Guyon and Lisheng Sun-Hosoy},
url = {https://hal.inria.fr/hal-03502358v2/document},
year = {2021},
date = {2021-09-01},
urldate = {2021-09-01},
booktitle = {Workshop on Interactive Adaptive Learning co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)},
abstract = {This paper addresses a cornerstone of Automated Machine Learning: the problem of rapidly uncovering which machine learning algorithm performs best on a new dataset. Our approach leverages performances of such algorithms on datasets to which they have been previously exposed, i.e., implementing a form of meta-learning. More specifically, the problem is cast as a REVEAL Reinforcement Learning (RL) game: the meta-learning problem is wrapped into a RL environment in which an agent can start, pause, or resume training various machine learning algorithms to progressively “reveal” their learning curves. The learned policy is then applied to quickly uncover the best algorithm on a new dataset. While other similar approaches, such as Freeze-Thaw, were proposed in the past, using Bayesian optimization, our methodology is, to the best of our knowledge, the first that trains a RL agent to do this task on previous datasets. Using real and artificial data, we show that our new RL-based meta-learning paradigm outperforms Free-Thaw and other baseline methods, with respect to the Area under the Learning curve metric, a form of evaluation of Any-time learning (i.e., the capability of interrupting the algorithm at any time while obtaining good performance).},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Ichnowski, Jeffrey; Jain, Paras; Stellato, Bartolomeo; Banjac, Goran; Luo, Michael; Borrelli, Francesco; Gonzalez, Joseph E.; Stoica, Ion; Goldberg, Ken
Accelerating Quadratic Optimization with Reinforcement Learning Unpublished
2021.
@unpublished{ichnowski-arxiv21a,
title = {Accelerating Quadratic Optimization with Reinforcement Learning},
author = {Jeffrey Ichnowski and Paras Jain and Bartolomeo Stellato and Goran Banjac and Michael Luo and Francesco Borrelli and Joseph E. Gonzalez and Ion Stoica and Ken Goldberg},
url = {https://arxiv.org/abs/2107.10847},
year = {2021},
date = {2021-07-22},
journal = {arXiv},
abstract = {First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Meszaros problems. Code for RLQP is available at this https URL. },
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
Speck, D; Biedenkapp, A; Hutter, F; Mattmüller, R; Lindauer, M
Learning Heuristic Selection with Dynamic Algorithm Configuration Proceedings Article
In: Zhuo, H H; Yang, Q; Do, M; Goldman, R; Biundo, S; Katz, M (Ed.): Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS'21), pp. 597–605, AAAI, 2021.
@inproceedings{speck-icaps21b,
title = {Learning Heuristic Selection with Dynamic Algorithm Configuration},
author = {D Speck and A Biedenkapp and F Hutter and R Mattmüller and M Lindauer},
editor = {H H Zhuo and Q Yang and M Do and R Goldman and S Biundo and M Katz},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/16008},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS'21)},
pages = {597--605},
publisher = {AAAI},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Eimer, T; Biedenkapp, A; Reimer, M; Adriaensen, S; Hutter, F; Lindauer, M
DACBench: A Benchmark Library for Dynamic Algorithm Configuration Proceedings Article
In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI'21), ijcai.org, 2021.
@inproceedings{eimer-ijcai21,
title = {DACBench: A Benchmark Library for Dynamic Algorithm Configuration},
author = {T Eimer and A Biedenkapp and M Reimer and S Adriaensen and F Hutter and M Lindauer},
url = {https://ml.informatik.uni-freiburg.de/papers/21-IJCAI-DACBench.pdf},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the Thirtieth International Joint Conference on
Artificial Intelligence (IJCAI'21)},
publisher = {ijcai.org},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Almeida, Diogo; Winter, Clemens; Tang, Jie; Zaremba, Wojciech
A Generalizable Approach to Learning Optimizers Unpublished
2021.
@unpublished{almeida-arxiv21,
title = {A Generalizable Approach to Learning Optimizers},
author = {Diogo Almeida and Clemens Winter and Jie Tang and Wojciech Zaremba},
url = {https://arxiv.org/pdf/2106.00958.pdf},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
journal = {arXiv preprint arXiv:2106.00958},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
Bhatia, Abhinav; Svegliato, Justin; Zilberstein, Shlomo
Tuning the Hyperparameters of Anytime Planning: A Deep Reinforcement Learning Approach Proceedings Article
In: ICAPS 2021 Workshop on Heuristics and Search for Domain-independent Planning, 2021.
@inproceedings{<LineBreak>bhatia-icaps21,
title = {Tuning the Hyperparameters of Anytime Planning: A Deep Reinforcement Learning Approach},
author = {Abhinav Bhatia and Justin Svegliato and Shlomo Zilberstein},
url = {https://openreview.net/forum?id=c7hpFp_eRCo},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {ICAPS 2021 Workshop on Heuristics and Search for Domain-independent Planning},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2020
Biedenkapp, André; Bozkurt, H. Furkan; Eimer, Theresa; Hutter, Frank; Lindauer, Marius
Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework Conference
Proceedings of the Twenty-fourth European Conference on Artificial Intelligence (ECAI'20), 2020.
@conference{biedenka-ecai,
title = {Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework},
author = {André Biedenkapp and H. Furkan Bozkurt and Theresa Eimer and Frank Hutter and Marius Lindauer},
url = {http://ecai2020.eu/papers/1237_paper.pdf},
year = {2020},
date = {2020-08-29},
booktitle = {Proceedings of the Twenty-fourth European Conference on Artificial Intelligence (ECAI'20)},
abstract = {The performance of many algorithms in the fields of
hard combinatorial problem solving, machine learning or AI in general depends on parameter tuning. Automated methods have been
proposed to alleviate users from the tedious and error-prone task of manually searching for performance-optimized configurations across
a set of problem instances. However, there is still a lot of untapped
potential through adjusting an algorithm’s parameters online since
different parameter values can be optimal at different stages of the
algorithm. Prior work showed that reinforcement learning is an effective approach to learn policies for online adjustments of algorithm
parameters in a data-driven way. We extend that approach by formulating the resulting dynamic algorithm configuration as a contextual
MDP, such that RL not only learns a policy for a single instance, but across a set of instances. To lay the foundation for studying dynamic
algorithm configuration with RL in a controlled setting, we propose
white-box benchmarks covering major aspects that make dynamic algorithm configuration a hard problem in practice and study the performance of various types of configuration strategies for them. On
these white-box benchmarks, we show that (i) RL is a robust candidate for learning configuration policies, outperforming standard parameter optimization approaches, such as classical algorithm configuration; (ii) based on function approximation, RL agents can learn to
generalize to new types of instances; and (iii) self-paced learning can
substantially improve the performance by selecting a useful sequence
of training instances automatically.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
hard combinatorial problem solving, machine learning or AI in general depends on parameter tuning. Automated methods have been
proposed to alleviate users from the tedious and error-prone task of manually searching for performance-optimized configurations across
a set of problem instances. However, there is still a lot of untapped
potential through adjusting an algorithm’s parameters online since
different parameter values can be optimal at different stages of the
algorithm. Prior work showed that reinforcement learning is an effective approach to learn policies for online adjustments of algorithm
parameters in a data-driven way. We extend that approach by formulating the resulting dynamic algorithm configuration as a contextual
MDP, such that RL not only learns a policy for a single instance, but across a set of instances. To lay the foundation for studying dynamic
algorithm configuration with RL in a controlled setting, we propose
white-box benchmarks covering major aspects that make dynamic algorithm configuration a hard problem in practice and study the performance of various types of configuration strategies for them. On
these white-box benchmarks, we show that (i) RL is a robust candidate for learning configuration policies, outperforming standard parameter optimization approaches, such as classical algorithm configuration; (ii) based on function approximation, RL agents can learn to
generalize to new types of instances; and (iii) self-paced learning can
substantially improve the performance by selecting a useful sequence
of training instances automatically.
Gomoluch, Pawel; Alrajeh, Dalal; Russo, Alessandra; Bucchiarone, Antonio
Learning Neural Search Policies for Classical Planning Proceedings Article
In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 522–530, 2020.
@inproceedings{gomoluch-icaps20b,
title = {Learning Neural Search Policies for Classical Planning},
author = {Pawel Gomoluch and Dalal Alrajeh and Alessandra Russo and Antonio Bucchiarone},
url = {https://icaps20.icaps-conference.org/paper191.html},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
booktitle = {Proceedings of the International Conference on Automated Planning and Scheduling},
volume = {30},
pages = {522--530},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Shala, G; Biedenkapp, A; Awad, N; Adriaensen, S; Lindauer, M; Hutter, F
Learning Step-Size Adaptation in CMA-ES Proceedings Article
In: Proceedings of the Sixteenth International Conference on Parallel Problem Solving from Nature (PPSN'20), pp. 691–706, Springer, 2020.
@inproceedings{shala-ppsn20b,
title = {Learning Step-Size Adaptation in CMA-ES},
author = {G Shala and A Biedenkapp and N Awad and S Adriaensen and M Lindauer and F Hutter},
url = {https://ml.informatik.uni-freiburg.de/papers/20-PPSN-LTO-CMA.pdf},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
booktitle = {Proceedings of the Sixteenth International Conference on Parallel Problem Solving from Nature (PPSN'20)},
pages = {691--706},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Sae-Dan, Weerapan; Kessaci, Marie-Eléonore; Veerapen, Nadarajen; Jourdan, Laetitia
Time-Dependent Automatic Parameter Configuration of a Local Search Algorithm Proceedings Article
In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1898–1905, Association for Computing Machinery, Cancún, Mexico, 2020, ISBN: 9781450371278.
@inproceedings{sae-dan-gecco20b,
title = {Time-Dependent Automatic Parameter Configuration of a Local Search Algorithm},
author = {Weerapan Sae-Dan and Marie-Eléonore Kessaci and Nadarajen Veerapen and Laetitia Jourdan},
url = {https://doi.org/10.1145/3377929.3398107},
doi = {10.1145/3377929.3398107},
isbn = {9781450371278},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion},
pages = {1898–1905},
publisher = {Association for Computing Machinery},
address = {Cancún, Mexico},
series = {GECCO '20},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2019
Xu, Z; Dai, A M; Kemp, J; Metz, L
Learning an Adaptive Learning Rate Schedule Unpublished
2019, (textitarXiv:1909.09712 [cs.LG]).
@unpublished{xu-arxiv19b,
title = {Learning an Adaptive Learning Rate Schedule},
author = {Z Xu and A M Dai and J Kemp and L Metz},
url = {https://arxiv.org/abs/1909.09712},
year = {2019},
date = {2019-01-01},
urldate = {2019-01-01},
note = {textitarXiv:1909.09712 [cs.LG]},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
Sharma, Mudita; Komninos, Alexandros; nez, Manuel López-Ibá; Kazakov, Dimitar
Deep reinforcement learning based parameter control in differential evolution Proceedings Article
In: Auger, A; ü, St T (Ed.): Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'19), pp. 709–717, ACM, 2019.
@inproceedings{sharma-gecco19b,
title = {Deep reinforcement learning based parameter control in differential evolution},
author = {Mudita Sharma and Alexandros Komninos and Manuel López-Ibá{~n}ez and Dimitar Kazakov},
editor = {A Auger and St T ü},
url = {https://dl.acm.org/doi/10.1145/3321707.3321813},
year = {2019},
date = {2019-01-01},
urldate = {2019-01-01},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO'19)},
pages = {709--717},
publisher = {ACM},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Gomoluch, Paweł; Alrajeh, Dalal; Russo, Alessandra
Learning classical planning strategies with policy gradient Proceedings Article
In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 637–645, 2019.
@inproceedings{gomoluch-icaps19b,
title = {Learning classical planning strategies with policy gradient},
author = {Pawe{ł} Gomoluch and Dalal Alrajeh and Alessandra Russo},
url = {https://ojs.aaai.org/index.php/ICAPS/article/view/3531},
year = {2019},
date = {2019-01-01},
urldate = {2019-01-01},
booktitle = {Proceedings of the International Conference on Automated Planning and Scheduling},
volume = {29},
pages = {637--645},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2017
Ansótegui, Carlos; Pon, Josep; Sellmann, Meinolf; Tierney, Kevin
Reactive Dialectic Search Portfolios for MaxSAT Proceedings Article
In: S.Singh,; Markovitch, S (Ed.): Proceedings of the Conference on Artificial Intelligence (AAAI'17), pp. 765–772, AAAI Press, 2017.
@inproceedings{ansotegui-aaai17b,
title = {Reactive Dialectic Search Portfolios for MaxSAT},
author = {Carlos Ansótegui and Josep Pon and Meinolf Sellmann and Kevin Tierney},
editor = {S.Singh and S Markovitch},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/10660},
year = {2017},
date = {2017-01-01},
urldate = {2017-01-01},
booktitle = {Proceedings of the Conference on Artificial Intelligence (AAAI'17)},
pages = {765--772},
publisher = {AAAI Press},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Xu, Chang; Qin, Tao; Wang, Gang; Liu, Tie-Yan
Reinforcement learning for learning rate control Journal Article
In: arXiv preprint arXiv:1705.11159, 2017.
@article{xu-arxiv17b,
title = {Reinforcement learning for learning rate control},
author = {Chang Xu and Tao Qin and Gang Wang and Tie-Yan Liu},
url = {https://arxiv.org/abs/1705.11159},
year = {2017},
date = {2017-01-01},
urldate = {2017-01-01},
journal = {arXiv preprint arXiv:1705.11159},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Kadioglu, S; Sellmann, M; Wagner, M
Learning a reactive restart strategy to improve stochastic search Proceedings Article
In: International Conference on Learning and Intelligent Optimization, pp. 109–123, Springer 2017.
@inproceedings{kadioglu-lion17b,
title = {Learning a reactive restart strategy to improve stochastic search},
author = {S Kadioglu and M Sellmann and M Wagner},
url = {https://cs.adelaide.edu.au/~markus/pub/2017lion-reactiveRestarts.pdf},
year = {2017},
date = {2017-01-01},
urldate = {2017-01-01},
booktitle = {International Conference on Learning and Intelligent Optimization},
pages = {109--123},
organization = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2016
Adriaensen, S; Nowé, A
Towards a White Box Approach to Automated Algorithm Design Proceedings Article
In: Kambhampati, S (Ed.): Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'16), pp. 554–560, 2016.
@inproceedings{adriaensen-ijcai16b,
title = {Towards a White Box Approach to Automated Algorithm Design},
author = {S Adriaensen and A Nowé},
editor = {S Kambhampati},
url = {https://www.ijcai.org/Proceedings/16/Papers/085.pdf},
year = {2016},
date = {2016-01-01},
urldate = {2016-01-01},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'16)},
pages = {554--560},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hansen, Samantha
Using Deep Q-Learning to Control Optimization Hyperparameters Journal Article
In: arXiv preprint arXiv:1602.04062, 2016.
@article{hansen-arxiv16b,
title = {Using Deep Q-Learning to Control Optimization Hyperparameters},
author = {Samantha Hansen},
url = {https://arxiv.org/pdf/1602.04062.pdf},
year = {2016},
date = {2016-01-01},
urldate = {2016-01-01},
journal = {arXiv preprint arXiv:1602.04062},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Andersson, Martin; Bandaru, Sunith; Ng, Amos HC
Tuning of Multiple Parameter Sets in Evolutionary Algorithms Proceedings Article
In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 533–540, 2016.
@inproceedings{andersson-gecco16b,
title = {Tuning of Multiple Parameter Sets in Evolutionary Algorithms},
author = {Martin Andersson and Sunith Bandaru and Amos HC Ng},
url = {https://dl.acm.org/doi/10.1145/2908812.2908899},
year = {2016},
date = {2016-01-01},
urldate = {2016-01-01},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},
pages = {533--540},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Daniel, C; Taylor, J; Nowozin, S
Learning Step Size Controllers for Robust Neural Network Training Proceedings Article
In: Schuurmans, D; Wellman, M (Ed.): Proceedings of the Thirtieth National Conference on Artificial Intelligence (AAAI'16), AAAI Press, 2016.
@inproceedings{daniel-aaai16b,
title = {Learning Step Size Controllers for Robust Neural Network Training},
author = {C Daniel and J Taylor and S Nowozin},
editor = {D Schuurmans and M Wellman},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/10187},
year = {2016},
date = {2016-01-01},
urldate = {2016-01-01},
booktitle = {Proceedings of the Thirtieth National Conference on Artificial Intelligence (AAAI'16)},
publisher = {AAAI Press},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2014
López-Ibánez, Manuel; Stützle, Thomas
Automatically Improving the Anytime Behaviour of Optimisation Algorithms Journal Article
In: European Journal of Operational Research, vol. 235, no. 3, pp. 569–582, 2014.
@article{lopez-ejor14b,
title = {Automatically Improving the Anytime Behaviour of Optimisation Algorithms},
author = {Manuel López-Ibánez and Thomas Stützle},
url = {https://www.sciencedirect.com/science/article/abs/pii/S0377221713008667?via%3Dihub},
year = {2014},
date = {2014-01-01},
urldate = {2014-01-01},
journal = {European Journal of Operational Research},
volume = {235},
number = {3},
pages = {569--582},
publisher = {Elsevier},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2012
Battiti, R; Campigotto, P
An Investigation of Reinforcement Learning for Reactive Search Optimization Book Section
In: Hamadi, Y; Monfroy, E; Saubion, F (Ed.): Autonomous Search, pp. 131–160, Springer, 2012.
@incollection{battiti-as12ab,
title = {An Investigation of Reinforcement Learning for Reactive Search Optimization},
author = {R Battiti and P Campigotto},
editor = {Y Hamadi and E Monfroy and F Saubion},
url = {https://link.springer.com/chapter/10.1007%2F978-3-642-21434-9_6},
year = {2012},
date = {2012-01-01},
urldate = {2012-01-01},
booktitle = {Autonomous Search},
pages = {131--160},
publisher = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
2010
Xu, Yuehua; Fern, Alan; Yoon, Sungwook
Iterative Learning of Weighted Rule Sets for Greedy Search Conference
Proceedings of the 20th International Conference on Automated Planning and Scheduling (ICAPS'10), 2010.
@conference{xu-icaps10a,
title = {Iterative Learning of Weighted Rule Sets for Greedy Search},
author = {Yuehua Xu and Alan Fern and Sungwook Yoon},
url = {https://www.aaai.org/ocs/index.php/ICAPS/ICAPS10/paper/view/1444},
year = {2010},
date = {2010-04-20},
booktitle = {Proceedings of the 20th International Conference on Automated Planning and Scheduling (ICAPS'10)},
abstract = {Greedy search is commonly used in an attempt to generate solutions quickly at the expense of completeness and optimality. In this work, we consider learning sets of weighted action-selection rules for guiding greedy search with application to automated planning. We make two primary contributions over prior work on learning for greedy search. First, we introduce weighted sets of action-selection rules as a new form of control knowledge for greedy search. Prior work has shown the utility of action-selection rules for greedy search, but has treated the rules as hard constraints, resulting in brittleness. Our weighted rule sets allow multiple rules to vote, helping to improve robustness to noisy rules. Second, we give a new iterative learning algorithm for learning weighted rule sets based on RankBoost, an efficient boosting algorithm for ranking. Each iteration considers the actual performance of the current rule set and directs learning based on the observed search errors. This is in contrast to most prior approaches, which learn control knowledge independently of the search process. Our empirical results have shown significant promise for this approach in a number of domains.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Sakurai, Y; Takada, K; Kawabe, T; Tsuruta, S
A Method to Control Parameters of Evolutionary Algorithms by Using Reinforcement Learning Proceedings Article
In: é, K Y; Dipanda, A; Chbeir, R (Ed.): Proceedings of Sixth International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), pp. 74–79, IEEE Computer Society, 2010.
@inproceedings{sakurai-sitis10ab,
title = {A Method to Control Parameters of Evolutionary Algorithms by Using Reinforcement Learning},
author = {Y Sakurai and K Takada and T Kawabe and S Tsuruta},
editor = {K Y é and A Dipanda and R Chbeir},
url = {https://ieeexplore.ieee.org/document/5714532},
year = {2010},
date = {2010-01-01},
urldate = {2010-01-01},
booktitle = {Proceedings of Sixth International Conference on Signal-Image Technology and Internet-Based Systems (SITIS)},
pages = {74--79},
publisher = {IEEE Computer Society},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Fialho, Alvaro; Costa, Luis Da; Schoenauer, Marc; Sebag, Michele
Analyzing Bandit-Based Adaptive Operator Selection Mechanisms Journal Article
In: Annals of Mathematics and Artificial Intelligence, vol. 60, no. 1, pp. 25–64, 2010.
@article{fialho-amai10b,
title = {Analyzing Bandit-Based Adaptive Operator Selection Mechanisms},
author = {Alvaro Fialho and Luis Da Costa and Marc Schoenauer and Michele Sebag},
url = {https://link.springer.com/article/10.1007%2Fs10472-010-9213-y},
year = {2010},
date = {2010-01-01},
urldate = {2010-01-01},
journal = {Annals of Mathematics and Artificial Intelligence},
volume = {60},
number = {1},
pages = {25--64},
publisher = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2008
Aine, Sandip; Kumar, Rajeev; Chakrabarti, P. P.
Adaptive parameter control of evolutionary algorithms to improve quality-time trade-off Journal Article
In: Applied Soft Computing, vol. 9, no. 2, pp. 527-540, 2008.
@article{aine-jasoc08a,
title = {Adaptive parameter control of evolutionary algorithms to improve quality-time trade-off},
author = {Sandip Aine and Rajeev Kumar and P.P. Chakrabarti},
url = {https://doi.org/10.1016/j.asoc.2008.07.001},
year = {2008},
date = {2008-08-05},
urldate = {2008-08-05},
journal = {Applied Soft Computing},
volume = {9},
number = {2},
pages = {527-540},
abstract = {Parameter control of evolutionary algorithms (EAs) poses special challenges as EA uses a population and requires many parameters to be controlled for an effective search. Quality improvement is dependent on several factors, such as, fitness estimation, population diversity and convergence rate. A widely practiced approach to identify a good set of parameters for a particular class of problem is through experimentation. Ideally, the parameter selection should depend on the resource availability, and thus, a rigid choice may not be suitable. In this work, we propose an automated framework for parameter selection, which can adapt according to the constraints specified. To condition the parameter choice through resource constraint/utilization, we consider two typical scenarios, one where maximum available run-time is pre-specified and the other in which a utility function modeling the quality-time trade-off is used instead of a rigid deadline. We present static and dynamic parameter selection strategies based on a probabilistic profiling method. Experiments performed with traveling salesman problem (TSP) and standard cell placement problem show that an informed adaptive parameter control mechanism can yield better results than a static selection.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2002
Pettinger, J; Everson, R
Controlling Genetic Algorithms with Reinforcement Learning Proceedings Article
In: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, pp. 692–692, 2002.
@inproceedings{pettinger-gecco02b,
title = {Controlling Genetic Algorithms with Reinforcement Learning},
author = {J Pettinger and R Everson},
url = {https://dl.acm.org/doi/10.5555/2955491.2955607},
year = {2002},
date = {2002-01-01},
urldate = {2002-01-01},
booktitle = {Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation},
pages = {692--692},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2001
Lagoudakis, M; Littman, M
Learning to Select Branching Rules in the DPLL Procedure for Satisfiability Journal Article
In: Electronic Notes in Discrete Mathematics, vol. 9, pp. 344–359, 2001.
@article{lagoudakis-endm01ab,
title = {Learning to Select Branching Rules in the DPLL Procedure for Satisfiability},
author = {M Lagoudakis and M Littman},
url = {https://www.sciencedirect.com/science/article/abs/pii/S1571065304003324?via%3Dihub},
year = {2001},
date = {2001-01-01},
urldate = {2001-01-01},
journal = {Electronic Notes in Discrete Mathematics},
volume = {9},
pages = {344--359},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2000
Lagoudakis, Michail G.; Littman, Michael L.
Algorithm Selection using Reinforcement Learning Conference
Proceedings of the 17th International Conference on Machine Learning (ICML 2000), 2000.
@conference{lagoudakis-icml2000a,
title = {Algorithm Selection using Reinforcement Learning},
author = { Michail G. Lagoudakis and Michael L. Littman },
url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.472.7494},
year = {2000},
date = {2000-07-01},
booktitle = {Proceedings of the 17th International Conference on Machine Learning (ICML 2000)},
abstract = {Many computational problems can be solved by multiple algorithms, with different algorithms fastest for different problem sizes, input distributions, and hardware characteristics. We consider the problem of algorithm selection: dynamically choose an algorithm to attack an in-stance of a problem with the goal of minimiz-ing the overall execution time. We formulate the problem as a kind of Markov decision process (MDP), and use ideas from reinforcement learning to solve it. This paper introduces a kind of MDP that models the algorithm selection problem by allowing multiple state transitions. The well known Q-learning algorithm is adapted for this case in a way that combines both Monte-Carlo and Temporal Difference methods. Also, this work uses, and extends in a way to control problems, the Least-Squares Temporal Difference algorithm (LSTD(0)) of Boyan. The experimental study focuses on the classic problems of order statistic selection and sorting. The encouraging results reveal the potential of applying learning methods to traditional computational problems},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}