In CMA-ES, the step size controls how fast or slow a population traverses through a search space. Large steps allow you to quickly skip over uninteresting areas (exploration), whereas small steps allow a more focused traversal of interesting areas (exploitation). Handcrafted heuristics usually trade off small and large steps given some measure of progress. Directly learning in which situation which step-size is preferable would allow us to act more flexible than a hand-crafted heuristic could. To learn such dynamic configuration policies, one approach is dynamic algorithm configuration through the use of reinforcement learning.
Through the use of DAC we could learn policies that are capable of generalizing beyond the training setting. DAC policies could transfer the learned step-size adaptation to functions of higher dimensions, over longer optimization trajectories of CMA-ES as well as other function classes.
References
- G. Shala and A. Biedenkapp and N. Awad and S. Adriaensen and M. Lindauer and F. Hutter
Learning Step-Size Adaptation in CMA-ES
In: Proceedings of the Sixteenth International Conference on Parallel Problem Solving from Nature (PPSN’20)
Link to source code and data as well as trained policies and accompanying blog post.
Link to the video poster presentation at PPSN’20.