Dynamic Multi-Armed Bandits and Extreme Value-Based Rewards for Adaptive Operator Selection in Evolutionary Algorithms

Fialho, Álvaro; Costa, Luís F.; Schoenauer, Marc; Sebag, Michèle

doi:10.1007/978-3-642-11169-3_13

Cited by 43 publications

(62 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the one hand, the average reward of every operator tends to decrease as evolution goes on (diminishing returns). In the One-Max problem, for instance, the best mutation operator is the 5-bit mutation when the population is far away from the optimum; but the reward of the 5-bit mutation gracefully decreases as the population goes to more fit regions, and at some point the 3-bit mutation operator catches up (more details on this can be found in [15]). This suggests that when a good operator has been identified, there is no need for exploration as long as this operator remains sufficiently good.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Analyzing bandit-based adaptive operator selection mechanisms

Fialho

Costa

Schoenauer

et al. 2010

Ann Math Artif Intell

Self Cite

135

123

View full text Add to dashboard Cite

Several techniques have been proposed to tackle the Adaptive Operator Selection (AOS) issue in Evolutionary Algorithms. Some recent proposals are based on the Multi-Armed Bandit (MAB) paradigm: each operator is viewed as one arm of a MAB problem, and the rewards are mainly based on the fitness improvement brought by the corresponding operator to the individual it is applied to. However, the AOS problem is dynamic, whereas standard MAB algorithms are known to optimally solve the exploitation versus exploration trade-off in static settings. An original dynamic variant of the standard MAB Upper Confidence Bound algorithm is proposed here, using a sliding time window to compute both its exploitation and exploration terms. In order to perform sound comparisons between AOS algorithms, artificial scenarios have been proposed in the literature. They are extended here toward smoother transitions between different reward settings. The resulting original testbed also includes a real evolutionary algorithm that is applied to the well-known Royal Road problem. It is used here to perform a thorough analysis of the behavior of AOS algorithms, to assess their sensitivity with respect to their own hyper-parameters, and to propose a sound comparison of their performances.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Fed by the Extreme value of fitness improvements, it was later assessed on some EA binary benchmark problems [14][15][16], and also on some SAT instances [31].…”

Section: Multi-armed Banditmentioning

confidence: 99%

Analyzing bandit-based adaptive operator selection mechanisms

Fialho

Costa

Schoenauer

et al. 2010

Ann Math Artif Intell

Self Cite

135

123

View full text Add to dashboard Cite

show abstract

“…MAENS is a memetic algorithm which makes use of a crossover operator, a local search combining three local move operators and a novel long move operator called MergeSplit, and a ranking selection procedure called stochastic ranking (SR) (Runarsson and Yao 2000). The major differences between MAENS and MAENS* are: (a) MAENS uses a single crossover operator, whereas MAENS* uses a set of crossover operators, (b) a dynamic MAB mechanism (dMAB) (Fialho et al 2009) is adopted as an AOS rule, (c) a novel CA mechanism assigns a reward to the operators which is proportional to the number of solutions generated by each operator that "survived" the ranking phase, named proportional reward, (d) the stochastic ranking is improved considering also the diversity of the solutions (dSR) using a (e) novel diversity measure for the CARP search space.…”

Section: Maens*mentioning

confidence: 99%

“…The dMAB (Fialho et al 2009) approach, adopted in this work, combines the UCB1 algorithm (Auer et al 2002) with the Page-Hinckley (PH) statistical test (Hinkley 1971) to detect changes in the environment. When the PH test is triggered, the MAB system is restarted and the information gathered in the previous generations is discarded.…”

Section: Maens*mentioning

confidence: 99%

“…A different strategy evaluating both fitness and diversity of the offspring was proposed in Maturana and Saubion (2008). The reward has been mostly considered as the value assessed during the last evaluation (instantaneous reward), as the average reward over a window of the last N evaluations (average reward), and as the biggest improvement achieved over a window of the last N evaluations (extreme reward) (Fialho et al 2009). …”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic selection of evolutionary operators based on online learning and fitness landscape analysis

et al. 2016

View full text Add to dashboard Cite

Self-adaptive mechanisms for the identification of the most suitable variation operator in evolutionary algorithms rely almost exclusively on the measurement of the fitness of the offspring, which may not be sufficient to assess the optimality of an operator (e.g., in a landscape with an high degree of neutrality). This paper proposes a novel adaptive operator selection mechanism which uses a set of four fitness landscape analysis techniques and an online learning algorithm, dynamic weighted majority, to provide more detailed information about the search space to better determine the most suitable crossover operator. Experimental analysis on the capacitated arc routing problem has demonstrated that different crossover operators behave differently during the search process, and selecting the proper one adaptively can lead to more promising results.

show abstract

Configuration of a Dynamic MOLS Algorithm for Bi-objective Flowshop Scheduling

Pageau¹,

Blot²,

Hoos

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In this work, we propose a dynamic multi-objective local search (MOLS) algorithm whose parameters are modified while it is running and a protocol for automatically configuring this algorithm. Our approach applies automated configuration to a static pipeline that sequentially runs multiple configurations of the MOLS algorithm. In a series of experiments for well-known benchmark instances of the bi-objective permutation flowshop scheduling problem, we show that our dynamic approach produces substantially better results than static MOLS, and that longer pipeline (with a higher number of parameters) outperform shorter ones.

show abstract

Dynamic Multi-Armed Bandits and Extreme Value-Based Rewards for Adaptive Operator Selection in Evolutionary Algorithms

Cited by 43 publications

References 24 publications

Analyzing bandit-based adaptive operator selection mechanisms

Analyzing bandit-based adaptive operator selection mechanisms

Dynamic selection of evolutionary operators based on online learning and fitness landscape analysis

Configuration of a Dynamic MOLS Algorithm for Bi-objective Flowshop Scheduling

Contact Info

Product

Resources

About