The various @a("methods", href:= DocumentationPages.explore.file) available in OpenMOLE make an extensive use of genetic algorithms (GA).
For instance, it is the case for @a("the model calibration method", href := DocumentationPages.calibration.file) (which is an optimization problem), or the search for output diversity with the @a("PSE", href:= DocumentationPages.pse.file) method (which boils down to a GA with a novelty incentive).
The various @aa("methods", href := explore.file) available in OpenMOLE make an extensive use of genetic algorithms (GA).
For instance, it is the case for the @aa("model calibration method", href := calibration.file), which is an optimization problem, or the search for output diversity with the @a("PSE method", href:= pse.file), which boils down to a GA with a novelty incentive.
@br
GAs can be smartly distributed on grid environments using an @a("island scheme", href:= DocumentationPages.island.file), and are able to deal with @a("stochastic", href:=DocumentationPages.stochasticityManagement.file) models.
GAs can be smartly distributed on grid environments using an @aa("island scheme", href := island.file), and are able to deal with @aa("stochastic models", href:=stochasticityManagement.file).
@h2{About Calibration and GA }
OpenMOLE provides advanced methods to help you calibrate your model.
These methods automatically generate workflows to explore the parameter space of your model towards the best parameter set, according to a previously defined @b{criterion} or @b{objective}.
These methods automatically generate workflows to explore the parameter space of your model towards the "best" parameter set, according to a previously defined @b{criterion} or @b{objective}.
This is commonly addressed in the literature as a calibration, or optimization, problem.
@br
The different calibration methods in OpenMOLE use GAs to explore the parameter space of a simulation model, looking for parameter sets that will produce outputs reaching one or several given @b{objectives}.
The different calibration methods in OpenMOLE use GAs to explore the parameter space of a simulation model, looking for parameter sets that will produce outputs reaching one or several given objectives.
@b{Objectives functions}, also called @b{fitness functions}, compute quantities from the model outputs that have to be minimized or maximized.
They are a quantification of the @i{optimal model output} you are looking for.
@br@br
@br
A common optimization problem is data fitting.
In this particular case, the objective function could compute the distance between simulation results and data, a classical example is the Squared Error function.
@br
In this particular case, the objective function could compute the distance between simulation results and data points, a classical example is the Squared Error function.
If you want your model to reproduce several characteristics (sometimes called stylised facts), you need several objectives, each of them quantifying the similarity between your model outputs and the characteristics you want to reproduce.
@br@br
@br
To calibrate your model, you need to define:
@ul
...
...
@@ -68,15 +69,12 @@ To calibrate your model, you need to define:
@h2{Dummy Model Optimization Example}
This workflow optimizes a dummy model using the generational NSGA II multi-objective algorithm.
You can replace the instances of @i{model} by your own model, and adapt the variation range of the input variables.
If you are not familiar with parameter tuning using GA, you should first consult the @aa("tutorial", href := DocumentationPages.netLogoGA.file) explaining how to calibrate a NetLogo model with a GA.
@br@br
This workflow optimizes a dummy model using the generational NSGA-II multi-objective algorithm.
You can replace the instances of @code{model} by your own model, and adapt the variation range of the input variables.
If you are not familiar with parameter tuning using GA, you should first consult the @aa("tutorial", href := netLogoGA.file) explaining how to calibrate a NetLogo model with a GA.
@hl.openmole(s"""
$model
// Construction of the workflow orchestrating the genetic algorithm
// genome is the inputs prototype and their variation ranges
// objective sets the objectives to minimize
...
...
@@ -98,15 +96,12 @@ val evolution =
// Construction of the complete workflow with the execution environment, and the hook.
// A hook is attached to save the population of solutions to workDirectory /evolution on the local machine running OpenMOLE
// Here the generated workflow will run using 4 threads of the local machine.
evolution hook (workDirectory / "evolution") on LocalEnvironment(4)""", name = "nsga2 example")
evolution hook (workDirectory / "evolution") on LocalEnvironment(4)
""", name = "nsga2 example")
@br
Notice that the objectives are given as a sequence of model outputs variables to @b{minimize}.
Note that the objectives are given as a sequence of model outputs variables to be @b{minimized}.
So if you want to reach specific target values, like Pi and 42, you can use the @code{delta} keyword:
@br@br
@hl.openmole(s"""
$model
...
...
@@ -121,16 +116,12 @@ NSGA2Evolution(
termination = 100
) hook (workDirectory / "evolution")""", name = "nsga2 delta example")
@br
NB: in this case the results in the saved file will be the difference between the outputs of the model and your objectives.
@br@br
@br
Obviously, maximization problems are performed by taking the opposite of variables as objectives.
You may use a @code{-} keyword to minimise the opposite of o1 (i.e. maximize o1).
@br@br
You may use a @code{-} keyword to minimise the opposite of o1 (@i{i.e.} maximize o1).
As an output, the method produce a population file for each generation in the directory provided to the hook, named with the generation number as @code{populationN.csv}.
As an output, the method produces a population file for each generation, in the directory provided to the hook, named with the generation number as @code{populationN.csv}.
Each csv file contains a column with the generation number, the values of parameters, the median value of the objectives at each point, and in the variable @code{evolution$samples}, the number of runs of the model used for the evaluation (in the case of stochastic models).
@h2{Real world Example}
This @a("tutorial", href:=DocumentationPages.netLogoGA.file) exposes how to use Genetic Algorithms to perform optimization on a NetLogo model.
This @aa("tutorial", href:=netLogoGA.file) exposes how to use Genetic Algorithms to perform optimization on a NetLogo model.
val model = ScalaTask("val o1 = x; val o2 = y") set (
inputs += (x, y),
outputs += (o1, o2)
)
"""
@h2{Distribution scheme}
For distributed environments, the island distribution scheme of evolutionary algorithms is especially well adapted. Islands of population evolve for a while on a remote node. When an island is finished, its final population is merged back into a global archive. A new island is then generated until the termination criterion, @i{i.e.} the max total number of individual evaluation has been reached.
For distributed environments, the island distribution scheme of evolutionary algorithms is especially well adapted.
Islands of population evolve for a while on a remote node before being merged back into the global population.
A new island is then generated until the termination criterion, @i{i.e.} the max total number of individual evaluation, has been reached.
@br@br
@br
The island scheme is enabled using the by @code{Island} syntax. For instance:
The island scheme is enabled using the @code{by Island} syntax.
For instance:
@hl.openmole("""
// Generate a workflow that orchestrates 100 concurrent islands.
...
...
@@ -41,5 +44,5 @@ The island scheme is enabled using the by @code{Island} syntax. For instance:
// Construction of the complete mole with the execution environment, and the hook.
// Here the generated workflow will run using 4 threads of the local machine.
(evolution on LocalEnvironment(4))""", header = model)
@h2{The problem of stochasticity in model calibration}
GA don’t cope well with stochasticity.
This is especially the case for algorithms with evolution strategies of type "µ + λ" (such as NSGA2, the GA used in OpenMOLE) which preserves the best solutions (individuals) from a generation to another.
GAs don’t cope well with stochasticity.
This is especially the case for algorithms with evolution strategies of type "µ + λ" (such as NSGA-II, the GA used in OpenMOLE), which preserve the best solutions (individuals) from a generation to another.
In that kind of optimization, the quality of a solution is only @b{estimated}.
Since it is subject to variation from a replication to another, the quality can either be overvalued or undervalued @i{i.e.} estimated at a significantly greater or lower value than the one obtained for an infinite number of replications.
Since it is subject to variations from a replication to another, the quality can either be overvalued or undervalued, @i{i.e.} estimated at a significantly greater or lower value than the one obtained for an infinite number of replications.
@br@br
@br
Undervalued solutions are not that problematic.
They might be discarded instead of being kept, but the algorithm has a chance to retry a very similar solution later on.
On the other hand, the overvalued solutions are very problematic: the GA will keep them in the population of good solutions because they have been (wrongly) evaluated as such, and will generate new offspring solutions from them.
@br@br
This behaviour can greatly slow down the convergence of the calibration algorithm and even make it converge toward sets of parameters producing very unstable dynamics, very likely to produce false good solutions.
This behaviour can greatly slow down the convergence of the calibration algorithm, and even make it converge toward sets of parameters producing very unstable dynamics, very likely to produce false good solutions.
@h3{Existing solutions}
To reduce the influence of the fitness fluctuation, the simplest approach is "resampling".
To reduce the influence of the fitness fluctuations, the simplest approach is "resampling".
It consists in replicating the fitness evaluation of individuals.
The computed "quality" of an individual is then an estimation (@i{e.g.} mean or median) based on a @i{finite} number of replications of the fitness computation.
@br@br
@br
This number is set to a compromise between the computation time taken to evaluate one set of parameters (an individual) and an acceptable level of noise for the computed quality.
This number is set to a compromise between the computation time taken to evaluate one set of parameters (an individual), and an acceptable level of noise for the computed quality.
Still, any number of replications, even very high, implies that some solutions are overvalued with a non negligible probability, given that the fitness function is evaluated millions of times.
@br@br
@br
Other @i{ad hoc} methods of the literature are based on some assumptions that are hard or impossible to verify (such as the invariance of the noise distribution over the fitness space) and add parameters to the algorithm that are difficult to tune finely.
See @aa("this paper", href:=Resource.literature.rakshit2016.file) for an extensive review of noisy fitness management in Evolutionary Computation.
...
...
@@ -83,39 +84,36 @@ See @aa("this paper", href:=Resource.literature.rakshit2016.file) for an extensi
@h3{OpenMOLE's solution}
To overcome these limitations, OpenMOLE uses an auto-adaptive strategy called "stochastic resampling".
@br
The idea is to evaluate individuals with only one replication and, at the next generation, to keep and re-inject a sample of the individuals of the current population in the newly created population.
@br@br
@br
For instance, for each generation, 90% of the individual offspring genomes are @b{new genomes} obtained by classical mutation/crossover steps of genetic algorithms, and 10% of the offspring genomes are drawn randomly from the current population @i{i.e.} @b{already evaluated genomes}, for which the algorithm computes one additional replication.
Replicated evaluations are stored for each individual in a vector of replications.
The global fitness of an individual is computed using (for instance) the median of each fitness value stored in the replication vector.
@br@br
@br
This evolution strategy intends to have the best individuals survive several generations and therefore be the most likely to be resampled, since each individual has a fixed chance of being resampled at each generation.
This evolution strategy intends to have the best individuals survive several generations and therefore be the most likely to be resampled, since each individual has a fixed chance of being resampled for each generation.
However, this fixed probability of resampling is not sufficient alone, since well evaluated solutions are likely to be replaced by overvalued solutions (new solutions with a few "lucky" replications).
@br
So as to compensate this bias, we add a technical objective to NSGA2: maximize the number of evaluations of a solution.
@br@br
The optimization problem of model calibration becomes a @b{multi-objective optimisation problem} (if it was not already !): the algorithm has to optimize the objectives of the model @b{and} the technical objective as well.
Therefore, the number of replications is taken into account in the Pareto compromise elitism of NSGA2: solutions with many replications are kept, even if some solutions are better on the other objectives but have been evaluated fewer times.
@br
By doing so, we let the multi-objective optimization algorithm handle the compromise between the quality of the solutions and their robustness.
Therefore, the number of replications is taken into account in the Pareto compromise elitism of NSGA-II: solutions with many replications are kept, even if some solutions are better on the other objectives but have been evaluated fewer times.
By doing so, we let the multi-objective optimization algorithm handle the compromise between the quality of the solutions, and their robustness.
@br@br
@br
This method adds only two new parameters:
@ol{
@li{@hl.code("""reevaluate"""), the probability of resampling an individual at each generation}
@li{@hl.code("""sample"""), the maximum number of evaluation values for an individual, to limit the memory used to store an individual.}
@li{@code{reevaluate}, the probability of resampling an individual at each generation}
@li{@code{sample}, the maximum number of evaluation values for an individual, to limit the memory used to store an individual.}
}
See the line @hl.code("""stochastic = Stochastic(seed = mySeed, reevaluate = 0.2, sample = 100)""") in the example.
See the line @code{stochastic = Stochastic(seed = mySeed, reevaluate = 0.2, sample = 100)} in the example.
@br@br
@br
This method has been implemented in the library for evolutionary computing: MGO, and is currently being benchmarked against other stochasticity management methods (see @aa("the repository", href:=shared.link.repo.mgobench)).
This method has been implemented in the library for evolutionary computing: MGO, and is currently being benchmarked against other stochasticity management methods (see @aa("the repository", href:=shared.link.repo.mgobench)).