The full set of models used in the experiments can be found here: models.zip - we expect to produce a repeatability package shortly.

Notice that the MILP-models are omitted per request of an industrial partner

We report the median and 25% percentile expected cost of the strategies over 50 repeated experiments.

Percentages in parantheses mark the change from the original implementation to an optimized implementation (the numbers reported).

Memory is reported in MB and time in seconds. Both are mean values.

We denote by D"X" the method of David et. al. where X denotes the value of the option "--learning-method" (Q-learning and M-learning are "--learning-method 4" and "--learning-method 6" respectively).

We note that the **Highway** experiments are not representative in terms of running-time and memory as the number of sample used for learning depends on the continued quality of the learned strategy as it develops (better strategies implies more samples).