Experiment 20230120-MTOA

Experiment design

20230120-MTOA

Date: 2023-01-20 (Andreas Kalaitzakis)

Hypotheses

Agents will benefit from undertaking a limited set of tasks.

Experimental setting

18 agents; 3 tasks; 6 features (2 independent features per task); 4 decision classes; 20 runs; 80000 games

Each agent initially trains on all tasks. The agent then carries out a limited set of tasks. When the agents disagree the following take place:

(a) The agent with the lower income will adapt its knowledge accordingly. If memory capacity limit is attained, the agent will try to generalize when this possible. For generalization to be an option, the decision for all undertaken tasks must be the same. Generalization takes place in a recursive way, i.e., after the first generalization, if generalization is possible at the previously higher level, we re-merge the now leaf nodes.

(b) The agent with the highest income will decide for both agents

Agents undertake 1-3 tasks having a limited memory, enough for learning 100% only one single task.

Variables independent variables: ['maxAdaptingRank']

dependent variables: ['avg_min_accuracy', 'avg_accuracy', 'avg_max_accuracy', 'success_rate', 'correct_decision_rate', 'delegation_rate']

Experiment

Date: 2023-01-20 (Andreas Kalaitzakis)

Computer: Dell Precision-5540 (CC: 12 * Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz with 16GB RAM OS: Linux 5.4.0-92-generic)

Duration : 120 minutes

Lazy lavender hash: ceb1c5d1ca8109373d293b687fc55953fce5241d

Parameter file: params.sh

Executed command (script.sh):

#!/bin/bash

. params.sh

CURRDIR=$(pwd)
OUTPUT=${CURRDIR}/${DIRPREF}
# cd ${LLPATH}
cd lazylav
# this sample runs ExperimentalPlan. It can be replaced with Monitor if parameters are not varied.
bash scripts/runexp.sh -p ${CURRDIR} -d ${DIRPREF} java -Dlog.level=INFO -cp ${JPATH} fr.inria.exmo.lazylavender.engine.ExperimentalPlan -Dexperiment=fr.inria.exmo.lazylavender.decisiontaking.multitask.SelectiveAcceptanceSpecializationExperiment ${OPT} -DresultDir=${OUTPUT}

Analysis

Raw data

Full results can be found at:

Zenodo DOI

Table 1: Final success rate values

Table 1 consists of the final achieved average success rate values, i.e., the average success rate after the last iteration. Each column corresponds to a different number of adapting tasks, while each row corresponds to a different run, for the same size of scope.

Out[6]:
0 1 2
0 0.660125 0.476875 0.359300
1 0.808213 0.451375 0.474712
2 0.804300 0.418287 0.354988
3 0.785225 0.431125 0.388613
4 0.793063 0.443138 0.424563
5 0.929250 0.412175 0.464288
6 0.795963 0.443325 0.364575
7 0.706875 0.428725 0.421013
8 0.875212 0.383975 0.363038
9 0.686612 0.489150 0.383250
10 0.747513 0.428362 0.360563
11 0.780050 0.494450 0.354262
12 0.836850 0.443763 0.427850
13 0.922512 0.539725 0.399962
14 0.788687 0.467525 0.390950
15 0.936150 0.463612 0.393012
16 0.788650 0.453725 0.384325
17 0.861300 0.420762 0.405788
18 0.772125 0.414712 0.346662
19 0.812338 0.408888 0.409300

Table 2: Final average worst task accuracy values

Table 2 consists of the final average minimum accuracy values with respect to the worst task, i.e., the accuracy after the last iteration for the task for which the agent scores the lowest accuracies. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same size of scope.

Out[7]:
0 1 2
0 0.246528 0.246528 0.246528
1 0.253472 0.243056 0.251736
2 0.263889 0.239583 0.251736
3 0.246528 0.241319 0.246528
4 0.243056 0.258681 0.258681
5 0.246528 0.234375 0.217014
6 0.232639 0.223958 0.244792
7 0.222222 0.234375 0.243056
8 0.250000 0.239583 0.225694
9 0.243056 0.243056 0.256944
10 0.244792 0.229167 0.250000
11 0.222222 0.237847 0.229167
12 0.243056 0.229167 0.255208
13 0.250000 0.250000 0.255208
14 0.243056 0.236111 0.232639
15 0.239583 0.223958 0.248264
16 0.236111 0.243056 0.237847
17 0.236111 0.246528 0.269097
18 0.236111 0.255208 0.253472
19 0.243056 0.230903 0.256944

Table 3: Final average accuracy values

Table 3 consists of the final achieved average ontology accuracy with respect to all tasks, i.e., the accuracy after the last iteration averaged on all tasks and agents. Each column corresponds to a different number of adapting tasks, while each row corresponds to a different run, for the same size of scope.

Out[8]:
0 1 2
0 0.376157 0.395255 0.392361
1 0.362269 0.403356 0.407986
2 0.389468 0.350694 0.389468
3 0.394676 0.366898 0.380787
4 0.385417 0.385417 0.394097
5 0.392361 0.349537 0.355903
6 0.368634 0.382523 0.386574
7 0.394097 0.358218 0.391782
8 0.394676 0.380787 0.364583
9 0.395833 0.378472 0.401620
10 0.387153 0.375579 0.371528
11 0.422454 0.373264 0.367477
12 0.359954 0.357639 0.407986
13 0.392361 0.379630 0.374421
14 0.398148 0.376157 0.367477
15 0.378472 0.377315 0.395833
16 0.332176 0.396991 0.385417
17 0.396991 0.385417 0.379051
18 0.410880 0.380208 0.381366
19 0.372685 0.371528 0.379051

Table 4: Final average best task accuracy values

Table 4 consists of the final average best task accuracy values with respect to the best task, i.e., the accuracy after the last iteration for the task for which the agent score the highest accuracies. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same size of scope.

Out[9]:
0 1 2
0 0.576389 0.618056 0.565972
1 0.527778 0.611111 0.565972
2 0.531250 0.506944 0.562500
3 0.600694 0.531250 0.562500
4 0.597222 0.538194 0.590278
5 0.555556 0.496528 0.529514
6 0.548611 0.597222 0.541667
7 0.579861 0.515625 0.607639
8 0.555556 0.597222 0.546875
9 0.628472 0.548611 0.583333
10 0.586806 0.586806 0.524306
11 0.666667 0.543403 0.519097
12 0.531250 0.519097 0.593750
13 0.552083 0.534722 0.543403
14 0.576389 0.583333 0.527778
15 0.493056 0.548611 0.572917
16 0.440972 0.626736 0.579861
17 0.625000 0.576389 0.517361
18 0.621528 0.553819 0.552083
19 0.559028 0.545139 0.539931

Table 5: Final average scope accuracy values

Table 5 consists of the final average scope accuracy values. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[10]:
0 1 2
0 0.517361 0.426215 0.392361
1 0.500000 0.436632 0.407986
2 0.501736 0.383681 0.389468
3 0.569444 0.375000 0.380787
4 0.572917 0.404514 0.394097
5 0.548611 0.366319 0.355903
6 0.519097 0.424479 0.386574
7 0.550347 0.374132 0.391782
8 0.538194 0.411458 0.364583
9 0.618056 0.425347 0.401620
10 0.536458 0.410590 0.371528
11 0.642361 0.407118 0.367477
12 0.531250 0.394097 0.407986
13 0.552083 0.411458 0.374421
14 0.569444 0.427083 0.367477
15 0.486111 0.388889 0.395833
16 0.406250 0.442708 0.385417
17 0.607639 0.418403 0.379051
18 0.621528 0.400174 0.381366
19 0.520833 0.413194 0.379051

Table 6: Final correct decision rate values

Table 6 consists of the final achieved correct decision rate values, i.e., the average correct decision rate after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same size of scope.

Out[11]:
0 1 2
0 0.602525 0.505487 0.477875
1 0.551075 0.534450 0.488425
2 0.552250 0.475363 0.473225
3 0.632100 0.451263 0.451400
4 0.636650 0.481425 0.479175
5 0.576950 0.440650 0.438262
6 0.571462 0.522563 0.464687
7 0.621500 0.461275 0.478737
8 0.571275 0.508550 0.450975
9 0.696237 0.518338 0.488938
10 0.611300 0.517962 0.441625
11 0.711612 0.485388 0.444263
12 0.556863 0.485050 0.487312
13 0.573013 0.488463 0.450087
14 0.627750 0.516312 0.450375
15 0.498412 0.470712 0.482687
16 0.469037 0.536075 0.469650
17 0.660300 0.507188 0.447738
18 0.680512 0.475237 0.464325
19 0.570300 0.504413 0.455475

Table 7: Final delegation rate values

Table 7 consists of the final achieved delegation rate values, i.e., the average delegation rate after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[12]:
0 1 2
0 0.339875 0.523125 0.640700
1 0.191787 0.548625 0.525288
2 0.195700 0.581712 0.645012
3 0.214775 0.568875 0.611387
4 0.206937 0.556863 0.575438
5 0.070750 0.587825 0.535713
6 0.204038 0.556675 0.635425
7 0.293125 0.571275 0.578987
8 0.124787 0.616025 0.636962
9 0.313387 0.510850 0.616750
10 0.252487 0.571638 0.639437
11 0.219950 0.505550 0.645737
12 0.163150 0.556238 0.572150
13 0.077488 0.460275 0.600038
14 0.211312 0.532475 0.609050
15 0.063850 0.536388 0.606988
16 0.211350 0.546275 0.615675
17 0.138700 0.579237 0.594213
18 0.227875 0.585287 0.653338
19 0.187663 0.591113 0.590700

Table 8: Final population's total compensation values

Table 8 consists of the final achieved total compensation values, i.e., the average total compensation of a population of agents after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[13]:
0 1 2
0 964040.0 808780.0 764600.0
1 881720.0 855120.0 781480.0
2 883600.0 760580.0 757160.0
3 1011360.0 722020.0 722240.0
4 1018640.0 770280.0 766680.0
5 923120.0 705040.0 701220.0
6 914340.0 836100.0 743500.0
7 994400.0 738040.0 765980.0
8 914040.0 813680.0 721560.0
9 1113980.0 829340.0 782300.0
10 978080.0 828740.0 706600.0
11 1138580.0 776620.0 710820.0
12 890980.0 776080.0 779700.0
13 916820.0 781540.0 720140.0
14 1004400.0 826100.0 720600.0
15 797460.0 753140.0 772300.0
16 750460.0 857720.0 751440.0
17 1056480.0 811500.0 716380.0
18 1088820.0 760380.0 742920.0
19 912480.0 807060.0 728760.0

Table 9: Final P90/10 decile ratio of average agent compensation values

Table 9 consists of the final P90/10 decile ratio of average agent compensation values, after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[14]:
0 1 2
0 0.215354 0.398070 0.572523
1 0.268938 0.260332 0.493932
2 0.328246 0.261966 0.409956
3 0.371674 0.298938 0.385745
4 0.341650 0.505780 0.477201
5 0.488399 0.270855 0.553469
6 0.332720 0.335669 0.488647
7 0.294254 0.282589 0.467322
8 0.445312 0.259347 0.401654
9 0.323990 0.397664 0.637325
10 0.201671 0.224958 0.455328
11 0.449689 0.415556 0.624671
12 0.432298 0.234973 0.543225
13 0.524888 0.376320 0.472098
14 0.463215 0.315820 0.415965
15 0.755494 0.254365 0.446329
16 0.244409 0.236407 0.470583
17 0.438891 0.272911 0.635369
18 0.383494 0.336800 0.513614
19 0.440307 0.291924 0.505743

Table 10: Final P90/10 decile ratio of average scope accuracy values

Table 10 consists of the final P90/10 decile ratio of average scope accuracy values, after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[15]:
0 1 2
0 0.333333 0.527778 0.681818
1 0.375000 0.447368 0.638298
2 0.437500 0.441176 0.555556
3 0.458333 0.477612 0.543478
4 0.416667 0.647059 0.622222
5 0.583333 0.455882 0.662500
6 0.395833 0.486486 0.600000
7 0.437500 0.444444 0.572917
8 0.541667 0.400000 0.489362
9 0.428571 0.542857 0.750000
10 0.270833 0.394737 0.613636
11 0.583333 0.544118 0.738095
12 0.500000 0.375000 0.663043
13 0.625000 0.527778 0.642857
14 0.541667 0.472222 0.576087
15 0.812500 0.315789 0.622222
16 0.125000 0.392857 0.588889
17 0.500000 0.432432 0.735632
18 0.500000 0.531250 0.651163
19 0.333333 0.472973 0.619048

Figures

Figure 1: Success rate

Figure 1 displays the evolution of the average success rate (y-axis) as the number of iterations increases (x-axis), depending on the maximum adapting rank |maxAdaptingRank| = {0,1,2,3}.

Here it is shown that the success rate stabilizes, but does not converge to 1. This indicates that either the agents continue to adapt their ontologies or their final ontologies do not allow them to agree on all decisions. We assume that if sufficiently restricted, an ontology will be able to contain only the properties required to be accurate on a single task. Therefore, it is to be expected that agents interacting on a number of tasks requiring more memory will not be able to agree on all decisions. It is shown that this is true even for agents interacting on a single task. However, we observe that the smaller the agents' scope, the higher their success rate. Due to the initial ontology induction algorithm, the ontology of an agent undertaking task t may contain properties related to other tasks. There is no guarantee that the agents will succeed in replacing these properties by properties related to the task t. Two distinct cases can be considered. In the first case, an agent will progressively eliminate all properties that are not related to the task it is undertaking. This agent will potentially learn an ontology that will allow it to make correct decisions with respect to all types of objects encountered. In the second case, an agent will not be able to eliminate all properties that are not related to the task it is undertaking. Agents that fall into the second case will repeatedly replace properties that are related to t with different properties that are also related to t. Therefore, these agents are able to make correct decisions for different subsets of object types existing at a given time.

Figure 2: Average worst task accuracy

Figure 2 portrays the evolution of the average worst task accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average minimum accuracy of all tackled tasks at the n^th interaction of each run.

Figure 3: Average accuracy

Figure 3 portrays the evolution of the average accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average accuracy of all existing in the environment tasks at the n^th interaction of each run.

Here it is shown that the size of the agents' scope has a minimal impact on the agents' average accuracy. Once again, this is due to the fact that our agent ontologies are limited to a maximum of 4 leaf classes. Two cases can be distinguished. In the first case, an agent becomes very accurate in one task and much less accurate in the others. In the second case, an agent learns an ontology allowing it to become moderately accurate on several tasks. This results in agents that demonstrate average accuracies that are statistically indistinguishable from each other, regardless of their scope size.

Figure 4: Average best task accuracy

Figure 4 portrays the evolution of the average maximum accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average maximum accuracy of all existing tasks at the n^th interaction of each run.

Here it is shown that memory-limited agents specialize. However, this specialization is not related to the size of their scope, but rather to memory limits. Agents will become most accurate on a task, whether they are tackling one task or multiple tasks. In other words, the fact that agents can refrain from all interactions except those related to the one task they are undertaking does not allow them to improve their accuracy on their best task.

Figure 5: Average scope accuracy

Figure 5 portrays the evolution of the average scope accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average accuracy of all tackled tasks at the n^th interaction of each run.

This figure shows that the smaller the scope of the agents, the higher their average accuracy on the tasks performed. This result is expected for the following reason. An agent in the configuration examined can become very accurate on at most one task. However, the larger the scope of the agents, the higher the number of tasks on which the average accuracy of the performed tasks is calculated. As a result, agents that tackle a single task exhibit higher average performed task accuracy.

Figure 6: Average correct decision rate

Figure 6 displays the evolution of the average correct decision rate (y-axis) as the number of iterations increases (x-axis), depending on the maximum adapting rank |maxAdaptingRank| = {0,1,2}.

The results show that the smaller the scope, the higher the rate of correct decisions for a population of agents. In other words, while changing the size of the agents' scope does not make them less or more specialized, it does allow the agents to make decisions for the tasks they are more precise at.

Figure 7: Average delegation rate

Figure 7 displays the evolution of the average delegation rate (y-axis) as the number of iterations increases (x-axis), depending on the maximum adapting rank |maxAdaptingRank| = {0,1,2}.

Figure 8: Average population's total compensation

Figure 9: Average P90/10 of average agent compensation

Figure 10: Average P90/10 of average scope accuracy

Analysis of variance (ANOVA)

We perform one-way ANOVA, testing if the independent variable 'maxAdaptingRank' has a statistically significant effect on different dependent variables.

One-way anova on table 1: Effect on final success rate

One-way Anova on final success rate values
F : 367.7205414374662
p : 2.641644534984272e-33

One-way anova on table 2: Effect on final average worst task accuracy values

One-way Anova on final average worst task accuracy values
F : 2.3062341260678836
p : 0.10886244305569112

One-way anova on table 3: Effect on final average accuracy

One-way Anova on final average accuracy values
F : 1.7397578779843235
p : 0.18475520732832224

One-way anova on table 4: Effect on final average best task accuracy values

One-way Anova on final average best task accuracy values.
F : 0.4479908829703096
p : 0.6411406833428723

One-way anova on table 5: Effect on final average scope accuracy values

One-way Anova on final average scope accuracy values
F : 127.89302142181441
p : 8.470001046805064e-22

One-way anova on table 6: Effect on final correct decision rate values

One-way Anova on final average correct decision rate values
F : 61.13826799465153
p : 6.561679319482983e-15

One-way anova on table 7: Effect on final average delegation rate values

One-way Anova on final average delegation rate values
F : 367.7205414374641
p : 2.641644534984664e-33

One-way anova on table 8: Effect on final total population compensation values

One-way Anova on final total population compensation values
F : 61.13826799465157
p : 6.561679319482917e-15

One-way anova on table 9: Effect on final P90/10 decile ratio of average agent compensation values

One-way Anova on on final P90/10 decile ratio of average agent compensation values
F : 19.477411246987696
p : 3.577086192710465e-07

One-way anova on table 10: Effect on final P90/10 decile ratio of average scope accuracy values

One-way Anova on the on final P90/10 decile ratio of average scope accuracy values
F : 17.70656229727322
p : 1.0447785163447364e-06

Conclusions