Experiment 20230110-MTOA

Experiment design

20230110-MTOA

Date: 2023-01-10 (Andreas Kalaitzakis)

Hypotheses

Agents will benefit from undertaking a limited set of tasks.

Experimental setting

18 agents; 3 tasks; 6 features (2 independent features per task); 4 decision classes; 20 runs; 80000 games

Each agent initially trains on all tasks. The agent then carries out a limited set of tasks. When the agents disagree the following take place:

(a) The agent with the lower income will adapt its knowledge accordingly. If memory capacity limit is attained, the agent will try to generalize when this possible. For generalization to be an option, the decision for all undertaken tasks must be the same. Generalization takes place in a recursive way, i.e., after the first generalization, if generalization is possible at the previously higher level, we re-merge the now leaf nodes.

(b) The agent with the highest income will decide for both agents

Agents undertake 1-3 tasks having unlimited memory, enough for learning all tasks.

Variables independent variables: ['maxAdaptingRank']

dependent variables: ['avg_min_accuracy', 'avg_accuracy', 'avg_max_accuracy', 'success_rate', 'correct_decision_rate', 'delegation_rate']

Experiment

Date: 2023-01-01 (Andreas Kalaitzakis)

Computer: Dell Precision-5540 (CC: 12 * Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz with 16GB RAM OS: Linux 5.4.0-92-generic)

Duration : 120 minutes

Lazy lavender hash: ceb1c5d1ca8109373d293b687fc55953fce5241d

Parameter file: params.sh

Executed command (script.sh):

#!/bin/bash

. params.sh

CURRDIR=$(pwd)
OUTPUT=${CURRDIR}/${DIRPREF}
# cd ${LLPATH}
cd lazylav
# this sample runs ExperimentalPlan. It can be replaced with Monitor if parameters are not varied.
bash scripts/runexp.sh -p ${CURRDIR} -d ${DIRPREF} java -Dlog.level=INFO -cp ${JPATH} fr.inria.exmo.lazylavender.engine.ExperimentalPlan -Dexperiment=fr.inria.exmo.lazylavender.decisiontaking.multitask.SelectiveAcceptanceSpecializationExperiment ${OPT} -DresultDir=${OUTPUT}

Analysis

Raw data

Full results can be found at:

Zenodo DOI

Table 1: Final success rate values

Table 1 consists of the final achieved average success rate values, i.e., the average success rate after the last iteration. Each column corresponds to a different number of adapting tasks, while each row corresponds to a different run, for the same size of scope.

Out[6]:
0 1 2
0 0.971300 0.869088 0.829700
1 0.952213 0.862387 0.800987
2 0.953363 0.881838 0.827213
3 0.954963 0.910450 0.814488
4 0.961762 0.871300 0.815875
5 0.965725 0.877500 0.820200
6 0.942000 0.886800 0.813250
7 0.955825 0.875687 0.830213
8 0.967525 0.886125 0.827800
9 0.956612 0.882025 0.828388
10 0.941438 0.870788 0.817612
11 0.965075 0.881062 0.807550
12 0.965050 0.886437 0.816688
13 0.961363 0.863463 0.817412
14 0.955213 0.866950 0.822525
15 0.947550 0.876537 0.830538
16 0.960800 0.874575 0.807987
17 0.955862 0.883988 0.818250
18 0.953887 0.872075 0.838075
19 0.965850 0.877775 0.812612

Table 2: Final average worst task accuracy values

Table 2 consists of the final average minimum accuracy values with respect to the worst task, i.e., the accuracy after the last iteration for the task for which the agent scores the lowest accuracies. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same size of scope.

Out[7]:
0 1 2
0 0.276910 0.387153 0.937500
1 0.263889 0.388889 0.875000
2 0.284722 0.324653 0.781250
3 0.324653 0.324653 0.859375
4 0.272569 0.397569 0.875000
5 0.284722 0.358507 0.873264
6 0.307292 0.446181 0.905382
7 0.293403 0.401910 0.889757
8 0.304688 0.329861 0.953125
9 0.304688 0.359375 0.843750
10 0.289062 0.354167 0.843750
11 0.286458 0.310764 0.906250
12 0.275174 0.398438 0.906250
13 0.300347 0.376736 0.936632
14 0.282118 0.368924 0.858507
15 0.281250 0.364583 0.906250
16 0.314236 0.357639 0.874132
17 0.258681 0.358507 0.900174
18 0.322917 0.318576 0.825521
19 0.326389 0.354167 0.916667

Table 3: Final average accuracy values

Table 3 consists of the final achieved average ontology accuracy with respect to all tasks, i.e., the accuracy after the last iteration averaged on all tasks and agents. Each column corresponds to a different number of adapting tasks, while each row corresponds to a different run, for the same size of scope.

Out[8]:
0 1 2
0 0.499421 0.684606 0.968461
1 0.462963 0.695602 0.937500
2 0.496238 0.708912 0.901042
3 0.467593 0.732928 0.932292
4 0.460648 0.726273 0.911458
5 0.506366 0.713252 0.921296
6 0.527778 0.728588 0.936632
7 0.538194 0.727720 0.942419
8 0.556713 0.731481 0.967882
9 0.478009 0.722512 0.905961
10 0.501447 0.680556 0.937500
11 0.481771 0.714699 0.921875
12 0.499132 0.750868 0.937500
13 0.541667 0.715856 0.951968
14 0.540220 0.695891 0.937211
15 0.530671 0.732639 0.947917
16 0.537326 0.716435 0.915799
17 0.545718 0.741030 0.940683
18 0.497106 0.710359 0.910590
19 0.578704 0.690972 0.935764

Table 4: Final average best task accuracy values

Table 4 consists of the final average best task accuracy values with respect to the best task, i.e., the accuracy after the last iteration for the task for which the agent score the highest accuracies. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same size of scope.

Out[9]:
0 1 2
0 0.822917 0.911458 0.999132
1 0.682292 0.890625 1.000000
2 0.770833 0.947917 0.968750
3 0.651042 1.000000 0.984375
4 0.682292 0.911458 0.937500
5 0.815972 0.958333 1.000000
6 0.791667 0.932292 0.968750
7 0.854167 0.927083 1.000000
8 0.906250 0.953125 0.984375
9 0.697049 0.973958 0.953125
10 0.703993 0.890625 1.000000
11 0.770833 0.968750 0.953125
12 0.732639 0.937500 0.984375
13 0.875000 0.916667 0.966146
14 0.840278 0.927083 1.000000
15 0.859375 0.947917 0.984375
16 0.854167 0.947917 0.937500
17 0.947917 0.963542 0.968750
18 0.713542 0.979167 0.953125
19 0.916667 0.880208 0.968750

Table 5: Final average scope accuracy values

Table 5 consists of the final average scope accuracy values. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[10]:
0 1 2
0 0.822917 0.833333 0.968461
1 0.635417 0.848958 0.937500
2 0.770833 0.901042 0.901042
3 0.598958 0.937066 0.932292
4 0.609375 0.890625 0.911458
5 0.796875 0.890625 0.921296
6 0.786458 0.869792 0.936632
7 0.854167 0.890625 0.942419
8 0.906250 0.932292 0.967882
9 0.692708 0.901042 0.905961
10 0.703125 0.843750 0.937500
11 0.770833 0.916667 0.921875
12 0.723958 0.927083 0.937500
13 0.875000 0.885417 0.951968
14 0.833333 0.859375 0.937211
15 0.859375 0.916667 0.947917
16 0.854167 0.895833 0.915799
17 0.947917 0.932292 0.940683
18 0.713542 0.906250 0.910590
19 0.916667 0.859375 0.935764

Table 6: Final correct decision rate values

Table 6 consists of the final achieved correct decision rate values, i.e., the average correct decision rate after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same size of scope.

Out[13]:
0 1 2
0 0.810000 0.765513 0.861163
1 0.618537 0.770913 0.803937
2 0.740725 0.830488 0.805275
3 0.584600 0.884737 0.805475
4 0.596425 0.809050 0.797488
5 0.778850 0.824638 0.817063
6 0.752613 0.803837 0.812913
7 0.831975 0.806262 0.834475
8 0.891625 0.860263 0.852950
9 0.678525 0.824913 0.795612
10 0.677325 0.775463 0.821350
11 0.754850 0.837462 0.800925
12 0.703488 0.856712 0.831025
13 0.843638 0.805975 0.836650
14 0.808187 0.781312 0.836350
15 0.832762 0.835025 0.844225
16 0.833087 0.817825 0.794188
17 0.920900 0.860237 0.825337
18 0.688600 0.823688 0.823412
19 0.894925 0.793138 0.816087

Table 7: Final delegation rate values

Table 7 consists of the final achieved delegation rate values, i.e., the average delegation rate after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[14]:
0 1 2
0 0.028700 0.130912 0.170300
1 0.047787 0.137612 0.199013
2 0.046637 0.118163 0.172788
3 0.045038 0.089550 0.185512
4 0.038238 0.128700 0.184125
5 0.034275 0.122500 0.179800
6 0.058000 0.113200 0.186750
7 0.044175 0.124313 0.169788
8 0.032475 0.113875 0.172200
9 0.043388 0.117975 0.171613
10 0.058563 0.129213 0.182388
11 0.034925 0.118938 0.192450
12 0.034950 0.113562 0.183312
13 0.038637 0.136538 0.182588
14 0.044788 0.133050 0.177475
15 0.052450 0.123463 0.169462
16 0.039200 0.125425 0.192013
17 0.044138 0.116013 0.181750
18 0.046113 0.127925 0.161925
19 0.034150 0.122225 0.187388

Table 8: Final population's total compensation values

Table 8 consists of the final achieved total compensation values, i.e., the average total compensation of a population of agents after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[15]:
0 1 2
0 1296000.0 1224820.0 1377860.0
1 989660.0 1233460.0 1286300.0
2 1185160.0 1328780.0 1288440.0
3 935360.0 1415580.0 1288760.0
4 954280.0 1294480.0 1275980.0
5 1246160.0 1319420.0 1307300.0
6 1204180.0 1286140.0 1300660.0
7 1331160.0 1290020.0 1335160.0
8 1426600.0 1376420.0 1364720.0
9 1085640.0 1319860.0 1272980.0
10 1083720.0 1240740.0 1314160.0
11 1207760.0 1339940.0 1281480.0
12 1125580.0 1370740.0 1329640.0
13 1349820.0 1289560.0 1338640.0
14 1293100.0 1250100.0 1338160.0
15 1332420.0 1336040.0 1350760.0
16 1332940.0 1308520.0 1270700.0
17 1473440.0 1376380.0 1320540.0
18 1101760.0 1317900.0 1317460.0
19 1431880.0 1269020.0 1305740.0

Table 9: Final P90/10 decile ratio of average agent compensation values

Table 9 consists of the final P90/10 decile ratio of average agent compensation values, after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[16]:
0 1 2
0 0.671662 0.822004 0.917597
1 0.669009 0.886940 0.946083
2 0.951306 0.899609 0.929718
3 0.652029 0.834270 0.928437
4 0.648642 0.914170 0.963125
5 0.651799 0.840651 0.942143
6 0.707684 0.833992 0.919271
7 0.805074 0.892150 0.951303
8 0.684471 0.882510 0.902686
9 0.812040 0.855304 0.943747
10 0.661392 0.880670 0.921447
11 0.818590 0.837420 0.936705
12 0.620505 0.896914 0.935177
13 0.681861 0.876005 0.927264
14 0.742174 0.849585 0.930684
15 0.770398 0.927016 0.928502
16 0.804982 0.863828 0.933635
17 0.836609 0.930015 0.920143
18 0.807382 0.822132 0.946538
19 0.842254 0.891342 0.907572

Table 10: Final P90/10 decile ratio of average scope accuracy values

Table 10 consists of the final P90/10 decile ratio of average scope accuracy values, after the last iteration. Each column corresponds to a different number of undertaken tasks, while each row corresponds to a different run, for the same number of undertaken tasks.

Out[17]:
0 1 2
0 0.718750 0.866071 0.997312
1 0.708333 0.929204 1.000000
2 0.960000 0.925620 1.000000
3 0.687500 0.902344 1.000000
4 0.673913 0.965517 1.000000
5 0.661290 0.891667 0.980337
6 0.736842 0.898305 0.994444
7 0.827586 0.940171 0.997238
8 0.718750 0.966942 0.994624
9 0.875000 0.887097 0.997126
10 0.709091 0.921053 1.000000
11 0.851852 0.918033 1.000000
12 0.625000 0.983333 1.000000
13 0.750000 0.948718 0.994536
14 0.758621 0.887931 0.997222
15 0.786885 0.950413 1.000000
16 0.850000 0.915966 0.994318
17 0.859375 0.950820 0.991713
18 0.854167 0.887097 0.994286
19 0.875000 0.964286 0.991667

Figures

Figure 1: Success rate

Figure 1 displays the evolution of the average success rate (y-axis) as the number of iterations increases (x-axis), depending on the maximum adapting rank |maxAdaptingRank| = {0,1,2,3}.

This figure shows that a population of interacting agents will learn to agree on all their decisions, regardless of the size of their scope. This is justified by the fact that the agents represented here do not face any limitations. Therefore, they will gradually adopt the same properties in their ontologies, even if their leaf classes do not coincide. These observations confirm the results that were presented previously by Bourahla. Moreover, they indicate that the size of the scope has an impact on the success rate obtained. The larger the scope size, the more interactions are needed to agree on everything, thus the lower the average success rate at convergence.

Figure 2: Average worst task accuracy

Figure 2 portrays the evolution of the average worst task accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average minimum accuracy of all tackled tasks at the n^th interaction of each run.

Figure 3: Average accuracy

Figure 3 portrays the evolution of the average accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average accuracy of all existing in the environment tasks at the n^th interaction of each run.

We assume that agents interacting on a limited scope of tasks will be more accurate on some tasks than others, at the expense of their average accuracy. However, we do not observe this here. Here it is shown that performing additional tasks significantly improves the average accuracy of agents.

Figure 4: Average best task accuracy

Figure 4 portrays the evolution of the average maximum accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average maximum accuracy of all existing tasks at the n^th interaction of each run.

Two observations can be drawn. The first observation is that the average accuracy is always lower than the average accuracy on the best task of the same agents shown here. These agents are therefore able to specialize by restricting the scope of their tasks. The second observation is that agents that tackle fewer tasks have a lower average accuracy on their best task than agents that tackle all tasks. More precisely, the smaller the scope of the agents, the lower this precision is. Thus, we can conclude that specializing agents with unlimited memory does not provide any advantage in terms of accuracy. In the examined configuration, each configuration depends on different properties. Therefore, our observation is not related to the transferability of knowledge from one task to another, since learning the decision with respect to one task is not related to learning the decision for another task. On the contrary, our observation is justified by the fact that agents tackling all tasks build more complete ontologies, and therefore associate the learned decisions with more detailed classification.

Figure 5: Average scope accuracy

Figure 5 portrays the evolution of the average scope accuracy (y-axis), depending on the number of carried tasks. Each point x,y corresponds to the average accuracy of all tackled tasks at the n^th interaction of each run.

Figure 6: Average correct decision rate

Figure 6 displays the evolution of the average correct decision rate (y-axis) as the number of iterations increases (x-axis), depending on the maximum adapting rank |maxAdaptingRank| = {0,1,2}.

Figure 7: Average delegation rate

Figure 7 displays the evolution of the average delegation rate (y-axis) as the number of iterations increases (x-axis), depending on the maximum adapting rank |maxAdaptingRank| = {0,1,2}.

Figure 8: Average population's total compensation

Figure 9: Average P90/10 of average agent compensation

Figure 10: Average P90/10 of average scope accuracy

Analysis of variance (ANOVA)

We perform one-way ANOVA, testing if the independent variable 'maxAdaptingRank' has a statistically significant effect on different dependent variables.

One-way anova on table 1: Effect on final success rate

One-way Anova on final success rate values
F : 1076.1522506806443
p : 5.3845502017442e-46

One-way anova on table 2: Effect on final average worst task accuracy values

One-way Anova on final average worst task accuracy values
F : 1931.8047940808706
p : 4.284063038848824e-53

One-way anova on table 3: Effect on final average accuracy

One-way Anova on final average accuracy values
F : 1456.1482340138953
p : 1.1797958887570817e-49

One-way anova on table 4: Effect on final average best task accuracy values

One-way Anova on final average best task accuracy values.
F : 59.62689623329024
p : 1.0653369175630308e-14

One-way anova on table 5: Effect on final average scope accuracy values

One-way Anova on final average scope accuracy values
F : 30.492259971861277
p : 9.895096568892882e-10

One-way anova on table 6: Effect on final correct decision rate values

One-way Anova on final average correct decision rate values
F : 5.8243141944754
p : 0.0049937322357773315

One-way anova on table 7: Effect on final average delegation rate values

One-way Anova on final average delegation rate values
F : 1076.1522506806507
p : 5.3845502017433124e-46

One-way anova on table 8: Effect on final total population compensation values

One-way Anova on final total population compensation values
F : 5.824314194475404
p : 0.004993732235777293

One-way anova on table 9: Effect on final P90/10 decile ratio of average agent compensation values

One-way Anova on on final P90/10 decile ratio of average agent compensation values
F : 60.519597134558595
p : 7.993522680350689e-15

One-way anova on table 10: Effect on final P90/10 decile ratio of average scope accuracy values

One-way Anova on the on final P90/10 decile ratio of average scope accuracy values
F : 84.0956505140236
p : 9.878988975251212e-18