We noticed that the aggregated Sobol indices are not provided in the interface. It could useful to have it. The question is how to implement it :
- add the aggregated output in the list of all output
- add another tab only visible when several outputs are defined and only dedicated to the aggregated indices.
What do you think ?
It is very useful to get the aggregated Sobol indices when we have mutltioutput model, I think the second choice for the implementation is better.
If the function G has only one single output, the aggregated Sobol’ indices are equal to the classical Sobol’ indices, is this true? I suppose that this is the main motivation for not showing these aggregated indices when there is only one output, because the information they provide duplicates what is already presented in the GUI.
In the context of the validation of a multivariate function G, is there an “aggregated Q2” predictivity coefficient" defined in the bibliography? I ask this question because the current GUI provides a separate Q2 for each marginal output, not an aggregated one. This would greatly simplify the comparison between different metamodels.
For your 1st question, yes the aggregated Sobol indices are equal to the classical ones, if there is only 1 output. The aggregated indices are also interesting in the case the output is a field, we might want to know globally the influence as well as along the mesh axis. This could be a functionality to add to Persalys.
For the aggregated Q2, I did not read anything like this in the bibliography but I assume we can do the sum of the residuals for all outputs. However I still prefer having a Q2 per output in order to choose the best metamodel for each output.
If the multi-output function is a field, adding the residuals might make sense. If the multi-output function has output which have varying order magnitudes (e.g. a Young modulus and a strain), this might not perform accurately. Would an averaged-Q2 (i.e. the sum of Q2 for all outputs divided by the number of outputs) do the trick in a manner that is statistically consistent?
I agree that for output with different order of magnitude, it might not perform well, especially as if the output variance is different. I tried to find some papers on it and I finally arrived in the scikit-learn documentation where they returns either all Q2, or the mean or the weighted mean (using the variance) of all Q2.