Using Persalys as a data visualization tool: questions and suggestions

Hi all,

I am a (quite) new Persalys user (under v12.0.1) and I am mainly using it (until now) as a “data visualization and inference tool”, which means that I import some .csv data files and I mostly build data models. First of all, I would like to thank all the members of the development team for such a great job! The tool is pretty awesome!

Typically, the .csv files I’m uploading are around 2000 lines times 30 columns (others can have more than 100 columns). The main goals for me are typically: analyzing data, building metamodels and estimating sensitivity indices.

Thus, the main goal of this post is to ask some questions/remarks and to submit some possible suggestions to enhance the software (if they are relevant enough). Some are minor questions, others go deeper into the analysis.

  • In the data analysis part:
    • Why does the button ‘PDF/CDF’ is not split into two buttons (one ‘PDF’ and one ‘CDF’). The fact that one needs to change it in the bottom left panel is a little bit awkward;
    • Moreover, why does the dimension “d” of the problem (meaning, the number of inputs) is not explicitely mentioned?
    • Would it be possible to add a feature that automatically detect that two columns are identical? When the number of columns is large, it is almost impossible to detect it visibly. However, this could be also detected as an output of the analysis as soon as the Spearman Rho is equal to 1. This “perfect” correlation clearly indicates a multicollinearity. Thus, a warning could be sent to the user that some columns seem to be “spurious redundant columns”.
    • Would it be possible to imagine “Principal Component Analysis” as a new feature for this part? T
  • In the marginals’ inference part:
    • Would it possible to try another test, e.g., Shapiro-Wilk, to better test normality, than Kolmogorov-Smirnov?
    • What does happen to an input if none of the candidate laws are accepted according to the KS statistic, especially if one builds a metamodel after (e.g., a PCE)?
  • In the metamodel part:
    • Why is it not possible to do, at this stage, another selection of the inputs X and the samples n? It could be useful to select only a part of the inputs without necessarily build another data model.
    • When one uses PCE for the first time, it is a little bit confusing to let the user choose the polynomial degree without providing any recommendation (documention says: “default: 2”). Do you have any idea about a possible rule of thumb to choose the degree depending on the dimensionality and the volume of data? This question seems crucial to me as the possible large dimension (>10) makes the metamodel building phase being very time-consuming (which can make the app freezing if you try to stop the building process by clicking on the “Stop” button.
    • Would it be possible to imagine as a possible future feature a standard linear regression algorithm (together with its importance measures, i.e., the SRC^2, as a first and primary metamodel? Such a model could be used first and foremost, before deploying more advanced tools (such as PCE and GP regression).
    • When one observe the Sobol’ indices obtained through the PCE metamodel, would it be possible to plot a “grid” on the graph in order to better identify the values (especially when the number of inputs is large).
    • What does the Interactions number represent? This value is somehow confusing.
    • Last question: why do you provide only a single value for the “Residuals” and not a full histogram? And why do you provide the relative error (1-R^2) as a metric, but not the R^2 (which is more easily readable for most people as they get used to it, in a similar way as the Q^2 predictivity coefficient).

Again, all of these questions/remars/suggestions are not criticisms at all and should not be misinterpreted. Thanks a lot for the tremendous work achieved.

All the best,
Vincent

PS : if you need more information about the applications I am working on, I can easily provide more insights and details. Please, don’t hesitate to ask.

hi,

  • We added the linear regression in version 13.0
1 Like

Hi Julien,

Thanks for this answer. I noticed it by downloading and using v13.0 couple of days ago.
Actually, I noticed in the software that it has been called “linear regression”. However, If I’m right, it should be called “polynomial regression” instead since you allow to use polynomials up to order 2 and to allow interactions. Thus, it goes beyond the standard multiple linear regression. Again, just a suggestion to make it clearer for the user (either the experienced one or the beginner).

As for the other topics, maybe I did not post it at the right place?
Tell me if you prefer to discuss about these points somewhere else.

Have a good day,
Vincent

Hi Vincent,
Thank you for your post, you provide a nice feedback of your use. It is quite detailed and it will help us to correct bugs and improve the use of Persalys. Some of your comments are already issues in gitlab and we will add some of your ideas to our list.

Antoine

1 Like

Hello
Another suggestion, if you do not mind…

To vizualize in a different way Sobol indices results, I draw pie charts
it highlights results more from my point of view…

Best regards
Flore