https://pdfs.semanticscholar.org/9128/feeb157945b632d1beddd588e0c00a1d65fb.pdf

]]>… Here we present some investigations into various aspects of the ensemble’s behaviour. In particular,

we explain why the multi-model mean is always better than the ensemble members on

average, and we also identify the properties of the distribution which control how likely it

is to out-perform a single model. …

I have often found myself returning to Box’s famous verdict, which unfortunately remains very impractical: Refrain from all modeling? Of course not. But is “utility” of a model something different than “truth”? Can I come to the right conclusions using a false model?

Being myself rather found of the Bayesian take of statistics, I would of course like to compare models and decide which model to prefer given observed data and prior knowledge. So I would like to rank models according to their posterior probability to be in accordance with observed data.

Nowadays, model averaging is also en vogue: Simply use a lot of models and average over their answer (with equal or specific weights?).

Maybe the following article is a nice take on “All models are wrong…” from a statistical point of view:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1016.444&rep=rep1&type=pdf

It seems that there are, as of yet, no easy answers.

]]>Thanks for refering to my paper.

Each software has its own implementation.

Users need to check if their intents are expressed correctrly.

But, I found many SD software users skip it.

Undoubtly, using something correctly is essential for getting something meaningful.

However, some software is too “friendly” to check something deeply.

We always need to check if our software expresses our models precisely, even if the software reports no errors!