Model comparison

Once you have two fits, compare(r1, r2) tells you which one the data prefers — with calibrated uncertainty on the difference and a self-diagnostic that flags when the comparison is unreliable.

LOO and WAIC

The loo and waic functions return ELPD estimates for a single fit:

r = post[1]    # a BayesianCurveFitResult

w = waic(r)
# (elpd = 230.52, p_eff = 3.46, n_obs = 97)

l = loo(r)
# (elpd = 230.52, se = 16.98, n_obs = 97, pareto_k_max = 0.624, elpd_pointwise = [...])

elpd is the expected log pointwise predictive density — a Bayesian analogue of out-of-sample log-likelihood. Higher is better. loo uses Pareto-smoothed importance sampling (via PSIS.jl) to estimate it from the posterior at no additional sampling cost.

The Pareto-k diagnostic

pareto_k_max is the largest Pareto-k value across observations. The standard interpretation:

kInterpretation
< 0.5LOO estimate is reliable.
0.5 – 0.7LOO is approximately reliable.
0.7 – 1.0Importance sampling is unreliable for some observation; treat ELPD with caution.
> 1.0LOO cannot be trusted on this data; the model is likely badly misspecified.

The diagnostic is self-firing — no separate calibration check needed. In our own validation: fitting Gompertz to aHPM-generated data produces pareto_k_max ≈ 1.19, correctly signalling that this model is the wrong one.

Comparing two models

The compare function computes the ELPD difference between two fits:

r_correct  = bayesfit(data, BayesianModelSpec([MODEL_REGISTRY["aHPM"]]))[1]
r_wrong    = bayesfit(data, BayesianModelSpec([MODEL_REGISTRY["NL_Gompertz"]]))[1]

cmp = compare(r_correct, r_wrong)
# (elpd_diff = +117.6, se_diff = 10.0, favours = "aHPM (>2 SE)")

elpd_diff = elpd(r1) - elpd(r2) with its standard error. Conventional reading: if |elpd_diff| > 2 * se_diff, the difference is meaningful. The favours field applies that threshold and reports the verdict in plain text.

Pointwise log-likelihoods

For users who want to do their own ELPD analysis or write their own diagnostics, pointwise_loglik returns an n_samples × n_obs matrix. You can pipe this into any standard model-comparison machinery.

What this gives you that Kinbiont's AICc doesn't

KinbiontBayesBiont
AICc per modelFull ELPD posterior
No diagnostic on whether AICc is reliablePareto-k flags when LOO is unreliable
Picks the best AICc modelReports ELPD difference + SE so you can judge significance
Asymptotic; needs N ≫ paramsBayesian; works for small N too

If your goal is "which of these N candidate models should I report?", Kinbiont's AICc is faster. If your goal is "is the difference between these two fits statistically meaningful?", compare() answers that with an uncertainty estimate.