This vignette can be referred to by citing the package:
Now that describing and understanding posterior distributions of linear regressions has no secrets for you, let’s go back and study some simpler models: correlations and t-tests.
But before we do that, let us take a moment to remind ourselves and appreciate the fact that all basic statistical pocedures such as correlations, t-tests, ANOVAs or Chisquare tests are linear regressions (we strongly recommend this excellent demonstration). But still, these simple models will be the occasion to introduce a more complex index, such as the Bayes factor.
Let us start, again, with a frequentist correlation between two continuous variables, the width and the length of the sepals of some flowers. The data is available in R as the iris
dataset (the same that was used in the previous tutorial).
Let’s compute a Pearson’s correlation test, store the results in an object called result
, then display it:
>
> Pearson's product-moment correlation
>
> data: iris$Sepal.Width and iris$Sepal.Length
> t = -1, df = 148, p-value = 0.2
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
> -0.273 0.044
> sample estimates:
> cor
> -0.12
As you can see in the output, the test that we did actually compared two hypotheses: the null hypothesis (no correlation) with the alternative hypothesis (a non-null correlation). Based on the p-value, the null hypothesis cannot be rejected: the correlation between the two variables is negative but not significant (r = -.12, p > .05).
To compute a Bayesian correlation test, we will need the BayesFactor
package (you can install it by running install.packages("BayesFactor")
). We will then load this package, compute the correlation using the correlationBF()
function and store the results in a similar fashion.
Let us run our describe_posterior()
function:
> Parameter Median CI CI_low CI_high pd ROPE_CI ROPE_low ROPE_high
> 1 rho -0.11 89 -0.24 0.0079 92 89 -0.1 0.1
> ROPE_Percentage BF Prior_Distribution Prior_Location Prior_Scale
> 1 42 0.51 cauchy 0 0.33
We see again many things here, but the important indices for now are the median of the posterior distribution, -.11
. This is (again) quite close to the frequentist correlation. We could, as previously, describe the credible interval, the pd or the ROPE percentage, but we will focus here on another index provided by the Bayesian framework, the Bayes factor.
We said that a correlation actually compares two hypotheses, a null (absence of effect) with an altnernative one (presence of an effect). The Bayes factor (BF) allows the same comparison and determines under which of two models the observed data are more probable: a model with the effect of interest, and a null model without the effect of interest. We can use bayesfactor()
to specifically compute the Bayes factor comparing those models (and many more):
> Bayes factor analysis
> ---------------------
> [2] Alt., r=0.333 0.51
>
> Against denominator:
> [1] Null, rho = 0
> ---
> Bayes factor type: JZS (BayesFactor)
We got a BF of 0.51
. What does it mean?
Bayes factors are continuous measures of relative evidence, with a Bayes factor greater than 1 giving evidence in favor of one of the models (often referred to as the numerator), and a Bayes factor smaller than 1 giving evidence in favour of the other model (the denominator).
Yes, you heard things right, evidence in favour of the null!
That’s one of the reason why the Bayesian framework is sometimes considered as superior to the frequentist framework. Remember from your stats lessons, that the p-value can only be used to reject h0, but not accept it. With the Bayes factor, you can measure evidence against - and in favour of - the null.
BFs representing evidence for the alternative against the null can be reversed using (BF_{01}=1/BF_{10}) to provided evidence of the null agaisnt the alternative. This improves human readbility in cases where the BF of the the alternative against the null is smaller than 1 (in support of the null).
In our case, BF = 1/0.51 = 2
, indicates that the data are 2 times more probable under the null compared to the alternative hypothesis, which, though favouring the null, is considered only anecdotal evidence against the null.
We can thus conclude that there is anecdotal evidence in favour of an absence of correlation between the two variables (rmedian = 0.11, BF = 0.51), which is a much more informative statement that what we can do with frequentist statistics.
And that’s not all!
A hypothesis for which one uses a t-test can also be tested using a logistic model. Indeed, one can reformulate the following hypothesis, “there is a important difference in this variable between my two groups” by “this variable is able to discriminate (or classify) between the two groups”.
About diagnostic indices such as Rhat and ESS.
About priors.