A Bayesian t-test: Bayes factors as a special case of estimation

The purpose of this web app is twofold:

To allow you to run a Bayesian version of a two sample t-test from the comfort of your own browser
To allow you to contrast the Bayes factor perspective with the Bayesian estimation perspective.

To use it, simply enter some data below and hit "Click to start!".

This web app was made as part of the paper

Williams, M. N., Bååth, R., & Philipp, M. C. (2017). Using Bayes Factors to Test Hypotheses in Developmental Research. Research in Human Development, 1-17. Retrieved from http://dx.doi.org/10.1080/15427609.2017.1370964

Please see that paper for a discussion on Bayes factors and Bayesian estimation.

→ Click for more info ←

This web app implements a version of the Bayesian t-test found in Rouder et al. (2012). It assumes that the data from both groups follow normal distributions with the same standard deviation (σ) but where the means in both groups (µ₁ and µ₂) are either the same (the H0 model) or they can differ (the H1 model). For the H1 model the prior on the difference is a Cauchy distribution over the effect size, that is, the difference between the means of the groups scaled by the standard deviation: (µ₂ - µ₁) / σ. (Note: Although we refer to different “models” here for pedagogical reasons, this app actually estimates a single statistical model. The model includes both an estimated effect size as well as a switching parameter used to determine whether or not the effect size is exactly zero).

From a Bayes Factors perspective you compare the likelihood that H0 would generate the observed data with the corresponding likelihood of H1. The result is a Bayes factor: the likelihood of H1 divided by the likelihood of H0 (or vice versa). If you assume that H0 and H1 were equally probable to begin with, then the Bayes factor can be interpreted as the posterior odds in favor of H1 (or H0).

From a Bayesian estimation perspective you can get numerically identical results by assuming a prior over the effect size that puts 50% probability on no/0 difference and 50% on the Cauchy prior from H1. After fitting the model the result is a posterior distribution over the effect size. Now, the posterior probability of a non-zero effect size divided by the probability of a zero effect size will give you the odds in favor of H1, which will be the same as the Bayes factor. So for this specific model, the Bayes factor perspective can be seen as a special case of the estimation perspective. But from the estimation perspective you have some more flexibility, for example, you can put any prior probability on a zero effect size, not just 50%.

You are also free to summarize the posterior in different ways, not just calculate Bayes factors. For example, it could be reasonable to define a region of practical equivalence (ROPE) where the effect size is so small that it's not practically relevant (see Kruschke, 2013). Then you can sum up the probability that is within the ROPE (in favor of no relevant difference), lower than the ROPE (in favor of group 1 having a higher mean), and higher than the ROPE (in favor of group 2 having a higher mean). In the limit where the ROPE is just defined to be 0.0 - 0.0, the probability within and outside the ROPE will have the same probabilities as H0 and H1, respectively. The ROPE approach may be particularly useful when the prior probability of an exactly zero effect is very low or even zero, but a researcher wishes to determine whether an effect size is large enough to be practically significant.

So, in summary, for certain models, calculating Bayes factor or estimating the effect size can give numerically identical results. The difference is in how those results are interpreted and presented. The Bayes factor approach contrasts two different models and gives you the probability for each (assuming they were equally probable to begin with). The estimation approach gives you a probability distribution over what the effect size could be using one model, and you are free to summarize this distribution in any way you want, for example, by looking at the probability that the effect size is close to zero.

↑↑↑ Click for less info ↑↑↑

Data - Group 1

Prior prob. of no difference: 0.5

Scale of effect size prior

Nbr of burn-in samples

Nbr of samples

Data - Group 2

Lower ROPE: -0.2

Upper ROPE: 0.2

Prior - Effect size

The prior on the effect size defined as (µ₂ - µ₁) / σ

Some things to try out

Change the prior probability of no/0 difference. For example, set the prior to a very low probability if you want to deemphasize that zero difference has a special status. Note that when this prior is 0.5 the posterior odds and the Bayes factor in favor of H1 are exactly the same.
Explore what happens when you change the with of the region of practical equivalence (ROPE). What happens when the ROPE is just centered on 0.0? How does that relate to the Bayes Factor?
The default data set comes from a study by Schroeder and Epley (2015) who looked at (among other things) how intelligent job applicants were perceived when their pitch was presented either as transcribed text (group 1) or as an audio recording (group 2). Try to change the data and see how that affects outcome of the model.
In this web app, you are given the choice of three different scales on the prior on the effect size, corresponding to the prior information that the effect size could be medium, large (wide) or very large (ultra wide). What happens with the Bayes Factor and the posterior probability within and outside the ROPE when you change this prior?
To fit the model this web app uses Markov chain Monte Carlo (MCMC) and the options Nbr of burn-in samples and Nbr of samples decide the accuracy of the MCMC estimate (where larger is better). What's the effect of making the number of MCMC samples really small or really large?

Traceplot - Effect size

A traceplot as a sanity check that the MCMC estimation did not go wrong.

Posterior - Effect size

The posterior probability of the effect size after having used the data.

Post. Prob. of Interest regions

The amount of probability that is lower than, inside and higher than the ROPE.

Posterior Prob of H0/H1

This gives odds of ... in favor of H1. Compare this with the BF of ....

The Rest of the Parameters.

Even though estimating the effect size is the main focus here, we also get estimates for the rest of the parameters in the model. All the plots show a 95% highest density interval (HDI), the shortest interval that contain 95% of the probability, and the posterior mean (in green) which can be interprete as a "best guess" for the parameter value.

Trace plot - Mean Group 1

Posterior - Mean Group 1

Trace plot - Difference in means

Posterior - Difference in means

Trace plot - Mean Group 2

Posterior - Mean Group 2

Trace plot - Standard deviation

Posterior - Standard deviation

About. This web app was created by me, Rasmus Bååth. Libraries used: jStat for some statistical functions, Flot for plotting and JQuery for this and that. For css styling I used the Square Grid framework. For MCMC I used my homegrown bayes.js framework. If you have any suggestions for improvements feel free to drop me a message. A word of caution. This web app should be considered a demo and if you want to include a Bayesian version of a t-test in a publication I recommend you use, for example, BEST or the BayesFactor package.

Coded by Rasmus Bååth 2017.
Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.