Lecture 5
Bayesian data analysis
R101: A practical guide to making R your everyday statistical tool (PSY532)

Programme
•Bayes’ Rule
•Simple “real-world” Bayesian problems: a demonstration of the rule’s rationality
•Bayesian data analysis and its rationality
•Bayesian ANOVA example: Described in detail in readings from Kruschke textbook
•Demonstration: A Bayesian approach to our most recent analyses
•

Logistic regression and multilevel modelling demonstrations will make use of a different data set

Bayes’ Rule
•Derivable from the laws of probability
•Premised on the idea that probability is a degree of belief inside a learner’s head
•
•
•
•
•
•
•
•
•
•
The probability that hypothesis h is true, given that we (the learners) have observed data d; i.e.,
our degree of belief in h after seeing the data
The prior: Our degree of belief in h before seeing the data
The likelihood: Our degree of belief in seeing the data if h is true (given our assumptions about
the generating mechanism behind all possible datasets – i.e., the sample space)
The marginal likelihood: The likelihood of the data under hypothesis h and all other possible
hypotheses in the hypothesis space H; can also be expressed as:
or
Reading: Navarro lecture 4; Kruschke Ch 4

Cough and cancer example; alternatives: lung cancer and stomach flu
Dash refers to the fact that we are evaluating this expression after having seen the data

Bayesian inference: rational in the real world
•The taxicab problem illustrates that, in the real world, the prior matters!
•The game show problem illustrates that, in the real world, the likelihood matters!
•(The following slides describing the problems and their solutions are from Daniel Navarro’s  2011
lectures on Computational Cognitive Science at The University of Adelaide. This is the link to the
full course, and this is the link to the specific lecture used – Lecture 4.)
Reading: Navarro lecture 4

Use lecture: constrained Y – counts, binary

The taxicab problem


The game show problem


Argued to also be more rational in data analysis
•In the Bayesian version of the analyses we’ve been doing, the linear model and its assumptions are
expressed in the likelihood function, which takes the form of a normal distribution (or a
t-distribution in some cases).
•For each predictor, there is a prior distribution (usually of gamma form), expressing the
learner’s beliefs about the strength of the relationship between the predictor and the outcome
variable.
•The marginal likelihood is usually not computed.
All four components of Bayes’ Rule are probability distributions; that is:
•They have a certain shape, expressed in a function (i.e., formula).
•For probability density functions (continuous variables), the area under the shape (i.e., the
integral of the function) equals 1. For probability mass functions (categorical variables) the sum
of the categories’ probabilities is 1.
Reading: Kruschke Ch 11 (not provided)

Bayesian data analysis logic continued
•The marginal likelihood is the conjoint (summed) probability of the data points given the model.
The reason we usually need specialised techniques to estimate this value is that it takes the form
of a complex integral. For example, for a model with three parameters:
•
•
•It follows from Bayes’ Rule that the posterior is a weighted average of the prior mean and the
data, with weighting corresponding to the relative precisions of the prior and likelihood.
Precisions are related to standard deviations and reflect the learner’s (i.e., the analyst’s)
uncertainty in prior beliefs, or, alternatively, model predictions. The precisions are estimated
from the data through the use of hyperpriors.  That is,
–if the prior is fairly vague, and the data are numerous, the posterior will be near the parameter
values that maximise the likelihood of the data.
–“natural shrinkage” occurs, similar to the shrinkage in multilevel modelling.

Possible parameters: phi – prob of being human, theta-one – number of agreed-to survey statements
by humans, theta-two – number of agreed-to survey statements by robots

Bayesian data analysis : Typical steps
1.Determine priors, likelihood, and hyperpriors (or get R to do this for you)
2.Estimate the posterior (and possibly the marginal likelihood) through a Markov Chain Monte Carlo
process (Metropolis-Hastings, Gibbs’ sampling)
–Since there is a prior for each predictor, there is a posterior for each parameter.
3.Determine whether the parameter value predicted by the null hypothesis (e.g., a slope value of
zero) falls within the 95% Highest Density Interval (HDI) of the posterior for the parameter. It
does in the case below!
4.
•
•
http://www.wcsmalaysia.org/analysis/stat_pics/BEST_default_plot.jpeg
Regression slope value

Possible parameters: phi – prob of being human, theta-one – number of agreed-to survey statements
by humans, theta-two – number of agreed-to survey statements by robots

Bayesian data analysis : Advantages over the standard “Frequentist” approach
1.The use of priors allows us to incorporate findings from previous studies into the data analysis.
This is particularly useful if there is a large number of preceding studies while the current
study’s data set is small.
2.HDI more interpretable than frequentist “confidence intervals”: If a parameter value falls within
the 95% HDI, it is among the most believable  parameter values. Frequentist confidence intervals
indicate that it is 95% likely that the parameter value could not be in the interval range if the
null hypothesis were true.
3.Subtle but philosophically important point discussed at length by Kruschke: The Bayesian
analyst’s model assumptions are transparent and not dependent on sample size (degrees of freedom).
•
•

Possible parameters: phi – prob of being human, theta-one – number of agreed-to survey statements
by humans, theta-two – number of agreed-to survey statements by robots

Bayesian ANOVA example
Reading: Kruschke Ch 18, 19
Likelihood function
Normal
Prior over slope parameters
Normal
(Hyper)prior over prior precisions
Folded-t
Prior over error (likelihood function precision)
Uniform

A Bayesian approach to our most recent analyses
•Demonstration – bayesglm function in arm package. For generalized linear modelling using the
Bayesian approach.
•
•
•
•
•
•
•
•
•For mixture models (discussed in Lecture 4), you could explore the package MCMCglmm and the
function of the same name within it.
Likelihood function
Normal
Prior over slope parameters
Cauchy, t
(Hyper)prior over prior precisions
- (standardization of predictors and fixing of SD, which is related to the precision, to 0.5 or 1)
Prior over error (likelihood function precision)
Uniform
Reading: Gelman et al 2009

Reading
•Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. (2009). A weakly informative default prior
distribution for logistic and other regression models. The Annals of Applied Statistics, 2,
1360-1383. Available online: http://www.stat.columbia.edu/~gelman/research/published/priors11.pdf
•
•Kruschke, J. K. (2011). Doing Bayesian Data Analysis. Elsevier: Oxford. Chapter 4 “Bayes’ Rule”,
Chapter 18 “Bayesian Oneway ANOVA”, and Chapter 19 “Metric predicted variable with multiple nominal
predictors” will be provided online