varied	not	just	on	dose	of	alcohol	but	also	on	their	tolerance	of	alcohol	(the
systematic	variation	created	by	their	past	experience	with	alcohol	cannot	be
separated	from	the	effect	of	the	experimental	manipulation).	The	best	way	to
reduce	this	eventuality	is	to	randomly	allocate	participants	to	conditions:	by
doing	so	you	minimize	the	risk	that	groups	differ	on	variables	other	than	the	one
you	want	to	manipulate.
Why	is	randomization	important?
1.8	Analysing	data	
The	final	stage	of	the	research	process	is	to	analyse	the	data	you	have	collected.
When	the	data	are	quantitative	this	involves	both	looking	at	your	data
graphically	(Chapter	5)	to	see	what	the	general	trends	in	the	data	are,	and	also
fitting	statistical	models	to	the	data	(all	other	chapters).	Given	that	the	rest	of	the
book	is	dedicated	to	this	process,	we’ll	begin	here	by	looking	at	a	few	fairly
basic	ways	to	look	at	and	summarize	the	data	you	have	collected.
1.8.1	Frequency	distributions	
Once	you’ve	collected	some	data	a	very	useful	thing	to	do	is	to	plot	a	graph	of
how	many	times	each	score	occurs.	This	is	known	as	a	frequency	distribution,
or	histogram,	which	is	a	graph	plotting	values	of	observations	on	the	horizontal
axis,	with	a	bar	showing	how	many	times	each	value	occurred	in	the	data	set.
Frequency	distributions	can	be	very	useful	for	assessing	properties	of	the
distribution	of	scores.	We	will	find	out	how	to	create	these	types	of	charts	in
Chapter	5.
Frequency	distributions	come	in	many	different	shapes	and	sizes.	It	is	quite
important,	therefore,	to	have	some	general	descriptions	for	common	types	of
distributions.	In	an	ideal	world	our	data	would	be	distributed	symmetrically
around	the	centre	of	all	scores.	As	such,	if	we	drew	a	vertical	line	through	the
centre	of	the	distribution	then	it	should	look	the	same	on	both	sides.	This	is
known	as	a	normal	distribution	and	is	characterized	by	the	bell-shaped	curve
with	which	you	might	already	be	familiar.	This	shape	implies	that	the	majority	of
scores	lie	around	the	centre	of	the	distribution	(so	the	largest	bars	on	the
histogram	are	around	the	central	value).	Also,	as	we	get	further	away	from	the
centre,	the	bars	get	smaller,	implying	that	as	scores	start	to	deviate	from	the
centre	their	frequency	is	decreasing.	As	we	move	still	further	away	from	the
centre	our	scores	become	very	infrequent	(the	bars	are	very	short).	Many
naturally	occurring	things	have	this	shape	of	distribution.	For	example,	most
men	in	the	UK	are	around	175	cm	tall;16	some	are	a	bit	taller	or	shorter,	but	most
cluster	around	this	value.	There	will	be	very	few	men	who	are	really	tall	(i.e.,
above	205	cm)	or	really	short	(i.e.,	under	145	cm).	An	example	of	a	normal
distribution	is	shown	in	Figure	1.3.
16	I	am	exactly	180	cm	tall.	In	my	home	country	this	makes	me	smugly	above
average.	However,	I	often	visit	the	Netherlands,	where	the	average	male	height
is	185	cm	(a	little	over	6ft,	and	a	massive	10	cm	higher	than	the	UK),	and	where
I	feel	like	a	bit	of	a	dwarf.
Figure	1.3	A	‘normal’	distribution	(the	curve	shows	the	idealized	shape)
Figure	1.4	A	positively	(left)	and	negatively	(right)	skewed	distribution
There	are	two	main	ways	in	which	a	distribution	can	deviate	from	normal:	(1)
lack	of	symmetry	(called	skew)	and	(2)	pointyness	(called	kurtosis).	Skewed
distributions	are	not	symmetrical	and	instead	the	most	frequent	scores	(the	tall
bars	on	the	graph)	are	clustered	at	one	end	of	the	scale.	So,	the	typical	pattern	is
a	cluster	of	frequent	scores	at	one	end	of	the	scale	and	the	frequency	of	scores
tailing	off	towards	the	other	end	of	the	scale.	A	skewed	distribution	can	be	either
positively	skewed	(the	frequent	scores	are	clustered	at	the	lower	end	and	the	tail
points	towards	the	higher	or	more	positive	scores)	or	negatively	skewed	(the
frequent	scores	are	clustered	at	the	higher	end	and	the	tail	points	towards	the
lower	or	more	negative	scores).	Figure	1.4	shows	examples	of	these
distributions.
Distributions	also	vary	in	their	kurtosis.	Kurtosis,	despite	sounding	like	some
kind	of	exotic	disease,	refers	to	the	degree	to	which	scores	cluster	at	the	ends	of
the	distribution	(known	as	the	tails)	and	this	tends	to	express	itself	in	how	pointy
a	distribution	is	(but	there	are	other	factors	that	can	affect	how	pointy	the
distribution	looks	–	see	Jane	Superbrain	Box	1.5).	A	distribution	with	positive
kurtosis	has	many	scores	in	the	tails	(a	so-called	heavy-tailed	distribution)	and	is
pointy.	This	is	known	as	a	leptokurtic	distribution.	In	contrast,	a	distribution
with	negative	kurtosis	is	relatively	thin	in	the	tails	(has	light	tails)	and	tends	to
be	flatter	than	normal.	This	distribution	is	called	platykurtic.	Ideally,	we	want
our	data	to	be	normally	distributed	(i.e.,	not	too	skewed,	and	not	too	many	or	too
few	scores	at	the	extremes).	For	everything	there	is	to	know	about	kurtosis,	read
DeCarlo	(1997).
Figure	1.5	Distributions	with	positive	kurtosis	(leptokurtic,	left)	and	negative
kurtosis	(platykurtic,	right)
In	a	normal	distribution	the	values	of	skew	and	kurtosis	are	0	(i.e.,	the	tails	of	the
distribution	are	as	they	should	be).17	If	a	distribution	has	values	of	skew	or
kurtosis	above	or	below	0	then	this	indicates	a	deviation	from	normal:	Figure	1.5
shows	distributions	with	kurtosis	values	of	+2.6	(left	panel)	and	−0.09	(right
panel).
17	Sometimes	no	kurtosis	is	expressed	as	3	rather	than	0,	but	SPSS	uses	0	to
denote	no	excess	kurtosis.
1.8.2	The	mode	
We	can	calculate	where	the	centre	of	a	frequency	distribution	lies	(known	as	the
central	tendency)	using	three	measures	commonly	used:	the	mean,	the	mode
and	the	median.	Other	methods	exist,	but	these	three	are	the	ones	you’re	most
likely	to	come	across.
The	mode	is	the	score	that	occurs	most	frequently	in	the	data	set.	This	is	easy	to
spot	in	a	frequency	distribution	because	it	will	be	the	tallest	bar.	To	calculate	the
mode,	place	the	data	in	ascending	order	(to	make	life	easier),	count	how	many
times	each	score	occurs,	and	the	score	that	occurs	the	most	is	the	mode.	One
problem	with	the	mode	is	that	it	can	take	on	several	values.	For	example,	Figure
1.6	shows	an	example	of	a	distribution	with	two	modes	(there	are	two	bars	that
are	the	highest),	which	is	said	to	be	bimodal,	and	three	modes	(data	sets	with
more	than	two	modes	are	multimodal).	Also,	if	the	frequencies	of	certain	scores
are	very	similar,	then	the	mode	can	be	influenced	by	only	a	small	number	of
cases.
Figure	1.6	Examples	of	bimodal	(left)	and	multimodal	(right)	distributions
1.8.3	The	median	
Another	way	to	quantify	the	centre	of	a	distribution	is	to	look	for	the	middle
score	when	scores	are	ranked	in	order	of	magnitude.	This	is	called	the	median.
Imagine	we	looked	at	the	number	of	friends	that	11	users	of	the	social
networking	website	Facebook	had.	Figure	1.7	shows	the	number	of	friends	for
each	of	the	11	users:	57,	40,	103,	234,	93,	53,	116,	98,	108,	121,	22.
To	calculate	the	median,	we	first	arrange	these	scores	into	ascending	order:	22,
40,	53,	57,	93,	98,	103,	108,	116,	121,	234.
Next,	we	find	the	position	of	the	middle	score	by	counting	the	number	of	scores
we	have	collected	(n),	adding	1	to	this	value,	and	then	dividing	by	2.	With	11
scores,	this	gives	us	(n	+	1)/2	=	(11	+	1)/2	=	12/2	=	6.	Then,	we	find	the	score
that	is	positioned	at	the	location	we	have	just	calculated.	So,	in	this	example,	we
find	the	sixth	score	(see	Figure	1.7).
This	process	works	very	nicely	when	we	have	an	odd	number	of	scores	(as	in
this	example),	but	when	we	have	an	even	number	of	scores	there	won’t	be	a
middle	value.	Let’s	imagine	that	we	decided	that	because	the	highest	score	was
so	big	(almost	twice	as	large	as	the	next	biggest	number),	we	would	ignore	it.
(For	one	thing,	this	person	is	far	too	popular	and	we	hate	them.)	We	have	only
10	scores	now.	Figure	1.8	shows	this	situation.	As	before,	we	rank-order	these
scores:	22,	40,	53,	57,	93,	98,	103,	108,	116,	121.	We	then	calculate	the	position
of	the	middle	score,	but	this	time	it	is	(n	+	1)/2	=	11/2	=	5.5,	which	means	that
the	median	is	halfway	between	the	fifth	and	sixth	scores.	To	get	the	median	we
add	these	two	scores	and	divide	by	2.	In	this	example,	the	fifth	score	in	the
ordered	list	was	93	and	the	sixth	score	was	98.	We	add	these	together	(93	+	98	=
191)	and	then	divide	this	value	by	2	(191/2	=	95.5).	The	median	number	of
friends	was,	therefore,	95.5.
Figure	1.7	The	median	is	simply	the	middle	score	when	you	order	the	data
Figure	1.8	When	the	data	contain	an	even	number	of	scores,	the	median	is	the
average	of	the	middle	two	values
The	median	is	relatively	unaffected	by	extreme	scores	at	either	end	of	the
distribution:	the	median	changed	only	from	98	to	95.5	when	we	removed	the
extreme	score	of	234.	The	median	is	also	relatively	unaffected	by	skewed
distributions	and	can	be	used	with	ordinal,	interval	and	ratio	data	(it	cannot,
however,	be	used	with	nominal	data	because	these	data	have	no	numerical
order).
1.8.4	The	mean	
The	mean	is	the	measure	of	central	tendency	that	you	are	most	likely	to	have
heard	of	because	it	is	the	average	score,	and	the	media	love	an	average	score.18
To	calculate	the	mean	we	add	up	all	of	the	scores	and	then	divide	by	the	total
number	of	scores	we	have.	We	can	write	this	in	equation	form	as:
18	I	wrote	this	on	15	February,	and	to	prove	my	point,	the	BBC	website	ran	a
headline	today	about	how	PayPal	estimates	that	Britons	will	spend	an	average	of
£71.25	each	on	Valentine’s	Day	gifts.	However,	uSwitch.com	said	that	the
average	spend	would	be	only	£22.69.	Always	remember	that	the	media	is	full	of
lies	and	contradictions.
This	equation	may	look	complicated,	but	the	top	half	simply	means	‘add	up	all
of	the	scores’	(the	xi	means	‘the	score	of	a	particular	person’;	we	could	replace
the	letter	i	with	each	person’s	name	instead),	and	the	bottom	bit	means,	‘divide
this	total	by	the	number	of	scores	you	have	got	(n)’.	Let’s	calculate	the	mean	for
the	Facebook	data.	First,	we	add	up	all	the	scores:
We	then	divide	by	the	number	of	scores	(in	this	case	11)	as	in	equation	(1.3):
The	mean	is	95	friends,	which	is	not	a	value	we	observed	in	our	actual	data.	In
this	sense	the	mean	is	a	statistical	model	–	more	on	this	in	the	next	chapter.
Compute	the	mean	but	excluding	the	score	of	234.
If	you	calculate	the	mean	without	our	most	popular	person	(i.e.,	excluding	the
value	234),	the	mean	drops	to	81.1	friends.	This	reduction	illustrates	one
disadvantage	of	the	mean:	it	can	be	influenced	by	extreme	scores.	In	this	case,
the	person	with	234	friends	on	Facebook	increased	the	mean	by	about	14
friends;	compare	this	difference	with	that	of	the	median.	Remember	that	the
median	changed	very	little	−	from	98	to	95.5	−	when	we	excluded	the	score	of
234,	which	illustrates	how	the	median	is	typically	less	affected	by	extreme
scores	than	the	mean.	While	we’re	being	negative	about	the	mean,	it	is	also
affected	by	skewed	distributions	and	can	be	used	only	with	interval	or	ratio	data.
If	the	mean	is	so	lousy	then	why	do	we	use	it	so	often?	One	very	important
reason	is	that	it	uses	every	score	(the	mode	and	median	ignore	most	of	the	scores
in	a	data	set).	Also,	the	mean	tends	to	be	stable	in	different	samples	(more	on
that	later	too).
Cramming	Sam’s	Tips	Central	tendency
The	mean	is	the	sum	of	all	scores	divided	by	the	number	of
scores.	The	value	of	the	mean	can	be	influenced	quite	heavily
by	extreme	scores.
The	median	is	the	middle	score	when	the	scores	are	placed	in
ascending	order.	It	is	not	as	influenced	by	extreme	scores	as	the
mean.
The	mode	is	the	score	that	occurs	most	frequently.
1.8.5	The	dispersion	in	a	distribution	
It	can	also	be	interesting	to	quantify	the	spread,	or	dispersion,	of	scores.	The
easiest	way	to	look	at	dispersion	is	to	take	the	largest	score	and	subtract	from	it
the	smallest	score.	This	is	known	as	the	range	of	scores.	For	our	Facebook	data
we	saw	that	if	we	order	the	scores	we	get	22,	40,	53,	57,	93,	98,	103,	108,	116,
121,	234.	The	highest	score	is	234	and	the	lowest	is	22;	therefore,	the	range	is
234−22	=	212.	One	problem	with	the	range	is	that	because	it	uses	only	the
highest	and	lowest	score,	it	is	affected	dramatically	by	extreme	scores.
Compute	the	range	but	excluding	the	score	of	234.
If	you	have	done	the	self-test	task	you’ll	see	that	without	the	extreme	score	the
range	drops	from	212	to	99	–	less	than	half	the	size.
One	way	around	this	problem	is	to	calculate	the	range	but	excluding	values	at
the	extremes	of	the	distribution.	One	convention	is	to	cut	off	the	top	and	bottom
25%	of	scores	and	calculate	the	range	of	the	middle	50%	of	scores	–	known	as
the	interquartile	range.	Let’s	do	this	with	the	Facebook	data.	First,	we	need	to
calculate	what	are	called	quartiles.	Quartiles	are	the	three	values	that	split	the
sorted	data	into	four	equal	parts.	First	we	calculate	the	median,	which	is	also
called	the	second	quartile,	which	splits	our	data	into	two	equal	parts.	We	already
know	that	the	median	for	these	data	is	98.	The	lower	quartile	is	the	median	of
the	lower	half	of	the	data	and	the	upper	quartile	is	the	median	of	the	upper	half
of	the	data.	As	a	rule	of	thumb	the	median	is	not	included	in	the	two	halves	when
they	are	split	(this	is	convenient	if	you	have	an	odd	number	of	values),	but	you
can	include	it	(although	which	half	you	put	it	in	is	another	question).	Figure	1.9
shows	how	we	would	calculate	these	values	for	the	Facebook	data.	Like	the
median,	if	each	half	of	the	data	had	an	even	number	of	values	in	it,	then	the
upper	and	lower	quartiles	would	be	the	average	of	two	values	in	the	data	set
(therefore,	the	upper	and	lower	quartile	need	not	be	values	that	actually	appear	in
the	data).	Once	we	have	worked	out	the	values	of	the	quartiles,	we	can	calculate
the	interquartile	range,	which	is	the	difference	between	the	upper	and	lower
quartile.	For	the	Facebook	data	this	value	would	be	116−53	=	63.	The	advantage
of	the	interquartile	range	is	that	it	isn’t	affected	by	extreme	scores	at	either	end
of	the	distribution.	However,	the	problem	with	it	is	that	you	lose	a	lot	of	data
(half	of	it,	in	fact).
It’s	worth	noting	here	that	quartiles	are	special	cases	of	things	called	quantiles.
Quantiles	are	values	that	split	a	data	set	into	equal	portions.	Quartiles	are
quantiles	that	split	the	data	into	four	equal	parts,	but	there	are	other	quantiles
such	as	percentiles	(points	that	split	the	data	into	100	equal	parts),	noniles
(points	that	split	the	data	into	nine	equal	parts)	and	so	on.
Figure	1.9	Calculating	quartiles	and	the	interquartile	range
Twenty-one	heavy	smokers	were	put	on	a	treadmill	at	the	fastest
setting.	The	time	in	seconds	was	measured	until	they	fell	off	from
exhaustion:
18,	16,	18,	24,	23,	22,	22,	23,	26,	29,	32,	34,	34,	36,	36,	43,	42,	49,
46,	46,	57
Compute	the	mode,	median,	mean,	upper	and	lower	quartiles,	range
and	interquartile	range.
If	we	want	to	use	all	the	data	rather	than	half	of	it,	we	can	calculate	the	spread	of
scores	by	looking	at	how	different	each	score	is	from	the	centre	of	the
distribution.	If	we	use	the	mean	as	a	measure	of	the	centre	of	a	distribution,	then
we	can	calculate	the	difference	between	each	score	and	the	mean,	which	is
known	as	the	deviance	(Eq.	1.4):
If	we	want	to	know	the	total	deviance	then	we	could	add	up	the	deviances	for
each	data	point.	In	equation	form,	this	would	be:
The	sigma	symbol	(∑)	means	‘add	up	all	of	what	comes	after’,	and	the	‘what
comes	after’	in	this	case	is	the	deviances.	So,	this	equation	simply	means	‘add	up
all	of	the	deviances’.
Let’s	try	this	with	the	Facebook	data.	Table	1.2	shows	the	number	of	friends	for
each	person	in	the	Facebook	data,	the	mean,	and	the	difference	between	the	two.
Note	that	because	the	mean	is	at	the	centre	of	the	distribution,	some	of	the
deviations	are	positive	(scores	greater	than	the	mean)	and	some	are	negative
(scores	smaller	than	the	mean).	Consequently,	when	we	add	the	scores	up,	the
total	is	zero.	Therefore,	the	‘total	spread’	is	nothing.	This	conclusion	is	as	silly	as
a	tapeworm	thinking	they	can	have	a	coffee	with	the	Queen	of	England	if	they
don	a	bowler	hat	and	pretend	to	be	human.	Everyone	knows	that	the	Queen
drinks	tea.
To	overcome	this	problem,	we	could	ignore	the	minus	signs	when	we	add	the
deviations	up.	There’s	nothing	wrong	with	doing	this,	but	people	tend	to	square
the	deviations,	which	has	a	similar	effect	(because	a	negative	number	multiplied
by	another	negative	number	becomes	positive).	The	final	column	of	Table	1.2
shows	these	squared	deviances.	We	can	add	these	squared	deviances	up	to	get
the	sum	of	squared	errors,	SS	(often	just	called	the	sum	of	squares);	unless
your	scores	are	all	exactly	the	same,	the	resulting	value	will	be	bigger	than	zero,
indicating	that	there	is	some	deviance	from	the	mean.	As	an	equation,	we	would
write:	equation	(1.6),	in	which	the	sigma	symbol	means	‘add	up	all	of	the	things
that	follow’	and	what	follows	is	the	squared	deviances	(or	squared	errors	as
they’re	more	commonly	known):
We	can	use	the	sum	of	squares	as	an	indicator	of	the	total	dispersion,	or	total
deviance	of	scores	from	the	mean.	The	problem	with	using	the	total	is	that	its
size	will	depend	on	how	many	scores	we	have	in	the	data.	The	sum	of	squares
for	the	Facebook	data	is	32,246,	but	if	we	added	another	11	scores	that	value
would	increase	(other	things	being	equal,	it	will	more	or	less	double	in	size).	The
total	dispersion	is	a	bit	of	a	nuisance	then	because	we	can’t	compare	it	across
samples	that	differ	in	size.	Therefore,	it	can	be	useful	to	work	not	with	the	total
dispersion,	but	the	average	dispersion,	which	is	also	known	as	the	variance.	We
have	seen	that	an	average	is	the	total	of	scores	divided	by	the	number	of	scores,
therefore,	the	variance	is	simply	the	sum	of	squares	divided	by	the	number	of
observations	(N).	Actually,	we	normally	divide	the	SS	by	the	number	of
observations	minus	1	as	in	equation	(1.7)	(the	reason	why	is	explained	in	the
next	chapter	and	Jane	Superbrain	Box	2.2):
As	we	have	seen,	the	variance	is	the	average	error	between	the	mean	and	the
observations	made.	There	is	one	problem	with	the	variance	as	a	measure:	it	gives
us	a	measure	in	units	squared	(because	we	squared	each	error	in	the	calculation).
In	our	example	we	would	have	to	say	that	the	average	error	in	our	data	was
3224.6	friends	squared.	It	makes	very	little	sense	to	talk	about	friends	squared,
so	we	often	take	the	square	root	of	the	variance	(which	ensures	that	the	measure
of	average	error	is	in	the	same	units	as	the	original	measure).	This	measure	is
known	as	the	standard	deviation	and	is	the	square	root	of	the	variance	(Eq.
1.8).
The	sum	of	squares,	variance	and	standard	deviation	are	all	measures	of	the
dispersion	or	spread	of	data	around	the	mean.	A	small	standard	deviation
(relative	to	the	value	of	the	mean	itself)	indicates	that	the	data	points	are	close	to
the	mean.	A	large	standard	deviation	(relative	to	the	mean)	indicates	that	the	data
points	are	distant	from	the	mean.	A	standard	deviation	of	0	would	mean	that	all
the	scores	were	the	same.	Figure	1.10	shows	the	overall	ratings	(on	a	5-point
scale)	of	two	lecturers	after	each	of	five	different	lectures.	Both	lecturers	had	an
average	rating	of	2.6	out	of	5	across	the	lectures.	However,	the	first	lecturer	had
a	standard	deviation	of	0.55	(relatively	small	compared	to	the	mean).	It	should
be	clear	from	the	left-hand	graph	that	ratings	for	this	lecturer	were	consistently
close	to	the	mean	rating.	There	was	a	small	fluctuation,	but	generally	her
lectures	did	not	vary	in	popularity.	Put	another	way,	the	scores	are	not	spread	too
widely	around	the	mean.	The	second	lecturer,	however,	had	a	standard	deviation
of	1.82	(relatively	high	compared	to	the	mean).	The	ratings	for	this	second
lecturer	are	more	spread	from	the	mean	than	the	first:	for	some	lectures	she
received	very	high	ratings,	and	for	others	her	ratings	were	appalling.
Figure	1.10	Graphs	illustrating	data	that	have	the	same	mean	but	different
standard	deviations
1.8.6	Using	a	frequency	distribution	to	go	beyond	the
data	
Another	way	to	think	about	frequency	distributions	is	not	in	terms	of	how	often
scores	actually	occurred,	but	how	likely	it	is	that	a	score	would	occur	(i.e.,
probability).	The	word	‘probability’	causes	most	people’s	brains	to	overheat
(myself	included)	so	it	seems	fitting	that	we	use	an	example	about	throwing
buckets	of	ice	over	our	heads.	Internet	memes	tend	to	follow	the	shape	of	a
normal	distribution,	which	we	discussed	a	while	back.	A	good	example	of	this	is
the	ice	bucket	challenge	from	2014.	You	can	check	Wikipedia	for	the	full	story,
but	it	all	started	(arguably)	with	golfer	Chris	Kennedy	tipping	a	bucket	of	iced
water	on	his	head	to	raise	awareness	of	the	disease	amyotrophic	lateral	sclerosis
(ALS,	also	known	as	Lou	Gehrig’s	disease).19	The	idea	is	that	you	are
challenged	and	have	24	hours	to	post	a	video	of	you	having	a	bucket	of	iced
water	poured	over	your	head;	in	this	video	you	also	challenge	at	least	three	other
people.	If	you	fail	to	complete	the	challenge	your	forfeit	is	to	donate	to	charity
(in	this	case,	ALS).	In	reality	many	people	completed	the	challenge	and	made
donations.
19	Chris	Kennedy	did	not	invent	the	challenge,	but	he’s	believed	to	be	the	first	to
link	it	to	ALS.	There	are	earlier	reports	of	people	doing	things	with	ice-cold
water	in	the	name	of	charity,	but	I’m	focusing	on	the	ALS	challenge	because	it	is
the	one	that	spread	as	a	meme.
Jane	Superbrain	1.5	The	standard	deviation	and	the	shape	of	the
distribution	
The	variance	and	standard	deviation	tell	us	about	the	shape	of	the
distribution	of	scores.	If	the	mean	represents	the	data	well	then	most
of	the	scores	will	cluster	close	to	the	mean	and	the	resulting	standard
deviation	is	small	relative	to	the	mean.	When	the	mean	is	a	worse
representation	of	the	data,	the	scores	cluster	more	widely	around	the
mean	and	the	standard	deviation	is	larger.	Figure	1.11	shows	two
distributions	that	have	the	same	mean	(50)	but	different	standard
deviations.	One	has	a	large	standard	deviation	relative	to	the	mean
(SD	=	25)	and	this	results	in	a	flatter	distribution	that	is	more	spread
out,	whereas	the	other	has	a	small	standard	deviation	relative	to	the
mean	(SD	=	15)	resulting	in	a	pointier	distribution	in	which	scores
close	to	the	mean	are	very	frequent	but	scores	further	from	the	mean
become	increasingly	infrequent.	The	message	is	that	as	the	standard
deviation	gets	larger,	the	distribution	gets	fatter.	This	can	make
distributions	look	platykurtic	or	leptokurtic	when,	in	fact,	they	are
not.
Figure	1.11	Two	distributions	with	the	same	mean,	but	large	and
small	standard	deviations
The	ice	bucket	challenge	is	a	good	example	of	a	meme:	it	ended	up	generating
something	like	2.4	million	videos	on	Facebook	and	2.3	million	on	YouTube.	I
mentioned	that	memes	often	follow	a	normal	distribution,	and	Figure	1.12	shows
this:	the	insert	shows	the	‘interest’	score	from	Google	Trends	for	the	phrase	‘ice
bucket	challenge’	from	August	to	September	2014.20	The	‘interest’	score	that
Google	calculates	is	a	bit	hard	to	unpick	but	essentially	reflects	the	relative
number	of	times	that	the	term	‘ice	bucket	challenge’	was	searched	for	on
Google.	It’s	not	the	total	number	of	searches,	but	the	relative	number.	In	a	sense
it	shows	the	trend	of	the	popularity	of	searching	for	‘ice	bucket	challenge’.
Compare	the	line	with	the	perfect	normal	distribution	in	Figure	1.3	−	they	look
fairly	similar,	don’t	they?	Once	it	got	going	(about	2–3	weeks	after	the	first
video)	it	went	viral,	and	popularity	increased	rapidly,	reaching	a	peak	at	around
21	August	(about	36	days	after	Chris	Kennedy	got	the	ball	rolling).	After	this
peak,	popularity	rapidly	declines	as	people	tire	of	the	meme.
20	You	can	generate	the	insert	graph	for	yourself	by	going	to	Google	Trends,
entering	the	search	term	‘ice	bucket	challenge’	and	restricting	the	dates	shown	to
August	2014	to	September	2014.
Labcoat	Leni’s	Real	Research	1.1	Is	Friday	13th	unlucky?	
Scanlon,	T.	J.,	et	al.	(1993).	British	Medical	Journal,	307,	1584–
1586.
Many	of	us	are	superstitious,	and	a	common	superstition	is	that
Friday	the	13th	is	unlucky.	Most	of	us	don’t	literally	think	that
someone	in	a	hockey	mask	is	going	to	kill	us,	but	some	people	are
wary.	Scanlon	and	colleagues,	in	a	tongue-in-cheek	study	(Scanlon,
Luben,	Scanlon,	&	Singleton,	1993),	looked	at	accident	statistics	at
hospitals	in	the	south-west	Thames	region	of	the	UK.	They	took
statistics	both	for	Friday	the	13th	and	Friday	the	6th	(the	week
before)	in	different	months	in	1989,	1990,	1991	and	1992.	They
looked	at	both	emergency	admissions	of	accidents	and	poisoning,
and	also	transport	accidents.
Calculate	the	mean,	median,	standard	deviation	and	interquartile
range	for	each	type	of	accident	and	on	each	date.	Answers	are	on	the
companion	website.
Cramming	Sam’s	Tips	Dispersion
The	deviance	or	error	is	the	distance	of	each	score	from	the
mean.
The	sum	of	squared	errors	is	the	total	amount	of	error	in	the
mean.	The	errors/deviances	are	squared	before	adding	them	up.
The	variance	is	the	average	distance	of	scores	from	the	mean.	It
is	the	sum	of	squares	divided	by	the	number	of	scores.	It	tells	us
about	how	widely	dispersed	scores	are	around	the	mean.
The	standard	deviation	is	the	square	root	of	the	variance.	It	is
the	variance	converted	back	to	the	original	units	of	measurement
of	the	scores	used	to	compute	it.	Large	standard	deviations
relative	to	the	mean	suggest	data	are	widely	spread	around	the
mean,	whereas	small	standard	deviations	suggest	data	are
closely	packed	around	the	mean.
The	range	is	the	distance	between	the	highest	and	lowest	score.
The	interquartile	range	is	the	range	of	the	middle	50%	of	the
scores.
The	main	histogram	in	Figure	1.12	shows	the	same	pattern	but	reflects
something	a	bit	more	tangible	than	‘interest	scores’.	It	shows	the	number	of
videos	posted	on	YouTube	relating	to	the	ice	bucket	challenge	on	each	day	after
Chris	Kennedy’s	initial	challenge.	There	were	2323	thousand	in	total	(2.32
million)	during	the	period	shown.	In	a	sense	it	shows	approximately	how	many
people	took	up	the	challenge	each	day.21	You	can	see	that	nothing	much
happened	for	20	days,	and	early	on	relatively	few	people	took	up	the	challenge.
By	about	30	days	after	the	initial	challenge	things	are	hotting	up	(well,	cooling
down,	really)	as	the	number	of	videos	rapidly	accelerated	from	29,000	on	day	30
to	196,000	on	day	35.	At	day	36,	the	challenge	hits	its	peak	(204,000	videos
posted)	after	which	the	decline	sets	in	as	it	becomes	‘yesterday’s	news’.	By	day
50	it’s	only	the	type	of	people	like	me,	and	statistics	lectures	more	generally,
who	don’t	check	Facebook	for	50	days,	who	suddenly	become	aware	of	the
meme	and	want	to	get	in	on	the	action	to	prove	how	down	with	the	kids	we	are.
It’s	too	late,	though:	people	at	that	end	of	the	curve	are	uncool,	and	the
trendsetters	who	posted	videos	on	day	25	call	us	lame	and	look	at	us
dismissively.	It’s	OK	though,	because	we	can	plot	sick	histograms	like	the	one	in
Figure	1.12;	take	that,	hipster	scum!
21	Very	very	approximately	indeed.	I	have	converted	the	Google	interest	data
into	videos	posted	on	YouTube	by	using	the	fact	that	I	know	that	2.33	million
videos	were	posted	during	this	period	and	by	making	the	(not	unreasonable)
assumption	that	behaviour	on	YouTube	will	have	followed	the	same	pattern	over
time	as	the	Google	interest	score	for	the	challenge.
Figure	1.12	Frequency	distribution	showing	the	number	of	ice	bucket	challenge
videos	on	YouTube	by	day	since	the	first	video	(the	insert	shows	the	actual
Google	Trends	data	on	which	this	example	is	based)
I	digress.	We	can	think	of	frequency	distributions	in	terms	of	probability.	To
explain	this,	imagine	that	someone	asked	you	‘How	likely	is	it	that	a	person
posted	an	ice	bucket	video	after	60	days?’	What	would	your	answer	be?
Remember	that	the	height	of	the	bars	on	the	histogram	reflects	how	many	videos
were	posted.	Therefore,	if	you	looked	at	the	frequency	distribution	before
answering	the	question	you	might	respond	‘not	very	likely’	because	the	bars	are
very	short	after	60	days	(i.e.,	relatively	few	videos	were	posted).	What	if
someone	asked	you	‘How	likely	is	it	that	a	video	was	posted	35	days	after	the
challenge	started?’	Using	the	histogram,	you	might	say	‘It’s	relatively	likely’
because	the	bar	is	very	high	on	day	35	(so	quite	a	few	videos	were	posted).	Your
inquisitive	friend	is	on	a	roll	and	asks	‘How	likely	is	it	that	someone	posted	a
video	35	to	40	days	after	the	challenge	started?’	The	bars	representing	these	days
are	shaded	orange	in	Figure	1.12.	The	question	about	the	likelihood	of	a	video
being	posted	35-40	days	into	the	challenge	is	really	asking	‘How	big	is	the
orange	area	of	Figure	1.12	compared	to	the	total	size	of	all	bars?’	We	can	find
out	the	size	of	the	dark	blue	region	by	adding	the	values	of	the	bars	(196	+	204	+
196	+	174	+	164	+	141	=	1075);	therefore,	the	orange	area	represents	1075
thousand	videos.	The	total	size	of	all	bars	is	the	total	number	of	videos	posted
(i.e.,	2323	thousand).	If	the	orange	area	represents	1075	thousand	videos,	and	the
total	area	represents	2323	thousand	videos,	then	if	we	compare	the	orange	area
to	the	total	area	we	get	1075/2323	=	0.46.	This	proportion	can	be	converted	to	a
percentage	by	multiplying	by	100,	which	gives	us	46%.	Therefore,	our	answer
might	be	‘It’s	quite	likely	that	someone	posted	a	video	35-40	days	into	the
challenge	because	46%	of	all	videos	were	posted	during	those	6	days’.	A	very
important	point	here	is	that	the	size	of	the	bars	relates	directly	to	the	probability
of	an	event	occurring.
Hopefully	these	illustrations	show	that	we	can	use	the	frequencies	of	different
scores,	and	the	area	of	a	frequency	distribution,	to	estimate	the	probability	that	a
particular	score	will	occur.	A	probability	value	can	range	from	0	(there’s	no
chance	whatsoever	of	the	event	happening)	to	1	(the	event	will	definitely
happen).	So,	for	example,	when	I	talk	to	my	publishers	I	tell	them	there’s	a
probability	of	1	that	I	will	have	completed	the	revisions	to	this	book	by	July.
However,	when	I	talk	to	anyone	else,	I	might,	more	realistically,	tell	them	that
there’s	a	0.10	probability	of	me	finishing	the	revisions	on	time	(or	put	another
way,	a	10%	chance,	or	1	in	10	chance	that	I’ll	complete	the	book	in	time).	In
reality,	the	probability	of	my	meeting	the	deadline	is	0	(not	a	chance	in	hell).	If
probabilities	don’t	make	sense	to	you	then	you’re	not	alone;	just	ignore	the
decimal	point	and	think	of	them	as	percentages	instead	(i.e.,	a	0.10	probability
that	something	will	happen	is	a	10%	chance	that	something	will	happen)	or	read
the	chapter	on	probability	in	my	other	excellent	textbook	(Field,	2016).
Figure	1.13	The	normal	probability	distribution
I’ve	talked	in	vague	terms	about	how	frequency	distributions	can	be	used	to	get	a
rough	idea	of	the	probability	of	a	score	occurring.	However,	we	can	be	precise.
For	any	distribution	of	scores	we	could,	in	theory,	calculate	the	probability	of
obtaining	a	score	of	a	certain	size	–	it	would	be	incredibly	tedious	and	complex
to	do	it,	but	we	could.	To	spare	our	sanity,	statisticians	have	identified	several
common	distributions.	For	each	one	they	have	worked	out	mathematical
formulae	(known	as	probability	density	functions,	PDF)	that	specify	idealized
versions	of	these	distributions.	We	could	draw	such	a	function	by	plotting	the
value	of	the	variable	(x)	against	the	probability	of	it	occurring	(y).22	The
resulting	curve	is	known	as	a	probability	distribution;	for	a	normal	distribution
(Section	1.8.1)	it	would	look	like	Figure	1.13,	which	has	the	characteristic	bell
shape	that	we	saw	already	in	Figure	1.3.
22	Actually	we	usually	plot	something	called	the	density,	which	is	closely
related	to	the	probability.
A	probability	distribution	is	just	like	a	histogram	except	that	the	lumps	and
bumps	have	been	smoothed	out	so	that	we	see	a	nice	smooth	curve.	However,
like	a	frequency	distribution,	the	area	under	this	curve	tells	us	something	about
the	probability	of	a	value	occurring.	Just	like	we	did	in	our	ice	bucket	example,
we	could	use	the	area	under	the	curve	between	two	values	to	tell	us	how	likely	it
is	that	a	score	fell	within	a	particular	range.	For	example,	the	blue	shaded	region
in	Figure	1.13	corresponds	to	the	probability	of	a	score	being	z	or	greater.	The
normal	distribution	is	not	the	only	distribution	that	has	been	precisely	specified
by	people	with	enormous	brains.	There	are	many	distributions	that	have
characteristic	shapes	and	have	been	specified	with	a	probability	density	function.
We’ll	encounter	some	of	these	other	distributions	throughout	the	book,	for
example	the	t-distribution,	chi-square	(χ2)	distribution,	and	F-distribution.	For
now,	the	important	thing	to	remember	is	that	all	of	these	distributions	have
something	in	common:	they	are	all	defined	by	an	equation	that	enables	us	to
calculate	precisely	the	probability	of	obtaining	a	given	score.
As	we	have	seen,	distributions	can	have	different	means	and	standard	deviations.
This	isn’t	a	problem	for	the	probability	density	function	–	it	will	still	give	us	the
probability	of	a	given	value	occurring	–	but	it	is	a	problem	for	us	because
probability	density	functions	are	difficult	enough	to	spell,	let	alone	use	to
compute	probabilities.	Therefore,	to	avoid	a	brain	meltdown	we	often	use	a
normal	distribution	with	a	mean	of	0	and	a	standard	deviation	of	1	as	a	standard.
This	has	the	advantage	that	we	can	pretend	that	the	probability	density	function
doesn’t	exist	and	use	tabulated	probabilities	(as	in	the	Appendix)	instead.	The
obvious	problem	is	that	not	all	of	the	data	we	collect	will	have	a	mean	of	0	and	a
standard	deviation	of	1.	For	example,	for	the	ice	bucket	data	the	mean	is	39.68
and	the	standard	deviation	is	7.74.	However,	any	data	set	can	be	converted	into	a
data	set	that	has	a	mean	of	0	and	a	standard	deviation	of	1.	First,	to	centre	the
data	around	zero,	we	take	each	score	(X)	and	subtract	from	it	the	mean	of	all
scores	( ).	To	ensure	the	data	have	a	standard	deviation	of	1,	we	divide	the
resulting	score	by	the	standard	deviation	(s),	which	we	recently	encountered.
The	resulting	scores	are	denoted	by	the	letter	z	and	are	known	as	z-scores.	In
equation	form,	the	conversion	that	I’ve	just	described	is:
The	table	of	probability	values	that	have	been	calculated	for	the	standard	normal
distribution	is	shown	in	the	Appendix.	Why	is	this	table	important?	Well,	if	we
look	at	our	ice	bucket	data,	we	can	answer	the	question	‘What’s	the	probability
that	someone	posted	a	video	on	day	60	or	later?’	First,	we	convert	60	into	a	z-
score.	We	saw	that	the	mean	was	39.68	and	the	standard	deviation	was	7.74,	so
our	score	of	60	expressed	as	a	z-score	is	2.63	(Eq.	1.10):
We	can	now	use	this	value,	rather	than	the	original	value	of	60,	to	compute	an
answer	to	our	question.
Figure	1.14	shows	(an	edited	version	of)	the	tabulated	values	of	the	standard
normal	distribution	from	the	Appendix	of	this	book.	This	table	gives	us	a	list	of
values	of	z,	and	the	density	(y)	for	each	value	of	z,	but,	most	important,	it	splits
the	distribution	at	the	value	of	z	and	tells	us	the	size	of	the	two	areas	under	the
curve	that	this	division	creates.	For	example,	when	z	is	0,	we	are	at	the	mean	or
centre	of	the	distribution	so	it	splits	the	area	under	the	curve	exactly	in	half.
Consequently,	both	areas	have	a	size	of	0.5	(or	50%).	However,	any	value	of	z
that	is	not	zero	will	create	different	sized	areas,	and	the	table	tells	us	the	size	of
the	larger	and	smaller	portions.	For	example,	if	we	look	up	our	z-score	of	2.63,
we	find	that	the	smaller	portion	(i.e.,	the	area	above	this	value,	or	the	blue	area	in
Figure	1.14)	is	0.0043,	or	only	0.43%.	I	explained	before	that	these	areas	relate
to	probabilities,	so	in	this	case	we	could	say	that	there	is	only	a	0.43%	chance
that	a	video	was	posted	60	days	or	more	after	the	challenge	started.	By	looking
at	the	larger	portion	(the	area	below	2.63)	we	get	0.9957,	or	put	another	way,
there’s	a	99.57%	chance	that	an	ice	bucket	video	was	posted	on	YouTube	within
60	days	of	the	challenge	starting.	Note	that	these	two	proportions	add	up	to	1	(or
100%),	so	the	total	area	under	the	curve	is	1.
Another	useful	thing	we	can	do	(you’ll	find	out	just	how	useful	in	due	course)	is
to	work	out	limits	within	which	a	certain	percentage	of	scores	fall.	With	our	ice
bucket	example,	we	looked	at	how	likely	it	was	that	a	video	was	posted	between
35	and	40	days	after	the	challenge	started;	we	could	ask	a	similar	question	such
as	‘What	is	the	range	of	days	between	which	the	middle	95%	of	videos	were
posted?’	To	answer	this	question	we	need	to	use	the	table	the	opposite	way
around.	We	know	that	the	total	area	under	the	curve	is	1	(or	100%),	so	to
discover	the	limits	within	which	95%	of	scores	fall	we’re	asking	‘What	is	the
value	of	z	that	cuts	off	5%	of	the	scores?’	It’s	not	quite	as	simple	as	that	because
if	we	want	the	middle	95%,	then	we	want	to	cut	off	scores	from	both	ends.	Given
the	distribution	is	symmetrical,	if	we	want	to	cut	off	5%	of	scores	overall	but	we
want	to	take	some	from	both	extremes	of	scores,	then	the	percentage	of	scores
we	want	to	cut	from	each	end	will	be	5%/2	=	2.5%	(or	0.025	as	a	proportion).	If
we	cut	off	2.5%	of	scores	from	each	end	then	in	total	we’ll	have	cut	off	5%
scores,	leaving	us	with	the	middle	95%	(or	0.95	as	a	proportion)	–	see	Figure
1.15.	To	find	out	what	value	of	z	cuts	off	the	top	area	of	0.025,	we	look	down	the
column	‘smaller	portion’	until	we	reach	0.025,	we	then	read	off	the
corresponding	value	of	z.	This	value	is	1.96	(see	Figure	1.14)	and	because	the
distribution	is	symmetrical	around	zero,	the	value	that	cuts	off	the	bottom	0.025
will	be	the	same	but	a	minus	value	(–1.96).	Therefore,	the	middle	95%	of	z-
scores	fall	between	−1.96	and	1.96.	If	we	wanted	to	know	the	limits	between
which	the	middle	99%	of	scores	would	fall,	we	could	do	the	same:	now	we
would	want	to	cut	off	1%	of	scores,	or	0.5%	from	each	end.	This	equates	to	a
proportion	of	0.005.	We	look	up	0.005	in	the	smaller	portion	part	of	the	table
and	the	nearest	value	we	find	is	0.00494,	which	equates	to	a	z-score	of	2.58	(see
Figure	1.14).	This	tells	us	that	99%	of	z-scores	lie	between	−2.58	and	2.58.
Similarly	(have	a	go),	you	can	show	that	99.9%	of	them	lie	between	−3.29	and
3.29.	Remember	these	values	(1.96,	2.58	and	3.29)	because	they’ll	crop	up	time
and	time	again.
Figure	1.14	Using	tabulated	values	of	the	standard	normal	distribution
Figure	1.15	The	probability	density	function	of	a	normal	distribution
Assuming	the	same	mean	and	standard	deviation	for	the	ice	bucket
example	above,	what’s	the	probability	that	someone	posted	a	video
within	the	first	30	days	of	the	challenge?
Cramming	Sam’s	Tips	Distributions	and	z-scores
A	frequency	distribution	can	be	either	a	table	or	a	chart	that
shows	each	possible	score	on	a	scale	of	measurement	along	with
the	number	of	times	that	score	occurred	in	the	data.
Scores	are	sometimes	expressed	in	a	standard	form	known	as	z-
scores.
To	transform	a	score	into	a	z-score	you	subtract	from	it	the	mean
of	all	scores	and	divide	the	result	by	the	standard	deviation	of	all
scores.
The	sign	of	the	z-score	tells	us	whether	the	original	score	was
above	or	below	the	mean;	the	value	of	the	z-score	tells	us	how
far	the	score	was	from	the	mean	in	standard	deviation	units.
1.8.7	Fitting	statistical	models	to	the	data	
Having	looked	at	your	data	(and	there	is	a	lot	more	information	on	different
ways	to	do	this	in	Chapter	5),	the	next	step	of	the	research	process	is	to	fit	a
statistical	model	to	the	data.	That	is	to	go	where	eagles	dare,	and	no	one	should
fly	where	eagles	dare;	but	to	become	scientists	we	have	to,	so	the	rest	of	this
book	attempts	to	guide	you	through	the	various	models	that	you	can	fit	to	the
data.
1.9	Reporting	data	
1.9.1	Dissemination	of	research	
Having	established	a	theory	and	collected	and	started	to	summarize	data,	you
might	want	to	tell	other	people	what	you	have	found.	This	sharing	of	information
is	a	fundamental	part	of	being	a	scientist.	As	discoverers	of	knowledge,	we	have
a	duty	of	care	to	the	world	to	present	what	we	find	in	a	clear	and	unambiguous
way,	and	with	enough	information	that	others	can	challenge	our	conclusions.	It	is
good	practice,	for	example,	to	make	your	data	available	to	others	and	to	be	open
with	the	resources	you	used.	Initiatives	such	as	the	Open	Science	Framework
(https://osf.io)	make	this	easy	to	do.	Tempting	as	it	may	be	to	cover	up	the	more
unsavoury	aspects	of	our	results,	science	is	about	truth,	openness	and	willingness
to	debate	your	work.
Scientists	tell	the	world	about	our	findings	by	presenting	them	at	conferences
and	in	articles	published	in	scientific	journals.	A	scientific	journal	is	a	collection
of	articles	written	by	scientists	on	a	vaguely	similar	topic.	A	bit	like	a	magazine,
but	more	tedious.	These	articles	can	describe	new	research,	review	existing
research,	or	might	put	forward	a	new	theory.	Just	like	you	have	magazines	such
as	Modern	Drummer,	which	is	about	drumming,	or	Vogue,	which	is	about