7CHAPTER 1 Why is my evil lecturer forcing me to learn statistics?
1.5.  Data collection 1: what to measure 1
We have seen already that data collection is vital for testing theories. When we collect data
we need to decide on two things: (1) what to measure, (2) how to measure it. This section
looks at the first of these issues.
1.5.1.   Variables 1
1.5.1.1. Independent and dependent variables 1
To test hypotheses we need to measure variables. Variables are just things that can change (or
vary); they might vary between people (e.g. IQ, behaviour) or locations (e.g. unemployment)
or even time (e.g. mood, profit, number of cancerous cells). Most hypotheses can be expressed
in terms of two variables: a proposed cause and a proposed outcome. For example, if we take
the scientific statement ‘Coca-Cola is an effective spermicide’7
then proposed cause is ‘CocaCola’
and the proposed effect is dead sperm. Both the cause and the outcome are variables: for
the cause we could vary the type of drink, and for the outcome, these drinks will kill different
amounts of sperm. The key to testing such statements is to measure these two variables.
A variable that we think is a cause is known as an independent variable (because its value
does not depend on any other variables). A variable that we think is an effect is called a
dependent variable because the value of this variable depends on the cause (independent
variable). These terms are very closely tied to experimental methods in which the cause
is actually manipulated by the experimenter (as we will see in section 1.6.2). In crosssectional
research we don’t manipulate any variables and we cannot make causal statements
about the relationships between variables, so it doesn’t make sense to talk of dependent and
independent variables because all variables are dependent variables in a sense. One possibility
is to abandon the terms dependent and independent variable and use the terms predictor
variable and outcome variable. In experimental work the cause, or independent variable, is
a predictor, and the effect, or dependent variable, is simply an outcome. This terminology
also suits cross-sectional work where, statistically at least, we can use one or more variables
to make predictions about the other(s) without needing to imply causality.
7
Actually, there is a long-standing urban myth that a post-coital douche with the contents of a bottle of Coke is
an effective contraceptive. Unbelievably, this hypothesis has been tested and Coke does affect sperm motility, and
different types of Coke are more or less effective – Diet Coke is best apparently (Umpierre, Hill, & Anderson,
1985). Nevertheless, a Coke douche is ineffective at preventing pregnancy.
CRAMMING SAM’s Tips Some Important Terms
	 When doing research there are some important generic terms for variables that you will encounter:
·	 Independent variable: A variable thought to be the cause of some effect. This term is usually used in experimental
research to denote a variable that the experimenter has manipulated.
·	 Dependent variable: A variable thought to be affected by changes in an independent variable. You can think of this variable
as an outcome.
·	 Predictor variable: A variable thought to predict an outcome variable. This is basically another term for independent variable
(although some people won’t like me saying that; I think life would be easier if we talked only about predictors and outcomes).
·	 Outcome variable: A variable thought to change as a function of changes in a predictor variable. This term could be
synonymous with ‘dependent variable’ for the sake of an easy life.
8 DISCOVERING STATISTICS USING SAS
1.5.1.2. Levels of measurement 1
As we have seen in the examples so far, variables can take on many different forms and levels
of sophistication. The relationship between what is being measured and the numbers that
represent what is being measured is known as the level of measurement. Broadly speaking,
variables can be categorical or continuous, and can have different levels of measurement.
A categorical variable is made up of categories. A categorical variable that you should
be familiar with already is your species (e.g. human, domestic cat, fruit bat, etc.). You are
a human or a cat or a fruit bat: you cannot be a bit of a cat and a bit of a bat, and neither
a batman nor (despite many fantasies to the contrary) a catwoman (not even one in a nice
PVC suit) exist. A categorical variable is one that names distinct entities. In its simplest
form it names just two distinct types of things, for example male or female. This is known
as a binary variable. Other examples of binary variables are being alive or dead, pregnant
or not, and responding ‘yes’ or ‘no’ to a question. In all cases there are just two categories
and an entity can be placed into only one of the two categories.
When two things that are equivalent in some sense are given the same name (or number),
but there are more than two possibilities, the variable is said to be a nominal variable. It should
be obvious that if the variable is made up of names it is pointless to do arithmetic on them
(if you multiply a human by a cat, you do not get a hat). However, sometimes numbers are
used to denote categories. For example, the numbers worn by players in a rugby or football
(soccer) team. In rugby, the numbers of shirts denote specific field positions, so the number
10 is always worn by the fly-half (e.g. England’s Jonny Wilkinson),8
and the number 1 is
always the hooker (the ugly-looking player at the front of the scrum). These numbers do not
tell us anything other than what position the player plays. We could equally have shirts with
FH and H instead of 10 and 1. A number 10 player is not necessarily better than a number 1
(most managers would not want their fly-half stuck in the front of the scrum!). It is equally as
daft to try to do arithmetic with nominal scales where the categories are denoted by numbers:
the number 10 takes penalty kicks, and if the England coach found that Jonny Wilkinson (his
number 10) was injured he would not get his number 4 to give number 6 a piggy-back and
then take the kick. The only way that nominal data can be used is to consider frequencies. For
example, we could look at how frequently number 10s score tries compared to number 4s.
8
Unlike, for example, NFL American football where a quarterback could wear any number from 1 to 19.
on a 10-point scale. We might be confident that a judge
who gives a rating of 10 found Billie more beautiful than
one who gave a rating of 2, but can we be certain that the
first judge found her five times more beautiful than the second?
What about if both judges gave a rating of 8, could
we be sure they found her equally beautiful? Probably not:
their ratings will depend on their subjective feelings about
what constitutes beauty. For these reasons, in any situation
in which we ask people to rate something subjective
(e.g. rate their preference for a product, their confidence
about an answer, how much they have understood some
medical instructions) we should probably regard these
data as ordinal although many scientists do not.
A lot of self-report data are ordinal. Imagine if two judges
at our beauty pageant were asked to rate Billie’s beauty
JANE SUPERBRAIN 1.2
Self-report data 1
9CHAPTER 1 Why is my evil lecturer forcing me to learn statistics?
So far the categorical variables we have considered have been unordered (e.g. different
brands of Coke with which you’re trying to kill sperm), but they can be ordered too (e.g.
increasing concentrations of Coke with which you’re trying to skill sperm). When categories
are ordered, the variable is known as an ordinal variable. Ordinal data tell us not only
that things have occurred, but also the order in which they occurred. However, these data
tell us nothing about the differences between values. Imagine we went to a beauty pageant
in which the three winners were Billie, Freema and Elizabeth. The names of the winners
don’t provide any information about where they came in the contest; however, labelling them
according to their performance does – first, second and third. These categories are ordered. In
using ordered categories we now know that the woman who won was better than the women
who came second and third. We still know nothing about the differences between categories,
though. We don’t, for example, know how much better the winner was than the runners-up:
Billie might have been an easy victor, getting much higher ratings from the judges than Freema
and Elizabeth, or it might have been a very close contest that she won by only a point. Ordinal
data, therefore, tell us more than nominal data (they tell us the order in which things happened)
but they still do not tell us about the differences between points on a scale.
The next level of measurement moves us away from categorical variables and into continuous
variables. A continuous variable is one that gives us a score for each person and can
take on any value on the measurement scale that we are using. The first type of continuous
variable that you might encounter is an interval variable. Interval data are considerably
more useful than ordinal data and most of the statistical tests in this book rely on
having data measured at this level. To say that data are interval, we must be certain that
equal intervals on the scale represent equal differences in the property being measured. For
example, on www.ratemyprofessors.com students are encouraged to rate their lecturers on
several dimensions (some of the lecturers’ rebuttals of their negative evaluations are worth
a look). Each dimension (i.e. helpfulness, clarity, etc.) is evaluated using a 5-point scale.
For this scale to be interval it must be the case that the difference between helpfulness ratings
of 1 and 2 is the same as the difference between say 3 and 4, or 4 and 5. Similarly, the
difference in helpfulness between ratings of 1 and 3 should be identical to the difference
between ratings of 3 and 5. Variables like this that look interval (and are treated as interval)
are often ordinal – see Jane Superbrain Box 1.2.
Ratio variables go a step further than interval data by requiring that in addition to the
measurement scale meeting the requirements of an interval variable, the ratios of values
along the scale should be meaningful. For this to be true, the scale must have a true and
meaningful zero point. In our lecturer ratings this would mean that a lecturer rated as 4
would be twice as helpful as a lecturer rated with a 2 (who would also be twice as helpful
as a lecturer rated as 1!). The time to respond to something is a good example of a ratio
variable. When we measure a reaction time, not only is it true that, say, the difference
between 300 and 350 ms (a difference of 50 ms) is the same as the difference between 210
and 260 ms or 422 and 472 ms, but also it is true that distances along the scale are divisible:
a reaction time of 200 ms is twice as long as a reaction time of 100 ms and twice as short
as a reaction time of 400 ms.
Continuous variables can be, well, continuous (obviously) but also discrete. This is quite
a tricky distinction (Jane Superbrain Box 1.3). A truly continuous variable can be measured
to any level of precision, whereas a discrete variable can take on only certain values (usually
whole numbers) on the scale. What does this actually mean? Well, our example above
of rating lecturers on a 5-point scale is an example of a discrete variable. The range of the
scale is 1–5, but you can enter only values of 1, 2, 3, 4 or 5; you cannot enter a value of
4.32 or 2.18. Although a continuum exists underneath the scale (i.e. a rating of 3.24 makes
sense), the actual values that the variable takes on are limited. A continuous variable would
be something like age, which can be measured at an infinite level of precision (you could
be 34 years, 7 months, 21 days, 10 hours, 55 minutes, 10 seconds, 100 milliseconds, 63
microseconds, 1 nanosecond old).
10 DISCOVERING STATISTICS USING SAS
1.5.2.   Measurement error 1
We have seen that to test hypotheses we need to measure variables. Obviously, it’s also
important that we measure these variables accurately. Ideally we want our measure to be
calibrated such that values have the same meaning over time and across situations. Weight
is one example: we would expect to weigh the same amount regardless of who weighs
us, or where we take the measurement (assuming it’s on Earth and not in an anti-gravity
chamber). Sometimes variables can be directly measured (profit, weight, height) but in
other cases we are forced to use indirect measures such as self-report, questionnaires and
computerized tasks (to name a few).
example, when we measure age we rarely use nanoseconds
but use years (or possibly years and months).
In doing so we turn a continuous variable into a discrete
one (the only acceptable values are years). Also, we
often treat discrete variables as if they were continuous.
For example, the number of boyfriends/girlfriends
that you have had is a discrete variable (it will be, in all
but the very weird cases, a whole number). However,
you might read a magazine that says ‘the average
number of boyfriends that women in their 20s have has
increased from 4.6 to 8.9’. This assumes that the variable
is continuous, and of course these averages are
meaningless: no one in their sample actually had 8.9
boyfriends.
The distinction between discrete and continuous variables
can be very blurred. For one thing, continuous
variables can be measured in discrete terms; for
JANE SUPERBRAIN 1.3
Continuous and discrete variables 1
CRAMMING SAM’s tips Levels of Measurement
	 Variables can be split into categorical and continuous, and within these types there are different levels of measurement:
·	 Categorical (entities are divided into distinct categories):
¡	 Binary variable: There are only two categories (e.g. dead or alive).
¡	 Nominal variable: There are more than two categories (e.g. whether someone is an omnivore, vegetarian, vegan, or fruitarian).
¡	 Ordinal variable: The same as a nominal variable but the categories have a logical order (e.g. whether people got a fail,
a pass, a merit or a distinction in their exam).
·	 Continuous (entities get a distinct score):
¡	 Interval variable: Equal intervals on the variable represent equal differences in the property being measured (e.g. the
difference between 6 and 8 is equivalent to the difference between 13 and 15).
¡	 Ratio variable: The same as an interval variable, but the ratios of scores on the scale must also make sense (e.g. a
score of 16 on an anxiety scale means that the person is, in reality, twice as anxious as someone scoring 8).
11CHAPTER 1 Why is my evil lecturer forcing me to learn statistics?
Let’s go back to our Coke as a spermicide example. Imagine we took some Coke and some
water and added them to two test tubes of sperm. After several minutes, we measured the
motility (movement) of the sperm in the two samples and discovered no difference. A few
years passed and another scientist, Dr Jack Q. Late, replicated the study but found that sperm
motility was worse in the Coke sample. There are two measurement-related issues that could
explain his success and our failure: (1) Dr Late might have used more Coke in the test tubes
(sperm might need a critical mass of Coke before they are affected); (2) Dr Late measured the
outcome (motility) differently to us.
The former point explains why chemists and physicists have devoted many hours to developing
standard units of measurement. If you had reported that you’d used 100 ml of Coke and
5 ml of sperm, then Dr Late could have ensured that he had used the same amount – because
millilitres are a standard unit of measurement we would know that Dr Late used exactly the
same amount of Coke that we used. Direct measurements such as the millilitre provide an
objective standard: 100 ml of a liquid is known to be twice as much as only 50 ml.
The second reason for the difference in results between the studies could have been to
do with how sperm motility was measured. Perhaps in our original study we measured
motility using absorption spectrophotometry, whereas Dr Late used laser light-scattering
techniques.9
Perhaps his measure is more sensitive than ours.
There will often be a discrepancy between the numbers we use to represent the thing
we’re measuring and the actual value of the thing we’re measuring (i.e. the value we would
get if we could measure it directly). This discrepancy is known as measurement error. For
example, imagine that you know as an absolute truth that you weigh 83 kg. One day you
step on the bathroom scales and it says 80 kg. There is a difference of 3 kg between your
actual weight and the weight given by your measurement tool (the scales): there is a measurement
error of 3 kg. Although properly calibrated bathroom scales should produce only
very small measurement errors (despite what we might want to believe when it says we
have gained 3 kg), self-report measures do produce measurement error because factors other
than the one you’re trying to measure will influence how people respond to our measures.
Imagine you were completing a questionnaire that asked you whether you had stolen from
a shop. If you had, would you admit it, or might you be tempted to conceal this fact?
1.5.3.   Validity and reliability 1
One way to try to ensure that measurement error is kept to a minimum is to determine
properties of the measure that give us confidence that it is doing its job properly. The first
property is validity, which is whether an instrument actually measures what it sets out to
measure. The second is reliability, which is whether an instrument can be interpreted consistently
across different situations.
Validity refers to whether an instrument measures what it was designed to measure;
a device for measuring sperm motility that actually measures sperm count is not valid.
Things like reaction times and physiological measures are valid in the sense that a reaction
time does in fact measure the time taken to react and skin conductance does measure the
conductivity of your skin. However, if we’re using these things to infer other things (e.g.
using skin conductance to measure anxiety) then they will be valid only if there are no
other factors other than the one we’re interested in that can influence them.
Criterion validity is whether the instrument is measuring what it claims to measure (does
your lecturer’s helpfulness rating scale actually measure lecturers’ helpfulness?). In an ideal
world, you could assess this by relating scores on your measure to real-world observations.
9
In the course of writing this chapter I have discovered more than I think is healthy about the measurement of
sperm.
12 DISCOVERING STATISTICS USING SAS
For example, we could take an objective measure of how helpful lecturers were and
compare these observations to student’s ratings on ratemyprofessor.com. This is often
impractical and, of course, with attitudes you might not be interested in the reality so much
as the person’s perception of reality (you might not care whether they are a psychopath
but whether they think they are a psychopath). With self-report measures/questionnaires
we can also assess the degree to which individual items represent the construct being measured,
and cover the full range of the construct (content validity).
Validity is a necessary but not sufficient condition of a measure. A second consideration
is reliability, which is the ability of the measure to produce the same results under the same
conditions. To be valid the instrument must first be reliable. The easiest way to assess reliability
is to test the same group of people twice: a reliable instrument will produce similar
scores at both points in time (test–retest reliability). Sometimes, however, you will want to
measure something that does vary over time (e.g. moods, blood-sugar levels, productivity).
Statistical methods can also be used to determine reliability (we will discover these in
Chapter 17).
SELF-TEST What is the difference between reliability
and validity?
1.6.  Data collection 2: how to measure 1
1.6.1.   Correlational research methods 1
So far we’ve learnt that scientists want to answer questions, and that to do this they have
to generate data (be they numbers or words), and to generate good data they need to use
accurate measures. We move on now to look briefly at how the data are collected. If we
simplify things quite a lot then there are two ways to test a hypothesis: either by observing
what naturally happens, or by manipulating some aspect of the environment and observing
the effect it has on the variable that interests us.
The main distinction between what we could call correlational or cross-sectional
research (where we observe what naturally goes on in the world without directly interfering
with it) and experimental research (where we manipulate one variable to see its
effect on another) is that experimentation involves the direct manipulation of variables.
In correlational research we do things like observe natural events or we take a snapshot of
many variables at a single point in time. As some examples, we might measure pollution
levels in a stream and the numbers of certain types of fish living there; lifestyle variables
(smoking, exercise, food intake) and disease (cancer, diabetes); workers’ job satisfaction
under different managers; or children’s school performance across regions with different
demographics. Correlational research provides a very natural view of the question we’re
researching because we are not influencing what happens and the measures of the variables
should not be biased by the researcher being there (this is an important aspect of
ecological validity).
At the risk of sounding like I’m absolutely obsessed with using Coke as a contraceptive
(I’m not, but my discovery that people in the 1950s and 1960s actually tried this has,
I admit, intrigued me), let’s return to that example. If we wanted to answer the question