Bottom-Up and Top-Down Processing in Perception
Demonstration: Perceiving a Picture
Recognizing Letters and Objects
Template Matching
Interactive Activation Model
Method: Word Superiority Effect
Feature Integration Theory (FIT)
Recognition-by-Components Theory
Test Yourself 3.1
Perceptual Organization: Putting Together
an Organized World
The Gestalt Laws of Perceptual Organization
Demonstration: Finding Faces in a Landscape
The Gestalt Laws Provide “Best Guess” Predictions
About What Is Out There
Why Computers Have Trouble Perceiving Objects
The Stimulus on the Receptors Is Ambiguous
Objects Need to Be Distinguished From Their
Surroundings and From Each Other
Objects Can Be Hidden or Blurred
The Reasons for Changes of Lightness and Darkness
Can Be Unclear
How Experience and Knowledge Create
“Perceptual Intelligence”
Heuristics for Perceiving
Demonstration: Shape From Shading
Knowledge Helps Us Perceive Words
in Conversational Speech
Demonstration: Organizing Strings of Sounds
Neurons Contain Information About the Environment
Something to Consider: Perception Depends
on Attention
Demonstration: Change Detection
Test Yourself 3.2
Chapter Summary
Think About It
If You Want to Know More
55
Key Terms
CogLab: Change Detection; Apparent Motion;
Blind Spot; Metacontrast Masking; Muller-Lyer Illusion;
Signal Detection; Visual Search; Garner Interference
Perception 3
Some Questions We Will Consider
Why does something that is so easy, like
looking at a scene and seeing what is out
there, become so complicated when we look
at the mechanisms involved? (57)
Why is recognizing an object so easy for
humans, but so difficult for computers? (80)
How is our knowledge of the world, which
we use for perceiving, stored in the brain?
(90)
•
•
•
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
56 Chapter 3
Because of the ease with which we perceive, many people don’t see the feats achieved
by our senses as complex or amazing. “After all,” the skeptic might say, “for vision, a
picture of the environment is focused on the back of my eye, and that picture provides
all the information my brain needs to duplicate the environment in my consciousness.”
But the erroneous idea that perception is not that complex is exactly what misled computer
scientists in the 1950s and 1960s into proposing that it would take only about a
decade or so to create “perceiving machines” that could negotiate the environment with
humanlike ease. As it turned out, it took over 50 years to create computer-controlled
robots capable of finding their way through the environment, and even these computers
fall far short of humans’ ability to perceive (Sinha, 2002).
In this chapter, we will explain why perception is so complex and why people still
outperform computers by a wide margin. We begin by describing how the process of
perception depends both on the incoming stimulation and the knowledge we bring to
the situation. Following this introduction, we will devote the rest of the chapter to answering
the question, “How do we perceive objects?” As we do this, we will see that
one reason humans are better at perceiving objects than computers is that humans use
perceptual intelligence—knowledge they have gained from their experience in perceiving
(Figure 3.1).
One reason we will focus on object perception is that perceiving objects is central to
our everyday experience. Consider, for example, what you would say if you were asked to
look up and describe what you are perceiving right now. Your answer would, of course,
depend on where you are, but it is likely that a large part of your answer would include
naming the objects that you see. (“I see a book. There’s a chair against the wall. . . .”)
We also focus on object perception in this chapter because concentrating on one
aspect of perception provides more in-depth understanding of the basic principles of
■ Figure 3.1
Flow diagram for
this chapter.
Perceiving
objects
Perceptual
intelligence
The process
of perception
• How does perception depend on
incoming stimulation and existing
knowledge?
• How are objects analyzed into features
early in the process of perception?
• How are elements in a scene organized
into objects?
• How do humans use perceptual intelligence
to perceive objects?
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
57Perception
perception than we could achieve by covering a number of different types of perception
more superficially. After describing a number of mechanisms of object perception, we
will consider “perceptual intelligence”—the idea that the knowledge we bring to a situation
plays an important role in perception.
Bottom-Up and Top-Down Processing in Perception
Although perception seems to just “happen,” it is actually the end result of a complex
process. We can appreciate the complexity involved in seemingly simple behaviors by
returning to our example of Juan and the alarm clock from the beginning of Chapter 2.
We saw that one way to describe Juan’s situation was to consider how neurons in his ear
and brain respond to the ringing of his alarm. But we also saw that things become more
complicated when we consider that Juan’s response to his alarm (hitting the snooze button
and going back to sleep) is determined by knowledge that he brings to the situation.
His behavior is determined both by the stimulation provided by the ringing alarm clock
and his knowledge that he can sleep longer and still get to class on time. We will now
consider how behavior is determined both by the energy reaching a person’s receptors
and by the knowledge the person brings to a situation.
To illustrate this cooperation between stimulus energy and knowledge, we will
consider Ellen, who is taking a walk in the woods. As she walks along the trail she is
confronted with a large number of stimuli (Figure 3.2a). When she looks at a particularly
distinctive tree off to the right, she doesn’t notice the interesting pattern on the
■ Figure 3.2 (a) Ellen
taking a walk in the woods,
which contains a large number
of stimuli; (b) the moth,
which she sees and then recognizes,
using a combination
of bottom-up and top-down
processing.
(a) (b)
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
58 Chapter 3
tree trunk at first, but then realizes that what she had at first taken to be a patch of moss
was actually a moth (Figure 3.2b).
Let’s stop for a moment to consider what has happened. Ellen perceived the moth
because light reflected from the moth created an image in her eye (Figure 3.3a). This
image triggered the process of transduction we discussed in Chapter 2 (page 31) and
resulted in electrical signals, which traveled from the eye to Ellen’s brain. This sequence
of events, which started with stimulation of the receptors, is called bottom-up
processing. Bottom-up processing—processing that begins with stimulation of the receptors—is
crucial for determining Ellen’s experience because if her receptors aren’t
stimulated, she won’t see anything.
But bottom-up processing is not the whole story, because perception involves more
than just registering energy on the receptors. We can appreciate this by considering
Ellen’s problem. Looking at the moth creates a pattern of light and dark on her retina,
but it may not be obvious which of the light and dark areas belong to the moth and
which belong to the textures of the tree trunk. To help achieve this, Ellen uses her
knowledge of moths, not only to detect its presence on the tree, but also to determine
that it is a moth, not a butterfly, and to identify what kind of moth it is. Knowledge that
Ellen brings to bear on the perceptual problem of seeing and recognizing the moth
represents top-down processing—processing that involves a person’s knowledge (Figure
3.3b). Knowledge doesn’t have to be involved in perception but, as we will see, it of(b)
Existing knowledge
(top down)
(a) Incoming data
(bottom up)
Image of
moth
Moth
Light
Electrical
signals
■ Figure 3.3 Ellen’s perception of the moth is determined by a combination of (a) incoming data
(bottom-up information) and (b) existing knowledge (top-down information).
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
59Perception
ten is—with bottom-up and top-down processing collaborating to result in perception
(Figure 3.4).
In our example, Ellen uses knowledge about moths she had learned much earlier.
The following demonstration illustrates that incoming data can be affected by knowledge
that has been provided just moments earlier.
Demonstration
Perceiving a Picture
After looking at the drawing in Figure 3.5, close your eyes, then turn to the next page in
the book without looking at the page. Then open and shut your eyes to brieﬂy expose the
picture in Figure 3.6 at the top of the page. Decide what the picture is based on this brief
exposure. Do this now, before reading further.
■ Figure 3.4 Both bottom-up and
top-down processing combine to determine
perception.
Perception
of moth
Expectations and
existing knowledge
(top-down)
Incoming data
(bottom-up)
Pattern of light
entering eye
■ Figure 3.5 Picture for “perceiving a picture” demonstration.
(Adapted from “The Role of Frequency in Developing Perceptual
Sets,” by B. R. Bugelski and D. A. Alampay, 1961, Canadian Journal
of Psychology, 15, pp. 205–211, Copyright © 1961 by the Canadian
Psychological Association.)
60 Chapter 3
What did you see when you looked at Figure 3.6 above? Did it look like a rat (or a
mouse)? If it did, you were influenced by the clearly rat- or mouselike figure you saw in
Figure 3.5. But people who first observe Figure 3.10 (on page 63) usually identify Figure
3.6 as a man. (Try this demonstration on someone else.) This demonstration, which
is called the rat–man demonstration, shows how recently acquired knowledge (“that
pattern is a rat”) can influence perception.
Another example of an effect of top-down processing is provided by an experiment
by Stephen Palmer (1975), in which he presented a context scene such as the one on the
left of Figure 3.7 and then briefly flashed one of the target pictures on the right. One of
the targets was appropriate to the scene (the loaf of bread), one was inappropriate (the
drum), and one was misleading (the mailbox, which was shaped like the loaf of bread).
When the participants reported what the target picture was, they were correct 83 percent
of the time for the appropriate object, 50 percent for the inappropriate object, and
(a)
(b)
(c)
■ Figure 3.7 Stimuli like those used in Palmer’s (1975) experiment, which showed how context can
inﬂuence perception. (Reprinted from “The Effects of Contextual Scenes on the Identiﬁcation of Objects,”
by S. E. Palmer, 1975, Memory and Cognition, 3, pp. 519–526, Copyright © 1975 with permission
of the author and the Psychodynamic Society Publishers.)
■ Figure 3.6 (Adapted from “The Role of Frequency in Developing
Perceptual Sets,” by B. R. Bugelski et al., 1961, Canadian Journal of
Psychology, 15, pp. 205–211, Copyright © 1961 by the Canadian
Psychological Association.)
61Perception
40 percent for the misleading object. This experiment shows how a person’s knowledge
of the context provided by a particular scene can influence perception.
As you will see in later chapters, there are numerous situations in which incoming
data interacts with a person’s knowledge. This occurs for attention, memory, language,
and most of the other types of cognition we will be discussing. In this chapter, we will
focus on perception by looking at what cognitive psychologists have discovered about
how both bottom-up and top-down processes operate as we perceive objects. We start
by describing how incoming stimuli are analyzed by the visual system. This analysis occurs
rapidly and without our awareness and provides an example of how bottom-up and
top-down processing can interact.
Recognizing Letters and Objects
As a first step in determining how we perceive objects, we will follow the lead of early
cognitive psychologists, who focused on the simple case of perceiving letters of the alphabet.
We begin with an idea called template matching, which turned out to be too
simple to explain how we perceive letters, but which led to the idea of perception based
on features, which is part of present-day explanations of object perception.
Template Matching
We begin with a simple example—how we recognize the letter K in Figure 3.8. One way
the perceptual system could achieve this would be to compare the pattern K to a model
or template of the letter K that is stored in the system. According to this idea, when
■ Figure 3.8 According to the
idea of template matching, we
can identify an object when it
matches a template. Thus, in (a),
in which the stimulus matches the
template, the perceiver identiﬁes it
as a K. A problem arises, however,
when the stimulus is tilted, as
in (b), because then it no longer
matches the template, and so the
perceiver would not be able to
identify it. (c) Each of these K’s
would require different templates,
but because they share features,
they can be identiﬁed by a mechanism
that takes these features into
account.
(a) Match (b) No match
(c) Different kinds of K's that share features
K
Template
62 Chapter 3
the pattern matches the template, the perceiver recognizes the letter as a K. But this
idea runs into problems when we consider what happens when the K is tilted, as in Figure
3.8b. Tilting the K poses no problem for a perceiver, who can still recognize it.
However, template-matching theory would require a template for every orientation of
the K. People also have no trouble identifying different forms of the same letter, like
the K’s in Figure 3.8c. It is apparent that the template-matching model won’t work, because
a huge number of different templates would be needed just to recognize one letter.
When we multiply this by how many objects there are in the environment, the number
becomes astronomical. To deal with this problem, psychologists developed models of
letter perception based on the idea that letters can be broken down into features.
Interactive Activation Model
We saw in Chapter 2 that there are cortical neurons called feature detectors that respond
to oriented lines (Hubel & Wiesel, 1965). The discovery of feature detectors in
the 1960s suggested that perhaps the perceptual system constructs letters and other
objects in the environment from simple features, like oriented lines. Features help solve
some of the problems associated with template matching, because although letters like
the ones in Figure 3.8c look different, they all have features in common, such as vertical
and slanted lines.
This idea led James McClelland and David Rumelhart (1981; also Rumelhart &
McClelland, 1982) to propose the model of letter recognition shown in Figure 3.9. This
model, which is called the interactive activation model, proposes that activation is
sent through three levels: The feature level contains feature units—mainly straight
and curved lines; the letter level contains letter units—one for each letter in the alphabet;
and the word level contains word units—all the words a person knows. The
simplified model in Figure 3.9 contains 6 feature units and 4 letter units. The complete
model has 12 feature units and 26 letter units. We will use our simplified model to dem■
Figure 3.9 Diagram of
McClelland and Rummelhart’s
(1981) interactive
activation model of word
recognition. This diagram
indicates that feature units
at the feature level are activated
by the letter K, and
that these feature units
send activation to letter
units in the letter level.
Color and radiating lines
indicate activation.
Fork Roof
F O R
K
Word level
Letter level
Feature level
Stimulus
K
63Perception
■ Figure 3.10 The “man” stimulus for the rat–man demonstration.
(Adapted from “The Role of Frequency in Developing Perceptual
Sets,” by B. R. Bugelski et al., 1961, Canadian Journal of Psychology,
15, pp. 205–211, Copyright © 1961 by the Canadian Psychological
Association.)
onstrate how interactive activation handles the following three situations: (1) recognizing
a single letter; (2) recognizing a single word; and (3) recognizing a letter within
a word.
Recognizing a Single Letter Presenting the letter K activates feature units for K’s features—
a straight line and two slanted lines (Figure 3.9). These feature units then send activation
to each letter unit that contains these features—the F, K, O, and R in our example
(in the full model, with all 26 letters, other letters would also be activated). According
to the interactive activation model, the letter unit that is activated the most indicates
which letter was presented. In our example, the K is activated the most, indicating that
the K was, in fact, the letter that was presented.
Recognizing a Word We now consider how the model responds to the word FORK. In Figure
3.11 we have added a characteristic to the model that enables it to deal with words.
Now, in the letter stage there is a letter unit for each letter’s position in a word. For
Fork Roof
FFFF OOOO RK K KK
FORK
F, R, K F O O,RK R,K
1 2 3 4 1 2 3 4 1 2 41 2 3 4 2 42 42 42 444
RR R
3
■ Figure 3.11 How the word FORK activates the components of the interactive activation model.
See text for explanation.
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
64 Chapter 3
example, presenting FORK activates the features for the F, and each of these features
send activation to letter unit F1—which is for F in the first position in a word. Similarly,
the features for O activate the O2 letter unit (the O in the second position), R’s features
activate the R3 unit, and K’s features activate the K4 unit.
These letter units then send activation to all words that contain letters in the correct
positions. In our example, the word FORK receives signals from the F1, O2, R3,
and K4 letter units. Notice that the word ROOF receives signals only from the O2 letter
unit. In this word, the R and the F are in the wrong position to receive activation
from the R and F letter units that are activated by FORK. Because FORK is more highly
activated than ROOF, the model recognizes FORK as the word that was presented. Of
course, in the full model, many more words would be involved, but the general result is
that the word that is presented causes the most activation.
The Word Superiority Effect Next we consider how the model deals with recognizing a letter
that is contained in a word, but first we will describe the word superiority effect—
letters are easier to recognize when they are contained in a word, compared to when
they appear alone or are contained in a nonword. This effect was first demonstrated by
G. M. Reicher in 1969 using the following procedure.
Method
Word Superiority Effect
A stimulus that is either (a) a word, like FORK; (b) a single letter, like K; or (c) a nonword,
such as RFOK, is ﬂashed brieﬂy and is followed immediately by a masking stimulus, indicated
in Figure 3.12 by XXXX, that stops further processing of the original stimulus. Following
the mask, two letters are brieﬂy presented, one that appeared in the original stimulus, and
another that did not. The participants’ task is to pick the letter that was presented in the
original stimulus. In the example in Figure 3.12a, the word FORK was presented, so K would
■ Figure 3.12 Procedure for experiment that
demonstrates the word-superiority effect. First
the stimulus is presented, then the XXXX’s, then
the letters. Three types of stimuli are shown:
(a) word condition; (b) letter condition; and
(c) nonword condition.
FORK XXXX XXXX
K XXXX XXXX
RFOK XXXX XXXX
K
M
K
M
K
M
(a)
(b)
(c)
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
65Perception
be the correct answer. K would also be the correct answer if the K were originally presented
alone (Figure 3.12b), or if it were presented in a nonword like RFOK (Figure 3.12c).
When Reicher’s participants were asked to choose which of the two letters they
saw in the original stimulus, they did so more quickly and accurately when the letter
was part of the original word, as in Figure 3.12a, than when the letter was presented
alone, as in Figure 3.12b, or was part of a nonword, as in Figure 3.12c. This more rapid
processing of letters when in a word—the word superiority effect—means that letters in
words are not processed letter by letter but that each letter is affected by its surroundings.
With this experimental finding in hand, let’s consider how the interactive activation
model would explain the recognition of a letter within a word.
Recognizing a Letter Within a Word Figure 3.13 shows the letter level and word level from
Figure 3.11, but with one added feature—feedback activation, indicated by the dashed
arrows that extend from the word units back to the letter units. Feedback activation is
activation that is sent from word units back to each of the letter units for that word. For
example, the unit for FORK sends activation back to the K4 letter unit. This enhances
the activation of the K4 unit.
The enhanced activity of the letter units caused by feedback activation explains
the word superiority effect, because feedback activation does not occur when a letter is
presented alone (note that the activation for K4 is greater than the activation for the K
in Figure 3.9). Notice that some feedback activation would occur when a nonword such
as RFOK is presented (because the K4 letter unit is activated and sends its activation to
the FORK unit), but much less than for when FORK is presented. Thus, the letter K and
each of the other letters in FORK are more highly activated when they appear in the
word than when they appear alone or in a nonword.
The model in Figure 3.11 is important for a number of reasons. First, it proposes
a mechanism that is consistent with what we know about neural firing. Excitation is
sent from one level to another in the model, just as excitation is sent from one neuron
to another in the nervous system. The model also contains another characteristic that
corresponds to neural firing. It proposes a role for inhibition, which is sent between the
letter units and between the word units. We didn’t include inhibition in our example
■ Figure 3.13 The letter
and word levels of the
interactive activation model,
showing how feedback activation
from the word level to
the letter level (dashed lines)
increases activation of the
letter units.
111111111111111111 44444444444444444444444444444444444 333333333333333333222222222222222222222222222222222222222222222
Stimulus = Fork
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
66 Chapter 3
above, but the net effect of inhibition is to enhance the activation of units
corresponding to stimulus letters or words, compared to units that do not
correspond to other letters or words.
The model is also important because it takes top-down processing
into account. Remember that bottom-up processing is initiated by
stimulation of the receptors, and top-down processing occurs when a
person’s knowledge affects processing. Thus, in this model, bottom-up
processing occurs when letter or word stimuli activate the receptors,
which then activate feature units. Top-down processing is also involved
because the existence of word units is based on the person’s knowledge
of which strings of letters form words, and the feedback activation that is
sent back from the words to the letter units reflects top-down processing
(Figure 3.14).
This is an early version of a type of model called a connectionist model.
Connectionist models involve networks that look like the ones in Figures
3.11 and 3.13. As we will see in Chapter 8, networks like this have
been used to explain not only how we recognize letters and words, but
how we learn to recognize stimuli we have never experienced before.
Considering how letters are recognized provides a good way to show
how bottom-up and top-down processing interact with one another. But
we are interested not just in how we recognize letters, but in how we recognize
other types of objects as well. This step in our story takes us to
Anne Treisman’s feature integration theory of perception.
Feature Integration Theory (FIT)
Figure 3.15 shows the basic idea behind feature integration theory (FIT; Treisman,
1986). According to this theory, the first stage of perception is the preattentive stage,
so named because it happens automatically and doesn’t require any effort or attention
by the perceiver. In this stage, an object is analyzed into its features.
The idea that an object is automatically broken into features may seem counterintuitive
because when we look at an object, we see the whole object, not an object that
Word
units
Letter
units
Feature
units
Top-down
processing
Bottom-up
processing
■ Figure 3.14 Summary of how
activation ﬂows in the interactive
activation model. Activation
ﬂowing from the feature units
toward the word units represents
bottom-up processing. Activation
ﬂowing from the word units to the
letter units represents top-down
processing.
■ Figure 3.15 Flow diagram for Treisman’s (1986) feature integration theory (FIT). According to this
theory, objects are ﬁrst analyzed into features in the preattentive stage, and then these features are
combined into an object that can be perceived in the focused attention stage.
Preattentive
stage
Object Perception
Analyze into
features
Combine
features
Focused
attention
stage
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
67Perception
has been divided into its individual features. The reason we aren’t aware of this process
of feature analysis is that it occurs early in the perceptual process, before we have become
conscious of the object. Thus, when you see this book, you are conscious of its
rectangular shape, but you are not aware that before you saw this rectangular shape,
your perceptual system analyzed the book into individual features such as lines with
different orientations.
To provide some perceptual evidence that objects are, in fact, analyzed into features,
Treisman and H. Schmidt (1982) did an ingenious experiment to show that early
in the perceptual process, features may exist independently of one another. Treisman
and Schmidt’s display consisted of four objects flanked by two black numbers (•Color
Plate 3.1). They flashed this display onto a screen for one-fifth of a second, followed by a
random-dot masking field designed to eliminate any residual perception that might remain
after the stimuli were turned off. Participants were told to report the black numbers
first and then to report what they saw at each of the four locations where the shapes
had been.
In 18 percent of the trials, participants reported seeing objects that were made up
of a combination of features from two different stimuli. For example, after being presented
with the display in • Color Plate 3.1, in which the small triangle was red and
the small circle was green, they might report seeing a small red circle and a small green
triangle. These combinations of features from different stimuli are called illusory conjunctions.
Illusory conjunctions can occur even if the stimuli differ greatly in shape
and size. For example, a small blue circle and a large green square might be seen as a
large blue square and a small green circle.
According to Treisman, these illusory conjunctions occur because at the beginning
of the perceptual process each feature exists independently of the others. That is, features
such as “redness,” “curvature,” or “tilted line” are, at this early stage of processing,
not associated with a specific object (Figure 3.16). They are, in Treisman’s (1986) words,
Tilted line
Curvature
Red
Tilted line
■ Figure 3.16 The results of
the illusory conjunction experiment
suggest that very early in
the perceptual process, features
that make up an object are “free
ﬂoating.” This is symbolized here
by showing some of the features
of a cell phone as existing
separately from one another at
the beginning of the perceptual
process.
68 Chapter 3
“free floating” and can therefore be incorrectly combined in laboratory situations when
briefly flashed stimuli are followed by a masking field.
You can think about these features as components of a visual “alphabet.” At the very
beginning of the process, perceptions of each of these components exist independently
of one another, just as the individual letter tiles in a game of Scrabble exist as individual
units when the tiles are scattered at the beginning of the game. However, just as the
individual Scrabble tiles are combined to form words, the individual features combine
to form perceptions of whole objects. According to Treisman’s model, these features are
combined in the second stage, which is called the focused attention stage. Once the
features have been combined in this stage, we perceive the object.
During the focused attention stage, the observer’s attention plays an important
role in combining the features to create the perception of whole objects. To illustrate
the importance of attention for combining the features, Treisman repeated the illusory
conjunction experiment using the stimuli in • Color Plate 3.1, but she instructed her
participants to ignore the black numbers and to focus all of their attention on the four
target items. This focusing of attention eliminated illusory conjunctions so that all of
the shapes were paired with their correct colors.
When I describe this process in class, some students aren’t convinced. One student
said, “I think that when people look at an object, they don’t break it into parts. They
just see what they see.” To convince this student (and the many others who, at the beginning
of the course, are still not comfortable with the idea that cognition sometimes
involves rapid processes we aren’t aware of ), I describe the case of R.M., a patient who
had parietal lobe damage that resulted in a condition called Balint’s syndrome. The
crucial characteristic of Balint’s syndrome is an inability to focus attention on individual
objects.
According to feature integration theory, lack of focused attention would make
it difficult for R.M. to combine features correctly, and this is exactly what happened.
When R.M. was presented with two different letters of different colors, such as a red T
and a blue O, he reported illusory conjunctions such as “blue T” on 23 percent of
the trials, even when he was able to view the letters for as long as 10 seconds (FriedmanHill
et al., 1995; Robertson et al., 1997). The case of R.M. illustrates how a breakdown
in the brain can reveal processes that are not obvious when the brain is functioning
normally.
The feature analysis approach involves mostly bottom-up processing because
knowledge is usually not involved. In some situations, however, top-down processing
can come into play. For example, when Treisman did an illusory conjunction experiment
using stimuli such as the ones in • Color Plate 3.2 and asked participants to identify
the objects, the usual illusory conjunctions occurred, so the orange triangle would, for
example, sometimes be perceived to be black. However, when she told participants that
they were being shown a carrot, a lake, and a tire, illusory conjunctions were less likely
to occur, so participants were more likely to perceive the triangular “carrot” as being
orange. Thus, in this situation, the participants’ knowledge of the usual colors of objects
influenced their ability to correctly combine the features of each object. In our every-
69Perception
day experience, in which we are often perceiving familiar objects, top-down processing
combines with feature analysis to help us perceive things accurately.
The features in Treisman’s model are things like lines, curves, and colors. But
these types of features don’t explain how we perceive the three-dimensional objects
we routinely encounter in our environment. Another feature-based theory, called
recognition-by-components theory, proposes three-dimensional features to deal with this
situation.
Recognition-by-Components Theory
In the recognition-by-components (RBC) theory of perception, the features are not
lines, curves, or colors, but are three-dimensional volumes called geons. Figure 3.17a
shows a number of geons, which are shapes such as cylinders, rectangular solids, and
pyramids. Irving Biederman (1987), who developed the recognition-by-components
theory, has proposed that there are 36 different geons, which is enough to construct a
large proportion of the objects that exist in the environment. Figure 3.17b shows a few
objects that have been constructed from geons.
An important property of geons is that they can be identified when viewed from
different angles. This property, which is called view invariance, occurs because geons
contain view invariant properties—properties such as the three parallel edges of the
rectangular solid in Figure 3.17 that remain visible even when the geon is viewed from
many different angles.
Text not available due to copyright restrictions
70 Chapter 3
You can test the view-invariant properties of a rectangular solid yourself by picking
up a book and moving it around, so you are looking at it from many different viewpoints.
As you do this, notice what percentage of the time you are seeing the three parallel
edges. Also notice that occasionally, as when you look at the book end-on, you do not
see all three edges (Figure 3.18c). However, these situations occur only rarely, and when
they do occur, it becomes more difficult to recognize the object. For example, when we
view the object in Figure 3.19a from the rarely encountered unusual perspective in Figure
3.19b, we see fewer basic geons and therefore have difficulty identifying it.
Two other properties of geons are discriminability and resistance to visual noise. Discriminability
means that each geon can be distinguished from the others from almost
all viewpoints. Resistance to visual noise means we can still perceive geons under
“noisy” conditions such as might occur under conditions of low light or fog. For example,
look at Figure 3.20. The reason you can identify this object (what is it?)—even
though over half of its contour is obscured—is because you can still identify its geons.
(b) (c)(a)
■ Figure 3.18 A view-invariant property of a rectangular object is demonstrated by fact that three
parallel edges are present even when we change our viewpoint of the book, as in (a) and (b). In rare
cases, such as (c), when the book is viewed from end-on, this invariant property is not perceived.
BruceGoldstein
■ Figure 3.19 (a) A familiar object; (b) the same object seen from a viewpoint that obscures most of
its geons. This makes it harder to recognize the object.
BruceGoldstein
(a) (b)
71Perception
However, in Figure 3.21, in which the visual noise is arranged so the geons cannot be
identified, it becomes impossible to recognize that the object is a flashlight.
The basic message of recognition-by-components theory is that if enough information
is available to enable us to identify an object’s basic geons, we will be able to identify
the object (also see Biederman, 2001; Biederman & Cooper, 1991; Biederman et al.,
1993). A strength of Biederman’s theory is that it shows that we can recognize objects
based on a relatively small number of basic shapes. For example, we easily recognize
Figure 3.22a, which has nine geons, as an airplane, but even when only three geons are
present, as in Figure 3.22b, we can still identify an airplane.
■ Figure 3.20 What is the object behind the mask?
(Adapted from “Recognition-by-Components: A Theory
of Human Image Understanding,” by I. Biederman, 1987,
Psychological Review, 24, 2, pp. 115–147, Figure 26,
Copyright © 1987 with permission from the author and
the American Psychological Association.)
■ Figure 3.21 The same object as in Figure 3.20 (a ﬂashlight)
with the geons obscured. (Adapted from “Recognition-by-Components:
A Theory of Human Image Understanding,”
by I. Biederman, 1987, Psychological Review,
24, 2, pp. 115–147, Figure 25, Copyright © 1987 with
permission from the author and the American Psychological
Association.)
(a) (b)
■ Figure 3.22 An airplane, as represented by (a) nine geons; (b) three geons. (Adapted from “Recognition-by-Components:
A Theory of Human Image Understanding,” by I. Biederman, 1987, Psychological
Review, 24, 2, pp. 115–147, Figure 13, Copyright © 1987 with permission from the author and the
American Psychological Association.)
72 Chapter 3
Both feature integration theory and recognition-by-components theory are based
on the idea of early analysis of objects into parts. These two theories explain different
facets of object perception. Feature integration theory is more concerned with very
basic features like lines, curves, colors, and with how attention is involved in combining
them, whereas recognition-by-components theory is more about how we perceive
three-dimensional shapes. Thus, both theories explain how objects are analyzed into
parts early in the perceptual process.
There is, however, more to perceiving objects than analyzing them into parts. We
will now consider another aspect of object perception, which focuses not on analysis
that occurs early in the perceptual process, but on how we organize elements of the
environment into separate objects.
Test Yourself 3.1
1. Describe the role of bottom-up and top-down processing as applied to Ellen seeing
the moth on the tree, to the rat–man demonstration, and to Palmer’s kitchen
experiment.
2. What is the basic idea behind the feature analysis approach to perception? Describe
the integrative activation model for recognizing letters. How do parts of this model
relate to what we know about physiology? How do the word units help explain the
word superiority effect?
3. Describe Treisman’s feature integration theory. How do her experiments on illusory
conjunctions support the idea that features are “free floating” in the preattentive
stage? What is the focused attention stage, and what is the evidence that attention is
important for combining the features?
4. Describe Biederman’s recognition-by-components theory. How is it similar to Treisman’s
theory, and how is it different?
Perceptual Organization: Putting Together an Organized World
What do you see in Figure 3.23? Take a moment and decide before reading further.
If you have never seen this picture before, you may just see a bunch of black
splotches on a white background. However, if you look closely you can see that the picture
is a Dalmatian facing to the left, with its nose to the ground. Once you have seen
the Dalmatian, it is hard to not to see it. Your mind has achieved perceptual organization—the
organization of elements of the environment into objects—and has perceptually
organized the black areas into a Dalmatian. But what is behind this process? The
first psychologists to study this question were the Gestalt psychologists, who were
active in Europe beginning in the 1920s.
73Perception
In Chapter 1, we described how, early in the 1900s, perception was explained by
an approach called structuralism, which involved adding up small, elementary units
called sensations. According to this idea, we see the two glasses in Figure 3.24a because
hundreds of tiny sensations, indicated by the dots in Figure 3.24b, add up to create our
perception of the glasses. But the Gestalt psychologists took a different approach. Instead
of looking at the glasses as a collection of tiny sensations, they considered the
overall pattern created by the glasses. According to the Gestalt approach, the pattern in
Figure 3.24a can potentially be perceived as representing a number of different objects,
as shown in Figures 3.25a, b, and c. But even though many different objects could have
created the pattern in Figure 3.24a, the fact that we automatically see the picture as two
separate glasses, as in Figure 3.25a, caused the Gestalt psychologists to ask what causes
us to organize our perception in this way. They answered this question by proposing
that the mind groups patterns according to rules that they called the laws of perceptual
organization.
Image not available due to copyright restrictions
74 Chapter 3
The Gestalt Laws of Perceptual Organization
The laws of perceptual organization are a series of rules that specify how we perceptually
organize parts into wholes. Let’s look at six of the Gestalt laws.
Pragnanz Pragnanz, roughly translated from the German, means “good figure.” The
law of Pragnanz, the central law of Gestalt psychology, which is also called the law
of good figure or the law of simplicity, states: Every stimulus pattern is seen in such a
(a) (b) (c)
■ Figure 3.25 Each of the objects in (a), (b), and (c) could have resulted in the perception in Figure
3.24a if arranged appropriately in relation to one another. The Gestalt psychologists pointed out
that we see the pattern as two glasses, as in (a), and proposed “laws of perceptual organization” to
explain why certain perceptions are more likely than others.
(a) (b)
■ Figure 3.24 (a) Two
overlapping wine glasses;
(b) each dot represents a
sensation. According to
the structuralist approach,
these individual sensations
are combined to result in our
perception of the glasses.
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
75Perception
way that the resulting structure is as simple as possible. The familiar Olympic symbol in
Figure 3.26a is an example of the law of simplicity at work. We see this display as five
circles and not as other, more complicated shapes such as the ones in Figure 3.26b. We
can also apply this law to the wine glasses in Figure 3.25. Seeing the pattern as two
glasses as in Figure 3.25a is much simpler than seeing it as the more complex objects in
Figures 3.25b and c.
Similarity Most people perceive Figure 3.27a as either horizontal rows of circles, vertical
columns of circles, or both. But when we change some of the circles to squares, as in
Figure 3.27b, most people perceive vertical columns of squares and circles. This perception
illustrates the law of similarity: Similar things appear to be grouped together. This law
causes the circles to be grouped with other circles and the squares to be grouped with
other squares. Grouping can also occur because of similarity of lightness (Figure 3.27c),
hue, size, or orientation.
(b)(a)
■ Figure 3.26 Law of simplicity. We see ﬁve circles, as in (a), not the more complex array of nine
objects, as in (b).
(a) (b) (c)
■ Figure 3.27 Law of similarity. (a) This display can be perceived as either vertical columns or horizontal
rows; (b) this is more likely perceived as columns of squares alternating with columns of circles,
due to similarity of shape; (c) this is perceived as columns because of similarity of lightness.
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
76 Chapter 3
Good Continuation We see wire starting at A in Figure 3.28 as flowing smoothly to B. It
does not go to C or D because that path would involve making sharp turns and would
violate the law of good continuation: Points that, when connected, result in straight or
smoothly curving lines, are seen as belonging together, and the lines tend to be seen as following
the smoothest path. Another effect of good continuation is shown in the Celtic knot
pattern in Figure 3.29. In this case, good continuation assures that we see a continuous
interweaved pattern that does not appear to be broken into little pieces every time
one strand overlaps another strand. Good continuation also helped us to perceive the
smoothly curving Olympic circles in Figure 3.26.
■ Figure 3.28 Good continuation
helps us perceive
two separate wires, even
though they overlap.
A
C D
B
BruceGoldstein
■ Figure 3.29 We perceive this pattern
as continuous interwoven strands because
of good continuation.
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
77Perception
Proximity or Nearness Figure 3.30a is the pattern from Figure 3.27a that can be seen as either
horizontal rows or vertical columns or both. By moving the circles closer together,
as in Figure 3.30b, we increase the likelihood that the circles will be seen in horizontal
rows. This illustrates the law of proximity or law of nearness: Things that are near to
each other appear to be grouped together.
Common Fate The law of common fate states: Things that are moving in the same direction
appear to be grouped together. Thus, when you see a flock of hundreds of birds all flying
together, you tend to see the flock as a unit, and if some birds start flying in another
direction, this creates a new unit (Figure 3.31).
(b)(a)
■ Figure 3.30 Grouping by
nearness. The pattern in (a) is
perceived as vertical columns
or horizontal rows, but when the
dots are near each other, as in
(b), the perception changes to
horizontal rows.
■ Figure 3.31 A ﬂock of birds that
are moving in the same direction are
seen as grouped together. When a portion
of the ﬂock changes direction, their
movement creates a new group. This
illustrates the law of common fate.
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
78 Chapter 3
Familiarity According to the law of familiarity, things are more likely to form groups if
the groups appear familiar or meaningful (Helson, 1933; Hochberg, 1971). You can appreciate
how meaningfulness determines perceptual organization by doing the following
demonstration.
Demonstration
Finding Faces in a Landscape
Consider the picture in •Color Plate 3.3. At ﬁrst glance this scene appears to contain mainly
trees, rocks, and water. But on closer inspection you can see some faces in the trees in the
background, and if you look more closely, you can see that a number of faces are formed by
various groups of rocks. See if you can ﬁnd all 12 faces that are hidden in this picture.
In this demonstration some people find it difficult to perceive the faces at first, but
then suddenly they succeed. (Hint: The group of rocks at the bottom of the picture, just
slightly to the right of center, forms a face.) The change in perception from “rocks in a
stream” or “trees in a forest” into “faces” is a change in the perceptual organization of
the rocks and the trees. The two shapes that you at first perceive as two separate rocks
in the stream become perceptually grouped together when they become the left and
right eyes of a face. In fact, once you perceive a particular grouping of rocks as a face, it
is often difficult not to perceive them in this way—they have become permanently organized
into a face. This effect of meaning on perceptual organization is an example of
the operation of top-down processing in perception.
The Gestalt Laws Provide “Best Guess” Predictions
About What Is Out There
The purpose of perception is to provide accurate information about the properties of
the environment. The Gestalt laws help provide this information because they reflect
things we know from long experience in our environment and because we are using
them unconsciously all the time. For example, the law of good continuation reflects our
understanding that many objects in the environment have straight or smoothly curving
contours, so when we see smoothly curving contours, such as the wires in Figure 3.28,
we correctly perceive the two wires.
The Gestalt laws usually result in accurate perceptions of the environment, but not
always. We can illustrate a situation in which the Gestalt laws might cause an incorrect
perception by imagining the following: As you are hiking in the woods, you stop cold
in your tracks because, not too far ahead, you see what appears to be an animal lurking
behind a tree (Figure 3.32a). The Gestalt laws of organization play a role in creating this
perception. You see the two dark shapes to the left and right of the tree as a single object
because of the Gestalt law of similarity (because both shapes are dark, it is likely that
For more Cengage Learning textbooks, visit www.cengagebrain.co.uk
79Perception
they are part of the same object). Also, good continuation links
these two parts into one, because the line along the top of the
object extends smoothly from one side of the tree to another.
Finally, the image resembles animals you’ve seen before. For all
of these reasons, it is not surprising that you perceive the two
dark objects as part of one animal.
Because you fear that the animal might be dangerous, you
take a different path, and as your detour takes you around the
tree, you notice that the dark shapes aren’t an animal after all,
but are two oddly shaped tree stumps (Figure 3.32b). So in this
case, the Gestalt laws have misled you.
Because the Gestalt laws do not always result in accurate
perceptions of the environment, it is more correct to call them
heuristics rather than laws. A heuristic is a “rule of thumb” that
provides a best-guess solution to a problem. Another way of
solving a problem, an algorithm, is a procedure guaranteed to
solve a problem. An example of an algorithm is the procedures
we learn for addition, subtraction, and long division. If we apply
these procedures correctly, we get the right answer every
time. In contrast, a heuristic may not result in a correct solution
every time.
To illustrate the difference between a heuristic and an algorithm,
let’s consider two different ways of finding a cat hiding
somewhere in the house. An algorithm for doing this would be
to systematically search every room in the house (being careful
not to let the cat sneak past you!). If you do this, you will eventually
find the cat, although it may take a while. A heuristic for
finding the cat would be to first look in the places where the cat
likes to hide. So you check under the bed and in the hall closet.
This may not always lead to finding the cat, but if it does, it has
the advantage of being faster than the algorithm.
The fact that heuristics are usually faster than algorithms helps explain why the
perceptual system is designed in a way that sometimes produces errors. Consider, for
example, what the algorithm would be for determining what the shape in Figure 3.32a
really is. The algorithm would involve walking around the tree so you can see the shape
from different angles, perhaps taking a more close-up look at the objects behind the
tree and maybe even poking them to see if they move. Although this may result in an
accurate determination of what the shapes are, it is potentially risky (what if the shape
actually is a dangerous animal?), and slow. The advantage of our Gestalt-based heuristics
is that they are fast and are correct most of the time.
The influence of knowledge and the top-down processing that accompanies knowledge
means that it is accurate to describe perception as being “intelligent.” This intel-
(a)
■ Figure 3.32 (a) What lurks behind the
tree? (b) It is two strangely shaped tree
stumps, not an animal!
(b)
80 Chapter 3
ligence becomes apparent when we bring our knowledge of faces to bear on the creation
of faces in the rocks and trees of •Color Plate 3.3. However, we could argue that there
is a certain intelligence behind even simpler processes, such as grouping by similarity
and nearness. The idea that these simple grouping processes could involve intelligence
is perhaps not obvious because they seem so automatic. In fact, people often react to
some of the Gestalt laws as if they are simply common sense. Our skeptic from the
beginning of the chapter, who thought perception was simple, might say, “Of course
things that are close to each other will become grouped. I don’t think there’s much intelligence
involved in that.”
It is easy to understand why someone might say this because these groupings usually
happen so easily and naturally that it doesn’t appear that much of anything is going
on. It is a case of perception appearing to just “happen.” But in reality there is a lot going
on, because the Gestalt laws are based on characteristics of our environment. Grouping
is easy because our perceptual system is tuned to respond, so when we encounter things
that commonly occur in the environment, we will be likely to perceive them accurately.
The need for perceptual intelligence becomes more obvious when we consider some of
the problems both computers and humans must solve in order to perceive objects.
Why Computers Have Trouble Perceiving Objects
At the beginning of the chapter, we noted that in the 1960s computer scientists predicted
the problem of perception would be easily solved. As it turned out it wasn’t easy
at all, and although a chess-playing computer beat the world chess champion in 1997, it
wasn’t until 2005 that computer-controlled vehicles were able to successfully navigate
a course that involved avoiding obstacles while traveling over varied types of terrain
(Figure 3.33).
Image not available due to copyright restrictions