3
1
The Active Body
1.1 A Walk on the Wild Side
Honda’s Asimo (see ﬁg. 1.1) is billed, perhaps rightly, as the world’s
most advanced humanoid robot. Boasting a daunting 26 degrees of
freedom (2 on the neck, 6 on each arm, and 6 on each leg), Asimo is
able to navigate the real world, reach, grip, walk reasonably smoothly,
climb stairs, and recognize faces and voices. The name Asimo stands (a
little clumsily perhaps) for Advanced Step in Innovative Mobility. And
certainly, Asimo is an incredible feat of engineering, still relatively short
on brainpower but high on mobility and maneuverability.
As a walking robot, however, Asimo is far from energy efﬁcient. For
a walking agent, one way to measure energy efﬁciency is by the so-called
speciﬁc cost of transport (Tucker 1975)—namely, “the amount of energy
required to carry a unit weight a unit distance.”1
The lower the number,
the less energy is required to shift a unit of weight a unit of distance.
Asimo rumbles in with a speciﬁc cost of transport of about 3.2, whereas
we humans display a speciﬁc metabolic cost of transport of about 0.2.
What accounts for this massive difference in energetic expenditure?
Whereas robots like Asimo walk by means of very precise, and
energy-intensive, joint-angle control systems, biological walking agents
make maximal use of the mass properties and biomechanical couplings
4 from embodiment to cognitive extension
present in the overall musculoskeletal system and walking apparatus
itself. Wild walkers thus make canny use of so-called passive dynamics,
the kinematics and organization inhering in the physical device alone
(McGeer 1990). Pure passive-dynamic walkers are simple devices that
boast no power source apart from gravity and no control system apart
from some simple mechanical linkages such as a mechanical knee and
the pairing of inner and outer legs to prevent the device from keeling
over sideways. Yet despite (or perhaps because of) this simplicity, such
devices are capable, if set on a slight slope, of walking smoothly and
with a very realistic gait. The ancestors of these devices are, as Collins,
Wisse, and Ruina (2001) nicely document, not sophisticated robots but
children’s toys, some dating back to the late 19th century. These toys
stroll, walk, or waddle down ramps or when pulled by string (see ﬁg.
1.2). Such toys have minimal actuation and no control system. Their
walking is a consequence not of complex joint-movement planning and
actuating but of basic morphology (the shape of the body, the distribution
of linkages and weights of components, etc.). Behind the passivedynamic
approach thus lies the compelling thought that
locomotion is mostly a natural motion of legged mechanisms,
just as swinging is a natural motion of pendulums. Stiff-legged
walking toys naturally generate their comical walking motions.
This suggests that human-like motions might come naturally to
human-like mechanisms. (Collins, Wisse, and Ruina 2001, 608)
FIGURE 1.1 Honda’s Asimo robot.
(http://asimo.honda.com/gallery.aspx;
by permission of Honda Corporation)
the active body 5
Collins, Wisse, and Ruina (2001) built the ﬁrst such device to mimic
humanlike walking by adding curved feet, a compliant heel, and
mechanically linked arms to the basic design pioneered by McGeer
(1990). In action (see ﬁg. 1.3), the device exhibits good, steady motion
and is described by its creators as “pleasing to watch” (McGeer 1990,
613). By contrast, robots that make extensive use of powered operations
and joint-angle control tend to suffer from “a kind of rigor mortis
[because] joints encumbered by motors and high-reduction gear
trains...make joint movement inefﬁcient when the actuators are on
and nearly impossible when they are off” (607).
What, then, of powered locomotion? Once the body itself is
“equipped” with the right kind of passive dynamics, powered walking
can be brought about in a remarkably elegant and energy-efﬁcient way.
In essence, the tasks of actuation and control have now been massively
reconﬁgured so that powered, directed locomotion can come about
by systematically pushing, damping, and tweaking a system in which
passive-dynamic effects still play a major role. The control design is
delicately geared to utilize all the natural dynamics of the passive baseline,
and the actuation is consequently efﬁcient and ﬂuid.
Some of the core ﬂavor of such a solution is captured by the broader
notion of “ecological control,”2
where an ecological control system is
one in which goals are not achieved by micromanaging every detail
of the desired action or response but by making the most of robust,
FIGURE 1.2 Fallis’s (1888) clever
implementation of counterswinging
arms. The entire toy is made from two
pieces of wire. Each wire makes up
a leg, a bearing, an axle, and an arm.
One wire also has a head and the other
a body of sorts. (S. Collins, M. Wisse
and A. Ruina, “A Three-dimensional
Passive-dynamic Walking Robot with
Two Legs and Knees,” The International
Journal of Robotics Research 20, no. 7
[July 2001]: 607–615, © 2001 Sage
Publications, by permission)
6 from embodiment to cognitive extension
reliable sources of relevant order in the bodily or worldly environment
of the controller. In such cases,
part of the “processing” is taken over by the dynamics of the
agent-environment interaction, and only sparse neural control
needs to be exerted when the self-regulating and stabilizing
properties of the natural dynamics can be exploited. (Pfeifer
et al. 2006, 7)
A nice example is the use of sparse, well-timed control signals
to support the “rolling and rising” motion (see ﬁg. 1.4) of a robot
that must raise itself up from a prone position (Kuniyoshi et al.
2004). Another is Iida and Pfeifer’s (2004) work on the running robot
Puppy. Puppy has springs (roughly mimicking some of the special
properties of a muscle-tendon system) connecting the lower and
upper parts of each leg, has pressure sensors on each foot, and beneﬁts
from just a few built-in powered oscillatory movements. These
simple inbuilt oscillatory movements nonetheless lead, in the special
context provided by the sprung body, to ﬂuent running and scampering
behavior. Even the simple fact that Puppy has aluminum legs
and feet plays an “adaptive” role, for it leads to small amounts of
slippage on most surf aces. This might seem like a bad thing, but
reducing the slippage by adding rubber pads to the feet caused the
robot to begin to fall over: The subtle slippage was actually playing
a stabilizing role, effectively enabling the robot to rapidly search
for a stable way to proceed (see Pfeifer and Bongard 2007, 96–100,
125–128, for discussion).
In subsequent chapters, we shall encounter ecological control
style solutions for problems ranging all the way from perceptuomotor
FIGURE 1.3 Pure passive dynamic walker in action. (S. Collins, M. Wisse,
and A. Ruina, “A Three-dimensional Passive-dynamic Walking Robot with
Two Legs and Knees,” The International Journal of Robotics Research 20, no. 7
[July 2001]: 607–615, © 2001 Sage Publications, by permission)
the active body 7
response to reﬂection, recall, and deliberation. To capture such effects,
Pfeifer and Bongard (2007) invoke the Principle of Ecological Balance.3
This principle states
ﬁrst...that given a certain task environment there has to be a
match between the complexities of the agent’s sensory, motor,
and neural systems...second....that there is a certain balance
or task-distribution between morphology, materials, control,
and environment. (123)
The “matching” of sensors, morphology, motor system, materials,
controller, and ecological niche yields a spread of responsibility for efﬁcient
adaptive response in which “not all the processing is performed
by the brain, but certain aspects of it are taken over by the morphology,
materials, and environment [yielding] a ‘balance’ or task-distribution
between the different aspects of an embodied agent” (see Pfeifer et al.
2006). In such cases, the details of embodiment may take over some
of the work that would otherwise need to be done by the brain or the
neural network controller, an effect that Pfeifer and Bongard (2007, 100)
aptly describe as “morphological computation.”
The exploitation of passive-dynamic effects exempliﬁes one of several
key characteristics of the embodied, embedded approach that we
will encounter as the chapter progresses. This ﬁrst characteristic has
been called nontrivial causal spread. Nontrivial causal spread (see Clark
1998b; Wheeler and Clark 1999; Wheeler 2005) occurs whenever something
we might have expected to be achieved by a certain well-demarcated
system turns out to involve the exploitation of more far-ﬂung factors and forces.4
For the Mississippi alligator, the temperature of the rotting vegetation
in which it lays its eggs determines the sex of its offspring. This is an
example of nontrivial causal spread. When the passive dynamics of
the actual legs and body take care of many of the demands that we
FIGURE 1.4 Sparse but well-timed control signals enable ﬂuent,
energy-efﬁcient roll and rise motion. (Work by Kuniyoshi et al. [2004];
ﬁgure from Y. Ohmura, by permission)
8 from embodiment to cognitive extension
might otherwise have ceded to an energy-hungry joint-angle control
system, we likewise encounter nontrivial causal spread. One of the big
lessons of contemporary robotics is that the coevolution of morphology
(which can include sensor placement, body plan, and even the choice of
basic building materials, etc.) and control yields a truly golden opportunity
to spread the problem-solving load between brain, body, and
world.5
Robotics thus rediscovers many ideas explicit in the continuing
tradition of J. J. Gibson and of “ecological psychology.”6
Thus, William
Warren, commenting on a quote from Gibson (1979), suggests that
biology capitalizes on the regularities of the entire system as
a means of ordering behavior. Speciﬁcally, the structure and
physics of the environment, the biomechanics of the body, perceptual
information about the state of the agent-environment
system, and the demands of the task all serve to constrain the
behavioral outcome. (2006, 358)
Such causal spread may be wholly evolved or engineered, wholly
learned, or some combination of the two. For example, some control
systems are able to actively learn strategies that make the most of
passive-dynamic opportunities. An example is the Toddler robot, a
walking robot that learns (using so-called actor-critic reinforcement
learning) a control policy that exploits the passive dynamics of the body
(ﬁg. 1.5). The Toddler robot, which features among the pack of passivedynamics-based
robots described in Collins et al. (2005), can learn to
change speeds, go forward and backward, and adapt on the go to different
terrains, including bricks, wooden tiles, carpet, and even a variable
speed treadmill. And as you’d expect, the use of passive dynamics
FIGURE 1.5 The Toddler robot, by Russ
Tedrake, Teresa Zhang, and H. Sebastian
Seung. The robot learns a control policy
that exploits the passive dynamics of its
own body. (Photo by Teresa Zhang, by
permission)
the active body 9
cuts power consumption to about one-tenth that of a standard robot
like Asimo. The passive-dynamics-based robot described in Collins and
Ruina (2005) similarly achieved a speciﬁc cost of transport of around
0.20, again around an order of magnitude lower than Asimo and quite
comparable to the human case. The discrepancy here is thought not
to be signiﬁcantly reducible by further technological advance using
Asimo-style control strategies (i.e., ones that do not exploit passivedynamic
effects). An apt comparison, Collins and Ruina suggest, is
with the energy consumption of a helicopter versus airplane or glider.
The helicopter, however well designed it may be, will still consume
vastly more energy per unit distance traveled.
1.2 Inhabited Interaction
Let’s switch gears, brieﬂy, to ask what it might be like to be an agent
embodied according to these very different sets of principles. What
would it feel like to be an intelligent, conscious version of Asimo and,
contrariwise, to be an intelligent, conscious version of a fully trained
Toddler robot? In the latter case, might it not feel (all other things
being equal) as if, with little effort and a simple act of will, directed
bodily motion is achieved? In the former, the efforts are large and the
body may perhaps be encountered as a complex, resistant object in
need of much ongoing energetic micromanagement. Over time, perhaps,
control can be streamlined, though energy consumption (as in
the case of the helicopter) will still remain high. Nonetheless, the successful
exploitation of passive-dynamic effects may well be a major
contributing element to what Dourish (2001) nicely calls “inhabited
interaction,” a way of being in the world that is contrasted with “disconnected
control.” Here is how Dourish describes the difference,
using present-day (i.e., still fairly clunky) virtual-reality systems as a
point of comparison:
Even in an immersive virtual-reality environment, users are
disconnected observers of a world they do not inhabit directly.
They peer out at it, ﬁgure out what’s going on, decide on some
course of action, and enact it through the narrow interface of
the keyboard or the data-glove, carefully monitoring the result
to see if it turns out the way they expected. Our experience in
the everyday world is not of that sort. There is no homunculus
sitting inside our heads, staring out at the world through our
eyes, enacting some plan of action by manipulating our hands,
10 from embodiment to cognitive extension
and checking carefully to make sure we don’t overshoot when
reaching for the coffee cup. We inhabit our bodies and they in
turn inhabit the world, with seamless connections back and
forth. (2001, 102)
It seems unlikely that immersive virtual reality (VR) is by its very
nature disconnected in this sense. Rather, it is just one more domain
in which a skilled agent may act and perceive. But skill matters, and
most of us are as yet unskilled in such situations. Moreover, the modes
of sensing and interaction supported by current technologies often
remain limited and clumsy, and this turns the user experience into that
of a kind of alert game player rather than that of an agent genuinely
located inside the virtual world.
It is worth noticing, however, that to the young human infant, the
physical body itself may often share some of this problematic character.
The infant, like the VR-exploring adult, must learn how to use
initially unresponsive hands, arms, and legs to obtain its goals (for
some detailed studies, see Thelen and Smith 1994). In so doing, the
infant, like the Toddler robot, learns to make the most of the complex
evolved morphology and passive dynamics of its own body.
These have been selected so as to dramatically reduce the “gap” that
needs to be bridged by the addition of energy and the imposition of
control.
With time and practice, enough bodily ﬂuency is achieved to
make the wider world itself directly available as a kind of unmediated
arena for embodied action. At this point, the extrabodily world
becomes poised to present itself to the user not just as a problem space
(though it is clearly that) but also as a problem-solving resource. For
(as we’ll see in more detail in chap. 2–4) the world, especially when
encountered via inhabited interaction, is a place in which we can act
ﬂuently in ways that simplify or transform the problems that we want
to solve. At such moments, the body has become “transparent equipment”
(Heidegger 1927/1961): equipment (the classic example is the
hammer in the hands of the skilled carpenter) that is not the focus of
attention in use. Instead, the user “sees through” the equipment to the
task in hand. When you sign your name, the pen is not normally your
focus (unless it is out of ink etc.). The pen in use is no more the focus
of your attention than is the hand that grips it. Both are transparent
equipment.7
Doubtless, transparency of this kind may be achieved, with practice,
without the large-scale exploitation of passive-dynamic effects.8
But one way in which evolved agents truly inhabit, rather than simply
the active body 11
control, their bodies may be usefully understood in terms of a profound
ﬁt between morphology and control. The kind of ﬁt is exhibited
by the wild walking systems devised by biological evolution and,
in compelling microcosm, by autonomous, passive-dynamics-based
walking robots.
1.3 Active Sensing
Suppose you were asked to solve the puzzle shown in ﬁgure 1.6. In this
task (Ballard et al. 1997), you are given a model pattern of colored blocks
that you are asked to copy by moving similar blocks from a reserve area
to a new workspace. Using the spare blocks in the reserve area, your
task is to re-create the pattern by moving one block at a time from the
reserve to the new version you are busy creating. The task is performed
using mouse clicks and drags on a computer screen. As you perform,
eye-tracker technology is monitoring exactly where and when you are
looking at different bits of the puzzle.
What problem-solving strategy do you think you would use?
One neat strategy might be to look at the target, decide on the color
Model
Workspace
Resource
hand
eye
1
1
2
3
3
4
4
5 5
2
FIGURE 1.6 Copying a single block within the task. The eye-position
trace is shown by the cross and the dotted line. The cursor trace is shown
by the arrow and the dark line. The numbers indicate corresponding
points in time for the eye and hand traces. (From Ballard et al. 2001, by
permission)
12 from embodiment to cognitive extension
and position of the next block to be added, and then execute the plan
by moving a block from the reserve area. This is, for example, pretty
much the kind of strategy you’d expect of a classical artiﬁcial intelligence
planning system (e.g., STRIPS—the Stanford Research Institute
Problem Solver) as used by the early mobile robot Shakey; see Nilsson
(1984) for a thorough retrospective review.
When asked how we would solve the problem, many of us pay lip
service to this neat and simple strategy. But the lips tell one story while
the hands and eyes tell another. For this is emphatically not the strategy
used by most human subjects. What Ballard et al. found was that
repeated rapid saccades (spontaneous scanning eye movements) to the
model were used in the performance of the task, and many more than
you might expect. For example, the model is consulted both before and
after picking up a block, suggesting that when glancing at the model,
the subject stores only one piece of information: either the color or the
position of the next block to be copied.
To test this hypothesis, Ballard et al. used a computer program to
alter the color of a block while the subject was looking elsewhere. For
most of these interventions, subjects did not notice the changes even
for blocks and locations that had been visited many times before or that
were the focus of the current action. This conﬁrmed that when glancing
at the model, the subject stores only one piece of information: either the
color or the position of the next block to be copied (not both). In other
words, even when repeated saccades are made to the same site, very
minimal information is retained. Instead, repeated ﬁxations provide
speciﬁc items of information “just in time” for use. The experimenters
conclude that
in the block-copying paradigm...ﬁxation appears to be tightly
linked to the underlying processes by marking the location
at which information (e.g., color, relative location) is to be
acquired, or the location that speciﬁes the target of the hand
movement (picking up, putting down). Thus ﬁxation can be
seen as binding the value of the variable currently relevant for
the task. (Ballard et al. 1997, 734)
Two morals matter for the story at hand. The ﬁrst is that visual
ﬁxation is here playing an identiﬁable computational role. As Ballard
et al. (1997) comment, “Changing gaze is analogous to changing the
memory reference in a silicon computer” (725). (These uses of ﬁxation
are thus described using the term “deictic pointers.”) The second is
that repeated saccades to the physical model thus allow the subject to
deploy what Ballard et al. dub “minimal memory strategies” to solve
the active body 13
the problem. The idea is that the brain creates its programs so as to
minimize the amount of working memory that is required and that
eye motions are here recruited to place a new piece of information into
memory. Indeed, by altering the task demands, Ballard et al. were also
able to systematically alter the particular mixes of biological memory
and active, embodied retrieval recruited to solve different versions
of the problem. They conclude that, in this kind of task at least, “eye
movements, head movements, and memory load trade off against each
other in a ﬂexible way” (732).
This is our ﬁrst example of another important characteristic of
embodied, embedded cognition, one that may be called the Principle
of Ecological Assembly (PEA). According to the PEA, the canny cognizer
tends to recruit, on the spot, whatever mix of problem-solving resources will
yield an acceptable result with a minimum of effort. The PEA deliberately
echoes Pfeifer and Scheier’s Principle of Ecological Balance (see sec.
1.1). Pfeifer and Scheier are, however, most interested in the slowly
evolved match among sensory, motor, and neural capabilities and hence
between the organismic bundle and its ecological niche. The PEA, by
contrast, tracks a kind of near-instantaneous version of such overall
balance: the balanced use of a set of potentially highly heterogeneous
resources assembled on the spot to solve a given problem. Ecological
balance of this latter kind is what a ﬂexible ecological control system
seeks to achieve (sec. 1.1).
It is important that, according to the PEA, the recruitment process
marks no special distinction among neural, bodily, and environmental
resources except insofar as these somehow affect the total effort
involved. Though the principle itself seems obvious enough, it is actually
far from obvious how best to unpack the notion of effort so as to
make sense of the idea of trading off one kind of effort (e.g., recall from
biological memory) against another very different kind of effort, such as
the production of a head or eye motion that (let’s assume) retrieves the
very same information. As our discussion progresses, we will encounter
various attempts (see especially chap. 7 and 9) to make quantitative
sense of this important but elusive notion of trade-offs among multiple
heterogeneous sources of information and order.
1.4 Distributed Functional Decomposition
The Ballard et al. model is also our ﬁrst example of an explanatory
strategy that may usefully be called distributed functional decomposition
(DFD). Distributed functional decomposition is a way of understanding
14 from embodiment to cognitive extension
the capacities of supersized mechanisms (ones created by the interactions
of biological brains with bodies and aspects of the local environment)
in terms of the ﬂow and transformation of energy, information,
control, and where applicable, representations.9
The use of the term
functional in distributed functional decomposition is meant to remind
us that even in these larger systems, it is the roles played by various
elements, and not the speciﬁc ways those elements are realized, that
do the explanatory work. (This should not be contentious: Even in the
case of Puppy’s aluminum legs, it is not the material itself that matters
as much as the slippage and give that it provides; sec. 1.1.) The
goal, familiar enough from traditional internalist approaches, is thus
to display some target performance as the outcome of an interacting
multitude of unintelligent (“mechanical”) interactions and effects but
to do so relative to a larger organizational whole. (Imagine, to take a
maximally simple case, an algorithm for addition that uses the agent’s
actual ﬁnger positions as a temporary storage buffer for key intermediate
results.) Such approaches recognize the important contributions
that embodiment and environmental embedding can make to the solution
of a problem and then seek to understand those contributions by
identifying the role of speciﬁc operations (perhaps some gross bodily,
some environment involving, and some neural) in real-time performance
of the task.
Ballard et al. explicitly recognize this element in their approach,
commenting that their model “strongly suggests a functional view of
visual computation where different operations are applied at different
stages during a complex task” (1997, 735). As a result, a Ballard-style
approach is able
to combine the concept that looking is a form of doing with the
claim that vision is computation [integrating the two points by]
introducing the idea that eye movements constitute a form of
deictic coding...that allow perceivers to exploit the world as a
kind of external storage device. (Wilson 2004, 176–177)
Bodily actions here appear as among the means by which certain
(in this case, quite familiar) computational and representational operations
are implemented. The difference is just that the operations are
realized not in the neural system alone but in the whole embodied system
located in the world.
Ballard et al. (1997) suggest using the term “the embodiment
level” to indicate the level at which functionally critical operations
occur at timescales of around one-third second. This corresponds,
nonaccidentally, to the observed frequency of saccades and is, the
the active body 15
authors claim, the timescale at which “the natural sequentiality of
body movements can be matched to the natural computational economies
of sequential decision systems through a system of implicit
reference (called deictic) in which pointing movements are used to
bind objects in the world to cognitive programs” (723). Although
this time frame is doubtlessly important, especially for the speciﬁc
kinds of tasks the authors investigate, I here avoid the identiﬁcation
of (what’s computationally crucial about) embodiment with any speciﬁc
temporal or spatial window. As we shall see later in the text,
body and world play varied and crucial roles at many (often interacting)
timescales.
1.5 Sensing for Coupling
Finally, it is worth pausing to reﬂect on the role of sensing in the Ballard
et al. block-copying scenario. For sensing here plays an importantly different
role to the one associated with classical planning and reasoning.
In the classical model, the role of sensing is to get as much information
into the system as is needed to solve the problem. For example, a planning
agent might scan the environment to build up a problem-sufﬁcient
model of what’s out there and where it is located, at which point the
reasoning engine can effectively throw away the world and operate
instead upon the inner model, planning and then executing a response
(perhaps checking now and then during execution to be sure that nothing
has changed). In the block-copying scenario, by contrast, the agent
does not use sensing to build up a rich inner model sufﬁcient to solve
the problem. Rather, sensing is used repeatedly, with the external scene
functioning as an information store to be called upon just in time for
the task fragment at hand. During all this, the external, screen-based
model acts as “its own best model” (to adapt the famous usage from
roboticist Rodney Brooks; see, e.g., Brooks 1991). Sensing here acts as a
constantly available channel that productively couples agent and environment
rather than as a kind of “veil of transduction” whereby worldoriginating
signals must be converted into a persisting inner model of
the external scene.
For an even more dramatic illustration of this possibility, consider
the now-classic example of running to catch a ﬂy ball in baseball. Giving
perception its standard role, we might assume that the job of the visual
system is to transduce information about the current position of the ball
so as to allow a reasoning system to project its future trajectory. Here,
too, however, nature looks to have found a more elegant and efﬁcient
16 from embodiment to cognitive extension
solution: You simply run so that the optical image of the ball appears
to present a straight-line constant speed trajectory against the visual
background (McBeath, Shaffer, and Kaiser 1995). This solution (the socalled
LOT, for Linear Optical Trajectory, model) exploits a powerful
invariant in the optic ﬂow, discussed in Lee and Reddish (1981). There
is, however, now some debate concerning the precise nature of the
simple invariant we lock onto in solving this kind of problem.10
Thus,
McLeod, Reed, and Dienes (2001, 2002) reported data that conﬂict with
the predictions of the simple LOT model and that seem better predicted
by an Optical Acceleration Cancellation (OAC) model ﬁrst suggested
by Chapman (1968). Shaffer et al. (2003) offer a mixed model combining
uses of both strategies. For present purposes, however, the point
is simply that the canny use of data available in the optic ﬂow enables
the catcher to sidestep the need to create a rich inner model to calculate
the forward trajectory of the ball. In more recent work, multiple uses
of the LOT approach seem to offer a better account of how dogs catch
Frisbees, a more demanding task due to occasional dramatic ﬂuctuations
in the ﬂight path (see Shaffer et al. 2004).
Important for present purposes, such strategies suggest (see also
Maturana 1980) a very different role for the perceptual coupling
itself. Instead of using sensing to get enough information inside, past
the visual bottleneck, so as to allow the reasoning system to “throw
away the world” and solve the problem wholly internally, they use
the sensor as an open conduit allowing environmental magnitudes to exert
a constant inﬂuence on behavior. Sensing is here depicted as the opening
of a channel, with successful whole-system behavior emerging
when activity in this channel is kept within a certain range. What
is created is thus a kind of new, task-speciﬁc agent-world circuit. In
such cases, as Randall Beer puts it, “the focus shifts from accurately
representing an environment to continuously engaging that environment
with a body so as to stabilize appropriate co-ordinated patterns
of behavior” (2000, 97).
Interestingly, human subjects are typically unaware of their own
deployment of such strategies. Shaffer and McBeath (2005) show that
most people, including expert baseball ﬁelders, think that they accurately
perceive where the ball is located in physical space at each point
in the unfolding trajectory, whereas the strategy actually used is unable,
under most conditions, to reveal accurate ball-position information of
this kind. That is, “observers seem to confuse or substitute their reasonably
accurate semantic knowledge of the physical ﬂight of the ball
with the information that is optically available during projectile tracking
tasks” (Shaffer and McBeath 2005, 1500).
the active body 17
Summing up the present section, we seem to confront what is really
a whole spectrum of cases, ranging from the classical extreme (the use
of perception to create a rich inner model sufﬁcient to solve the problem)
to many intermediate cases (e.g., the blocks-copying task where
perception and ongoing bodily engagement are used repeatedly to
retrieve and bind fragments of information just in time for use) to the
(subjectively unobvious) nonclassical extreme (where perception opens
a channel such that minimizing energetic variation within some ﬁxed
range can directly solve a problem). A third (partially overlapping)
characteristic of embodied cognition can thus be added to our list: The
embodied agent is empowered to use active sensing and perceptual coupling in
ways that simplify neural problem solving by making the most of environmental
opportunities and information freely available in the optic array.
1.6 Information Self-structuring
Embodied agents are also able to act on their worlds in ways that actively
generate cognitively and computationally potent time-locked patterns of
sensorystimulation.Inthisvein,Fitzpatricketal.(2003;seealsoMettaand
Fitzpatrick 2003), using both the COG and BABYBOT (ﬁg. 1.7) platforms,
show how active object manipulation (pushing and touching objects in
view) can help generate information about object boundaries. The robot
learns about the boundaries by poking and shoving. It uses motion detection
to see its own hand–arm moving, but when the hand encounters and
pushes an object, there is a sudden spread of motion activity. This cheap
signature picks out the object from the rest of the environment.
In human infants, grasping, poking, pulling, sucking, and shoving
create a rich ﬂow of time-locked multimodal sensory stimulation.
Such multimodal input streams have been shown (Lungarella, Sporns,
and Kuniyoshi 2008; Lungarella and Sporns 2005) to aid category learning
and concept formation. The key to such capabilities is the robot’s or
infant’s capacity to maintain coordinated sensorimotor engagement with
its environment. Self-generated motor activity, such work suggests, acts
as a “complement to neural information-processing” in that
the agent’s control architecture (e.g. nervous system) attends to
and processes streams of sensory stimulation, and ultimately
generates sequences of motor actions which in turn guide the
further production and selection of sensory information. [In this
way] “information structuring” by motor activity and “information
processing” by the neural system are continuously
18 from embodiment to cognitive extension
linked to each other through sensorimotor loops. (Lungarella
and Sporns 2005, 25)
An important implication of this focus on the active self-structuring
of information ﬂows is that timing (and especially, the time-locked
unfolding of multimodal data streams) plays a major functional role
in supporting learning and adaptive response. In work implemented
on the famous COG robot (Brooks et al. 1999), Fitzpatrick and Arsenio
(2004) show that the cross-modal binding of incoming signals that display
common rhythmic signatures can aid a robot in learning about
objects and, by including proprioception as a modality, about the nature
of its own body. The robot ﬁrst detects rhythmic patterns in the individual
modalities (sight, hearing, and proprioception) and then deploys
a binding algorithm to associate signals that display the same kind of
periodicity. Courtesy of such bindings, COG can learn about its own
body parts by binding visual, auditory, and proprioceptive signals.
COG’s arm is noisy in action, unlike our own, so when a human grabs
and moves the robot’s arm out of its ﬁeld of vision it can bind sound
and proprioceptive information. With the arm in view binding occurs
across three modalities. Thus equipped, COG can even learn to identify
FIGURE 1.7 BABYBOT learns about object properties and
affordances by poking and shoving. (From Metta and Fitzpatrick
2003, by permission)
the active body 19
its own arm with the moving image seen in a mirror. Summarizing this
work, the authors write that
our work is an attempt to build a perceptual system which,
from the ground up, focuses on timing just as much as content.
This is powerful because timing is truly cross-modal, and
leaves its mark on all the robot’s senses no matter how they are
processed and transformed. (Fitzpatrick and Arsenio 2004, 65)
Here, then, is a nice example of an approach that combines a bedrock
computational and information-processing perspective with a potent
functional role for timing and environmentally coupled action. We will
meet this combination repeatedly in the chapters that follow. Such work
depicts intelligent response as grounded in processes of information
extraction, transformation, and use, while recognizing the key roles, in
those very processes, played by timing, action, and coupled unfolding.
Information self-structuring may also play a key role in continuous
self-modeling of the kind necessary to regain behavioral competence
following bodily injury or change. Bongard, Zykov, and Lipson (2006)
describe an algorithm (ﬁg. 1.8) by which a robot continuously learns
about its own bodily structure (morphology) by the ongoing generation
of competing internal models that are tested by self-generated
FIGURE 1.8 Outline of the algorithm. (From Josh
Bongard, by permission)
20 from embodiment to cognitive extension
actions. In brief, as the robot acts, it records the resulting sensory data
and then generates a set (15, in the test case of a four-legged physical
robot) of candidate models of its own morphology—models that
would be broadly consistent with those data. It next (and this is the
important part) ﬁnds an action (actuation pattern) that, when executed,
will yield the greatest disagreement across the projected sensory
consequences of the 15 candidate models. It then performs this
action as part of an iterated cycle in which the robot learns about the
possibly changing nature of its own body—for example, adapting to
damage such as the loss of a limb or change such as the grasping of
a tool (for more on this, see chap. 2). The key element in this process
is, of course, the robot’s ability to actively produce the kinds of action
that will yield the greatest information: a clear case of information
self-structuring.
Finally, the active structuring of an information ﬂow is also a potent
between-agent tool, as demonstrated in striking studies by Yu, Ballard,
andAslin (2005). In these studies, a subject, ﬁtted with eye tracker, headmounted
camera, microphone, and hand and body trackers describes,
as if to a child (slowly, with clear enunciations) their current actions (see
ﬁg. 1.9). The verbal descriptions, along with the time-locked stream of
multimodal training data recorded by the eye, head, hand, and body
trackers, are fed to an artiﬁcial neural network. The task of the network
is to learn visually grounded “meanings” for words for some actions
solely by exposure to the time-locked stream of multimodal training
data created by the active “caregiver.” In the presence of this critical
active structuring, the net can learn image–sound associations using
“raw” visual and auditory data (an unsegmented sound stream and an
un-preprocessed video stream) and without the beneﬁt of any inbuilt
“language model.” The demonstration is compelling to watch as, from
this raw but correlated data, the net learns generalizable image–sound
pairings (e.g., it learns to produce phonetic strings such as “sta-pling”
when shown new video recordings of the same action type). The net
has simultaneously learned speech segmentation into meaningful
units and “visually grounded meanings” for the units themselves. Key
to this success is the information carried by the caregivers “embodied
intentions”—that is, their use of eye and body movement to track
and isolate salient aspects of the scene (the ones currently being verbally
described) from the mass of co-occurring visual data. The added
informational punch created by this active structuring of the training
data transforms a daunting learning problem into one that is visibly
tractable without massive prestructuring or much in the way of prior
knowledge.
the active body 21
In many ways, this is simply the ﬂip side of the work on deictic
pointing discussed in the previous section. Deictic pointing allows an
agent to exploit the world as external storage. This work allows the
learner to exploit another agent’s use of deictic pointers (by tracking
those very same eye ﬁxations) as a kind of “gating mechanism that
determines whether co-occurring data are relevant or not” (Yu, Ballard,
and Aslin 2005, 994). As a result, social knowledge transmission is
here supported by the very same kinds of embodied strategy (deictic
uses of eye, head and body motions, and the active generation of timelocked
data ﬂows) that allow the individual learner to simplify her own
problem solving and to learn about the world.
Here, then, is another way embodiment seems to matter to human
cognition. It matters because the presence of an active, self-controlled, sensing
body allows an agent to create or elicit appropriate inputs, generating good
data (for oneself and for others) by actively conjuring ﬂows of multimodal,
correlated, time-locked stimulation. This trick promotes learning, bodily
self-modeling, and categorization and may even (deep breath) hold out
hope for grounded knowledge acquisition.
FIGURE 1.9 The associate training the computational model is wearing
ASL eye tracker, CCD camera, microphone, and position sensors. The
computational model thus shares multisensory information like a human
language learner. This allows the association of coincident signals in
different modalities. (From Yu, Ballard, and Aslin 2005, by permission)
22 from embodiment to cognitive extension
1.7 Perceptual Experience and Sensorimotor Dependencies
The appeal to action and to active sensing also lies at the heart of a
recent, ambitious, and highly inﬂuential attempt to give an account
of perception and of perceptual experience that centers upon what
the agent (implicitly) knows about how sensory stimulation will vary
as a result of change or movement.11
This is in terms of our (implicit,
nonconscious) knowledge or expectations concerning the many complex
ways perceptual stimulations will morph and alter as we move
our eyes, heads, and bodies. Such knowledge is dubbed (O’Regan
and Noë 2001) “knowledge of sensorimotor dependencies” or of
“sensorimotor contingencies”: It is knowledge of the relations between
movement or change and resulting patterns of sensory stimulation.
Though superﬁcially similar, this story about perception and perceptual
experience goes (as we shall see in much more detail in chap.
8) well beyond the claims made by Ballard et al. (1997) or by most
other proponents of so-called active perception (e.g., Churchland,
Ramachandran, and Sejnowski 1994). For where the latter depict the
active use of bodily motion and just-in-time retrieval as ploys that productively
reconﬁgure the tasks to be performed by the brain and central
nervous system, Noë (along with Hurley in press, and others) depicts
the sensorimotor-expectation laden cycles as strongly constitutive of
the perceptual experiences themselves. By strongly constitutive, I mean
they assert a kind of identity such that sameness of active bodies of
sensorimotor knowledge (knowledge of sensorimotor dependencies) is
required for sameness of perceptual experience.
The central claim is thus that differences in what we perceptually
experience correspond to differences in sensorimotor signatures
(patterns of association between movements and the sensory effects
of movement). If two things look different, they do so because, as we
engage them in space and time, we bring to bear (rightly or wrongly)
different sets of sensorimotor expectations. As our encounter proceeds,
theseexpectationsmayormaynotbevalidated.Crucially,itisthiswhole
cycle of (implicit) expecting and subsequent sensory stimulation that is
said to determine the content and character of any given perceptual
experience. The expectations we have must differ as between, for example,
a soccer ball and a rugby ball or an American football. Such differences
underwrite the difference in experienced look. But despite such
differences, for all visually presented objects, there will be some parts of
the sensorimotor signatures in common. It is these commonalities that
are said to make the experiences visual rather than, say, auditory. For
the active body 23
example, vision (unlike audition or touch) only samples the front or
facing sides of objects and so on. The visual attributes of sensed objects
are thus that subset of the signature sensorimotor contingencies that
pertain to the distinctive ways that the visual sense can sample the real
properties of objects. Thus, the very same real property (e.g., size) may
be apprehended by vision or sometimes (for small objects) by touch.
But the mode of sampling varies dramatically and with it the associated
sensorimotor contingencies.
To visually perceive a square object, then, is to bring to bear a body
of diverse practical knowledge concerning how movement of the eyes,
head, or body would produce sensory change (new sensory inputs) as
we inspect or interact with the object. An example is the way a leftward
saccade would bring a certain (left-facing) shape of corner into central
vision, while a rightward saccade would bring a different (right-facing)
shape of corner into central vision. A rich body of such knowledge is
said to constitute our visual perception of the square object. One upshot
of all this, or so it is claimed, is that “what determines phenomenology
is not neural activity set up by stimulation as such, but the way the neural
activity is embedded in a sensorimotor dynamic” (Noë 2004, 227).
For it is arguably the shape of a whole batch of sensorimotor loops that
now determines the nature of the visual experience.
We can now formulate the next feature of recent work that I want
to highlight: attention to the possibility that the substrate (the “vehicles”)
of speciﬁc perceptual experiences may involve whole cycles of world-engaging
activity.
1.8 Time and Mind
Approaches that foreground embodiment, active sensing, and temporally
coupled unfoldings are sometimes rather starkly contrasted with
(any or all of) functional, computational, information-processing, and
information-theoretic approaches to the study of mind and cogni-
tion.12
The proper explanatory tools, when confronted with apparently
intrinsically embodied and richly temporal phenomena, are
instead said to be the geometric constructs and differential equations
of Dynamical Systems Theory (DST). This polarization (among
dynamical and computational and information-theoretic approaches)
is, I think, one of the less happy fruits of recent attempts to put brain,
body, and world together again. I shall largely refrain (but see chap.
9) in the treatment that follows from re-rehearsing my rather liberal
24 from embodiment to cognitive extension
views on the notions of representation, computation, and dynamical
explanation. These views are quite well represented in previous work
(especially Clark 1997a, 1997b, and 2001a). Instead, in a more positive
vein, the various demonstrations, examples, and thought experiments
that populate this book aim to reveal computational, representational,
information-theoretic, and dynamical approaches as deeply complementary
elements in a mature science of the mind. This emerging complementarity
is the ﬁnal feature of recent work that I want to highlight. But to
very brieﬂy motivate this more accommodating perspective, it may
be worth just pausing to say a few words concerning time, dynamics,
and computation (for a much more detailed treatment of these issues,
see Clark 1997b).
One challenge that temporal considerations seem to pose to traditional
forms of explanation and analysis is to account for cases of
what I elsewhere (Clark 1997b) term continuous reciprocal causation.
Continuous reciprocal causation (CRC) occurs when some system
S is both continuously affecting and simultaneously being affected by
activity in some other system O. Internally, we may well confront such
causal complexity in the brain since many neural areas are linked by
both feedback and feedforward pathways (e.g., Van Essen and Gallant
1994). On a larger canvass, we often ﬁnd processes of CRC that crisscross
brain, body, and local environment. Think of a dancer, whose
bodily orientation is continuously affecting and being affected by her
neural states, and whose movements are also inﬂuencing those of her
partner, to whom she is continuously responding! Or imagine playing
improvised jazz in a small combo. Each musician’s playing is inﬂuencing
and being inﬂuenced by everyone else. CRC looks, in fact, to
pervade the ﬁeld of natural adaptive intelligence. The delicate dance
of predator and prey or of mating animals exhibits the same complex
causal structure.
Enter Dynamical Systems Theory. DST is a powerful framework
for describing and understanding the temporal evolution of complex
systems.13
In a typical explanation, the theorist speciﬁes a set of parameters
whose collective evolution is governed by a set of differential
equations. Such equations always involve a temporal element, and in
this way, timing is factored into the heart of the approach. Moreover,
such explanations are easily able to span organism and environment.
In such cases, the two components are treated as a coupled system in a
speciﬁc technical sense; that is, the equation describing the evolution of
each component contains a term that factors in the other system’s current
state (technically, the state variables of the ﬁrst system are also the
parameters of the second, and vice versa).
the active body 25
Thus, consider two wall-mounted pendulums placed in close
proximity on a single wall. The two pendulums will tend (courtesy
of vibrations running along the wall) to become swing synchronized
over time. This process admits of an elegant dynamical explanation
in which the two pendulums are analyzed as a single coupled system
with the motion equation for each one including a term representing
the inﬂuence of the other’s current state (see Salzman and Newsome
1994). A useful way to think of this is by imagining two coevolving
state spaces. Each pendulum traces a course through a space of spatial
and temporal conﬁgurations. But the shape of this space is determined,
in part, by the ongoing activity of the other pendulum, which is itself
behaving in ways continuously modiﬁed by the action of its neighbor.
The crucial upshot of the emphasis on constant mutual interaction
is a corresponding emphasis on what Van Gelder and Port (1995, 14)
usefully term total state. Because we assume that there is widespread
and complex interanimation among multiple systemic factors (x inﬂuences
y and z, and x is itself inﬂuenced by y, which also inﬂuences z,
etc.), the dynamicist chooses to focus on changes in total system state
over time. The various geometric devices used to put intuitive ﬂesh on
the models (trajectories through state spaces populated by attractors,
repellors, etc.; see Clark 2001a, chap. 7, for a brief introduction) thus
reﬂect motion in a space of possible overall system states, with routes
and distances deﬁned relative to points each of which assigns a value to
all the systemic variables and parameters. This emphasis on total state
marks one of the deepest contrasts between (the purest of) dynamical
and standard computationalist approaches, and it is both a boon and
a burden. It is a boon insofar as it allows the dynamicist to respect
the burgeoning complexity of causal webs in which everything (both
inner and outer) is continuously inﬂuencing everything else. Relative
to such cases, the mathematics of a system of interlocking differential
equations can (at least in simple cases) accurately capture the way
two or more systems engage in a continuous, real-time, and effectively
instantaneous dance of mutual codetermining interaction.14
But it is a
burden insofar as it threatens to obscure the speciﬁcally intelligencebased
route to evolutionary success. That route involves the ability to
become apprised of information concerning our surroundings and to
use that information as a guide to present and future action. As soon
as we embrace the notion of the brain as the principal (though not
the only) seat of information-processing activity, we are already seeing
it as fundamentally different from, say, the ﬂow of a river or the activity
of a volcano. And this difference needs to be reﬂected in our scientiﬁc
analysis—a difference that typically is reﬂected when we pursue
26 from embodiment to cognitive extension
the kind of information-processing model associated with computational
approaches, but which threatens to be lost if we treat the brain,
or any other systemic element engaged in information-based problemsolving
activity, in exactly the same terms as the beating of a heart or
the unfolding of a basic chemical reaction.15
The question, in short, is how to do justice to the idea that there
is a principled distinction between knowledge-based and merely
physical-causal systems. It does not seem likely that the dynamicist
will deny that there is a difference (though hints of such a denial are
occasionally found).16
But rather than responding by embracing a different
vocabulary for the understanding and analysis of brain events
(at least as they pertain to cognition), the dynamicist recasts the issue as
the explanation of distinctive kinds of behavioral ﬂexibility and hopes
to explain that ﬂexibility using the very same apparatus that works for
other physical systems. Such an apparatus, however, may not be intrinsically
well suited to explaining the particular way certain neural, and
sometimes bodily and extrabodily, processes contribute to behavioral
ﬂexibility. This is because (a) it is unclear how it can do justice to the
fundamental idea of information-guided choice, and (b) the emphasis
on total state may obscure the kinds of rich structural variation especially
characteristic of information-guided control systems.
Total state explanations do not fare well as a means of understanding
systems in which complex information ﬂow plays a key role. This is
because such systems, as Sloman points out, typically depend on multiple,
“independently variable, causally interacting sub-states” (1993,
80).17
Such systems support great behavioral ﬂexibility by being able
cheaply to alter the inner ﬂow of information in a wide variety of ways.
To understand the operation of a standard computational device, for
example, we may appeal to multiple databases, procedures, and operations.
The real power of the device consists in its ability to rapidly and
cheaply reconﬁgure the way these components interact. Informationbased
control systems thus tend to exhibit a kind of complex articulation
in which what matters most is the extent to which component
processes may be rapidly decoupled and reorganized. This kind of
articulation has been depicted as a pervasive and powerful feature of
real neural processing.18
The fundamental idea is that large amounts of
neural machinery are devoted not to the direct control of action but to
the trafﬁcking and routing of information within the brain. The point,
for present purposes, is that to the extent that neural control systems
exhibit such complex and information-based articulation (into multiple
independently variable information-sensitive subsystems), the sole use
of total state explanations would tend to obscure explanatorily impor-
the active body 27
tant details, such as the various ways in which substate x may vary
independently of substate y and so on.
1.9 Dynamics and “Soft” Computation
The dynamicist should, at this point, reply that the dynamical framework
really leaves plenty of room for the understanding of such variability.
After all, any location in state space can be speciﬁed as a vector comprising
multiple elements, and we may then observe how some elements change
while others remain ﬁxed and so on. This is true. But notice the difference
between this kind of dynamical approach and the radical, total state vision
introduced in section 1.8. If the dynamicist is forced (a) to give an information-based
reading of various systemic substates and processes and (b) to
attend as much to the details of the inner ﬂow of information as to the evolution
of total state over time, then it is unclear that we still confront a radical
alternative to the computational story. Instead, what we seem to end up
with is a very powerful and interesting hybrid: a kind of “dynamical computationalism”
in which the details of the ﬂow of information are every
bit as important as the larger scale dynamics and in which some dynamical
features lead a double life as elements in an information-processing
economy. Indeed, we have already met one such case. The Ballard et al.
model of the role of deictic pointing in the blocks-copying task story analyzed
a cognitive task in part by using recognizable computational and
information-processing concepts. But it also made coupling and ﬁne temporal
coordination crucial and thus applied those familiar computational
and information-processing concepts to a larger, essentially embodied
dynamic whole.19
Such work aims to display the speciﬁc contributions that
embodiment and environmental embedding make by identifying what
might be termed the dynamic functional role of speciﬁc bodily and worldly
operations in the real-time performance of some task.20
This kind of dynamical “soft” computationalism is surely attractive.21
Indeed, it is already the norm in many treatments that combine the use of
dynamical tools with notions of internal representation and/or of neural
computation (see, e.g., Spencer and Schöner 2003; Elman 1995, 2005). Thus,
consider once again those complex loops of reciprocal causal inﬂuence. Let
us assume for now that some such loop is fully internal and involves some
relation of continuous reciprocal causal inﬂuence binding the activity of
two elements. From this, it does not follow that we could not assign representational
and (more broadly) information-processing roles either to
the elements or to their coupled unfolding. It might be, for example, that
the two elements are still best understood as trading in different kinds of
28 from embodiment to cognitive extension
encoding or information, kinds that nonetheless mutually and continuously
modify each other in some useful manner. We shall explore a concrete
example of this involving a neural-bodily loop in chapter 6. There
we examine a recent account of the role of physical gesture in the unfolding
of thought and reason. According to that account, gesture and verbal
thinking differ quite radically in the kinds of information they encode, but
the gestural and verbal systems are nonetheless depicted as coupled in
precisely the manner described earlier.22
In such cases, we need to understand
both the distinctive individual contributions of the various coupled
elements and the powerful effects that ﬂow from their coupled unfolding.
Itshouldbeadmitted,however,thattheissuesconcerningcontinuous
reciprocal causation, and the potential threat it poses to representationalist
and computationalist modes of understanding, are complex ones. For
some forms of CRC may indeed threaten such understandings. This will
be so where the nature of the contributions being made by the “parts” is
itself changing radically over time as a result of the multiple inﬂuences
from elsewhere in the system.23
At the extreme limit, such variability
may undermine attempts to gloss stable types of systemic events as the
bearers or vehicles of speciﬁc contents. It is an empirical question where,
on this continuum of possibilities, biological information-processing lies
(for some discussion, see Clark 1997a, 1997b; Wheeler 2005).
Short of this extreme limit, however, considerations concerning the
importance of time and continuous reciprocal causation mandate not
an outright rejection of the computational/representational vision24
but
rather the addition of a potent and irreducibly dynamical dimension.
Such a dimension may manifest itself in several ways, including the
use of dynamical tools to recover potential information-bearing states
and processes from highly complex (and sometimes bodily and environmentally
extended) webs of causal exchange; the recognition that
intrinsically dynamical and temporal features may sometimes themselves
play identiﬁable representational and/or computational roles;
the (consequent) extension of standard computational ideas to include
analog systems that change continuously in time and that exploit continuous
state; and the recognition (sec. 1.6) of the importance of information
self-structuring (e.g., via the active creation of time-locked ﬂows
of multimodal input) in learning and reasoning.
1.10 Out from the Bedrock
We have now scouted some of the most fundamental ways in which
appeals to the body, to the environment, and to embodied action may
the active body 29
inform our vision and understanding of mind. Firm bedrock is provided
by the wide suite of beneﬁts enabled by the coevolution of morphology,
materials, and control. Moving into the time frame of lifetime learning,
we glimpsed related strategies of “ecological assembly” in which
embodied agents exploit the opportunities provided by dynamic loops,
active sensing, and iterated bouts of environmental exploitation and
intervention. The next three chapters ramp up the complexity, exploring
ﬁrst the surprising lability and negotiability of human sensing and
embodiment, then the transformative potential of material artifacts,
language, and symbolic culture, and leading ﬁnally to the suggestion
that mind itself leaches into body and world.
30
2
The Negotiable Body
2.1 Fear and Loathing
In a short article in the May 2004 edition of WIRED magazine (revealingly
subtitled “Fear and Loathing on the Human–Machine Frontier”),
the futurist and science ﬁction writer Bruce Sterling sounds an increasingly
familiar alarm. After warning us of the imminent dangers of
“brain augmentation,” he adds:
Another troubling frontier is physical, as opposed to mental,
augmentation. Japan has a rapidly growing elderly population
and a serious shortage of caretakers. So Japanese roboticists...envision
walking wheelchairs and mobile arms that
manipulate and fetch.
But there’s ethical hell at the interfaces. The peripherals may
be dizzyingly clever gizmos...but the CPU is a human being:
old, weak, vulnerable, pitifully limited, possibly senile. (116)
But such fears are rooted in a fundamentally misconceived vision
of our own humanity: a vision that depicts us as “locked-in agents”—
as beings whose minds and physical abilities are ﬁxed quantities,
apt (at best) for mere support and scaffolding by their best tools and
technologies. In contrast to this view, I believe that human minds and
the negotiable body 31
bodies are essentially open to episodes of deep and transformative
restructuring in which new equipment (both physical and “mental”)
can become quite literally incorporated into the thinking and acting
systems that we identify as our minds and bodies (see, e.g., Clark 1997a,
2001b, 2003). In this chapter, I pursue this theme with special attention
to the negotiability of our own embodiment.
It helps to start with the commonplace. Sensing and moving are
the spots where the rubber of embodied agency meets the road of the
wider world—the world outside the agent’s organismic boundaries.
The typical human agent, circa 2008, feels herself to be a bounded
physical entity in contact with the world through a variety of standard
sensory channels, including touch, vision, smell, and hearing. It is a
common observation, however, that the use of simple tools can lead
to alterations in that local sense of embodiment. Fluently using a stick,
we feel as if we are touching the world at the end of the stick, not (once
we are indeed ﬂuent in our use) as if we are touching the stick with our
hand. The stick, it has sometimes been suggested, is in some way incorporated,
and the overall effect seems more like bringing a temporary
whole new agent-world circuit into being rather than simply exploiting
the stick as a helpful prop or tool (see Merleau-Ponty 1945/1962
and Gibson 1979; for some more recent explorations of this theme, see
Burton 1993; Reed 1996; Peck et al. 1996; Smitsman 1997; Hirose 2002;
Maravita and Iriki 2004; Wheeler 2005).
In thinking about the case of stick-augmented perception, there
would seem to be two key interfaces at play: the place where the stick
meets the hand and the place where the extended system “biological
agent + stick” meets the rest of the world. When we read about new
forms of human–machine interface, we are again confronted by a
similar duality and an accompanying tension. What makes such interfaces
appropriate as mechanisms for human enhancement is, it seems,
precisely their potential role in creating whole new agent-world circuits.
But insofar as they succeed at this task, the new agent-tool interface
itself fades from view, and the proper picture is one of an extended or
enhanced agent confronting the (wider) world.
A good place to start, then, is with the notion of an interface itself.
2.2 What’s in an Interface?
Haugeland (1998) is, in part, an extended philosophical meditation
on the very idea of an interface. The goal is to uncover the underlying
principles “for dividing systems into distinct subsystems along
32 from embodiment to cognitive extension
nonarbitrary lines” (211). According to Haugeland, the notions of
component, system, and interface are all interdeﬁned and interdeﬁning.
Components are those parts of a larger whole that interact through
interfaces. Systems are “relatively independent and self-contained”
composites of such interfaced components. And an interface itself is
“a point of interactive ‘contact’ between components such that the
relevant interactions are well-deﬁned, reliable and relatively simple”
(Haugeland 1998, 213).
Haugeland is right to point to the nature of interactions as the key
to the location of an interface. We discern an interface where we discern
a kind of regimented, often deliberately designed, point of contact
between two or more independently tunable or replaceable parts.
It does not seem correct, however, to insist that ﬂow across the interface
be simple. The idea here seems to be that we ﬁnd genuine interfaces
only where we ﬁnd energetic or informational bottlenecks, as if an
interface must be a narrow channel yielding what Haugeland describes
as “low bandwidth” coupling. This is important for Haugeland’s argumentative
purpose because he means to show that human sensing
typically yields very task-variable, high-bandwidth forms of agentenvironment
coupling and thus to argue that no genuine interface or
interfaces separate agent and world. Instead (and see also the longer
version of this claim already presented in the Introduction), there is said
to be “intimate intermingling of mind, body and world” (Haugeland
1998, 224).
But although agreeing with Haugeland that sensing is at least sometimes
best understood in terms of direct agent-environment couplings
(as we saw in the previous chapter), his own conclusion that no genuine
interfaces then link agent and world seems premature. Haugeland
depicts these kinds of “open-channel” solutions as involving “tightly
coupled high-bandwidth interaction” (223) and hence as inimical to
the very idea of an agent-world interface.1
But it seems intuitive that
there can be genuine interfaces that support extremely high-bandwidth
forms of coupling. Think, for example, of multiple computers linked
into a network by means of superfast, very high-bandwidth “grid tech-
nologies.”2
There is really no doubt that we here confront a web of distinct
intercommunicating component machines. Yet that web, in action,
can sometimes function as a single uniﬁed resource. Nonetheless, we
still think of it as a web of distinct but interfaced devices. And we do so
not because the point of each machine’s contact with the grid is narrow
(it isn’t) but because there exist, for each machine on the grid, very welldeﬁned
points of potential detachment and reengagement. We discern
interfaces at the points at which one machine can be easily disengaged
the negotiable body 33
and another engaged instead, allowing the ﬁrst to join another grid or
to operate in a stand-alone fashion. Grush (2003, 79) calls this the “plug
points criterion” according to which “components are entities that
can be plugged into, or unplugged from, other components and/or the
system at large.”
An interface, I conclude, is indeed a point of contact between two
items across which the types of performance-relevant interaction are
reliable and well deﬁned. But there is no requirement that such interfaces
be narrow-bandwidth bottlenecks. The way to argue for cognitive
extensions and blurrings of the mind-world boundary is not by casting
doubt on the presence of genuine interfaces (there are plenty of these
within the brain, too, and that doesn’t stop us from distinguishing parts
and roles) but by displaying special features of the ﬂow of information
across those interfaces and by stressing the novel properties of the new
systemic wholes that result. It is to these tasks that we now turn.
2.3 New Systemic Wholes
Biological systems, from lampreys to primates, display remarkable powers
of bodily and sensory adaptability (see Mussa-Ivaldi and Miller 2003;
Bach y Rita and Kercel 2003; Clark 2003). The Australian performance
artist Stelarc routinely deploys a “third hand,” a mechanical actuator
controlled by Stelarc’s brain through commands to muscle sites on his
legs and abdomen.3
Activity at these sites is monitored by electrodes that
transmit signals (via a computer) to the artiﬁcial hand. Stelarc reports
that, after some years of practice and performance, he no longer feels
as if he has to actively control the third hand to achieve his goals. It has
become “transparent equipment” (recall chap. 1), something through
which Stelarc (the agent) can act on the world without ﬁrst willing an
action on anything else. In this respect, it now functions much as his
biological hands and arms, serving his goals without (generally) being
itself an object of conscious thought or effortful control.
Recent experimental work reveals more about the kinds of mechanisms
that may be at work in such cases. A much publicized example
is the work by Miguel Nicolelis and colleagues on a brain-machine
interface (BMI) that allows a macaque monkey to use thought control
to move a robot arm. In the most recent version of this work, Carmena
et al. (2003) implanted 320 electrodes in the frontal and parietal lobes
of a monkey. The electrodes allowed a monitoring computer to record
neural activity across multiple cortical ensembles while the monkey
learned to use a joystick to move a cursor across a computer screen
34 from embodiment to cognitive extension
for rewards. As in previous work, the computer was able to extract
the neural activity patterns corresponding to different movements,
including direction and grip. Next, the joystick is disconnected. But
the monkey is still able to use its neural activity, interpreted through
the intervening computer, to directly control the cursor for rewards,
and it learns to do so. Finally, these commands are diverted to a robot
arm whose actual motions are then translated into on-screen cursor
movements, including an on-screen equivalent of forceful gripping.
This closes the loop. Instead of the monkey merely moving an
unseen robot arm by thought control alone, the movement of the distant
unseen arm now yields visual feedback in the form of on-screen
cursor motion.
When the robot arm was inserted into the control loop, the monkey
displayed a striking degradation of behavior. It took two full days of
practice to reestablish ﬂuent thought control over the on-screen cursor.
The reason was that the monkey’s brain now had to learn to factor in
the mechanical and temporal “friction” created by the new physical
equipment: It had to factor in the mechanical and dynamical properties
of the robot arm and the time delays (which were substantial, in the
60–90 millisecond range) caused by interposing the motion of the arm
between neural command and on-screen feedback. By the time full ﬂuency
was achieved, it is reasonable to conjecture that these properties
of the still unseen distant arm were in some sense incorporated into the
monkey’s own body schema. In support of this, the experimenters were
able to track real long-term physiological changes in the response proﬁles
of frontoparietal neurons following use of the BMI, leading them
to comment that
the dynamics of the robot arm (reﬂected by the cursor movements)
become incorporated into multiple cortical representations...we
propose that the gradual increase in behavioral
performance...emerged as a consequence of a plastic reorganization
whose main outcome was the assimilation of the
dynamics of an artiﬁcial actuator into the physiological properties
of fronto-parietal neurons. (Carmena et al. 2003, 205)
Creatures capable of this kind of deep incorporation of new bodily
(and as we’ll later see, also sensory and cognitive) structure are examples
of what I shall call “profoundly embodied agents.” Such agents are able
constantly to negotiate and renegotiate the agent-world boundary itself.
Although our own capacity for such renegotiation is, I believe,
vastly underappreciated, it really should come as no great surprise,
given the facts of biological bodily growth and change. The human
the negotiable body 35
infant must learn (by self-exploration) which neural commands bring
about which bodily effects and must then practice until skilled enough
to issue those commands without conscious effort. This process has
been dubbed “body babbling” (Meltzoff and Moore 1997) and continues
until the infant body becomes transparent equipment (see 1.6).
Because bodily growth and change continue, it is simply good design
not to permanently lock in knowledge of any particular conﬁguration
but instead to deploy plastic neural resources and an ongoing regime
of monitoring and recalibration (for some excellent discussion, see
Ramachandran and Blakeslee 1998).
2.4 Substitutes
As a second class of examples of recalibration and renegotiation, consider
the plasticity revealed by work in sensory substitution. Pioneered
in the ‘60s and ’70s by Paul Bach y Rita and colleagues, the earliest such
systems were grids of blunt “nails” ﬁtted to the backs of blind subjects
and taking input from a head-mounted camera. In response to the camera
input, speciﬁc regions of the grid became active, gently stimulating
the skin under the grid. At ﬁrst, subjects report only a vague tingling
sensation. But after wearing the grid while engaged in various kinds of
goal-driven activity (walking, eating, etc.), the reports change dramatically.
Subjects stop feeling the tingling on the back and start to report
rough, quasi-visual experiences of looming objects and so forth. After
a while, a ball thrown at the head causes instinctive and appropriate
ducking. The causal chain is “deviant”: It runs via the systematic input
to the back. But the nature of the information carried, and the way it
supports the control of action, is suggestive of the visual modality.
Performance using such devices can be quite impressive. In a recent
article, Bach y Rita, Tyler, and Kaczmarek (2003) note that Tactile-Visual
Substitution Systems (TVSS) have
been sufﬁcient to perform complex perception and “eye”-hand
co-ordination tasks. These have included face recognition,
accurate judgment of speed and direction of a rolling ball with
over 95% accuracy in batting the ball as it rolls over a table
edge, and complex inspection-assembly tasks. (287)
The key to such effective sensory substitution is goal-driven motor
engagement. It is crucial that the head-mounted camera be under the
subject’s intentional motor control. This meant that the brain could, in
effect, experiment through the motor system, giving commands that
36 from embodiment to cognitive extension
systematically varied the input so as to begin to form hypotheses about
what information the tactile signals might be carrying. Such training
yields quite a ﬂexible new agent-world circuit. Once trained in the use
of the head-mounted camera, the motor system operating the camera
could be changed (e.g., to a hand-held camera) with no loss of acuity.
The touch pad, too, could be moved to new bodily sites, and there was
no tactile–visual confusion: An itch scratched under the grid caused no
“visual” effects (for these results, see Bach y Rita and Kercel 2003).
Such technologies, though still experimental, are now increasingly
advanced. The back-mounted grid is often replaced by a tonguemounted
coin-sized array and extensions in other sensory modalities.
Bach y Rita and Kercel (2003) give the nice example of a touch-sensorrich
glove that allows leprosy patients to begin to feel again using their
hands. The patient is ﬁtted with the glove that transmits signals to a
forehead-mounted tactile disc array and rapidly reports feeling sensations
of touch at the ﬁngertips. This is presumably because the motor
control over the sensors runs via commands to the hand, so the sensation
is subsequently projected to that site. (See also the discussion of the
auditory visual-substitution system known as The Voice in sec. 8.3.)
As an aside, it is worth noticing that the line between these kinds
of rehabilitative strategy and wholly new forms of bodily and sensory
enhancement is already thin to the point of nonexistence. There
is advanced work on night-vision versions of sensory substitution, and
at the more dramatic end of this spectrum, it is possible to bypass the
existing sensory peripheries, feeding all manner of signals (including
commercial TV!) directly to the cortex (see Bach y Rita and Kercel 2003,
and the discussion in Clark 2003, 125). Even without penetrating the
existing surface of skin and skull, sensory enhancement and bodily
extension are pervasive possibilities. One striking example (see Schrope
2001) is a U.S. Navy innovation known as a tactile ﬂight suit. The suit
(a kind of vest worn by the pilot) allows even inexperienced helicopter
pilots to perform difﬁcult tasks such as holding the helicopter in a stationary
hover in the air. It works by generating bodily sensations (via
safe puffs of air) inside the suit. If the craft is tilting to the right or left
or forward or backward, the pilot feels a puff-induced vibrating sensation
on that side of the body. The pilot’s own responses (moving in
the opposite direction to correct the vibrations) can even be monitored
by the suit to control the helicopter. The suit is so good at transmitting
and delivering information in a natural and easy way that military
pilots can use it to ﬂy blindfolded. While the pilot wears the suit, the
helicopter behaves very much like an extended body for the pilot: It
rapidly links the pilot to the aircraft in the same kind of closed-loop
the negotiable body 37
interaction that linked Stelarc and the third hand, the monkey and the
robot arm, or the blind person and the TVSS system. What matters, in
each case, is the provision of closed-loop signaling so that motor commands
affect sensory input. What varies is the amount of training (and
hence the extent of deeper neural changes) required to fully exploit the
new agent-world circuits thus created.
It is important, in all these cases, that the new agent-world circuits be
trained and calibrated in the context of a whole agent engaged in worlddirected
(goal-driven) activity. One sign of successful calibration is, as we
noted earlier, that once ﬂuency is achieved, the speciﬁc details of the (old
or new) circuitry by which the world is engaged fall “transparent” in use.
The conscious agent is then aware of the oncoming ball, not (usually) of
seeing the ball or (by the same token) of using a tactile substitution channel
to detect the ball. In just this way, the tactile-vest-wearing pilot becomes
aware of the aircraft’s tilt and slant, not of the puffs of air.
In all these diverse ways, humans and other primates are revealed
as constantly negotiable bodily platforms of sense, experience, and
(as we’ll see in later chapters) reasoning, too. Such platforms are biologically
primed so as to ﬂuidly incorporate new bodily and sensory
kit, creating brand new systemic wholes. This is just what one would
expect of creatures built to engage in what we earlier (sec. 1.1) called
“ecological control”: systems evolved so as to constantly search for
opportunities to make the most of the reliable properties and dynamic
potentialities of body and world.
2.5 Incorporation Versus Use
A very natural doubt to raise, at about this point, would be the following:
Critic: “You are making quite a song and a dance out of this,
what with talk of brand new systemic wholes and so on. But we
all know we can use tools and that we can learn to use them ﬂuently
and transparently. Why talk here of new systemic wholes,
of extended bodies and reconﬁgured users, rather than just the
same old user in command of a new tool?”
This is the right question to ask. We have already begun to see a hint
of the answer in the quoted comments of Carmena et al. concerning the
“assimilation of the dynamics of an artiﬁcial actuator into the physiological
properties of fronto-parietal neurons.” To bring the key idea
into focus, it helps next to consider a closely related body of research on
tool use by primates.
38 from embodiment to cognitive extension
Recent years have seen the discovery, in primate brains, of a variety
of so-called bimodal neurons. These are “pre-motor, parietal and
putaminal neurons that respond both to somatosensory information
from a given body region (i.e., the somatosensory Receptive Field; sRF)
and to visual information from the space (visual Receptive Field; vRF)
adjacent to it” (Maravita and Iriki 2004, 79).
For example, some neurons respond to somatosensory stimuli
(light touches) at the hand and to visually presented stimuli near the
hand so as to yield an action-relevant coding of visual space. In a series
of experiments, recordings were taken from bimodal neurons in the
intraparietal cortex of Japanese macaques while the macaques learned
to reach for food using a rake. The experimenters found that after just
ﬁve minutes of rake use, the responses of some bimodal neurons whose
original vRFs picked out stimuli near the hand had expanded to include
the entire length of the tool, “as if the rake was part of the arm and forearm”
(Maravita and Iriki 2004, 79). Similarly, other bimodal neurons,
which previously responded to visual stimuli within the space reachable
by the arm, now had vRFs that covered the space accessible by
the arm-rake combination.4
After surveying a number of other related
ﬁndings, including some fascinating work in which similar effects are
observed after experience of reaching with a virtual arm in an on-screen
display, Maravita and Iriki conclude: “Such vRF expansions may constitute
the neural substrate of use-dependent assimilation of the tool
into the body-schema, suggested by classical neurology” (2004, 80).
In human subjects suffering from unilateral neglect (in which stimuli
from within a certain region of egocentrically coded space are selectively
ignored), it has been shown that the use of a stick as a tool for
reaching actually extends the area of visual neglect to encompass the
space now reachable using the tool (see Berti and Frassinetti 2000). Berti
and Frassinetti conclude that
the brain makes a distinction between “far space” (the space
beyond reaching distance) and “near space” (the space within
reaching distance) [and that]...simply holding a stick causes a
remapping of far space to near space. In effect the brain, at least
for some purposes, treats the stick as though it were a part of
the body. (2000, 415)
The plastic neural changes reported by Carmena et al., and now
further emphasized by Maravita and Iriki and by Berti and Frassinetti,
suggest a real (philosophically important and scientiﬁcally wellgrounded)
distinction between true incorporation into the body schema
and mere use. The body schema, it is important to note, is not the same
the negotiable body 39
as the body image, though the two can sometimes be related. As I shall
use the terms (see Gallagher 1998), the body image is a conscious construct
able to inform thought and reasoning about the body. The body
schema, by contrast, names a suite of neural settings that implicitly (and
nonconsciously) deﬁne a body in terms of its capabilities for action, for
example, by deﬁning the extent of “near space” for action programs.5
We can certainly imagine tool users (perhaps even ﬂuent tool users?)
whose brains were not engineered so as to adapt the body schema in
these ways. Such beings would always use tools the way we typically
begin to use them: by roughly representing the tool and its features and
powers (e.g., its length) and calculating effective uses accordingly. We
can probably even imagine beings who were so fast and good at these
calculations as to deploy the tools with the same skill and efﬁcacy as an
expert human agent. The contrast that would remain, even in the latter
kind of case, would be between (a) the skilled agent’s ﬁrst explicitly
representing the shape, dimensions, and powers of the tool and then
inferring (consciously or otherwise) that she can now reach such and
such and do such and such and (b) agents whose brains were so constituted
that experience with the tool results in, for example, a suite of
altered vRFs such that objects within tool-augmented reaching range
are now automatically treated as falling within near space. These are
surely distinct strategies. The latter strategy might be especially recommended
for beings whose bodies (like our own) are naturally subject to
growth and change, as it seems designed to support genuine episodes
of integration across change: cases that can now be deﬁned as cases in
which plastic neural resources become recalibrated (in the context of
goal-directed whole agent activity) so as to automatically take account
of new bodily and sensory opportunities. In this way, to paraphrase
Varela, Thompson, and Rosch (1991), our own embodied activity enacts
or brings forth new systemic wholes.
2.6 Toward Cognitive Extension
Could anything like this notion of incorporation (rather than mere use)
and the consequent emergence of new systemic wholes get a grip in
the more ethereal domain of mind and cognition? Could human minds
be genuinely extended and augmented by cultural and technological
tweaks, or is it (as many evolutionary psychologists, such as Pinker 1997,
would have us believe) just the same old mind with a shiny new tool?
Here,thestoryismurkierbyfar.Myownview,aswillbecomeincreasingly
clear, is that external and nonbiological information-processing
40 from embodiment to cognitive extension
resources are also apt for temporary or long-term recruitment and incorporation
rather than simply knowledge-based use (see Clark 1997a, 2003;
Clark and Chalmers 1998). To whatever extent this holds, we are not just
bodily and sensorily but also cognitively permeable agents. But whereas
we can now begin to point, in the case of basic tool use, to the distinctive
kinds of visible neural changes that accompany the genuine assimilation
of tools or of new bodily structure, it is harder to know just what to look
for in the case of mental and cognitive routines. For the present, we may
look for some preliminary hints from the more basic case of physical and
sensory augmentation and incorporation.
It may be helpful ﬁrst to display the bare logical possibility of such
cognitive extension. For even the bare possibility, some might feel, is
ruled out by a simple argument to the effect that, as an anonymous
journal referee once put it, “cognitive enhancement requires that the
cognitive operations of the resource be intelligible to the agent.” If this
were so, cognitive enhancement would always be in some clear sense
superﬁcial: It would provide tools while leaving the user fundamentally
untouched. But the argument is ﬂawed because the cognitive
operations of much of my own brain (even those elements that mature
later during development) are not thus intelligible to me, the conscious
agent. Yet those operations surely help make me the cognitive agent
I am. It also helps to reﬂect that biological brains must sometimes
change and evolve by coordinating old activities and processes with
new ones made available (e.g., by maturation and growth) courtesy of
new or subtly altered structures. To insist that such change requires the
literal intelligibility of the operations of the new by the old, rather than
simply the emergence of appropriate integration and coordination, is to
miss the potential for new wholes that are then themselves the determiners
of what is and is not intelligible to the agent. It must thus be possible,
at least in principle, for new nonbiological tools and structures to
likewise become sufﬁciently well integrated into our problem-solving
activity as to yield new agent-constituting wholes. What might such
integration (genuine cognitive incorporation) require?
Consider the case when some existing neural system or systems
learn a complex problem-solving routine that makes a variety of deep
implicit commitments to the robust bioexternal availability of certain
operations and/or bodies of information. This is the cognitive equivalent,
I suggest, of the implicit commitments to details of bodily shape
and potentials for action made (in the case of the rake) by rapidly retuning
the receptive ﬁelds of key bimodal neurons and (in the case of the
robot arm) by retuning key cortical representations (speciﬁcally, populations
of frontoparietal neurons).
the negotiable body 41
A quick (though frequently misused; see the critical discussion
in sec. 7.3) illustration is provided by recent work on so-called
change blindness. In this work (see Simons and Rensink 2005, for
a balanced review), simple experimental manipulations, involving
the masking of motion transients while various changes are made
to a visually presented scene, reveal the surprising sparseness of the
change-specifying information easily available to conscious reﬂection.
Subjects seldom spot quite large and important changes, even
when the changes are made in focal vision. Subjects are frequently
amazed when they realize just how much has changed without their
noticing it. How should we reconcile the limitations of such conscious
change spotting with our strong sense of rich visual contact
with our surroundings? Part of the answer (and see chap. 7 and 8 for
more discussion) may be that the strong feeling of rich visual contact
is really a reﬂection of something implicit in the larger overall
problem-solving organization in which moment-by-moment vision
merely participates. That larger organization “assumes” the (ecologically
normal) ability to retrieve, via saccades or head and body
movements, more detailed information as and when needed. Given
such “availability on demand,” we feel (correctly, in an important
sense) that we (qua agents engaged in knowledge-based interactions
with the world) are fully in command of the detail (for this idea, see
O’Regan and Nöe 2001; Clark 2002).
Or recall the use of visual ﬁxation for binding in the block-copying
task described in section 1.3. Here, the brain deploys a problem-solving
routine that directly factors in the availability of certain types of information
by certain types of embodied action. It is in just this way that
nonbiological informational resources can become—either temporarily
or more or less permanently—deeply incorporated into a subpersonally
deﬁned problem-solving whole. In such cases, a problem-solving
routine is delicately geared to automatically exploit, on pretty much
an equal footing, both internal and (bio)external forms of information
storage.6
Rather than drawing a ﬁrm line around the inner encodings,
we thus expand the relevant forms of storage and retrieval to include
inner biological resources, environmental structure, and the data (and
operations) made available by cognitive artifacts such as notebooks
and laptops. As we move toward an era of wearable computing and
ubiquitous information access, the robust, reliable information ﬁelds
to which our brains delicately adapt their inner cognitive routines will
surely become increasingly dense and powerful, perhaps further blurring
the boundaries between the cognitive agent and his or her best
tools, props and artifacts.7
42 from embodiment to cognitive extension
2.7 Three Grades of Embodiment
We can now distinguish three grades of embodiment. Let’s call them
(simply if unimaginatively) mere embodiment, basic embodiment, and
profound embodiment. A merely embodied creature or robot is one
equipped with a body and sensors, able to engage in closed-loop interactions
with its world, but for whom the body is nothing but a highly
controllable means to implement practical solutions arrived at by pure
reason. A basically embodied creature or robot would then be one (we
saw several in chap. 1) for whom the body is not just another problem
space, requiring constant micromanaged control, but is rather a resource
whose own features and dynamics (of sensor placement, of linked tendons
and muscle groups, etc.) could be actively exploited allowing for
increasingly ﬂuent forms of action selection and control. Much (though
by no means all) work in contemporary robotics has explored this middle
ground of modest embodiment. Such systems are, however, congenitally
unable to learn new kinds of body-exploiting solution “on the
ﬂy,” in response to damage, growth, or change. By contrast, as we have
seen, biological systems (and especially we primates) seem to be speciﬁcally
designed to constantly search for opportunities to make the
most of body and world, checking for what is available, and then (at
various timescales and with varying degrees of difﬁculty) integrating
new resources very deeply, creating whole new agent-world circuits
in the process. A profoundly embodied creature or robot is thus one
that is highly engineered to be able to learn to make maximal problemsimplifying
use of an open-ended variety of internal, bodily, or external
sources of order.
Why describe this as profound embodiment rather than as a return
to the outdated (or so many of us believe; see Clark 1997a, for a review)
image of mind as a truly disembodied organ of control? The answer is
that these kinds of minds are not in the least disembodied. Rather, they
are promiscuously body-and-world exploiting. They are forever testing
andexploringthepossibilitiesforincorporatingnewresourcesandstructures
deep into their embodied acting and problem-solving regimes.
They are, to use the jargon of Clark (2003), the minds of “natural-born
cyborgs”—of systems continuously renegotiating their own limits, components,
data stores, and interfaces. On this account, the body is both
critically important and constantly negotiable. It is critically important
as a key player on the problem-solving stage. It is not simply the point at
which processes of transduction pass the real problems (now rendered
in rich internal representational formats) to an inner engine of disembodied
reason. Instead, much of our successful performance depends
the negotiable body 43
on constant and subtle trade-offs among morphology, real-world action
and opportunities, and neural control strategies. But this empowering
body is constantly negotiable, constructed moment by moment from
the ﬂux of willed action and resulting sensory stimulation.
Those ﬁrst waves of fear and loathing now give way to something
more rewarding. Sterling (sec. 2.1) saw frightening scenes of a
merely superﬁcially augmented agent within whom “the CPU is a human
being: old, weak, vulnerable, pitifully limited, possibly senile.” Such
fears play upon a deeply misguided image of who and what we already
are. They play upon an image of the human agent as doubly locked
in: as a ﬁxed mind (one constituted solely by a given biological brain)
and as a ﬁxed bodily presence in a wider world. Fortunately for us,
human minds are not old-fashioned CPUs trapped in immutable and
increasingly feeble corporeal shells. Instead, they are the surprisingly
plastic minds of profoundly embodied agents: agents whose boundaries
and components are forever negotiable and for whom body, sensing,
thinking, and reasoning are all woven ﬂexibly and repeatedly from the
accommodating weave of situated, intentional action.