1.2 Undmtanding Com#kx IY5formdtion-ProcessingSystems 19
The other piece of work was Horn's (1975) analysis of shape from
shading,which was the first in what was to become a distinguishedseries
of articles on the formation of images. By carefully analyzing the way in
which the illumination, surface geometry, surface reflectance, and viewpoint
conspiredto createthe measured intensityvalues in an image,Horn
formulated a differential equation that related the image intensity values
to the surface geomeuy. If the surface reflectance and illumination are
known,one can solvefor the surfacegeometry(see alsoHorn, 1977).Thus
from shading one can derive shape.
The messagewas plain.There must exist an additionallevel of understandingatwhich
the characterof the information-processingtasks carried
out during perception are analyzed and understood in a way that is independent
of the particularmechanismsand structuresthat implementthem
in our heads. This was what was missing-the analysis of the problem as
an information-processingtask. Such analysis does not usurp an understanding
at the other levels-of neurons or of computer programs-but
it is a necessary complementto them,sincewithout it there can be no real
understanding of the function of all those neurons.
This realizationwas arrived at independentlyand formulatedtogether
by Tomaso Poggio in Tiibingen and myself (Marr and Poggio, 1977;Marr,
197%). Itwas not evenquitenew-Leon D. Harmonwas sayingsomething
similar at about the sametime, and others had paid lip service to a similar
distinction.But the important point is that if the notion of different types
of understanding is taken very seriously,it allows the study of the information-processing
basis of perception to be made n&rous. It becomes
possible,by separating explanations into different levels, to make explicit
statements about what is being computed and why and to constructtheories
stating that what is being computed is optimal in some sense or is
guaranteed to function correctly The ad hoc element is removed, and
heuristic computer programs are replaced by solid foundationson which
a real subject can be built. This realization-the formulation of what was
missing,together with a clear idea of how to supplyit-formed the basic
foundation for a new integrated approach,which it is the purpose of this
book to describe.
1.2 UNDERSTANDING COMPLEX
INFORMATION-PROCESSING SYSTEMS
Almost never can a complexsystem of anykind be understood as a simple
extrapolationfromthe propertiesof its elementarycomponents.Consider,
forexample,somegas inabottle.Adescriptionof thermodynamiceffects-
20 B e Philosophy and theApproach
temperature, pressure, density and the relationships among these factors-is
not formulated by using a large set of equations,one for each of
the particles involved.Sucheffects are describedat their own level,that of
an enormous collection of particles;the effort is to show that in principle
the microscopic and macroscopic descriptions are consistent with one
another. If one hopes to achieve a full understanding of a system as complicated
as a nervous system, a developing embryo, a set of metabolic
pathways,abottleof gas,or evena largecomputerprogram,then one must
be prepared to contemplatedifferent kindsof explanation at different levelsof
descriptionthat are linked,atleastinprinciple,intoacohesivewhole,
even if linking the levels in complete detail is impractical.For the specific
case of a system that solves an information-processingproblem, there are
in addition the twin strands of process and representation, and both these
ideas need some discussion.
Representation and Description
A representation is a formal system for making explicit certain entities or
types of information,together with a specificationof how the system does
this.And I shallcallthe result of using a representation to describe agiven
entityadamption of the entityin that representation (Marr and Nishihara,
1978).
For example,the Arabic, Roman, and binary numeral systems are all
formal systems for representingnumbers. The Arabic representation consists
of a string of symbols drawn from the set (0, 1,2, 3, 4, 5, 6, 7, 8, 9),
and the rule for constructing the description of a particular integer n is
that one decomposesn into a sum of multiplesof powers of 10and unites
these multiples into a string with the largest powers on the left and the
smallest on the right. Thus,thirty-sevenequals 3 x 10' + 7 x lo0,which
becomes 37,the Arabic numeral system's description of the number.What
this descriptionmakes explicitis the number'sdecompositionintopowers
of 10.The binary numeral system'sdescription of the number thirty-seven
is 100101,and this descriptionmakesexplicitthe number's decomposition
intopowers of 2. In the Roman numeralsystem,thirty-sevenis represented
as XXXVII.
This definition of a representation is quite general. For example, a
representationfor shape would be a formal scheme for describing some
aspectsof shape,togetherwith rulesthatspecifyhowtheschemeisapplied
to any particular shape.A musical score provides a way of representinga
symphony;the alphabetallowsthe constructionof awritten representation
1.2 UnderstandingComplexInformation-ProcessingSystems 21
of words; and so forth.The phrase "formal scheme" is criticalto the definition,but
the reader shouldnot be frightened by it.The reason is simply
that we are dealing with information-processing machines, and the way
such machines work is by using symbolsto stand for things-to represent
things, in our terminology.Tosaythat somethingisaformalschememeans
only that it is a set of symbolswith rules for putting them together-no
more and no less.
A representation, therefore, is not a foreign idea at all-we all use
representations all the time. However, the notion that one can capture
some aspect of reality by making a description of it using a symbol and
that to do so can be useful seems to me a fascinating and powerful idea.
But even the simple examples we have discussed introduce some rather
generaland important issues that arisewhenever one chooses to use one
particular representation.For example,if one choosesthe Arabic numeral
representation, it is easy to discover whether a number is a power of 10
but difficultto discoverwhetherit isapower of 2. If one choosesthe binary
representation, the situation is reversed. Thus, there is a trade-off; any
particular representationmakescertaininformation explicitatthe expense
of information that is pushed into the background and may be quite hard
to recover.
This issue is important, because haw information is represented can
greatly affect how easy it is to do different things with it. This is evident
even from our numbers example:It is easyto add,to subtract,and even to
multiply if the Arabic or binary representations are used,but it is not at all
easyto do thesethings-especially multiplication-with Roman numerals.
This is a key reason why the Roman culturefailedto develop mathematics
in the way the earlierArabic cultures had.
An analogous problem faces computer engineers today. Electronic
technology is much more suited to a binary number system than to the
conventionalbase 10system,yet humans supplytheir data and require the
results in base 10.The design decision facing the engineer, therefore, is,
Should one pay the cost of conversion into base 2, carry out the arithmetic
in a binary representation,and then convert back into decimal numbers
on output;or should one sacrifice efficiencyof circuitryto carry out operations
directly in a decimal representation?On the whole, business computers
and pocket calculators take the secondapproach, and generalpurpose
computers take the first. But even though one is not restricted to
using just one representationsystem for a given type of information,the
choice of which to use is important and cannot be taken lightly It determineswhat
informationis made explicitand hence what is pushed further
into the background, and it has a far-reaching effect on the ease and
22 l%e Philosophy and theApproach
difficultywith which operations may subsequentlybe carried out on that
information.
Process
The termprocess is very broad. For example,addition is a process, and so
is takmg a Fourier transform. But so is making a cup of tea, or going
shopping.For the purposes of this book, I want to restrict our attentionto
the meanings associated with machines that are carrying out informationprocessingtasks.Soletus
examine in depththe notions behind onesimple
such device,a cash register at the checkout counter of a supermarket.
There are several levels at which one needs to understand such a
device, and it is perhaps most useful to think in terms of three of them.
The most abstract is the level of what the device does and why. What it
does is arithmetic, so our first task is to master the theory of addition.
Addition is a mapping,usually denoted by +,from pairs of numbers into
singlenumbers;for example, + maps the pair (3,4) to 7,and I shallwrite
this in the form (3 + 4) +7.Addition has a number of abstractproperties,
however. It is commutative:both (3 + 4) and (4 + 3) are equal to 7;and
associative: the sum of 3 + (4 + 5) is the same as the sum of (3 + 4)
+ 5. Then there is the unique distinguished element,zero, the adding of
which has no effect:(4 + 0) 4.Also,for every number there is a unique
"inverse,"written (-4) in the case of 4,which when added to the number
gives zero: [4 + (-4)]+= 0.
Notice that these properties are part of the fundamental theory of
addition.They are true no matter how the numbers arewritten-whether
in binary Arabic,or Roman representation-and no matter how the addition
is executed. Thus part of this first level is something that might be
characterizedas what is being computed.
The other half of this level of explanation has to do with the question
of why the cash register performs addition and not, for instance,multiplication
when combining the prices of the purchased items to arrive at a
final bill. The reason is that the rules we intuitively feel to be appropriate
for combining the individual prices in fact define the mathematical operation
of addition.These can be formulated as ~01?sdaint..in the following
way:
1. If you buy nothing,it shouldcostyou nothing;and buying nothing
and something should cost the same as buying just the something. (The
rules for zero.)
1.2 UmkntandingComplexInfxtnu%ion-Procesn'ngSystems 23
2. The order in which goods are presented to the cashier should not
affect the total. (Commutativity)
3. Arranging the goods into two piles and paying for each pile separately
should not affect the total mount you pay (Associativity;the basic
operationfor combining prices.)
4. If you buy an item and then return it for a refund,your total expenditure
should be zero. (Inverses.)
It is a mathematical theorem that these conditions define the operationof
addition,which is therefore the appropriatecomputation to use.
This whole argument is what I call the conlputational theory of the
cash register. Its important features are (1) that it contains separate arguments
aboutwhat is computed andwhyand (2) that the resultingoperation
is defineduniquelyby the constraintsit has to satisfy Inthe theoryof visual
processes,the underlyingtask is to reliably deriveproperties of the world
from images of it;the business of isolating constraints that are both powerful
enough to allow a process to be defined and generally true of the
world is a central theme of our inquiry.
In order that a process shall actually run, however,one has to realize
it in some way and therefore choose a representation for the entities that
the process manipulates. The second level of the analysis of a process,
therefore,involves choosing two things: (1) a representation for the input
and for the output of the process and (2) an algorithm by which the
transformation may actuallybe accomplished. For addition,of course,the
input and output representations can both be the same,because they both
consist of numbers. However this is not true in general. In the case of a
Fourier transform, for example,the input representation may be the time
domain, and the output, the frequency domain. If the first of our levels
specifies what and why, this second level specifieshow. For addition,we
might choose Arabic numerals for the representations, and for the algorithm
we could follow the usual rules about adding the least significant
digits first and "carrying" if the sum exceeds 9. Cash registers, whether
mechanicalor electronic, usually use this type of representation and algo-
rithm.
There are three important points here. First, there is usually a wide
choice of representation.Second,the choice of algorithm often depends
rather critically on the particular representation that is employed. And
third,evenfor agivenfixedrepresentation,there are often severalpossible
algorithms for carrying out the same process. Which one is chosen will
usually depend on anyparticularly desirableor undesirable characteristics
that the algorithms may have; for example, one algorithm may be much
24 The Philosophy and theApproach
more efficient than another, or another may be slightly less efficient but
more robust (that is, less sensitive to slight inaccuracies in the data on
which it must run). Or again, one algorithm may be parallel,and another,
serial.The choice,then,maydepend onthe type of hardware or machinery
in which the algorithm is to be embodied physically
Thisbrings us to the thirdlevel,thatof the deviceinwhichtheprocess
is to be realized physically The important point here is that, once again,
the same algorithm may be implemented in quite different technologies.
The childwho methodically adds two numbers from right to left,carrying
a digit when necessary, may be using the same algorithm that is implemented
by the wires and transistors of the cash register in the neighborhood
supermarket,but the physical realization of the algorithm is quite
different in these two cases.Another example: Many people have written
computerprogramsto play tic-tac-toe,and there is a more or lessstandard
algorithm that cannot lose. This algorithm has in fact been implemented
by W. D. Hillis and B. Silverman in a quite different technology, in a computer
made out of Tinkertoys,a children's wooden building set.Thewhole
monstrouslyungainly engine,which actuallyworks, currentlyresides in a
museum at the Universityof Missouri in St.Louis.
Somestylesof algorithmwill suitsomephysical substratesbetter than
others. For example, in conventional digital computers, the number of
connections is comparable to the number of gates, while in a brain, the
number of connections is much larger (x lo4)than the number of nerve
cells. The underlying reason is that wires are rather cheap in biological
architecture,because they can grow individuallyand in three dimensions.
In conventional technology, wire laying is more or less restricted to two
dimensions, which quite severely restricts the scope for using parallel
techniques and algorithms; the same operations are often better carried
out serially
The Three Levels
We can summarizeour discussion in somethinglike the manner shown in
Figure 1-4, which illustratesthe different levels at which an inforrnationprocessing
device must be understood before one can be said to have
understood it completely At one extreme, the top level, is the abstract
computationaltheory of the device,inwhichthe performanceof the device
is characterizedas a mapping from one kind of informationto another,the
abstract properties of this mapping are defined precisely, and its appropriateness
and adequacy for the task at hand are demonstrated. In the
center is the choice of representation for the input and output and the
1.2 UnderstandingComplexInfmtion-Processing Systems 25
Computational theory
What is the goal of the
computation,why is it
appropriate,and what
is the logic of the strategy
by which it can be
carried out?
Representationand Hardware
algorithm implementation
How can this computa- How can the representional
theory be imple- tation and algorithmbe
mented?In particular, realized physically?
what is the representation
for the input and
output, and what is the
algorithmfor the trans-
formation?
Figure 14. The three levels at which any machine carrying out an informationprocessing
task must be understood.
algorithm to be used to transform one into the other. And at the other
extreme are the details of how the algorithm and representationare realized
physically-the detailed computer architecture, so to speak. These
three levels are coupled, but only loosely The choice of an algorithm is
influencedfor example,by what it has to do and by the hardware inwhich
it must run. But there is a wide choice available at each level, and the
explication of each level involves issues that are rather independent of the
other two.
Eachof thethree levelsof descriptionwillhave itsplace inthe eventual
understanding of perceptual information processing, and of course they
are logically and causally related. But an important point to note is that
since the three levels are only rather loosely related, some phenomena
may be explained at only one or two of them. This means, for example,
that a correct explanation of some psychophysical observation must be
formulated at the appropriate level. In attempts to relate psychophysical
problems to physiology, too often there is confusion about the level at
which problems should be addressed. For instance, some are related
mainly to the physical mechanisms of vision-such as afterimages (for
example, the one you see after staring at a light bulb) or such as the fact
that any color can be matched by a suitablemixture of the three primaries
(a consequence principally of the fact that we humans have three types of
cones). On the other hand, the ambiguity of the Necker cube (Figure1-5)
seems to demand a different kind of explanation. To be sure, part of the
explanationof itsperceptual reversalmusthave to dowith abistable neural
network (that is, one with two distinct stable states)somewhereinside the
26 7bePhilosophy and theApproach
Figure 1-5. The so-called Necker illusion, named after L. A. Necker, the Swiss
naturalist who developed it in 1832. The essence of the matter is that the twodimensional
representation (a) has collapsed the depth out of a cube and that a
certain aspect of human vision is to recover this missing third dimension. The
depth of the cube can indeed be perceived, but two interpretationsare possible,
(b) and (c). A person's perception characteristicallyflips from one to the other.
brain, but fewwould feel satisfied by an account that failed to mention the
existence of two different but perfectly plausible three-dimensionalinterpretations
of this two-dimensionalimage.
For some phenomena, the type of explanation required is fairly
obvious.Neuroanatomy,for example,is clearlytied principallyto the third
level,the physical realization of the computation.The same holds for synaptic
mechanisms, action potentials, inhibitory interactions,and so forth.
Neurophysiology,too, is related mostly to this level,but it can also help us
to understand the type of representations being used, particularly if one
accepts something along the lines of Barlow's views that I quoted earlier.
But one has to exercise extreme caution in making inferencesfrom neurophysiological
findings about the algorithms and representations being
used, particularly until one has a clear idea aboutwhat information needs
to be represented and what processes need to be implemented.
Psychophysics,on the other hand, is related more directlyto the level
of algorithm and representation.Different algorithms tend to fail in radically
different ways as they are pushed to the limits of their performance
or are deprived of critical infornlation.As we shall see, primarily psychophysical
evidenceproved to Poggio and myself that our first stereo-matching
algorithm (Marr and Poggio, 1976)was not the one that is used by the
brain, and the best evidence that our second algorithm (Marr and Poggio,
1979) 13roughly the one that is used also comes from psychophysics. Of
course, the underlying computationaltheory remained the same in both
cases, only the algorithms were different.