Depth Perception
We view the world from one vantage point; ourselves.
The outside world projects images on our retinas.
How do we see how far away something is?
The retinal image is only two-dimensional, similar to a photograph.
There is no depth in a photographic print.
The ability to see and place objects at various distances
from us is called depth perception.
We have two kinds of depth perception.
The first is unambiguous and based on geometrical principles
familiar to a surveyor.
The second is ambiguous; the eye/mind sees phenomena
characteristically related to distance and goes one step further
to map them as distance.
Flat media artists must make use of these ambiguous cues to simulate distance.
Unambiguous geometrical depth
Accommodation
Accommodation is the muscular action changing the focal length of the
eye lens so as to place a focused image on the fovea of the retina.
Both the muscular action and the lack of focus of adjacent depths
provide information to the brain that can be used to sense depth.
Image sharpness/fuzziness is an ambiguous depth cue, as we have
already seen in the discussion of
depth of field in cameras.
However, by changing the focused plane (looking closer and/or
further than the primary object), the ambiguities are resolved.
A photograph will not change its relative focus; the three-dimensional
world does.
Convergence
In order to see an object close to you and fuse the image on both
retinas into one object, you must converge the optical
axes of both eyes on the object. This convergence makes the
observer more or less cross-eyed depending on how far away the
point of convergence is.

Cat converges on camera...
The muscular action of convergence also provides unambiguous depth
information.
Accommodation and convergence play a role for distances less than
about 10 meters (30 feet).
Beyond 10 meters, depth of field and convergence cannot distinguish
any fixed depth from infinity.
Binocular disparity
Related to convergence is the fact that because our two eyes are
located in slightly different locations separated by about
10 centimeters, they form slightly different images on each retina.
This difference is called binocular disparity.
It is also called binocular or stereoscopic vision.
The image disparity takes two related forms.
a: Nearby objects appear at slightly different positions against
a more distant background through each eye.
This image shift is called parallax.
If you can, wink both eyes back and forth to see the parallax shift of
nearby objects.
If you cannot easily wink both eyes, put one hand in front of your face
and move it back and forth, blocking each eye in succession.
b: Nearby objects are themselves seen from a slightly different angle
by each eye, allowing the left eye to see a bit more 'around' the left side
of the object than the right eye, and vice versa.
This relative twist of the object is really just the parallax shift of the
front of the object relative to the back.
Since each retina sees only one image, the fusion of the two different
images into one three-dimensional world image must occur in the brain.
The layering of the cells in the
lateral geniculate nuclei may begin the analysis.
Because the individual retinal images are only two-dimensional, we can
simulate the three-dimensional world by sending different 2-D pictures to each eye.
There are many ways to do it.
One of the more familiar is the anaglyph viewed through red and cyan
glasses.

Dan Shelley Cypress Swamp, Florida
Notice that the red and cyan images are nearly coincident at the beginning of the
boardwalk trail, but diverge from each other with distance.
This relative shift puts the trailhead in the same plane as the terminal screen,
and the deep swamp further back.
It is curious that the brain does not mind very much seeing one color in one eye
and another color in the other eye.
The result is more-or-less correctly fused into the 'proper' colors of the scene,
which suggests that at least some, if not all, of our color sense is produced in
the brain and not in the retina.
Another way to present different images to each eye is to use controlled convergence.
Here are two photographs of the same scene taken from slightly different positions.

Highcliff State Park, Wisconsin 2003
The left picture is to be viewed by the right eye and the right picture by the left
eye.
If you do not know how to fuse these pictures, do the following.
Hold your finger or a pencil in front of the space between the pictures, about one
third of the way from your face to the screen.
Converge your eyes to look at your finger/pencil.
You should notice that the pictures behind the finger/pencil have split into two
overlapping images somewhat like Figure 1 below:
 |
Now tip your head back and forth to align the two background images,
and move the finger/pencil forward or back,
always looking at the finger/pencil until the pictures exactly overlap
into three equal rectangles (Figure 2): |
 |
Now comes the tricky part that requires some practice.
You must adjust your accommodation (focus) from the finger/pencil back to the
images, without changing the convergence.
Let your eyes focus on the picture scene in the middle, keeping the two "unused"
pictures on either side.
It should jump into 3-D.
When it does, remove the finger/pencil.
The above Highcliff Park pictures are arranged left to right so the viewer must
cross her/his eyes slightly to see the 3-D image. This arrangement works
well for larger images.
Most websites present smaller pictures in the reverse order, which one must view
by converging beyond the screen.
Most people cannot widen their convergence beyond parallel (infinity) so these
pictures must be smaller than the distance between the viewer's eyes.
There are many pictures on this site arranged both ways for you to practice on.
Any web search on stereo images will bring up thousands more...
Perhaps you would like to go for a walk on
Mars.
You can make your own stereo photographs by taking one picture, moving over a few
inches, and taking another picture.
You can make distant scenes like the Grand Canyon look like little models by
increasing the distance you move between the first and second picture.
Lenticular screens
Some novelty postcards and other 3-D pictures make use of a lenticular screen,
a transparent plastic screen cut into long cylindrical lenses.
The diagram below shows how they work.
 |
Cross section view of a lenticular screen
The two pictures of a stereographic pair are cut into long narrow strips and
placed alternately behind the lens screen as shown.
The lenses tend to send scattered light from the strips in different directions;
light from the left strip tends to go right while light from the right strip tends
to go left.
Therefore the left eye will see mostly light from right strips and vice versa.
|
Motion parallax
Stationary binocular disparity only works well for objects less than 20 meters or so
away (the parallax shift against a very distant background may be noticeable out
to about 200 meters if the background has fine detail).
However, we rarely stand still.
Motion continually changes the viewing angle over time.
Move your head back and forth.
Note how everything in the room you are in shifts back and forth relative to
everything else.
Such motion parallax greatly extends depth perception, giving the brain
a much larger model world to place ourselves in.

Jim Gasperini Old Stone Gate
It is interesting to note that the above animation consists of only two pictures
flashing back and forth every 0.15 seconds.
Our motion perception fills in
between the frames and makes the motion look much smoother.
Ambiguous depth cues
Many aspects of nature change with distance in predictable ways.
These changes give distance cues for objects they affect.
However, while distance may always be accompanied by these changes,
sometimes the changes occur locally as well.
Thus, these cues for distance are ambiguous; they are not
absolute cues.
Any part of reality that is reproduced in a 2-D photograph is
ambiguous, since the photograph is not the real object and does not
have 'depth'.
We have already seen that the sharpness of an image is an ambiguous
depth cue; the object might have naturally fuzzy edges.
We can broadly classify many other ambiguous cues.
Overlap

Dead dummies, Marseille 2004 |
When one object 'covers up' part or all of another object, it is a
pretty good bet that it is the closer of the two.
Our brains are designed to trace boundaries of objects to determine
visible overlaps.
However, it might be that what we interpret as a partially
covered distant object is really a closer object with a clever notch
exactly the shape of something more distant.
A little motion will resolve the ambiguity, except it might be that
the objects change shape exactly as we move. |
Shadows and shading
Any art program will have a class on proper shading to instruct the
student how to show form in a simple charcoal drawing. If the student
is careful to have all the shading and shadows consistent with
real light sources like the Sun, the sky, or a fire, perceived depth is
assured.
This
Ball-in-a-Box movie shows the importance of shadows in locating objects
in space.
The ball moves across a checkerboard once with its shadow attached,
and once with its shadow separating from the path.
In the former case the ball appears to be rolling on the board.
In the latter the ball appears to be drifting up into the air.
Because our brains are so attuned to the relation between shading and form,
many artists and philosophers consider form more fundamentally 'real'
than color.
It is certainly true that a black-and-white photograph tells a convincing
story.
But, black, gray, and white are visual experiences of the same kind as
color, and form itself is a mental construction of questionable reality.
Geometrical perspective
We are all familiar with the idea that objects appear smaller when they are
far away.
This observation is called geometrical (also linear) perspective.
In 1415 the great Florentine architect/engineer
Filippo Brunelleschi determined and wrote down the 'rules' for using
perspective in flat pictures.
Prior to Brunelleschi, artists and architects had a basic observational awareness
of the concepts, but did not have techniques to assure accuracy.
For example, Iktinos, the architect of the Parthenon (447-432 BCE), understood that by making
the otherwise heavy columns taper slightly toward the top he would achieve a grander feeling of
height and airiness.
Even more subtly he made them non-tapering at eye-level, so the brain is further 'deceived'
into thinking that the columns are straight all the way to the top. |

The Parthenon |
The main rule of geometrical perspective is that lines parallel in space
appear to converge on a distant vanishing point.
The vanishing point for horizontal lines is on the horizon, which
is at the same height as the viewer's eyes.

Marseille Street 2004
Crepuscular rays are parallel beams of sunlight that appear to converge
on the Sun.
Aerial perspective
Objects in the far distance lose contrast and appear bluish, because more air
is in front of them.
Sunlight scatters off this air toward the viewer.
Essentially the sky is extending
down in front of those objects.

The Chemin Valley, France 2004
Leonardo da Vinci was the first to describe and explain
aerial perspective.
If the sky is not blue, the aerial haze isn't either.

George Caleb Bingham,
Fur Traders Descending the Missouri, 1845.
Metropolitan Museum of Art
Pattern change
A regular pattern bent or draped over a surface reveals the underlying shape
of the surface.
The houses in the picture below are laid out on a straight rectangular grid,
which drapes over the undulating landscape of the San Francisco peninsula.

RockBandit
Sunset District, San Francisco
Color
Which of these colors appears closer and which appears more distant?
Some colors appear to stand forward, and some to recede.
The softening of aerial perspective may play a role, but it is not the
complete explanation.
In the original of this painting by van Gogh, the sky appears to stand
a good several inches behind the horizon.

van Gogh The Plain of Auvers, 1890.
Carnegie Museum of Art
My brother-in-law has spent the last several decades
trying to understand and control depth through color.

David Cantine
Still-life: Blue 2004
Previous experience
In some sense, all cues, unambiguous and ambiguous alike, are learned through
experience.
A baby has to learn how to look at things.
Binocular disparity may be hardwired into our genetics, but even
then some learning of how to use it must also take place.
Our past experience can lead to some funny misinterpretations.
Here is a page illustrating an
inverted shadow.
The interior of the mask in the picture appears to project inward, because we are used to
having light sources above us, projecting downward.
Plus, faces always have the nose pointing toward us, don't they?
Summary:
We construct for ourselves and walk around in a seemingly "ordinary"
three dimensional world. How do we construct three dimensions? From
one vantage point (one eye), we should only be able to see two
dimensions: left-right and up-down. But, somehow we also know
near-far. How? The mind uses all sorts of information to do it.
We can divide the information into two classes: absolute and ambiguous.
Absolute depth information is geometrical and cannot be mis-interpreted.
Ambiguous information can be mis-interpreted, but rarely is. All depth
pictured in a 2-D painting must, by nature, be ambiguous.
Absolute Information:
Accommodation:
If we have to shorten
the focal length of our lenses to focus a clear image, the object must be
nearby.
Convergence and parallax:
If we have to point
our two eyes together, the object must be closeby. Both these cues are
muscular. Information reaches the brain from muscular coordination, not
directly from the image.
Binocular disparity:
There are slight
differences between the image seen by the right eye and that seen by
the left eye.
These differences include a shift against the background and a rotation
of the object, since we see it from two different directions.
The first comparison of the images occurs in the lateral geniculate
nucleus, where the image elements are overlayed on each other in adjacent
layers.
Motion parallax:
As one moves, the relative positions of objects and backgrounds change.
In additiion, one sees the objects themselves from changing angles.
The brain converts these changes into distances and stable objects.
Ambiguous Information:
The mind also uses many so-called ambiguous cues for depth. They are
ambiguous in that they do not literally indicate depth, but are so
often related to depth in our natural environment that they are pretty
reliable. These cues include size and geometrical perspective (things
appear smaller when they are farther away), overlapping, shading and
shadows, aerial perspective (loss of contrast and "blueness" with
increasing distance due to scattering in the intevening air), pattern
changes as fixed patterns wrap around objects, color itself, and previous
experience.
These are the cues that an artist makes use of to give apparent depth
to a flat picture.
Sample questions for reflection
How does stereoscopic vision provide depth perception?
Where does the first comparison of the left and right eye images occur?
How does accommodation provide depth information?
What is meant by the word "parallax"?
How is depth information presented in an anaglyph (picture designed to
be viewed with red/blue glasses)?
How does a lenticular screen work?
Why does motion of the observer provide visual depth information?
Be able to recognize examples of the ambiguous depth cues.
Why are "ambiguous depth cues" ambiguous?
Which ambiguous depth cues are available for the artist who wants to
indicate depth in a painting?
Which cues does van Gogh use in this painting?