AS138 Depth Perception
Lorenzo Ghiberti The Story of Isaac, Esau, and Jacob; Porta del Paradiso, 1425-52. Firenze

Depth Perception

We view the world from one vantage point; ourselves. The outside world projects images on our retinas. How do we see how far away something is? The retinal image is only two-dimensional, similar to a photograph. There is no depth in a photographic print. The ability to see and place objects at various distances from us is called depth perception.

We have two kinds of depth perception. The first is unambiguous and based on geometrical principles familiar to a surveyor. The second is ambiguous; the eye/mind sees phenomena characteristically related to distance and goes one step further to map them as distance. Flat media artists must make use of these ambiguous cues to simulate distance.

Unambiguous geometrical depth

Accommodation
Accommodation is the muscular action changing the focal length of the eye lens so as to place a focused image on the fovea of the retina. Both the muscular action and the lack of focus of adjacent depths provide information to the brain that can be used to sense depth. Image sharpness/fuzziness is an ambiguous depth cue, as we have already seen in the discussion of depth of field in cameras. However, by changing the focused plane (looking closer and/or further than the primary object), the ambiguities are resolved. A photograph will not change its relative focus; the three-dimensional world does.

Convergence
In order to see an object close to you and fuse the image on both retinas into one object, you must converge the optical axes of both eyes on the object. This convergence makes the observer more or less cross-eyed depending on how far away the point of convergence is.


Cat converges on camera...

The muscular action of convergence also provides unambiguous depth information. Accommodation and convergence play a role for distances less than about 10 meters (30 feet). Beyond 10 meters, depth of field and convergence cannot distinguish any fixed depth from infinity.

Binocular disparity
Related to convergence is the fact that because our two eyes are located in slightly different locations separated by about 10 centimeters, they form slightly different images on each retina. This difference is called binocular disparity. It is also called binocular or stereoscopic vision.

The image disparity takes two related forms.
a: Nearby objects appear at slightly different positions against a more distant background through each eye. This image shift is called parallax. If you can, wink both eyes back and forth to see the parallax shift of nearby objects. If you cannot easily wink both eyes, put one hand in front of your face and move it back and forth, blocking each eye in succession.
b: Nearby objects are themselves seen from a slightly different angle by each eye, allowing the left eye to see a bit more 'around' the left side of the object than the right eye, and vice versa. This relative twist of the object is really just the parallax shift of the front of the object relative to the back.

Since each retina sees only one image, the fusion of the two different images into one three-dimensional world image must occur in the brain. The layering of the cells in the lateral geniculate nuclei may begin the analysis.

Because the individual retinal images are only two-dimensional, we can simulate the three-dimensional world by sending different 2-D pictures to each eye. There are many ways to do it. One of the more familiar is the anaglyph viewed through red and cyan glasses.


Dan Shelley Cypress Swamp, Florida

Notice that the red and cyan images are nearly coincident at the beginning of the boardwalk trail, but diverge from each other with distance. This relative shift puts the trailhead in the same plane as the terminal screen, and the deep swamp further back. It is curious that the brain does not mind very much seeing one color in one eye and another color in the other eye. The result is more-or-less correctly fused into the 'proper' colors of the scene, which suggests that at least some, if not all, of our color sense is produced in the brain and not in the retina.

Another way to present different images to each eye is to use controlled convergence. Here are two photographs of the same scene taken from slightly different positions.


Highcliff State Park, Wisconsin 2003

The left picture is to be viewed by the right eye and the right picture by the left eye. If you do not know how to fuse these pictures, do the following. Hold your finger or a pencil in front of the space between the pictures, about one third of the way from your face to the screen. Converge your eyes to look at your finger/pencil. You should notice that the pictures behind the finger/pencil have split into two overlapping images somewhat like Figure 1 below:
Now tip your head back and forth to align the two background images, and move the finger/pencil forward or back, always looking at the finger/pencil until the pictures exactly overlap into three equal rectangles (Figure 2):
Now comes the tricky part that requires some practice. You must adjust your accommodation (focus) from the finger/pencil back to the images, without changing the convergence. Let your eyes focus on the picture scene in the middle, keeping the two "unused" pictures on either side. It should jump into 3-D. When it does, remove the finger/pencil.

The above Highcliff Park pictures are arranged left to right so the viewer must cross her/his eyes slightly to see the 3-D image. This arrangement works well for larger images. Most websites present smaller pictures in the reverse order, which one must view by converging beyond the screen. Most people cannot widen their convergence beyond parallel (infinity) so these pictures must be smaller than the distance between the viewer's eyes. There are many pictures on this site arranged both ways for you to practice on. Any web search on stereo images will bring up thousands more... Perhaps you would like to go for a walk on Mars. You can make your own stereo photographs by taking one picture, moving over a few inches, and taking another picture. You can make distant scenes like the Grand Canyon look like little models by increasing the distance you move between the first and second picture.

Lenticular screens
Some novelty postcards and other 3-D pictures make use of a lenticular screen, a transparent plastic screen cut into long cylindrical lenses. The diagram below shows how they work.
Cross section view of a lenticular screen
The two pictures of a stereographic pair are cut into long narrow strips and placed alternately behind the lens screen as shown. The lenses tend to send scattered light from the strips in different directions; light from the left strip tends to go right while light from the right strip tends to go left. Therefore the left eye will see mostly light from right strips and vice versa.

Motion parallax
Stationary binocular disparity only works well for objects less than 20 meters or so away (the parallax shift against a very distant background may be noticeable out to about 200 meters if the background has fine detail). However, we rarely stand still. Motion continually changes the viewing angle over time. Move your head back and forth. Note how everything in the room you are in shifts back and forth relative to everything else. Such motion parallax greatly extends depth perception, giving the brain a much larger model world to place ourselves in.


Jim Gasperini Old Stone Gate

It is interesting to note that the above animation consists of only two pictures flashing back and forth every 0.15 seconds. Our motion perception fills in between the frames and makes the motion look much smoother.

Ambiguous depth cues

Many aspects of nature change with distance in predictable ways. These changes give distance cues for objects they affect. However, while distance may always be accompanied by these changes, sometimes the changes occur locally as well. Thus, these cues for distance are ambiguous; they are not absolute cues. Any part of reality that is reproduced in a 2-D photograph is ambiguous, since the photograph is not the real object and does not have 'depth'. We have already seen that the sharpness of an image is an ambiguous depth cue; the object might have naturally fuzzy edges. We can broadly classify many other ambiguous cues.

Overlap

Dead dummies, Marseille 2004
When one object 'covers up' part or all of another object, it is a pretty good bet that it is the closer of the two. Our brains are designed to trace boundaries of objects to determine visible overlaps. However, it might be that what we interpret as a partially covered distant object is really a closer object with a clever notch exactly the shape of something more distant. A little motion will resolve the ambiguity, except it might be that the objects change shape exactly as we move.

Shadows and shading
Any art program will have a class on proper shading to instruct the student how to show form in a simple charcoal drawing. If the student is careful to have all the shading and shadows consistent with real light sources like the Sun, the sky, or a fire, perceived depth is assured.

This Ball-in-a-Box movie shows the importance of shadows in locating objects in space. The ball moves across a checkerboard once with its shadow attached, and once with its shadow separating from the path. In the former case the ball appears to be rolling on the board. In the latter the ball appears to be drifting up into the air.

Because our brains are so attuned to the relation between shading and form, many artists and philosophers consider form more fundamentally 'real' than color. It is certainly true that a black-and-white photograph tells a convincing story. But, black, gray, and white are visual experiences of the same kind as color, and form itself is a mental construction of questionable reality.

Geometrical perspective
We are all familiar with the idea that objects appear smaller when they are far away. This observation is called geometrical (also linear) perspective. In 1415 the great Florentine architect/engineer Filippo Brunelleschi determined and wrote down the 'rules' for using perspective in flat pictures. Prior to Brunelleschi, artists and architects had a basic observational awareness of the concepts, but did not have techniques to assure accuracy. For example, Iktinos, the architect of the Parthenon (447-432 BCE), understood that by making the otherwise heavy columns taper slightly toward the top he would achieve a grander feeling of height and airiness. Even more subtly he made them non-tapering at eye-level, so the brain is further 'deceived' into thinking that the columns are straight all the way to the top.
The Parthenon

The main rule of geometrical perspective is that lines parallel in space appear to converge on a distant vanishing point. The vanishing point for horizontal lines is on the horizon, which is at the same height as the viewer's eyes.


Marseille Street 2004

Crepuscular rays are parallel beams of sunlight that appear to converge on the Sun.

Aerial perspective
Objects in the far distance lose contrast and appear bluish, because more air is in front of them. Sunlight scatters off this air toward the viewer. Essentially the sky is extending down in front of those objects.


The Chemin Valley, France 2004

Leonardo da Vinci was the first to describe and explain aerial perspective. If the sky is not blue, the aerial haze isn't either.


George Caleb Bingham, Fur Traders Descending the Missouri, 1845. Metropolitan Museum of Art

Pattern change
A regular pattern bent or draped over a surface reveals the underlying shape of the surface. The houses in the picture below are laid out on a straight rectangular grid, which drapes over the undulating landscape of the San Francisco peninsula.


RockBandit Sunset District, San Francisco

Color
Which of these colors appears closer and which appears more distant?

Some colors appear to stand forward, and some to recede. The softening of aerial perspective may play a role, but it is not the complete explanation. In the original of this painting by van Gogh, the sky appears to stand a good several inches behind the horizon.


van Gogh The Plain of Auvers, 1890. Carnegie Museum of Art

My brother-in-law has spent the last several decades trying to understand and control depth through color.


David Cantine Still-life: Blue 2004

Previous experience
In some sense, all cues, unambiguous and ambiguous alike, are learned through experience. A baby has to learn how to look at things. Binocular disparity may be hardwired into our genetics, but even then some learning of how to use it must also take place. Our past experience can lead to some funny misinterpretations. Here is a page illustrating an inverted shadow. The interior of the mask in the picture appears to project inward, because we are used to having light sources above us, projecting downward. Plus, faces always have the nose pointing toward us, don't they?

Summary:
We construct for ourselves and walk around in a seemingly "ordinary" three dimensional world. How do we construct three dimensions? From one vantage point (one eye), we should only be able to see two dimensions: left-right and up-down. But, somehow we also know near-far. How? The mind uses all sorts of information to do it. We can divide the information into two classes: absolute and ambiguous. Absolute depth information is geometrical and cannot be mis-interpreted. Ambiguous information can be mis-interpreted, but rarely is. All depth pictured in a 2-D painting must, by nature, be ambiguous.

Absolute Information:
Accommodation:
If we have to shorten the focal length of our lenses to focus a clear image, the object must be nearby.
Convergence and parallax:
If we have to point our two eyes together, the object must be closeby. Both these cues are muscular.
Information reaches the brain from muscular coordination, not directly from the image.

Binocular disparity:
There are slight differences between the image seen by the right eye and that seen by the left eye. These differences include a shift against the background and a rotation of the object, since we see it from two different directions. The first comparison of the images occurs in the lateral geniculate nucleus, where the image elements are overlayed on each other in adjacent layers.

Motion parallax:
As one moves, the relative positions of objects and backgrounds change. In additiion, one sees the objects themselves from changing angles. The brain converts these changes into distances and stable objects.

Ambiguous Information:
The mind also uses many so-called ambiguous cues for depth. They are ambiguous in that they do not literally indicate depth, but are so often related to depth in our natural environment that they are pretty reliable. These cues include size and geometrical perspective (things appear smaller when they are farther away), overlapping, shading and shadows, aerial perspective (loss of contrast and "blueness" with increasing distance due to scattering in the intevening air), pattern changes as fixed patterns wrap around objects, color itself, and previous experience. These are the cues that an artist makes use of to give apparent depth to a flat picture.


Sample questions for reflection

How does stereoscopic vision provide depth perception?

Where does the first comparison of the left and right eye images occur?

How does accommodation provide depth information?

What is meant by the word "parallax"?

How is depth information presented in an anaglyph (picture designed to be viewed with red/blue glasses)?

How does a lenticular screen work?

Why does motion of the observer provide visual depth information?

Be able to recognize examples of the ambiguous depth cues.

Why are "ambiguous depth cues" ambiguous?

Which ambiguous depth cues are available for the artist who wants to indicate depth in a painting?

Which cues does van Gogh use in this painting?