Motion perception is the process of inferring the speed and direction of objects that move in a visual scene given some visual input. While this process appears straighforward to most observers, it has proven to be a hard problem from a computational perspective, and extraordinarily difficult to explain in terms of neural processing.
Motion perception has connections to both psychology (i.e. visual perception) and computer science.
First-order motion perception
When an object (defined by a difference in luminance from its surroundings) moves, the motion can be detected by a relatively simple motion sensor designed to detect a change in luminance at one point on the retina and correlate it with a
delayed change in luminance at a neighbouring point on the retina. Sensors that work this way have been referred to as Reichardt detectors (Reichardt, 1961), motion-energy sensors (Adelson & Bergen, 1985) or Elaborated Reichardt Detectors (van Santen & Sperling, 1985). These sensors detect motion by
spatio-temporal correlation and are plausible models for how the visual system may detect motion. Debate still rages about the exact nature of this process and it is unlikely to be resolved soon. These
'first-order' (i.e. luminance-based) motion sensors unfortunately suffer from the
aperture problem, which means that they can only detect motion perpendicular to the orientation of the contour that is moving. Further processing is required to disambiguate motion direction.
The aperture problem
These issues become apparent when considering a variety of simple motion stimuli. A well known example is the barberpole illusion. When a diagonally-striped pole is rotated around its longer axis, so that the stripes are moving in the direction of the pole's shorter axis, it nonetheless appears the stripes are moving in the direction of its longer axis. Why this occurs is not well understood.
In addition to these problems of motion perception, a number of issues arise due to the physiology of the brain. Each neuron in the visual system is sensitive to visual input in a small part of our visual field, as if each neuron is looking at the visual input through a small aperture. At the resolution of this aperture visual cues can often be approximated by straight lines. The motion direction of a straight line is fundamentally ambiguous, because the motion component parallel to the line cannot be inferred based on the visual input.
In cases where motion cannot be determined based on visual input alone, the visual system is thought to rely on prior assumptions (von Helmholtz 1924; Rock, 1983). In the second figure the visual input and prior assumptions together make it appear the stripes are moving to the bottom-right.
Individual neurons initially estimate motion locally within their receptive field. Because each neuron will suffer from the "aperture problem" the estimates from many neurons are then integrated into a global motion estimate. This appears to occur in Area MT/V5 in human visual cortex (Salzman et al., 1992)
Motion in depth
As in other aspects of vision, the observer's visual input is generally insufficient to uniquely determine the 'true' nature of stimulus sources, in this case their velocity in a visual scene. In monocular vision for example, the visual input will be a 2D projection of a 3D scene. The motion cues present in the 2D projection will by default be insufficient to reconstruct the motion present in the 3D scene. Put differently, many 3D scenes will be compatible with a single 2D projection. The problem of motion estimation generalizes to
binocular vision when we consider occlusion or motion perception at relatively large distances, where binocular disparity is a poor cue to depth. This fundamental difficulty is referred to as the "inverse problem."
Second-order motion perception
Motion stimuli are classified into
first-order stimuli, in which the moving contour is defined by
luminance, and
second-order stimuli in which the moving contour is defined by
contrast,
texture, flicker or some other quality that does not result in an increase in motion energy in the
Fourier spectrum of the stimulus (Chubb & Sperling, 1988; Cavanagh & Mather, 1989). There is much evidence to suggest that early processing of first- and second-order motion is carried out by separate pathways (Nishida, et al, 1997). Second-order mechanisms have poorer temporal resolution and are
low-pass in terms of the range of
spatial frequencies that they respond to. Second-order motion produces a weaker
motion aftereffect unless tested with dynamically flickering stimuli (Ledgeway & Smith, 1994).
References
- Adelson, E.H., & Bergen, J.R. (1985). Spatiotemporal energy models for the perception of motion. J Opt Soc Am A, 2 (2), 284-299.
- Cavanagh, P & Mather, G (1989) Motion: the long and short of it. Spatial vision, 4, 103-129
- Chubb, C & Sperling, G (1988) Drift-balanced random stimuli: A general basis for studying non-Fourier motion perception. J Opt Soc Amer A, 5, 1986-2007.
- Ledgeway, T. & Smith, A.T. (1994). The duration of the motion aftereffect following adaptation to first- and second-order motion. Perception, 23, 1211-1219.
- Nishida, S., Ledgeway, T. & Edwards, M. (1997). Dual multiple-scale processing for motion in the human visual system. Vision Research, 37, 2685-2698.
- Reichardt, W. (1961). Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. In: W.A. Rosenblith (Ed.) Sensory communication (pp. 303-317). New York: MIT Press.
- Rock, I. (1983). The logic of perception, The MIT Press, Cambridge, Massachusetts.
- van Santen, J.P., & Sperling, G. (1985). Elaborated Reichardt detectors. J Opt Soc Am A, 2 (2), 300-321.
- von Helmholtz, H. (1924). Helmholtz’s Treatise on Physiological Optics. Translated from the Third German Edition. Vol. 3. The Optical Society of America.
See also
VisionPsychologyCognitionNeuroscience
External Links