Empirical Theories of Vision - Identification of tuna species by a real-time polymerase chain r

In this section, I consider some empirical theories of vision and discuss how they may bear on perceptual anti-individualism. 3D object perception involves very complex processing. Most vision scientists would maintain that

we are still far from a full understanding of how it works. Currently, there are diverse accounts of the functions and mechanisms of visual perception. We will see that not all of them support Burge’s position.¹⁸

(1) Let us first consider two major approaches to object recognition in cognitive science. The Recognition-by-Components Theory explains shape constancy by postulating 3D volumetric and viewpoint-independent units called geons (Biederman, 1987, 2007). Each object is analyzed and represented by the visual system as composed of a set of geons arranged in a certain way. The relations between geons are specified by a structural description using a viewpoint-invariant frame of reference, for example, a cylinder “on-top-of” a brick. A complete representation of an object is called a geon structural description, consisting of a set of geons together with a structural description of their relations. To perceive an object is for the visual system to generate a particular geon structural description for that object. It involves a series of processes: First, the edges of an object are extracted from the retinal images. Second, the non-accidental properties of the image are detected and parsed into regions. Based on this information, in the third stage, a particular set of geons is identified. Since geons are defined by non-accidental properties of retinal images, they explain why objects can be perceived as having the same shape regardless of changes in the retinal images. In the fourth stage, the geon structural description of an object is

18 Burge regards the empirical account by Shepard (2001) and the Bayesian “ideal observer theory”

by Geisler (2008) as supporting perceptual anti-individualism (Burge, 2010: 99). For critical evaluations of Shepard’s view, cf. Kubovy and Epstein (2001) and Hatfield (2003). For critical comments on the Bayesian approach to perception, cf. (Purves & Lotto, 2011: 13-14) and (Purves et al., 2011, 15594).

compared with object representations stored in memory. The object of perception is recognized when there is a match.¹⁹

Another approach, the Multiple-View Theory, proposes a very different explanation of shape constancy, according to which object perception is essentially viewpoint-dependent and image-based. This theory consists of two main ideas. First, the visual system employs only 2D images to construct object perception. When one sees an object, a 2D image of the object is formed that depicts it from a particular viewpoint. Each 2D image represents various aspects of the object including shape, depth, color, texture, shading, etc., in a viewer-centered frame of reference (Tarr, 1995: 56). An ordinary object is represented by multiple 2D views that depict the object from various perspectives (Tarr & Bülthoff, 1995: 1495). Second, shape constancy is achieved by comparing 2D images on the retina with a collection of views already stored in memory. This is done by a set of transformation mechanisms (Bülthoff et al., 1994). When one perceives an object, the mechanisms bring the current retinal image into alignment with one of the

19 The Recognition-by-Components Theory predicts that, when one visually identifies objects, the performance will not be affected by objects perceived from different perspectives. Biederman and Gerhardstein (1993) report several experiments that support this theory. In one experiment, 48 subjects were presented with a set of familiar objects, one at a time for 200 milliseconds (ms), followed by a mask for 500 ms (see Biederman & Gerhardstein, 1993: Figure 5). Then the set of objects was shown again for 100 ms, with some presented at the same orientation and some at different orientations (0°, 67.5°, and 135°). Although their orientations might change, the same parts of the objects would still be visible. Subjects were asked to name the objects as quickly and accurately as possible. Calculating the mean reaction time and error percentage on many trials, the results indicate that changing the orientations of the objects by rotation does not affect the subjects’ performance. This suggests that object recognition is viewpoint invariant. The proponents of this theory also try to find supporting evidence in cognitive neuroscience. It is widely known that neurons in the macaque’s inferior temporal cortex respond to shape properties (Tanaka, 1993). Hayworth and Biederman (2006) use an fMRI adaptation paradigm to argue that the lateral occipital complex is more sensitive to parts than to local features. They argue that these results fit nicely with the Recognition-by-Components Theory.

stored views by a set of mental transformations, such as rotations, translations, dilations, reflections, etc. (cf. Palmer, 1999: 364-365). The object is recognized when there is a match.²⁰

For our purpose, the main point is that neither the Recognition-by-Components Theory nor the Multiple-View Theory lends obvious support to perceptual anti-individualism. As I argued above, a theory of vision can explain perceptual constancy without embracing Burge’s full-hearted notion of veridicality. This means that Burge can only seek support from empirical theories that treat the ambiguity problem, rather than the constancy problem, as the central problem of vision. As we can see, both the Recognition-by-Components Theory and the Multiple-View Theory are about shape constancy rather than ambiguity.

They can be understood and evaluated without assuming Burge’s theory.

One possible defense of Burge’s view is to argue that perceptual constancy is not totally independent of the ambiguity problem. Shape constancy cannot be fully explained by either of the two approaches above

20 This approach predicts that the performance of object recognition will be affected according to how different the current view is from a familiar view in memory. Tarr (1995) conducted a series of psychophysical experiments to support this theory. In one experiment, 12 subjects were presented with seven left/right and front/back asymmetrical objects. After some training trials of viewing standard versions of the objects, both the standard versions and the mirror-reversed versions (produced by rotations of 130° around the x-, y-, or z-axis) were shown to the subjects, one at a time. They were asked to decide as quickly and accurately as possible whether the object was the standard or the mirror-reversed version of one of the objects that they had seen in the training trials. After calculating the mean reaction time and error percentage of many trials, Tarr reported the following findings: (1) the subjects’ response time increased with the angular distance from the training viewpoint of seeing the standard versions of objects; (2) after some practice, performance became nearly equivalent at all familiar viewpoints; and (3) at unfamiliar viewpoints, response times increased with the angular distance from the nearest familiar viewpoint (Tarr, 1995, 64). Tarr argues that these findings strongly uphold the Multiple-View Theory. In addition, some neurophysiological data are considered as evidence for this theory. Logothetis et al. (1995) have reported that many neurons in monkey’s inferior temporal cortex are sensitive to specific views of an object, and that different neurons encode different views. This, according to Tarr and Bülthoff (1998), provides evidence for multiple-view representation.

unless a solution to the problem of shape ambiguity is assumed. I believe that empirical support is required in order for this defense to begin to work, and I suspect it would not be easy to find.²¹ Currently, as far as I know, empirical theories of object recognition explain only shape constancy. The defender of Burge’s view demands that theories of object recognition should address ambiguity as well. It is not obvious whether most vision scientists would accept this demand. As we will see later in this section, even for those theories that accord the ambiguity problem a central place in vision research, not all of them concur with perceptual anti-individualism.

(2) Donald Hoffman (2009) advocates what he calls the “interface theory of perception,” which employs the idea of a user interface from computer science. An icon on a computer screen has a particular color and shape and is associated with some stored file. But the icon’s color and shape do not represent or reconstruct the “true” color or shape of the file. Computer files do not have any color or shape, and, as a user interface, the icon does not reconstruct anything. Hoffman holds that a user interface is useful precisely because it is not a reconstruction. He says: “The user interface is there to facilitate our interactions with the computer by hiding its causal and structural complexity” (2009: 154). A user interface is a convenient tool for specific purposes and nothing more.

Applying this idea to perception, Hoffman says that “Our perceptions are a species-specific user interface … to guide adaptive behavior in our

21 The support that this defense needs here, I think, is to find empirical theories that (1) explain both shape constancy and shape ambiguity, (2) formulate the two issues as connected in the way suggested by Burge, and (3) justify why shape constancy and shape ambiguity cannot be explained by pattern-based representations alone. As I see it, it is not easy to fulfill all of these requirements.

niche; accuracy of reconstruction is irrelevant” (2009: 154-155). Hoffman rejects what he calls the “principle of faithful depiction,” the idea that the primary goal of perception is to provide veridical representation of the physical world (2009: 149). What perception does is not represent certain properties or categories of the objective world (2009: 153). Rather, “it is construction of a niche-specific, problem-specific, fitness-enhancing interface”

(2009: 156). The aim of a vision theory is not to explain how veridical representations are produced by the visual system.

Like other empirical theories of vision, Hoffman’s interface theory is controversial. It is not my goal to assess the explanatory power of the interface theory here. However, I will make two remarks with regard to our evaluations of Burge’s theory. First, the conflicts between perceptual anti-individualism and the interface theory are obvious and serious.

Proponents of both theories share an equal burden of defenses and criticisms.

Since Burge maintains that perceptual anti-individualism provides the only framework within which vision science can be understood, a defender of his theory would need to establish that the interface theory is in principle incapable of explaining how the visual system solves the inverse problem.

This is not an easy task. Until this is done, the interface theory remains a competitor.

Second, the dispute between Burge’s view and the interface theory is very similar to the debate between scientific realism versus instrumentalism.

The former claims － but the latter rejects － that the aim of vision science is to explain veridical representation of the world. If most researchers in applied sciences unreflectively take a realist stance about the world, this by

itself would not show that scientific realism is true. Likewise, employing the notion of veridicality in a philosophically naive way cannot be regarded as lending decisive support to perceptual anti-individualism. When vision scientists assume that we have veridical perceptions, it may just mean that they are naive realists. They do not, nor do they need to, consider whether their empirical research may help deal with certain philosophical issues.

Therefore, in the practice of vision science, assuming that we often have veridical perceptions can just be a pragmatic or convenient choice, rather than a mandatory one. The interface theory illustrates that vision scientists do not have to presume perceptual anti-individualism to conduct empirical research and make sense of their work.

(3) Finally, let us consider a theory that is friendlier to Burge’s account.

Purves and Lotto (2003, 2011) maintain that the inverse problem is a central problem in vision science. They construe it as the ambiguity problem, and agree that it is a problem of underdetermination. However, they do not think that it can be solved by positing some a priori constraints or formation principles. Rather, they propose a purely empirical-statistical theory, which can be summarized as follows.

[T]he visual system is not organized to generate a veridical representation of the physical world, but rather is a statistical reflection of visual history … By virtue of trial-and-error feedback over the eons about the success or failure of visually guided behavior in phylogeny and decades of ontogenetic experience, the visual brain simply responds to a stimulus with a pattern of neuronal activity

whose form has been thus determined by the probability distribution of what it has turned out to be in the past (i.e., by the empirical significance of the stimulus). (2003: 227-228)

In contrast to Burge, Purves and Lotto think that “visual percepts (and the corresponding activity of visual neurons and circuits) do not vary systematically with the physical measurements of objects or light stimuli as such” (2003: 15). The relations between vision and the world are purely contingent and statistical such that the visual system is not guided by formation principles that mirror environmental regularities. The solution of the ambiguity problem is gradually accumulated from the past. The ways in which the visual system responds to stimuli are partly trial-and-error. Then the system gradually learns from feedback over a long evolutionary history such that it acquires the capability of anticipation. That is, percepts produced by the visual system are, so to speak, fallible “predictions” of what is going to happen in the environment. This not only solves the ambiguity problem in theory, but also explains why animals are able to cope with the environment rapidly and efficiently.²²

According to Purves and Lotto’s account, the function of the visual system is not to produce veridical representations of the current physical environment. What is represented is the “probability distribution of the possible sources of the stimulus” (2003: 10). What animals see is whatever turned out to be the statistical majority of possible causes of visual stimuli in the past. Burge might think that this is compatible with his theory. But notice

22 One of the anonymous reviewers has suggested me to take the anticipatory nature of perception into consideration. I am thankful for this useful comment.

that the statistical majority does not correspond to any particular object but to probability distribution. The feedback of trial-and-error and the statistical majority are understood not in terms of veridicality in Burge’s sense, but in terms of biological functions. Also, visual illusions, according to this empirical-statistical theory, “are neither anomalies nor evidence of biological limitations or constraints, but simply the universal signature of this strategy of vision” (2003: 10). That is, there is no need to appeal to the perceptual norms suggested by Burge in order to understand vision.

VI. Conclusion

I have argued in this paper that various aspects of vision science can be understood without positing singular representation or the full-hearted notion of veridicality. Also, empirical theories of vision do not uniquely support perceptual anti-individualism. I conclude that, pace Burge, perceptual anti-individualism is not the only framework within which vision science can be understood. Let me make a final remark. Burge’s perceptual anti-individualism has decisively elevated philosophical investigations of perception to a new and interdisciplinary level. I believe that what this theory has achieved is just a beginning, not the end. A lot more work can and must be done to deepen our understanding of the nature of perception.

References

Baars, J. & Gage N. (eds.) (2010). Cognition, Brain, and Consciousness: Introduction to Cognitive Neuroscience, second edition. Oxford: Academic Press. doi:

10.1007/ s00381-010-1178-y.

Biederman, I. (1987). “Recognition-by-Components: A Theory of Human Image Understanding.” Psychological Review, 94: 115-147. doi: 10.1037/0033-295X.

94.2.115.

--- (2007). “Recent Psychophysical and Neural Research in Shape Recognition.”

Osaka, N., Rentschler, I. & Biederman, I. (eds.). Object Recognition, Attention, and Action (Ch. 5, 71-88). Tokyo: Springer. doi: 10.1007/978-4-431-73019-4_6.

Biederman, I. & Gerhardstein P. (1993). “Recognizing Depth-rotated Objects:

Evidence and Conditions for 3D Viewpoint Invariance.” Journal of Experimental Psychology: Human Perception and Performance, 19: 1162-1182. doi:

10.1037/0096-1523.19.6.1162.

Bülthoff, H., Edelman, S. & Tarr, M. (1994). “How Are Three-dimensional Objects Represented in the Brain?” CogSci Memo, No. 5: 1-23. doi: 10.1093/cercor/

5.3.247.

Burge, T. (2003). “Perceptual Entitlement.” Philosophy and Phenomenological Research, LXVII, 3, November: 503-548. doi: 10.1111/j.1933-1592.2003.

tb00307.x.

--- (2005). “Disjunctivism and Perceptual Psychology.” Philosophical Topics, 33, 1: 1-78. doi: 10.5840/philtopics20053311.

--- (2009). “Perceptual Objectivity.” Philosophical Review, 118, 3: 285-324. doi:

10.1215/00318108-2009-001.

--- (2010). Origins of Objectivity. Oxford: Oxford University Press. doi:

10.1093/acprof:oso/9780199581405.001.0001.

Craig, A. D. (2003). “A New View of Pain as a Homeostatic Emotion.” Trends in Neuroscience, 26, 6: 303-7. doi: 10.1016/S0166-2236(03)00123-1.

Geisler, W. S. (2008). “Visual Perception and the Statistical Properties of Natural Scenes.” Annual Review of Psychology, 59: 167-192. doi: 10.1146/annurev.

psych.58.110405.085632.

Hatfield, G. (2003). “Representation and Constraints: the Inverse Problem and the Structure of Visual Space.” Acta Psychologica, 114: 355-378. doi:

10.1016/j.actpsy.2003.07.003.

Hayworth, K. & Biederman, I. (2006). “Neural Evidence for Intermediate Representations in Object Recognition.” Vision Research, 46: 4024-4031.

doi: 10.1016/j.visres.2006.07.015.

Hoffman, D. (2009). “The Interface Theory of Perception.” Dickinson, S., Leonardis, A., Schiele, B. & Tarr, M. (eds.). Object Categorization (148-166). Cambridge:

Cambridge University Press. doi: 10.1017/CBO9780511635465.009.

Kubovy, M. & Epstein, W. (2001). “Internalization: a Metaphor We Can Live Without.” Behavioral and Brain Sciences, 24: 618-625. doi: 10.1017/

S0140525X01760086.

Logothetis, N., Pauls, J. & Poggio, T. (1995). “Shape Representation in the Inferior Temporal Cortex of Monkeys.” Current Biology, 5: 552–563. doi: 10.1016/

S0960-9822(95)00108-4.

Palmer, S. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA:

MIT Press.

Pylyshyn, Z. (2003). Seeing and Visualizing. Cambridge, MA: MIT Press.

--- (2007). Things and Places. Cambridge, MA: MIT Press.

Pizlo, Z. (2008). 3D Shape: Its Unique Place in Visual Perception. Cambridge, MA: MIT Press.

Poggio, T., Torre, V. & Koch, C. (1985). “Computational Vision and Regularization Theory.” Nature, 317, September: 314-319. doi: 10.1038/317314a0.

Purves, D. & Lotto, R. (2003). Why We See What We Do: An Empirical Theory of Vision. Sunderland, MA: Sinauer Associates, Inc. doi: 10.5860/CHOICE.40-5803.

--- (2011). Why We See What We Do Redux: A Wholly Empirical Theory of Vision.

Sunderland, MA: Sinauer Associates, Inc. doi: 10.5860/CHOICE.48-6268.

Purves, D., Wojtach, W. & Lotto, R. (2011). “Understanding Vision in Whole Empirical Terms.” PNAS, 18, suppl. 3: 15588-15595. doi: 10.1073/pnas.

1012178108.

Shepard, R. (2001). “Perceptual-Cognitive Universals as Reflections of the World.” Behavioral and Brain Sciences, 24: 581-601. doi: 10.1017/

S0140525X01000012.

Snowden, R., Thompson, P. & Troscianko, T. (2006). Basic Vision: An Introduction to Visual Perception. Oxford: Oxford University Press.

Tanaka, K. (1993). “Neural Mechanisms of Object Recognition.” Science, 262:

685-688. doi: 10.1126/science.8235589.

Tarr, M. (1995). “Rotating Objects to Recognize Them: A Case Study of the Role of Viewpoint Dependency in the Recognition of Three-dimensional Objects.” Psychonomic Bulletin and Review, 2: 55-82. doi: 10.3758/

BF03214412.

Tarr, M. & Bülthoff, H. (1995). “Is Human Object Recognition Better Described by Geon-structural-descriptions or by Multiple-views?” Journal of Experimental Psychology: Human Perception and Performance, 21, 6: 1494-1505. doi:

10.1037/0096-1523.21.6.1494

--- (1998). “Image-Based Object Recognition in Man, Monkey, and Machine.”

Cognition, 67, July: 1-20. doi: 10.1016/S0010-0277(98)00026-2.

在文檔中 Identification of tuna species by a real-time polymerase chain reaction technique (頁 22-34)