DOI: 10.6276/NTUPR.2014.03.(47).04
知覺的反個體主義與視覺科學
梁益堉
*摘 要
本文從跨領域的角度探討視覺的根本性質,並以Tyler Burge 的「知覺 的反個體主義」(perceptual anti-individualism)為研究對象。根據這一理論, 知覺狀態的本質乃是由知覺者與外在環境的互動關係而定。Burge 提出論證 主張:視覺科學(vision science)預設了這個理論。本文反對這個觀點,並 企圖從三方面來論證:「知覺的反個體主義」不是我們理解視覺的唯一理論 選項。首先,我討論「體內恆定」(homeostasis)的概念,並指出這概念會使我們對Burge 理論中的「知覺規範」(perceptual norms)產生質疑。第二,
我以論證指出:許多視覺科學所研究的現象,可以不必預設Burge 理論中的 「正確性」(veridicality)和「單一表徵」(singular representation)也能得到 解釋。第三,我討論一些有關視覺的科學理論並論證:許多視覺科學領域中 的看法其實不支持Burge 的理論。本文的結論是:「知覺的反個體主義」並 不是瞭解視覺本質唯一可選的理論架構。 關鍵詞:知覺、視覺科學、知覺反個體主義
* 梁益堉,臺灣大學哲學系副教授。 投稿:102 年 8 月 18 日;修訂:103 年 3 月 2 日;接受刊登:103 年 3 月 3 日。
Perceptual Anti-Individualism
and Vision Science
Caleb Liang
*Abstract
I discuss the nature of visual perception from an interdisciplinary perspective. The target of investigation is Tyler Burge’s theory of perceptual anti-individualism, according to which perceptual states constitutively depend on relations between perceivers and the external world. Burge argues that this theory is presupposed by vision science. My goal is to argue that perceptual anti-individualism is not the only theoretical choice. First, I consider the notion of homeostasis and suggest how it may cast doubt on the perceptual norms in Burge’s theory. Second, I argue that many phenomena studied by vision science can be explained without positing Burge’s notions of veridicality and singular representation. Third, I consider some empirical theories and argue that vision science does not uniquely favor Burge’s theory. I conclude that perceptual anti-individualism is not the only framework for understanding visual perception.
Keywords: perception, vision science, perceptual anti-individualism
Perceptual Anti-Individualism
and Vision Science
Caleb Liang
I.Introduction
What is the most fundamental relationship between visual perception and the external world? One influential idea in vision science is that “visual perception is useful only if it is reasonably accurate” (Palmer, 1999). Tyler Burge has recently developed a philosophical theory that greatly articulates and substantiates this idea (2005, 2009, 2010). According to his theory of
perceptual anti-individualism, the nature and individuation of perceptual
states constitutively depend on relations, including causal relations, between perceivers and the environment. Drawing on diverse empirical resources, he argues for two important claims: first, perception delineates the lower border of representational mind and exhibits the most basic form of objectivity (2010: 10, 12). As he puts it, “Perception, representation, and objectivity begin together” (2010: 11). Second, perceptual anti-individualism is
presupposed by vision science (2010: 87, 98-101). Burge contends that
understanding conditions under which perceptual representation is possible” (2005: 9). The goal of this paper is to argue against the second claim.1
To sharpen the focus, let me make two clarifications right away. First, my target is not the general form of anti-individualism2 but its application to visual perception. The former is an abstract view about the individuation of mental states that, Burge holds, can be established without appealing to empirical research. I will not argue against the general anti-individualism, and I will not defend for any form of individualism, either. My concern is the latter, perceptual anti-individualism, which is a philosophical theory of vision. Since Burge claims that it is presupposed by vision science, it is open to empirical investigation whether this is indeed the case. Second, Burge says that even the thesis of perceptual anti-individualism itself, just stated above, is still abstract (2010: 87). As an abstract thesis, it can be compatible with many aspects of vision science. My concern is not with the abstract thesis. The issue to be addressed here is how well perceptual anti-individualism is filled in with details and supported by empirical studies. This is significant because whether vision science really presupposes Burge’s theory ultimately depends on these considerations. As we will see, Burge articulates his theory with specific accounts of veridicality, perceptual norms, and elements of visual representation. These are the places where I will take issue with him.
1 In Origins of Objectivity, Burge forcefully criticizes various versions of Individual Representationalism,
all of which, according to him, over-intellectualize the constitutive requirements of perception (cf. 2010: 12-22 and Part II). But in this paper I focus on his positive theory of perception.
2 Burge: “In its general form, anti-individualism is the claim that (A) the natures of many mental
states constitutively depend on relations between a subject matter beyond the individual and the individual that has the mental states, where relevant relations help determine specific natures of those states” (2010: 61).
The aim of this paper is not to argue that perceptual anti-individualism is false. Rather, I argue for a more modest position, that is, this theory is not the only theoretical choice. Many aspects of vision science can be understood without assuming this theory. After presenting perceptual anti-individualism, three issues will be discussed. First, I consider the notion of homeostasis and point out how it may cast some doubt on the perceptual norms in Burge’s theory. Second, I examine Burge’s view of veridicality. According to this view, in order for a perception to be veridical, both the general and the singular elements of representational content have to be veridical. I argue that many phenomena studied by vision science can be explained without positing Burge’s notion of singular representation. Third, I consider some empirical theories of vision and argue that none of them provides the sort of support that Burge requires. Vision science does not uniquely favor his theory. I conclude that perceptual anti-individualism is not the only framework for understanding how vision works.
II. Perceptual Anti-Individualism
Burge’s theory is quite complex, but its core consists of two parts. The first consists of a set of a priori claims describing the constitutive nature of perception. The second part depicts an overview of vision science, which is meant to show how the science presupposes this theory. The a priori claims are the following.
(1) The constitutive nature of perceptual states depends on a systematic network of causal relations between instances of the environmental attributes and the individual. Burge considers this claim as a “necessary truth” (2010,
85). (2) Perceptual representation has veridicality conditions. Having veridicality conditions, according to Burge, is part of “what it is to be a perceptual state” (2010: 535).3 (3) The representational function of perceptual states is to produce veridical representation. The success or failure of a perceptual state is to be evaluated according to whether it is an accurate representation of the objective world.4
(4) The representational content of perception has two elements. The
singular element “functions fallibly to single out (refer to) perceived
particulars” and is context-dependent (2010: 83, 381). This is to capture the idea that properties or kinds are never perceived in the abstract. An individual always perceives particular objects. The general element “functions fallibly to group or categorize particulars by attributing some indicated kind, property, or relation to them” (2010: 83, 380). This is to capture the other idea that perception necessarily represents particular objects as being a certain way.5
(5) “Perception is a capacity constitutively attributable to individuals” (2010: 369, 373, 536). It is individuals who perceive, not subsystems in the brain. This leads to the view that perception has biological functions in addition to the representational function. The biological functions of
3 Burge: “a constitutively necessary condition of perceptual representation by an individual is that
any such representation be associated with a background of some veridical perceptual representation” (2005, 1; cf. also 2010, 68).
4 As pointed out by an anonymous reviewer, (2) and (3) in Burge’s theory are not the same ideas.
The link between them should not be taken for granted. For example, if (2) is plausible, it by no means implies that (3) is established. I do not address this potential issue in this paper. I thank the reviewer for this comment.
5 Burge also holds that it is a priori that the representational content of perception constitutes a
fallible egocentric perspective on such attributes and particulars (2010: 84, 401, 536). An individual always perceives an object from a given perspective, and the same object can be perceived from different perspectives.
perception contribute to survival, fitness and reproduction (2010: 301, 303). Burge emphasizes that the representational and biological functions of perception are two different kinds of functions; they are dissociable (2010: 302, 308, 411). A perception can be non-veridical but biologically useful. As he puts it, “Evolution does not care about veridicality. It does not select for veridicality per se (2010: 303).” Therefore, the representational function cannot be reduced to biological functions. Perception and representation are distinctive psychological kinds.6
The second part of the theory contains an overview that depicts three key aspects of vision science. The first is the underdetermination problem of
vision. The primary goal of vision science, Burge claims, “is to explain how
perceptual states that are of and as of the environment are formed from the immediate effects of proximal stimulation” (2010: 89). Consider the neural processing of object perception. Various patterns of light, reflected from external three-dimensional (3D) objects, strike the photoreceptors on the retina and form two-dimensional (2D) images of objects. These patterns of light are converted into neural impulses, carried by retinal ganglion cells. They travel through the lateral geniculate nucleus and enter into the visual cortex. The key is that, from 3D objects to 2D images, depth information about objects is forever lost. This creates a problem for the visual system. Different objects from different distances and orientations can project exactly the same 2D image on the retina. Theoretically, for any 2D retinal image there can be infinitely many possible distal causes. How does the visual
6 Regarding these a priori claims, I will leave (1) and (5) aside, even though I disagree with their
system “figure out” which external object is the right one? This is called the “inverse problem” (Poggio et al., 1985; Palmer, 1999; Pizlo, 2008). Burge correctly characterizes this as a problem of underdetermination.7
Let me make a critical observation. There are actually two different problems involved here that need to be explained. One is that the actual
distal causes are underdetermined with regard to the 2D retinal images. This
is due to the fact that, with suitable orientations and distances, different external 3D objects can project identical 2D images on the retina. Some vision scientists call this the “ambiguity” problem (Purves & Lotto, 2003; Pizlo, 2008). The other is that the perception of an object can remain constant while proximal stimulation varies. For example, looking at an object from different perspectives, my retinal images are changing, but I still perceive the object as having the same shape and size. This is the problem called “perceptual constancy” (Palmer, 1999; Pizlo, 2008). In the case of shape, shape ambiguity is about how to identify the actual distal cause from indefinitely many potential causes on the basis of a single retinal shape. This does not involve changes in viewing perspectives. On the contrary, shape
constancy essentially involves changes in viewing perspectives. It is about
how to produce an invariant percept of shape, regardless of different 2D images, due to such changes. Since these are different problems, solving one does not imply solving the other.8
7 Retinal information, Burge says, “significantly underdetermine[s] the distal causes of those
registrations, hence the objects and properties that are represented in perception, hence representational content as of those objects and properties … The initial sensory registration of proximal stimulation in itself also underdetermines what perceptual representations the perceptual system will form” (2010: 90).
Facing the inverse problem (more precisely, the ambiguity problem), how does science explain 3D object perception? Burge argues that the mainstream of vision science decisively supports perceptual anti-individualism. The ways in which visual information is processed in the brain can be characterized as constrained or guided by what he calls formation
principles.9 These principles “privilege” or “bias” the neural process such that the underdetermined retinal inputs trigger a unique perception that (often but not always) represents the actual external object. The content of a perception is then determined by the operations of the formation principles embedded in the visual processing.
Obviously, the formation principles themselves require explanations. Perceptual anti-individualism plays an important role here. Burge says,
In every case, formation principles … mirror basic facts in the broader physical environment. These are facts regarding spatial relations, natural forms of motion, the way light patterns tend to correlate with shadows and edges, the way surfaces tend to have unseen backsides, and so on. They mirror either environmental laws or deep environmental regularities that hold for the most part … So the natures of specific perceptual states are constitutively associated,
he thinks that both problems can be solved by what he later calls “formation principles” (2010: 92-94, 397-400).
9 Burge: “The dominant scheme in the psychology of vision … is to explain a series of unconscious,
largely automatic transformational processes that lead from registration of the array and spectral properties of light striking the retina to the formation of perceptions as of specific aspects of the distal environment … The transformations operate under certain principles that describe psychological laws or law-like patterns. These laws or law-like processes serve to privilege certain among the possible environmental causes over others … I call psychological principles that describe, in an explanatory way, these laws or law-like patterns formation principles” (2010: 92).
via causal relations, with specific attributes, laws, and patterns in the environment. (2010: 98-99)
The idea is that the physical environment is not chaotic; it has various regularities. The formation principles do not simply come from nowhere. Rather, they stem from perceivers’ long-term interactions with the world, and hence reflect those regularities of the environment. Burge holds that the formation principles can be explained “only by reference to the way in which patterns in the perceptual system’s natural environment have molded the nature of the perceptual system and its perceptual states” (2010: 100). He discusses a few cases, such as convergence, lightness constancy, etc., and concludes that vision science is committed to perceptual anti-individualism.
The second aspect is a research tenet in the practice of vision science that Burge calls the proximality principle (2005: 22). He says:
The formation of perceptual states depends causally, in any given instance, on registration of proximal stimulation. The same attributional kind of perceptual state, with the same attributional representational content, can be caused by the same type of registration of proximal stimulation, whether or not the perceptual state has perceptual representata - whether or not it is a perception
of anything at all. (2010: 389)
According to this principle, the causal process of perceptions depends exclusively on proximal stimulation and visual processing in the brain. Let me add that there is an empirical justification for this tenet. The visual
system does not have infinite capacities to process innumerable distal objects. Rather, the visual system responds to similar objects or situations with similar patterns of processing. Vision science does not aim to explain particular cases; it studies patterns of interactions with the environment. It is very probable that the proximality principle captures how the visual system in the brain actually works. Hence, it is reasonable that the causal explanations provided by vision science are constrained by this principle.
The third aspect is perceptual constancy. Given the solution of the inverse problem and the proximality principle, the notion of objectivity in vision science is explained by the distinction between registration of sensory information and perceptual representation. Burge says,
In effect, the transformation patterns systematically distinguish the merely proximal from the probably environmental … Specification of mind-independent and constitutively non-perspectival physical entities is separated out from the individual’s sensory registration … Perceptual constancies are capacities for objectification … Objectivity is the product of separating what occurs on an individual’s sensory surfaces from the significance of those stimulations for specific attributes and particulars in the broader environment. In this way, perception is the product of objectification. (2010: 398-400)
When I move towards a car, the visual information registered on my retina changes in a systematic way, and the 2D images gradually occupy a larger area in my visual field. However, the size of the car does not appear to change; it does not look to me as if it is getting bigger. This is “size
constancy.” When I look at the Eiffel Tower and walk around it, the proximal sensory stimulations received by my visual system vary with respect to my pace, direction, and eye orientation. Yet the position of the Eiffel Tower does not seem to alter at all; it appears to me as being located in the same place. This is “position constancy.”10 According to Burge, objectivity is embedded in perceptual constancy studied by vision science. Perceptual constancy shows that what perception represents is a non-perspectival objective reality. For Burge, perceptual constancy is both necessary and sufficient for objective perceptual representation (2010: 413). Perceptual constancy draws the line between mere sensory information and perception, and it is this line that marks the beginning of perception, representation, and objectivity. Since perceptual constancy is also explained by formation principles, Burge concludes that it reveals how vision science supports his theory (2010: 346, 358, 365, 400).
III. Homeostasis and Norms of Perception
According to Burge’s theory, the primary norm constitutively associated with perception is to “perceive things as they are - to form veridical perceptual representation” (2010: 312). To evaluate this claim, I would like to consider: How accurate should a perception be in order to be accurate
enough? The rationale behind this question is that veridicality is a matter of
degree. The notion of homeostasis will play a key role here. As researchers characterize it, homeostasis is “a dynamic and ongoing process comprising
10 Perceptual constancy is pervasive; other forms of perceptual constancy include “color constancy,”
many integrated mechanisms that maintain an optimal balance in the physiological condition of the body, for the purpose of survival. In mammals, these include autonomic, neuroendocrine and behavioral mechanisms” (Craig, 2003). Animals engage in various types of actions, such as eating, mating, navigating, predating, etc., for the purpose of maintaining homeostasis. For animals, the ecological environment is essentially characterized by its homeostatic values. In this regard, perceptual systems (and their biological functions) can be understood as part of the homeostatic system.
To see how this relates to our evaluation of perceptual anti-individualism, consider an important feature of the human visual system. That is, the neural-physiological structures of the human visual system are highly
specific and contingent. Just to give a few examples: (i) Not all lights are
visible. The wavelengths of visible light range roughly from 400-700 nanometers. (ii) The two kinds of photoreceptor cells, rods and cones, with roughly opposite sensitivities to light intensities, are not distributed evenly on the retina. The density of cones is high in the vicinity of the fovea and low in the periphery, which is opposite to the rods. The distributions of the three types of cone cells are also not even. (iii) The receptive fields of the retinal ganglion cells have a central-surround structure, which suggests that they are meant to represent contrast of light intensities rather than the exact spectrum of a particular light pattern. (iv) Vision acuity is high only when fixated by the fovea; peripheral vision is actually blurred. This is compensated for by various patterns of saccades. But during an ongoing saccade, the visual system does not take in any information at all. (v) During visual processing, a lot of visual information is dropped by the visual system. Only part of the
visual information gets into the range of visual attention and is stored in visual memory. Other contingent features include color contrast, filling-in, change-blindness, etc. (Palmer, 1999; Snowden et al., 2006; Baars & Gage, 2010).
The point is that, given the role of homeostasis, it is not obvious at all that these particular features are meant to help the visual system to “perceive things as they are.” The visual system does not process every piece of information about the world that it receives, and it does not produce precise representations of everything in all their details that stand before our eyes. It seems that the perceptual states do not primarily aim to represent “things as they are” but to serve the needs of maintaining homeostasis. The visual system can function perfectly well to maintain homeostasis without fulfilling the primary norm proposed by Burge. The neural-physiological structures of vision do not unquestionably uphold the primary norm of Burge’s theory.
Burge is aware of this potential objection. Some other norms in his theory attempt to respond to this worry. For example, he says: “A second natural representational norm constitutively associated with perception is to perceive as well as the perceptual system can, given its natural limitations, its input, and its environmental circumstances” (2010: 312). The visual system has various limitations, well-studied by vision science and recognized by his theory. He repeatedly emphasizes that the primary norm should be taken as an “idealization;” “perceptual representational contents when successful, are commonly only veridical within some range. So approximate veridicality is
what is often at issue” (2010: 535).11 Adding this proviso makes perceptual anti-individualism more flexible to accommodate various non-ideal perceptual situations, and the primary norm remains central in the constitutive account of perception. As Burge comments, although the visual system is limited in many ways, “Still, veridicality is at the center of the natures and laws or law-like transformations that are central to perception and the subject matter of perceptual psychology” (2010: 535).
I do not think that the worry is really relieved by this proviso. First of all, since each specific limitation of the visual system is utterly contingent, making the proviso constitutive of the nature of perception seems a bit ad
hoc. Moreover, given the contingent structures of the visual system, it is not
quite right to say that strict veridicality is seldom achieved by perceptual representation. Compared with Burge’s view, it seems more precise and adequate to say that there is never a perception that fully satisfies Burge’s primary norm. Taking veridicality as idealization is not the best way to characterize the contingent features of the visual system. They can be better accommodated without appealing to Burge’s primary norm and proviso.
An alternative way to consider the contingent features of the visual system, I suggest, is that veridicality is constrained by homeostasis. The question of how accurate is accurate enough is answered by homeostasis. I am not suggesting that veridicality should simply be replaced by homeostasis. I agree with Burge that the representational function of perception cannot be reduced to biological functions. My point is that the representational function of perception is not independent of biological functions, i.e. the former is
constrained by the latter. A successful perceptual representation does not need to be more accurate than homeostasis requires. A veridical perception is certainly helpful in maintaining homeostasis. Still, given the homeostatic constraint on veridicality, it is not mandatory to think that the representational content of perception must aim to enable animals to perceive things as they are. Not because the goal of achieving strict veridicality is hindered by various limitations of animals’ visual systems, but because animals and their visual systems do not have to pursue this goal. The primary and secondary norms proposed by Burge are not the only theoretical choice for the purpose of understanding the nature of perception.
IV. Veridicality and Singular Representation
According to perceptual anti-individualism, it is constitutive of perceptual representation that it contains both general and singular elements. Perceptual states are produced by the visual system following formation principles, but the operations of formation principles are causally constrained by the proximality principle. Although what we perceive are particular objects, the visual system responds to similar objects or situations with similar patterns of processing. This is accommodated by the general element of perceptual representation. Or, as Burge states in another way, each perception has a “pattern-based” representational content (2005: 35).
Due to the proximality principle, it is possible that different perceptual contexts may trigger the same pattern of interactions, hence producing perceptions that are type-identical. For example, a subject may have a veridical perception of a particular object; on another occasion he may have a
veridical perception of a duplicate of the same type of object; on a third occasion he may have a matching hallucination of the object. The perceiver may be unable to distinguish between these situations because the same pattern-based representational content is produced. This means that, when the visual system is responding to the distal environment, it is responding to patterns or types of input (Burge, 2005: 5-6, 23-24). “The response to the input characterizes the distal environment as being of a certain kind” (2005: 6). So, as Burge characterizes it, the causal explanations provided by vision science “do not primarily explain particular events. They explain patterns, tendencies, general abilities, and so on” (2005: 32).
For Burge, a theory of perception must provide an account of how the veridicality of a perception is to be evaluated. It must be able to distinguish between veridical and non-veridical perception, and between seeing a particular object and seeing a numerically distinct duplicate. The general element, as constrained by the proximality principle, cannot reflect these differences. So Burge argues that perceptual representation must also contain a singular element to meet this requirement (2010: 389-390). It is the singular element that marks the differences among perceptions that are subjectively indistinguishable but with different veridicality values. For Burge, in order for a perception to be veridical, both the general and the singular elements of representational content have to be veridical (2010: 383). Let me call this a “full-hearted” notion of veridicality.
The issue that I want to raise is: Must we assume the “full-hearted” notion of veridicality in order to explain how vision works? I do not think so. It is my view that when a vision scientist says that “visual perception is
useful only if it is reasonably accurate” (Palmer, 1999: 6), the notion of accuracy in this remark can be understood as involving only the general element, not the singular element of perceptual content.12 If so, from the standpoint of perceptual anti-individualism, the notion of veridicality in vision science would be only “half-hearted” in the sense that it takes into consideration only the veridicality of pattern-based representation. This would be unacceptable to Burge because he believes that the veridicality conditions of the subjectively indistinguishable situations mentioned above cannot be properly distinguished without positing the singular element. In the following I argue that there are various phenomena, seemingly supporting Burge’s theory, can actually be explained by vision science without positing singular representation and the full-hearted notion of veridicality.
(1) Perceptual constancy. As Burge agrees, the visual processes that produce perceptual constancies are constrained by the proximality principle. He says: “exercises of the capacities [of perceptual constancies] are triggered even in cases where the proximal stimulations derive from no (environmental) objects of perception” (2010: 388). I fully agree with Burge on this point. A perceptual state can exhibit constancy without being veridical. A visual hallucination of an object can exhibit size or shape constancy, yet the singular element fails to refer to any object. However, this implies that perceptual constancy and veridicality are not the same. An empirical theory can explain perceptual constancy without embracing Burge’s full-hearted notion of veridicality. As an essential feature of perception, perceptual constancy certainly reveals a type of objectivity. But this is a type of
objectivity that can be understood without assuming Burge’s theory. The distinction between sensory registration and perception drawn by perceptual constancy does not substantiate the kind of objectivity that is defined by Burge’s notion of veridicality.
(2) Visual illusions. A defender of perceptual anti-individualism may contend that vision science has to posit the singular element in visual representation when it comes to explain visual illusions. For Burge, the formation principles of perception explain not only how the underdetermination problem is solved, but also “conditions in which perceptual systems yield misperception” (2010: 384). He says that “Failures of approximate veridicality - illusions - are explained primarily in terms of abnormal environmental conditions’ producing proximal stimulations that would yield veridical representations under more normal conditions” (2010: 98). Since visual illusions are explained in terms of abnormal distal causes, Burge’s notion of singular representation plays an essential role in vision science.
I disagree. Just because vision science appeals to distal causes to explain illusions, it does not follow that singular representation must be posited. Of course, what animals perceive are particular objects. But vision science does not aim to study specific particulars but their properties sharable with others. To see this point, let us consider: What would remain if all the pattern-based representations are removed from a perceptual state? The pattern-based visual representations of, say, a red flower, include representations of the color red, the shape and orientation of the flower, the distance in-between, etc. Suppose all these representations are removed. What remains would be a pure bearer or point of reference, which amounts to the singular element in
Burge’s theory. Does vision science appeal to such a pure reference to explain illusions? I do not think so. Consider the case of Ames’ Room, an interesting visual illusion in which the subject misperceives a trapezoidal room as rectangular, and misperceives an adult as absurdly smaller than a child in the room. The explanatory role of the distal causes, the room and the adult in it, is not to fix the pure references of what is seen. Rather, what explains the illusion is that, due to a manipulation of depth information, the
properties of the distal causes, that is, the trapezoidal shape of the room and
the actual height of the adult, are misrepresented by the pattern-based representations in the subject’s perceptual state. The singular element in Burge’s theory is unnecessary in this explanation.13
(3) Multiple object tracking. A defender of Burge’s theory may draw on Zenon Pylyshyn’s work on multiple object tracking (MOT) and his visual
index theory (2003, 2007). The MOT experiment shows that normal subjects
can visually track up to five randomly moving objects.14 Pylyshyn explains
13 The point here applies to many other illusions studied by vision science, e.g., the Müller-Lyer
illusion, the Ponzo illusion, the Ebbinghaus illusion, etc. (Palmer, 1999: 324, 326). What about the distinction between a veridical perception of a particular object and a veridical perception of a numerically distinct duplicate? If, due to the same pattern-based representations, the subject mistakes the latter for the former, this can be regarded as a matter of thought being mistaken rather than false perception. What about the distinction between a veridical perception and a subjectively indistinguishable illusion? It is true that the notion of distal cause is required to draw this distinction. Still, it does not follow that vision science aims to explain how the pure reference of a perceptual state should be determined. At least in the cases discussed in this section, the explanatory powers of distal causes lie in their sharable properties rather than serving as pure references. The task of determining the veridicality of an indistinguishable illusion does not properly characterize what vision scientists do in their research.
14 Pylyshyn: “In a typical MOT experiment, observers are shown a screen containing 8 simple
identical figures (e.g., points, circles, squares, plus signs, figure eights) that move in unpredictable ways. … At the start of each trial, a subset of these objects is briefly rendered distinct (usually by flashing them on and off a few times). The observer’s task is to keep track of this subset of objects. At some later time in the tracking trial (say 5 to 10 seconds into the trial) all the objects stop moving and the subject has to indicate (using a mouse pointing device) which objects were the
this by postulating a mechanism called “visual index” in the visual system. A crucial feature of this mechanism is that it enables us to track visual objects without representing any of their properties. Pylyshyn says that “the early visual system possesses a mechanism for detecting and tracking what I will refer to as ‘primitive visual objects.’ It does so by keeping track of them as
individuals rather than as ‘whatever is at location X’ or ‘whatever has
property Y’ ” (2003: 180; cf. also 201, 206, 214; 2007: 38-39). In this sense, objects are “indexed directly” (2003: 202). A defender of Burge’s view might consider the visual index theory as a strong case that vision science does posit singular representation in visual perception.
Unfortunately, if we look into Pylyshyn’s theory, we will see that visual indexes are not considered as representations at all. He advocates a “conservative use of representations in theories”, according to which “we should not postulate representations if no explanatory advantage is gained by such a postulate” (2007: 78).15 This view is firmly embraced by Burge as well.16 It is correct that, for Pylyshyn, no property is represented (or encoded) when an object is indexed by the visual system.17 But this does not provide any defense for Burge’s view. Two crucial points in Pylyshyn’s theory are
targets. A large number of experiments have shown clearly that observers can indeed track up to 5 independently moving identical objects (i.e., objects that are indistinguishable by any property other than their historical continuity with the initially distinct objects” (2003: 223-224; cf. also 2007: 34-37).
15 Pylyshyn distinguishes between information registrations and representation. The difference
between the two lies in the fact that the former does not allow for the possibility of misrepresentation (2007: 74-75). If something plays a causal role in visual processing it does not follow that it plays this role by being represented (2007: 73).
16 See Burge’s criticisms of what he calls the deflationary conceptions of representation (2010:
292-308, especially 292, 299, 301).
17 Pylyshyn: “no represented (or encoded) property is used in making the assignment of an index”
relevant here. First, as Pylyshyn says, “a sudden onset of a visual object may cause an index to be assigned without the assignment either being based on prior encoding of the event as an onset or itself carrying the information that an onset occurred” (2003: 218). This is to say that, as a causal event, a visual index is not itself encoded or represented. Second, Pylyshyn says that “Indexes, unlike codes, pick out things in the world to which they are related by a causal event, and they do not encode these things as something or other; indeed they do not encode them at all” (2003: 219). According to this point, even the indexed distal objects are not themselves encoded or represented. Hence, it is not the case that singular representation is required in the visual index theory. What one perceives are indeed particular objects, but it does not follow that it is mandatory to posit singular representation to explain multiple object tracking.
I have argued in this section that at least three types of visual phenomena - perceptual constancy, some visual illusions, and multiple object tracking - can be explained without appealing to singular representation or the full-hearted notion of veridicality. These phenomena, of course, do not cover all the areas in vision science. But they do challenge the claim that perceptual anti-individualism is the only framework for understanding how vision works.
V. Empirical Theories of Vision
In this section, I consider some empirical theories of vision and discuss how they may bear on perceptual anti-individualism. 3D object perception involves very complex processing. Most vision scientists would maintain that
we are still far from a full understanding of how it works. Currently, there are diverse accounts of the functions and mechanisms of visual perception. We will see that not all of them support Burge’s position.18
(1) Let us first consider two major approaches to object recognition in cognitive science. The Recognition-by-Components Theory explains shape constancy by postulating 3D volumetric and viewpoint-independent units called geons (Biederman, 1987, 2007). Each object is analyzed and represented by the visual system as composed of a set of geons arranged in a certain way. The relations between geons are specified by a structural description using a viewpoint-invariant frame of reference, for example, a cylinder “on-top-of” a brick. A complete representation of an object is called a geon structural description, consisting of a set of geons together with a structural description of their relations. To perceive an object is for the visual system to generate a particular geon structural description for that object. It involves a series of processes: First, the edges of an object are extracted from the retinal images. Second, the non-accidental properties of the image are detected and parsed into regions. Based on this information, in the third stage, a particular set of geons is identified. Since geons are defined by non-accidental properties of retinal images, they explain why objects can be perceived as having the same shape regardless of changes in the retinal images. In the fourth stage, the geon structural description of an object is
18 Burge regards the empirical account by Shepard (2001) and the Bayesian “ideal observer theory”
by Geisler (2008) as supporting perceptual anti-individualism (Burge, 2010: 99). For critical evaluations of Shepard’s view, cf. Kubovy and Epstein (2001) and Hatfield (2003). For critical comments on the Bayesian approach to perception, cf. (Purves & Lotto, 2011: 13-14) and (Purves et al., 2011, 15594).
compared with object representations stored in memory. The object of perception is recognized when there is a match.19
Another approach, the Multiple-View Theory, proposes a very different explanation of shape constancy, according to which object perception is essentially viewpoint-dependent and image-based. This theory consists of two main ideas. First, the visual system employs only 2D images to construct object perception. When one sees an object, a 2D image of the object is formed that depicts it from a particular viewpoint. Each 2D image represents various aspects of the object including shape, depth, color, texture, shading, etc., in a viewer-centered frame of reference (Tarr, 1995: 56). An ordinary object is represented by multiple 2D views that depict the object from various perspectives (Tarr & Bülthoff, 1995: 1495). Second, shape constancy is achieved by comparing 2D images on the retina with a collection of views already stored in memory. This is done by a set of transformation mechanisms (Bülthoff et al., 1994). When one perceives an object, the mechanisms bring the current retinal image into alignment with one of the
19 The Recognition-by-Components Theory predicts that, when one visually identifies objects, the
performance will not be affected by objects perceived from different perspectives. Biederman and Gerhardstein (1993) report several experiments that support this theory. In one experiment, 48 subjects were presented with a set of familiar objects, one at a time for 200 milliseconds (ms), followed by a mask for 500 ms (see Biederman & Gerhardstein, 1993: Figure 5). Then the set of objects was shown again for 100 ms, with some presented at the same orientation and some at different orientations (0°, 67.5°, and 135°). Although their orientations might change, the same parts of the objects would still be visible. Subjects were asked to name the objects as quickly and accurately as possible. Calculating the mean reaction time and error percentage on many trials, the results indicate that changing the orientations of the objects by rotation does not affect the subjects’ performance. This suggests that object recognition is viewpoint invariant. The proponents of this theory also try to find supporting evidence in cognitive neuroscience. It is widely known that neurons in the macaque’s inferior temporal cortex respond to shape properties (Tanaka, 1993). Hayworth and Biederman (2006) use an fMRI adaptation paradigm to argue that the lateral occipital complex is more sensitive to parts than to local features. They argue that these results fit nicely with the Recognition-by-Components Theory.
stored views by a set of mental transformations, such as rotations, translations, dilations, reflections, etc. (cf. Palmer, 1999: 364-365). The object is recognized when there is a match.20
For our purpose, the main point is that neither the Recognition-by-Components Theory nor the Multiple-View Theory lends obvious support to perceptual anti-individualism. As I argued above, a theory of vision can explain perceptual constancy without embracing Burge’s full-hearted notion of veridicality. This means that Burge can only seek support from empirical theories that treat the ambiguity problem, rather than the constancy problem, as the central problem of vision. As we can see, both the Recognition-by-Components Theory and the Multiple-View Theory are about shape constancy rather than ambiguity. They can be understood and evaluated without assuming Burge’s theory.
One possible defense of Burge’s view is to argue that perceptual constancy is not totally independent of the ambiguity problem. Shape constancy cannot be fully explained by either of the two approaches above
20 This approach predicts that the performance of object recognition will be affected according to
how different the current view is from a familiar view in memory. Tarr (1995) conducted a series of psychophysical experiments to support this theory. In one experiment, 12 subjects were presented with seven left/right and front/back asymmetrical objects. After some training trials of viewing standard versions of the objects, both the standard versions and the mirror-reversed versions (produced by rotations of 130° around the x-, y-, or z-axis) were shown to the subjects, one at a time. They were asked to decide as quickly and accurately as possible whether the object was the standard or the mirror-reversed version of one of the objects that they had seen in the training trials. After calculating the mean reaction time and error percentage of many trials, Tarr reported the following findings: (1) the subjects’ response time increased with the angular distance from the training viewpoint of seeing the standard versions of objects; (2) after some practice, performance became nearly equivalent at all familiar viewpoints; and (3) at unfamiliar viewpoints, response times increased with the angular distance from the nearest familiar viewpoint (Tarr, 1995, 64). Tarr argues that these findings strongly uphold the Multiple-View Theory. In addition, some neurophysiological data are considered as evidence for this theory. Logothetis et al. (1995) have reported that many neurons in monkey’s inferior temporal cortex are sensitive to specific views of an object, and that different neurons encode different views. This, according to Tarr and Bülthoff (1998), provides evidence for multiple-view representation.
unless a solution to the problem of shape ambiguity is assumed. I believe that empirical support is required in order for this defense to begin to work, and I suspect it would not be easy to find.21 Currently, as far as I know, empirical theories of object recognition explain only shape constancy. The defender of Burge’s view demands that theories of object recognition should address ambiguity as well. It is not obvious whether most vision scientists would accept this demand. As we will see later in this section, even for those theories that accord the ambiguity problem a central place in vision research, not all of them concur with perceptual anti-individualism.
(2) Donald Hoffman (2009) advocates what he calls the “interface theory of perception,” which employs the idea of a user interface from computer science. An icon on a computer screen has a particular color and shape and is associated with some stored file. But the icon’s color and shape do not represent or reconstruct the “true” color or shape of the file. Computer files do not have any color or shape, and, as a user interface, the icon does not reconstruct anything. Hoffman holds that a user interface is useful precisely because it is not a reconstruction. He says: “The user interface is there to facilitate our interactions with the computer by hiding its causal and structural complexity” (2009: 154). A user interface is a convenient tool for specific purposes and nothing more.
Applying this idea to perception, Hoffman says that “Our perceptions are a species-specific user interface … to guide adaptive behavior in our
21 The support that this defense needs here, I think, is to find empirical theories that (1) explain both
shape constancy and shape ambiguity, (2) formulate the two issues as connected in the way suggested by Burge, and (3) justify why shape constancy and shape ambiguity cannot be explained by pattern-based representations alone. As I see it, it is not easy to fulfill all of these requirements.
niche; accuracy of reconstruction is irrelevant” (2009: 154-155). Hoffman rejects what he calls the “principle of faithful depiction,” the idea that the primary goal of perception is to provide veridical representation of the physical world (2009: 149). What perception does is not represent certain properties or categories of the objective world (2009: 153). Rather, “it is construction of a niche-specific, problem-specific, fitness-enhancing interface” (2009: 156). The aim of a vision theory is not to explain how veridical representations are produced by the visual system.
Like other empirical theories of vision, Hoffman’s interface theory is controversial. It is not my goal to assess the explanatory power of the interface theory here. However, I will make two remarks with regard to our evaluations of Burge’s theory. First, the conflicts between perceptual anti-individualism and the interface theory are obvious and serious. Proponents of both theories share an equal burden of defenses and criticisms. Since Burge maintains that perceptual anti-individualism provides the only framework within which vision science can be understood, a defender of his theory would need to establish that the interface theory is in principle incapable of explaining how the visual system solves the inverse problem. This is not an easy task. Until this is done, the interface theory remains a competitor.
Second, the dispute between Burge’s view and the interface theory is very similar to the debate between scientific realism versus instrumentalism. The former claims - but the latter rejects - that the aim of vision science is to explain veridical representation of the world. If most researchers in applied sciences unreflectively take a realist stance about the world, this by
itself would not show that scientific realism is true. Likewise, employing the notion of veridicality in a philosophically naive way cannot be regarded as lending decisive support to perceptual anti-individualism. When vision scientists assume that we have veridical perceptions, it may just mean that they are naive realists. They do not, nor do they need to, consider whether their empirical research may help deal with certain philosophical issues. Therefore, in the practice of vision science, assuming that we often have veridical perceptions can just be a pragmatic or convenient choice, rather than a mandatory one. The interface theory illustrates that vision scientists do not have to presume perceptual anti-individualism to conduct empirical research and make sense of their work.
(3) Finally, let us consider a theory that is friendlier to Burge’s account. Purves and Lotto (2003, 2011) maintain that the inverse problem is a central problem in vision science. They construe it as the ambiguity problem, and agree that it is a problem of underdetermination. However, they do not think that it can be solved by positing some a priori constraints or formation principles. Rather, they propose a purely empirical-statistical theory, which can be summarized as follows.
[T]he visual system is not organized to generate a veridical representation of the physical world, but rather is a statistical reflection of visual history … By virtue of trial-and-error feedback over the eons about the success or failure of visually guided behavior in phylogeny and decades of ontogenetic experience, the visual brain simply responds to a stimulus with a pattern of neuronal activity
whose form has been thus determined by the probability distribution of what it has turned out to be in the past (i.e., by the empirical
significance of the stimulus). (2003: 227-228)
In contrast to Burge, Purves and Lotto think that “visual percepts (and the corresponding activity of visual neurons and circuits) do not vary systematically with the physical measurements of objects or light stimuli as such” (2003: 15). The relations between vision and the world are purely contingent and statistical such that the visual system is not guided by formation principles that mirror environmental regularities. The solution of the ambiguity problem is gradually accumulated from the past. The ways in which the visual system responds to stimuli are partly trial-and-error. Then the system gradually learns from feedback over a long evolutionary history such that it acquires the capability of anticipation. That is, percepts produced by the visual system are, so to speak, fallible “predictions” of what is going to happen in the environment. This not only solves the ambiguity problem in theory, but also explains why animals are able to cope with the environment rapidly and efficiently.22
According to Purves and Lotto’s account, the function of the visual system is not to produce veridical representations of the current physical environment. What is represented is the “probability distribution of the possible sources of the stimulus” (2003: 10). What animals see is whatever turned out to be the statistical majority of possible causes of visual stimuli in
the past. Burge might think that this is compatible with his theory. But notice
22 One of the anonymous reviewers has suggested me to take the anticipatory nature of perception
that the statistical majority does not correspond to any particular object but to probability distribution. The feedback of trial-and-error and the statistical majority are understood not in terms of veridicality in Burge’s sense, but in terms of biological functions. Also, visual illusions, according to this empirical-statistical theory, “are neither anomalies nor evidence of biological limitations or constraints, but simply the universal signature of this strategy of vision” (2003: 10). That is, there is no need to appeal to the perceptual norms suggested by Burge in order to understand vision.
VI. Conclusion
I have argued in this paper that various aspects of vision science can be understood without positing singular representation or the full-hearted notion of veridicality. Also, empirical theories of vision do not uniquely support perceptual anti-individualism. I conclude that, pace Burge, perceptual anti-individualism is not the only framework within which vision science can be understood. Let me make a final remark. Burge’s perceptual anti-individualism has decisively elevated philosophical investigations of perception to a new and interdisciplinary level. I believe that what this theory has achieved is just a beginning, not the end. A lot more work can and must be done to deepen our understanding of the nature of perception.
References
Baars, J. & Gage N. (eds.) (2010). Cognition, Brain, and Consciousness: Introduction
to Cognitive Neuroscience, second edition. Oxford: Academic Press. doi:
10.1007/ s00381-010-1178-y.
Biederman, I. (1987). “Recognition-by-Components: A Theory of Human Image Understanding.” Psychological Review, 94: 115-147. doi: 10.1037/0033-295X. 94.2.115.
--- (2007). “Recent Psychophysical and Neural Research in Shape Recognition.” Osaka, N., Rentschler, I. & Biederman, I. (eds.). Object Recognition, Attention,
and Action (Ch. 5, 71-88). Tokyo: Springer. doi: 10.1007/978-4-431-73019-4_6.
Biederman, I. & Gerhardstein P. (1993). “Recognizing Depth-rotated Objects: Evidence and Conditions for 3D Viewpoint Invariance.” Journal of Experimental
Psychology: Human Perception and Performance, 19: 1162-1182. doi:
10.1037/0096-1523.19.6.1162.
Bülthoff, H., Edelman, S. & Tarr, M. (1994). “How Are Three-dimensional Objects Represented in the Brain?” CogSci Memo, No. 5: 1-23. doi: 10.1093/cercor/ 5.3.247.
Burge, T. (2003). “Perceptual Entitlement.” Philosophy and Phenomenological
Research, LXVII, 3, November: 503-548. doi: 10.1111/j.1933-1592.2003.
tb00307.x.
--- (2005). “Disjunctivism and Perceptual Psychology.” Philosophical Topics, 33, 1: 1-78. doi: 10.5840/philtopics20053311.
--- (2009). “Perceptual Objectivity.” Philosophical Review, 118, 3: 285-324. doi: 10.1215/00318108-2009-001.
--- (2010). Origins of Objectivity. Oxford: Oxford University Press. doi: 10.1093/acprof:oso/9780199581405.001.0001.
Craig, A. D. (2003). “A New View of Pain as a Homeostatic Emotion.” Trends in
Neuroscience, 26, 6: 303-7. doi: 10.1016/S0166-2236(03)00123-1.
Geisler, W. S. (2008). “Visual Perception and the Statistical Properties of Natural Scenes.” Annual Review of Psychology, 59: 167-192. doi: 10.1146/annurev. psych.58.110405.085632.
Hatfield, G. (2003). “Representation and Constraints: the Inverse Problem and the Structure of Visual Space.” Acta Psychologica, 114: 355-378. doi: 10.1016/j.actpsy.2003.07.003.
Hayworth, K. & Biederman, I. (2006). “Neural Evidence for Intermediate Representations in Object Recognition.” Vision Research, 46: 4024-4031. doi: 10.1016/j.visres.2006.07.015.
Hoffman, D. (2009). “The Interface Theory of Perception.” Dickinson, S., Leonardis, A., Schiele, B. & Tarr, M. (eds.). Object Categorization (148-166). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511635465.009. Kubovy, M. & Epstein, W. (2001). “Internalization: a Metaphor We Can Live
Without.” Behavioral and Brain Sciences, 24: 618-625. doi: 10.1017/ S0140525X01760086.
Logothetis, N., Pauls, J. & Poggio, T. (1995). “Shape Representation in the Inferior Temporal Cortex of Monkeys.” Current Biology, 5: 552–563. doi: 10.1016/ S0960-9822(95)00108-4.
Palmer, S. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Pylyshyn, Z. (2003). Seeing and Visualizing. Cambridge, MA: MIT Press. --- (2007). Things and Places. Cambridge, MA: MIT Press.
Pizlo, Z. (2008). 3D Shape: Its Unique Place in Visual Perception. Cambridge, MA: MIT Press.
Poggio, T., Torre, V. & Koch, C. (1985). “Computational Vision and Regularization Theory.” Nature, 317, September: 314-319. doi: 10.1038/317314a0.
Purves, D. & Lotto, R. (2003). Why We See What We Do: An Empirical Theory of
Vision. Sunderland, MA: Sinauer Associates, Inc. doi: 10.5860/CHOICE.40-5803.
--- (2011). Why We See What We Do Redux: A Wholly Empirical Theory of Vision. Sunderland, MA: Sinauer Associates, Inc. doi: 10.5860/CHOICE.48-6268. Purves, D., Wojtach, W. & Lotto, R. (2011). “Understanding Vision in Whole
Empirical Terms.” PNAS, 18, suppl. 3: 15588-15595. doi: 10.1073/pnas. 1012178108.
Shepard, R. (2001). “Perceptual-Cognitive Universals as Reflections of the World.” Behavioral and Brain Sciences, 24: 581-601. doi: 10.1017/ S0140525X01000012.
Snowden, R., Thompson, P. & Troscianko, T. (2006). Basic Vision: An Introduction
to Visual Perception. Oxford: Oxford University Press.
Tanaka, K. (1993). “Neural Mechanisms of Object Recognition.” Science, 262: 685-688. doi: 10.1126/science.8235589.
Tarr, M. (1995). “Rotating Objects to Recognize Them: A Case Study of the Role of Viewpoint Dependency in the Recognition of Three-dimensional Objects.” Psychonomic Bulletin and Review, 2: 55-82. doi: 10.3758/ BF03214412.
Tarr, M. & Bülthoff, H. (1995). “Is Human Object Recognition Better Described by Geon-structural-descriptions or by Multiple-views?” Journal of Experimental
Psychology: Human Perception and Performance, 21, 6: 1494-1505. doi:
10.1037/0096-1523.21.6.1494
--- (1998). “Image-Based Object Recognition in Man, Monkey, and Machine.”