The Interface Hypothesis account

CHAPTER 6 GENERAL DISCUSSION

6.4 The Interface Hypothesis account

The Interface Hypothesis proposed by Kita and Ö zyürek (2002) suggests that gestures originate from an interface representation between speaking and spatial thinking. The interface representation, according to their hypothesis, is a spatio-motoric representation that represents the action and spatial information in terms of actions that are tailored for the purpose of speaking. Gestures are generated from this general process, where action and spatial information are made into a plan for action in real and imagined space. In the model of speech and gesture production proposed by Kita and Ö zyürek, they further illustrated that this interface representation which is a general mechanism which generates actions and gestures is the Action Generator. The Action Generator can interact on-line with the Message

Generator—the mechanism whereby the way in which a proposition which is to be

verbally expressed is formulated, and can receive guidance and feedback from the

Message Generator as to which spatial information should be selected from the

working memory or the environment. The two generators can constantly exchange information and also initiate the organization of information independently. In other words, the interplay between the Action Generator and the Message Generator allows the gesture content to be shaped simultaneously by the spatial features of the real and imagined space and the linguistics formulation process. Thus, according to the Interface Hypothesis, the gesture content is determined by both the information encoded in the accompanying speech, and the spatial properties of the events that may not be linguistically and verbally encoded. A gesture, interacting on-line constantly with the Message Generator, can further structure the information in a way that is relatively compatible with linguistic encoding possibilities for the purpose of speaking.

With regard to the current data, the Interface Hypothesis would then predict that for a given gesture one might be able to observe simultaneously the influence of both the linguistic formulation possibilities and the spatio-motoric properties of the referent that may not be verbalized in the accompanying speech. The current data has shown that 69.8% of all iconic gestures only encode information that is also encoded in the accompanying speech. The other 30.2% of gestures encode not only the same

information that is being verbally expressed, but reflect extra information concerning the third-person past events being described which is not conveyed in the accompanying speech. For these 30.2% of gestures, we have further suggested types of information that could be additionally encoded through gestures, including trajectory, manner etc. From these 30.2% of gestures, we can see that certain features of the past event, when represented through gesture as imagery, can be specified and conveyed despite the fact that they are not mentioned in the accompanying speech. In addition, the 30.2% of gestures provide evidence to support the Interface Hypothesis in that within a given gesture, the gesture content is shaped simultaneously by the linguistic information and the spatial properties concerning the past events that can be represented as imagery.

With the current data as evidence to support its prediction, the Interface Hypothesis also provides an explanation for the division of labor between linguistic and gestural viewpoints. From the quantitative study on the distribution of viewpoints in both the linguistic and gestural channels in the current data, we can observe the division of labor whereby certain viewpoints tend to be expressed in their respective modalities. From the perspective of each viewpoint, observer viewpoint seems to be the most unmarked way that is the most frequently adopted viewpoint in language (60.5% of all linguistic viewpoints) and is also regularly expressed in gesture (43.7%

of all gestural viewpoints) to represent third-person past events. In representing observer viewpoint through language, plain statements are predominately used for speakers to talk about past events in the role of an observer, to simply and plainly describe the spatio-temporal information concerning the past events without attempting to walk into the scene and act as a character in the event.

With regard to character viewpoint, it is the one adopted most frequently in the gestural modality, but least often in language. This suggests that in talking about third-person past events, a speaker’s attempt to enact the role of a character is more likely to be realized through the use of gesture rather than speech. The information encoded in gesture often comes to represent the past event as if the speaker himself was the person who was involved in the original scene, but information encoded in the accompanying speech rarely does. On the contrary, speaker viewpoint is rarely seen in gesture, with only four cases in the current data. Information that comes to be structured as the revelation of a speaker’s attitude towards or comment on past events or as a speaker’s attempts to interact with other co-conversationalists which both

serve to indicate the current role of the speaker as a speaker within the ongoing conversation is rarely encoded in the gesture content. Consequently, if speakers want to reveal their here-and-now state as a current speaker to maintain the ongoing conversation when talking about third-person past events, they generally tend to do it

through the linguistic rather than gestural channel. The very different, nearly

contradictory distributions of character viewpoint and speaker viewpoint in linguistic and gestural modality not only suggest that speakers’ gestures performed in the

descriptions of third-person past events do not always collaborate with the accompanying speech in expressing the same viewpoint, but also shows that there is division of labor between language and gesture in expressing viewpoints. The model proposed by Kita and Ö zyürek (2002) based on the Interface Hypothesis can provide a theoretical account for this situation.

According to the Interface Hypothesis, the division of labor between the two modalities is planned at the Communication Planner proposed in the model. Based on this model, the Conceptualizer—the planning process of speech and gesture production at the conceptual level, is split into the Communication Planner and the

Message Generator (see Figure 1 in Chapter 2). In the case of the Communication

Planner, its primary function is to generate the communicative intention—a rough

specification of what needs to be communicated when, and achieves what Levelt (1989) called the “macro-planning” such as a rough decision on information to be

expressed and its ordering, and the selection of appropriate speech acts. More importantly, it determines which modalities of expression should be involved.

Consequently, when the communicative intention is determined, it is then given to the

Message Generator and the Action Generator. The two generators can then converge

and exchange the information intended to be conveyed in either the linguistic or gestural channel. Despite the fact that the Communication Planner does not necessarily determine exactly what information is to be expressed in each modality, it

is able to explicitly divide the labor between the two modalities to achieve different communicative goals. For example, in Kita and Özyürek’s study (2002), they suggested that the division of labor might occur when an expression such as “like this” is the communicative intention. In this case, two coordinated but different goals

might be planned in the Communication Planner when the gesture is aimed to iconically “demonstrate” the referents and the speech is aimed to “index” the gesture;

in spite of the same communicative intention—the expression of “like this”. The two goals would then be sent to the two generators respectively for the formulation process. In other words, when two different communicative goals are aimed to be achieved in two modalities with respect to the same communicative intention, the division of labor can be planned in the Communication Planner.

With respect to the current data, we can say that the communicative intention for speakers in talking about third-person past events generally might look like, “My global intention is to talk about an event that has happened to a third-person.” When the decision has been made concerning the communicative intention, the rough

specifications of the information to be encoded then start to be generated. However, speakers’ different communicative goals in making comments on the events or

interacting with other co-conversationalists that serve to maintain the ongoing conversations, or performing characters’ parts in the original events might also want

to be achieved despite the fact that the global intention is to describe a third-person past event. In this case, the Communication Planner can explicitly divide the labor between speech and gestural modality to achieve these goals. The current data suggest

that language rather than gesture often takes the job of achieving the goal of a speaker’s attempts to maintain an ongoing conversation by conveying the speaker’s

personal comments about or attitudes towards the past events, or interacting with other co-conversationalists that reveal their role as a current speaker. On the other hand, when a speaker attempts to enact the role of a character in an event is the communicative goal that is aimed to be achieved, the Communication Planner would separate this labor out to the gestural modality. With the Communication Planner proposed in the Interface Hypothesis that can divide the labor of representing certain viewpoints between the two modalities, we are able to explain why speaker viewpoint is rarely found in gesture, and why character viewpoint is rarely represented in speech in the current data. In the models proposed by other hypotheses, the division of labor between language and gesture in expressing viewpoints cannot be explained. Indeed,

a speaker’s division of labor between linguistic and gestural viewpoints in talking about third-person past events has suggested that the gesture production system is not completely and independently separate from the speech production system as suggested by the Free Imagery Hypothesis, where gesture content is not influenced by language. The gesture production system is also not completely restrained by the speech production system as claimed by the Lexical Semantics Hypothesis, where gesture content must be generated from the semantics of lexical items uttered in speech. Rather, the division of labor between the linguistic and gestural viewpoints suggests that speech and gesture production do not work independently in the case where language does not have any impact on gesture content, and certainly neither do they work completely in unity where gesture should always encode what is encoded in speech. The Interface Hypothesis has provides an explanation concerning this fact, due to the view proposed by this hypothesis also highlights the certain but not definite influence of language.

In sum, in proposing that gestures are generated in an interface representation between speaking and spatial thinking, the Interface Hypothesis predicts that for a given gesture, one can observe the linguistic formulation possibilities and the spatio-motoric features that are unlikely to be verbally expressed. This prediction is supported by the current data where 30.2% of all iconic gestures encode not only the

same information that is also encoded in the accompanying speech, but also encode spatial properties concerning the past events that are not being uttered. The Interface Hypothesis also provides a model of speech and gesture production that is able to explain the division of labor in viewpoint expressions between the speech and gestural modalities that other hypotheses cannot explain. The Communication Planner in the proposed model which generates the global communicative intention could divide the labor into either the speech or gesture modality, when different communicative goals that result from different viewpoint expressions are to be achieved.

6.5 Summary

In this chapter, the discussion on the comparison of the current study with McNeill’s gestural study on viewpoints is first presented. Theoretical accounts of the collaborative expressions of viewpoints in language and gesture that lead us to see the processing of speech and gesture in human communication are then provided.

The present study differs from McNeill’s gestural study on viewpoints in two respects. First, concerning the distributions of O-VPT and C-VPT gestures, the current study finds that C-VPT is more commonly expressed in speakers’ descriptions of third-person past events within conversational contexts than O-VPT in gesture.

Second, in terms of the collaborative expressions of viewpoints in language and

gesture, the current study suggests that while speakers’ speech-accompanying gestures collaborate with the speech in expressing viewpoints when talking about third-person past events, gestures more often convey viewpoints that are different from those conveyed in language in the descriptions of the same event.

With respect to the theoretical accounts on the collaborative expressions of viewpoints in language and gesture, two hypotheses of speech and gesture production—the Lexical Semantics Hypothesis and the Interface Hypothesis are provided to explain the process involved in gesture production and its relationship with speech production when collaborating with the accompanying speech in expressing viewpoints.

The Lexical Semantics Hypothesis claims that gestures are generated from the semantics of lexical items in the accompanying speech. Concerning the current data, the Lexical Semantics Hypothesis therefore predicts that the content of a gesture corresponds to the semantics of a lexical item in the accompanying speech, and that gesture does not encode information that is not verbally expressed. The current data provide evidence against the Lexical Semantics Hypothesis by showing that 33.9% of all gestures produced have content that corresponds to the semantics of a phrase in the accompanying speech, suggesting that “word” might not be the only grammatical unit

to be the source of the content of a gesture. The current data also show that 30.2% of

all iconic gestures encode extra information that is not conveyed in the accompanying speech.

From the view of the Interface Hypothesis, gestures originate from an interface representation between speaking and spatial thinking. With regard to the current data, the Interface Hypothesis predicts that for a given gesture, one can observe the linguistic formulation possibilities and the spatio-motoric features that are not verbally expressed. The current data supports this prediction in that 30.2% of all iconic gestures encode not only the same information that is also encoded in the accompanying speech, but also encode spatial properties concerning the past events that are not being verbally conveyed. In addition, the Interface Hypothesis also provides a model of speech and gesture production that is able to explain the division of labor in the expressions of viewpoint between the speech and gestural modality that other hypotheses cannot explain.

To conclude, two hypotheses respectively provide theoretical accounts to explain the collaborative expressions of viewpoints in language and gesture which lead us to explore the processing of gesture and its relationship with speech production. Each hypothesis is also supported by different pieces of evidence and percentages of the gestures found in the current data.

187

CHAPTER 7 CONCLUSION

7.1 Summary of the thesis

This thesis is an attempt to explore linguistic and gestural representations of viewpoints in the descriptions of third-person past events within Chinese conversational discourse. Following McNeill’s contention that “one area of meaning where speech and gesture are coexpressive is the point of view” (1992:118), this study also investigates whether the speech-accompanying gesture often collaborates with language in expressing the same or different viewpoints within the descriptions of an event in the joint expression of viewpoints.

Three types of viewpoints are identified in the framework of this study. Speaker viewpoint (S-VPT) is expressed when speakers in talking about past events are also concerned about the maintenance of the ongoing conversation, by interacting with co-conversationalists or by making a comment or showing their attitude to reveal their current status as a speaker. In conveying observer viewpoint (O-VPT), speakers describe the past events like an observer not involved in the event, refraining themselves from either re-enacting the roles of the characters or attempting to maintain the ongoing conversation. Character viewpoint (C-VPT) is represented as a

speaker’s attempt to walk into the temporal and spatial frame of the past events by reconstructing the scene of the past events from the perspective of the different characters in the scene and enacting their thoughts, speech and other deeds.

By focusing the analysis on conversational data, this study shows that various linguistic structures and paralinguistic devices, together with other embodied nonverbal resources that accompany the gesture can serve to represent the three viewpoints. With respect to the representations of linguistic viewpoints, speakers make use of the interrogative sentence, phrasal expressions such as speculative, suggestive, evaluative and emotive expressions, parenthetical remarks, and lexical items such as the impersonal use of second-person pronoun to convey speaker viewpoint. In addition, speakers can also use paralinguistic devices such as discourse markers and laughter to state their role as a current speaker within an ongoing conversation. For observer viewpoint, indirect reported speech and plain statements are linguistic representations that suggest speakers are talking about the past events like an outside-the-event observer. In representing character viewpoint through language, speakers make use of different modes of direct quote, including direct speech, voiced direct reported speech, and inner speech to enact characters’ speech and thoughts in the past events.

With regard to the gestural representations of viewpoints, a gesture is synthetic in

that several features composed of a single gesture interact with each other to represent a certain viewpoint. Five gestural features—gestural space, handedness, stroke duration, frequency of the stroke, and the involvement of other body parts are identified as crucial and indicative criteria in representing the three viewpoints.

Quantitative study of linguistic and gestural viewpoints shows that speech-accompanying gesture in the descriptions of third-person past events within conversational contexts displays different patterns from that of those found in language in the distribution of speaker viewpoint, observer viewpoint, and character viewpoint. The distribution of the three viewpoints in language suggests that observer viewpoint is the most common choice for speakers to use to talk about a past event in an ongoing conversation, while character viewpoint is the least frequently chosen.

Character viewpoint, in contrast, is the most frequently adopted viewpoint in the gestural channel. While observer viewpoint is also commonly expressed, speaker viewpoint is rarely seen in gesture despite the fact that speakers in talking about third-person past events are occasionally also concerned about the ongoing conversations and therefore represent the speaker viewpoint in language. The discrepancy in the distribution of viewpoints in language and gesture, respectively also imply the possibility of the existence of mismatching viewpoints in the joint expression of viewpoints in language and gesture concerning a description of the

same event. Quantitative study of the collaborative expressions used in both modalities further manifests this fact by showing that 64.7% of all gestures produced in the current data represent a viewpoint different from that conveyed in language in the description of the same event. Therefore, this study shows that while language and gesture are co-expressive in terms of viewpoints, gesture more often collaborates with the accompanying speech in representing different viewpoints from that conveyed in language.

The collaborative expressions of viewpoints in language and gesture suggest how

在文檔中語言與手勢的觀點表現 - 政大學術集成 (頁 186-0)