1 INTRODUCTION
1.2 Organization of the Study
In what follows, Chapter 2 reviews the literature concerning sarcasm definition, sarcasm processing, cross-linguistic sarcasm profiles, and the age effect on human’s voice quality.
Chapter 3 introduces the research design for the present study whereas Chapter 4 presents the results. The discussion of the results is provided in Chapter 5. Finally, Chapter 6 concludes this thesis.
Chapter Two: Literature Review
This chapter reviews the previous literature pertaining to the voice quality of sarcasm cross-linguistically. Section 2.1 provides the definition of sarcasm. Section 2.2 reviews how interlocutors process sarcasm during communications. Section 2.3 discusses the prosodic/acoustic features of sarcastic speech. The gaps from previous research and the direction of current research will be addressed in Section 2.4.
2.1 Sarcasm Definition
Often being conflated with one another, sarcasm, in fact, is an attitude that belongs to a subtype of verbal irony (Capelli et al., 1990; Kreuz and Roberts, 1993). Generally speaking, verbal irony is an indirect speech act where speakers intentionally offer a statement of which the literal meaning is opposite to their real intentions (Haverkate, 1990). In other words, verbal irony presents a reversal of the surfaced semantic information and is considered to be the opposite of sincerity in which the surfaced meaning shows no discrepancy from the intended meaning (Farias, 2017). Embodied by different linguistic forms (e.g., sarcasm, overstatement, understatement, hyperbole, rhetorical questions, etc.), verbal irony achieves a variety of social or communicative goals, including signaling solidarity, creating amusement, showing resentment, or teasing (Gibbs, 2000; Leggitt and Gibbs, 2010). With these communicative
the person or event discussed at the moment. One thing to note is that speakers’ intentions in verbal irony can be either positive or negative. Based on speakers’ intentions, Mauchand et al.
(2018) thus categorized verbal irony into two subgroups: ironic compliments and ironic criticisms. While ironic compliments show speakers’ positive intentions through negative linguistic contents, ironic criticisms cover a speaker’s negative attitudes with positive linguistic contents. For instance, saying ‘這個開車技術太可怕了。Such a dreadful driving skill.’ after seeing a car racer’s fascinating performance is an ironic compliment, whereas saying ‘做得真 好!(Great Job!)’ when your employee makes a mistake is an ironic criticism.
From the aforementioned categorization, sarcasm is regarded as a representation of ironic criticisms (Mauchand et al, 2018). Unlike verbal irony which encompasses the attitudinal spectrum from positive to negative sides, sarcastic comments specifically occupy the negative end of this spectrum (Capelli et al, 1990). Despite its pessimistic nature, sarcasm is pervasive in daily communications (Booth, 1974; Li et al, 2020; Muecke, 1969). The motivations behind the use of this indirect speech act could be varied (Cheang and Pell, 2008). For instance, sarcasm helps reinforce speakers’ true intentions (Colston, 1997) but meanwhile lessen the threats caused by the unfavorable remarks (Dews et al., 1995).
2.2 Sarcasm Processing in Communication
semantic information. The mismatch between the literal meanings and the hidden messages, thus, causes listeners more efforts to decode the implications of what has been said (Kim, 2014).
In this section, we will review some approaches addressing sarcasm processing (Section 2.2.1);
that is, how listeners receive and grasp the sarcastic intention during communications.
Additionally, the importance of prosody in sarcasm recognition will be discussed in Section 2.2.2.
2.2.1 Approaches to Sarcasm Processing
Three main approaches to sarcasm recognition will be reviewed in this section, including the literal-first account (Section 2.2.1.1), the interactive account (Section 2.2.1.2), and the relevance theory (Section 2.2.1.3).
2.2.1.1 The Literal-first Account
The literal-first account posits that irony recognition involves a sequential process by which the literal meaning is obligatorily accessed but rejected prior to the arrival of the non-literal intention.
To account for the sequential process, Giora (1997) proposed the graded salience hypothesis and suggested that the most salient interpretation would surface earlier than less salient ones if there are more than one interpretation for the linguistic input. In ironic statements,
the literal meaning is considered to be the most salient. Hence, it is assumed that perceivers would initially process the literal meaning when encountering an ironic statement. Then, the initial interpretation would be found to be false so that it is rejected and a new interpretation is derived. One thing to note is that, according to this hypothesis, the literal meaning is only rejected but not discarded. Instead, it is retained so that perceivers can compare the two opposite interpretations and decide which could be counted as the proper interpretation.
Therefore, in the cases of ironic expressions where the literal meaning does not represent the speakers’ true intention, listeners need extra steps to compute the two possible interpretations (i.e., literal and non-literal) before arriving at the intended meaning. This mechanism, thus, predicts longer processing time for ironic intentions.
Some empirical evidence has supported this account, including an online reading comprehension task conducted by Giora et al. (1998). In their experiment, they designed several target utterances accompanied by two versions of contexts. In one version, the target utterance should be processed with literal meaning (1a), whereas in another version, the target utterance should be processed with non-literal meaning (1b). The former represented speakers’
sincere attitude since the intended meaning was the same as the literal meaning. The latter, on the other hand, represented speakers’ ironic attitude because the intended meaning was opposite to the literal interpretation.
(1) a. Anna is a great student and very responsible. One day she called to tell me she did not know when she would [not] be able to show up for my lecture.
However, just as I was starting, she entered the classroom. I said to her: “You are just in time.”
b. Anna is a great student, but she is very absent-minded. One day while I was well through my lecture, she suddenly showed up in the classroom. I said to her: “You are just in time.”
Participants were required to read the context line by line before answering the comprehension question presented on a computer screen. The reading time of the target utterances in both sincere and ironic conditions were measured. The results showed longer reading times for ironic target utterances compared to their literal counterparts. This finding confirmed the assumption that ironic statements need longer duration to process since they require extra effort to evaluate both literal and intended meaning during the recognition procedure.
Dews and Winner (1999) also adopted an online-processing task in an attempt to support the literal-first account. The experiment included target utterances that were designed based on two factors: the target utterance’s literal intention and the speaker’s intention. Both factors contained two levels: positive and negative. A two-by-two crossed design, therefore, generated
four conditions – literal compliment, ironic compliment, literal criticism, and ironic criticism – as illustrated in Table 1.
Table 1 Dews and Winner's (1999) experimental design
Literal meaning
positive negative
speaker’s intention positive literal compliment ironic compliment negative ironic criticism literal criticism
Examples for each condition from Dews and Winner (1999) are listed in (2). In the conditions where the target utterances contained positive literal meanings, if the speaker’s intention was positive, no hidden message existed so that the target utterance was considered a literal compliment (see (2a)). Whereas if the speaker’s intention was negative, a mismatch between the literal meaning and the real intention occurred, causing the target utterance to be an ironic criticism (see (2b)). On the other hand, in the conditions where the target utterances contained negative literal meanings, if the speaker’s intention was positive, inconsistencies between the literal and intended meanings made the target utterance an ironic compliment (see (2c)). Contrarily, if the speaker’s intention was negative, the literal and the intended meaning showed no discrepancies; therefore, the target utterance was viewed as a literal criticism (see (2d)).
(2) a. Literal compliment
Sally and her roommate were having a cookout. The weather was warm and sunny so it was perfect for a cookout. Sally said:
‘We picked a great day.
b. Ironic criticism
Sally and her roommate were having a cookout. Their guests had arrived and the food was cooking when a heavy rain began. Sally said:
‘We picked a great day.’
c. Literal criticism
Jeanne and her friend had postponed their holiday shopping until two days before Christmas. When they arrived, the shopping mall was packed with other last-minute shoppers. Jeanne sad:
‘Dreadfully crowded, isn’t it?’
d. Ironic compliment
Jeanne and her friend had postponed their holiday shopping until two days before Christmas. Both expected the shopping mall to be backed with other last-minute shoppers but it was nearly empty. Jeanne said:
‘Dreadfully crowded, isn’t it?’
In Dews and Winner (1999), participants were instructed to judge the intended meaning of each
target utterance. The experimenter then analyzed the error rates and the reaction time for answering the comprehension questions from participants’ performance. Similar to Giora et al.’s (1998) finding, target utterances with ironic intentions required longer reaction times and produced higher error rates. Interestingly, a closer look at ironic conditions found that ironic compliments exhibited higher error rates and longer reaction time than ironic criticisms. Since ironic compliments contained negative literal meanings whereas ironic criticisms contained positive literal meanings, the finding could be explained by Clark’s (1973a) claim that negative notions would need longer reaction time to process than positive ones. Hence, corroborated with Clark’s (1973a) idea, these results suggested that literal contexts did have an influence on ironic processing. Importantly, the findings supported that literal meaning was processed during irony recognition.
To sum up, the empirical findings mentioned above seemed to support the literal-first account in that literal meaning is processed and has an influence on irony comprehension.
However, as suggested by Dews and Winner (1999), the results could neither confirm that the literal meaning is entirely processed by the listeners nor attest that the literal meaning is processed earlier than the intended meaning. Although longer time was required and higher error rate was observed in identifying the ironic intentions, such results merely indicated that the literal meaning plays a role in irony processing. These findings could only infer that the
to comprehend. Whether the literal meaning is processed earlier is unknown. Additionally, previous literature that supported this approach mainly focused on the written form of verbal irony. Other cues such as speakers’ facial expressions or prosody were excluded. Therefore, it would be problematic to conclude that the literal meaning precedes any other cues during irony recognition.
2.2.1.2 The Interactive Account
Contrary to the literal-first account in which the non-literal meaning comes after the processing of the literal meaning, the interactive account believed that both literal and non-literal meanings are processed concurrently at the early stage of comprehension (Gibbs, 1986;
Gildea and Glucksberg, 1983; Katz, 2005; Long and Graesser, 1988; Spotorno et al., 2013).
When ambiguity occurs, instead of waiting for literal meaning to be processed first, recipients incorporate a wide variety of cues simultaneously to decode the real motive of the delivered contents.
Under the interactive account, a well-accepted model introduced by Katz (2005) is the parallel-constraint approach. This model treats irony processing as a parallel structure in which multiple cues, including the linguistic codes, the contextual cues, the speaker’s background, etc., enter our speech recognition system once a sarcastic utterance is delivered. These cues are integrated and computed in order to determine the speaker’s intended meaning. Meanwhile, all
possible interpretations of the delivered utterance appear, waiting to be activated. Once the integrated information provides sufficient clues, perceivers select the most coherent interpretation as the speaker’s real intention. Therefore, the completion of irony identification can possibly finish even before the process of the entire utterance.
Some neurophysiological or psychological evidence from ERP studies (Spotorno et al., 2013) or on-line processing experiments (Pexman et al., 2000) had supported the parallel-constraint satisfaction approach. Evidence also came from the investigation of children’s irony comprehension. Climie and Pexman (2008) adopted an eye-tracking experiment, tracking children’s eye-movements and measuring their reaction latencies when they were watching puppet shows. In their study, puppet shows containing different remarks (literal compliments, ironic compliments, literal criticisms, or ironic criticisms) were displayed. In each show, the puppet speaker was assigned with different personalities (funny vs. serious to different degrees) so as to represent speakers’ traits. Children were instructed to judge whether the puppet speaker was nice or mean prior to the experiment. Two tokens (a duck and a shark) representing nice and mean were placed in front of the children during the shows. After watching each performance, children needed to judge the puppet speaker’s intention and to place the duck or shark in the answer box. Children’s eye-movements were recorded for further analysis.
The results show that children’s reaction latencies were longer in ironic intentions. Such
frequently set their first look at the ironic intention during ironic scenarios. The observations of children’s eye-movements suggested that the literal meaning was not necessarily considered prior to the ironic interpretation. Instead, it indicated that children accessed the ironic interpretation at the early moment of the processing. Furthermore, since the experimental design incorporated multiple cues, including speakers’ personalities, the congruity of the contexts, the tone of voice, and the linguistic components, etc., the correct judgement of the intended meaning demonstrated children’s abilities to process a combination of cues in order to derive the appropriate interpretation. Therefore, Climie and Pexman (2008) argued that it was the coordination of multiple cues, instead of the obligatory literal-first processing sequence, that led to longer reaction time in children’s ironic comprehension.
In sum, the findings of Climie and Pexman’s (2008) study supported the parallel-constraint satisfaction approach in that both literal and non-literal meanings are processed in parallel along with a combination of cues. One limitation from the empirical data, however, is that it failed to address the contribution of each cue to sarcasm processing. Although a collective of cues are suggested to be processed simultaneously in the recognition system, which cue leads a dominant role during the recognition process is unknown.
2.2.1.3 Relevance Theory
Another theoretical framework that facilitates the understanding of sarcasm processing is
Relevance Theory (Sperber and Wilson, 1986). Unlike the literal-first and the interactive accounts which focused on the status of the literal components, Relevance Theory views the irony comprehension from a cognitive perspective.
Similar to the interactive account, Relevance Theory believed that multiple cues are processed together during sarcasm processing. According to Sperber and Wilson (1986), when an input message encompasses more than one possible interpretation, listeners would scrutinize a combination of cues (e.g., semantic cues, the ongoing contexts, prosodic cues, literal meanings of the utterance, etc.) and consider the interpretation which required the least effort to comprehend as the relevant one. The relevant interpretation would therefore be reckoned as the intended meaning of the speaker.
Following this approach, Yus (2000) constructed an irony-comprehension model, categorizing seven contextual sources that interlocutors adopt to recognize irony/sarcasm from a cognitive perspective. The seven contextual sources were as follows: (i) factual information, (ii) physical setting, (iii) nonverbal communication, (iv) biographical data, (v) mutual knowledge, (vi) previous utterances, and (vii) linguistic cues.
In this model sarcasm is recognized based on one or more contextual sources. Imagine the following situation:
(3) [It is raining heavily outside.]
Mary: What a lovely day!
In this scenario, if the listener was in a face-to-face interaction with Mary, the listener would have some general ideas about Mary such as her age, gender, ethnicity (biographical data).
Moreover, if the listener is a close friend of Mary, and knows she adores rainy day (factual information and mutual knowledge), with the smile on her face (nonverbal communication) and the weather outside (physical setting), the listener probably would not consider Mary’s comment (linguistic cues) to be sarcastic. On the other hand, if Mary happened to dislike heavy rain, the entire scenario would lead the hearer to interpret her comment as a sarcastic reaction.
Such an interpretation can be inferred from the contrast between the literal meaning of Mary’s comment (‘What a lovely weather!’) and Mary’s attitude toward rainy days (factual information). However, what if the hearer were not familiar with Mary and had no clues for the weather outside? The hearer may, therefore, encounter difficulties in understanding Mary’s true intentions.
Therefore, in irony-comprehension model, Yus (2000) proposed that sarcasm is much easier to detect when at least two sources create highly incompatible information. If no overt conflicting meanings were provided by the contextual resources, recipients would need extra effort to grasp the true intentions of the speaker. The more effort the listeners need, the more
probable they misunderstand the real meaning of the delivered utterance.
One evidence supporting the Relevance Theory came from second language learners. Kim (2014) investigated Korean EFL learners’ understanding for English sarcasm. Twenty-eight Korean adults with advanced English proficiency level were asked to identify English sarcasm from five video clips of the famous American sit-com, Friends. Participants were required to identify which lines in the story carry sarcastic intentions and to judge the speaker's intended communicative goals. Afterwards, post-hoc interviews were conducted for each participant individually in order to understand the rationale behind their judgements.
The results showed that Korean EFL learners tend to rely on contextual information in order to recognize sarcasm. For instance, in one video clip, the sarcastic utterance was successfully recognized by 42% of the participants. Post-hoc interviews revealed that these participants attributed their understandings to the obviously untrue remark made by the speaker (refer to Kim, 2014 p.199). The “obviously untrue remark” was then viewed by the author as the representation of the incongruity between the literal meaning and the overall context. These findings confirmed Relevance Theory’s claim that the incongruity created by contextual sources is important for sarcasm comprehension. Still, which contextual cue leads as the dominant role in assisting the comprehension process is not addressed.
2.2.2 The Role of Prosodic/Acoustic Cues in Sarcasm Processing
This section addresses the importance of prosody in sarcasm detection. Although context contributes greatly to recognizing sarcastic intentions, other strategies can be adopted as well to assist the identification of speakers’ connotations. Prosody, for instance, is one valuable resource especially when the incongruity between linguistic contents and the overall contexts is not apparent.
It is argued that when contextual cues are missing, prosody alone provides enough information for recipients to distinguish sarcastic from sincere utterances (Bryant and Fox Tree 2005; Capelli et al., 1990; Laval and Bert-Erboul, 2005). Moreover, research on children’s irony comprehension revealed that before children attained contexts as their comprehension source, they had mainly relied on prosodic cues to process sarcastic utterances.
Capelli et al. (1990) investigated children’s understanding of sarcasm. They recruited third- and sixth-graders and asked them to listen to eight stories. In order to examine whether contextual cues or prosodic cues contributed most to children’s sarcasm comprehension, these stories were manipulated based on two factors: context and intonation. The context was the storyline of each story, and it was designed to be either congruent or incongruent compared with the ending remark of the story. The congruent contexts led to literal interpretations for the ending remarks whereas the incongruent contexts lead to ironic interpretations. The intonation, on the other hand, was encoded in the ending remark of each story. The final remarks were
designed to have either sincere or sarcastic tone. Therefore, four story types were constructed
designed to have either sincere or sarcastic tone. Therefore, four story types were constructed