• 沒有找到結果。

Chapter 5 Verbal Irony

5.2 Irony Corpus Construction

This section introduces a bootstrapping methodology used to construct an irony corpus and mine irony patterns. While Lukin and Walker (2013) also used a bootstrapping method to improve sarcasm and nastiness classifiers, this study, in contrast, focuses on irony pattern mining and corpus construction. The procedure is shown in Figure 5.1.

Choose Pattern Extract Review

New Patterns?

Get Candidates

End

Yes

No

Figure 5.1 The bootstrapping procedure.

5.2.1 Finding Irony from an Emotion-Tagged Corpus

Under the definition of irony described in Section 5.1, texts annotated with information about actual meaning should be obtained, and the literal meanings of words

in the texts should be identified in order to collect ironic expressions. If any disagreement exists between the actual meaning and literal meaning in a text, then it may contain irony.

Nowadays, emoticons are used frequently on social media to express the feelings of the posters. These emotion icons specify a poster’s actual meanings in some sense.

Based on this idea, messages on Plurk, a microblogging platform described in Section 2.2, can be obtained for the irony processing purpose. Plurk lets users post messages limited to 140 characters and allows them to use graphical emoticons in their messages.

It is assumed that these emoticons can represent the poster’s sentiments and therefore be regarded as sentiment labels of the messages. Among 35 emoticons, 23 are categorized into positive, while the other 12 are categorized into negative, as shown in Figure 2.2. Collected messages were generated from Jun 21, 2008 to Nov 7, 2009, and all of them are in Traditional Chinese.

On the other hand, the literal meanings of the posted messages need to be figured out. Among a variety of sentiment analysis algorithms (Liu, 2012), a lexicon-based approach is adopted for this study. The NTU Sentiment Dictionary, or NTUSD (Ku and Chen, 2007), was employed to determine the sentiment of a word. NTUSD contains 21,056 positive and 22,751 negative words. Most of these words are in Traditional Chinese.

There are some benefits to use microtexts like Plurk messages for this study of irony. The length of these messages is limited and usually shorter than a regular article, while they can still contain multiple sentences. This helps us exclude most irrelevant information but still capture discourse information.

5.2.2 Candidates Extraction

Possible irony messages were extracted from the Plurk corpus according to emoticons and NTUSD. Since the typical social function of irony is expressing negative meanings with positive words, as mentioned in Gibbs and Colston (2007), focus was directed on those messages with negative emoticons and positive words. A total of 3,178,372 messages are found containing at least one negative emoticon. Among them, 304,754 messages with at least one positive word are found and form an irony candidate dataset.

Discourse relation determines how two discourse units cohere to each other.

Sentiment transition of two clausal arguments can be identified based on their discourse relation (Zhou et al., 2011; Wang et al., 2012; Huang et al., 2013). In the sentence “he is nice but not attractive,” positive opinion at the beginning is transformed to a negative one by the discourse connective “but.” In such a case, both the positive word “nice” and the negative phrase “not attractive” are used literally, and the sentence cannot be regarded as irony. For this reason, it is necessary to filter out messages containing such connectives. To do this, messages are removed only when the positive word occur earlier than the disjunctive word due to the grammatical structure of Chinese. The disjunctive words used in this step include 但, 但是, 可是, 只是, 不過 (all the above are equivalent to the English word but), 然而 (however), 卻 (comparatively), 可惜 (unfortunately), 偏偏 (contrarily), 反而 (oppositely), 倒是 (on the contrary). A total of 254,836 messages remain after this process.

5.2.3 Pattern Mining

Although irony can be used without any customary linguistic patterns, some ironic expressions do exhibit specific forms of language use. Colston and O’Brien (2000)

suggest that both irony and hyperbole create contrasts between expected and ensuing events. It is assumed that exaggerated expressions could be used with irony to strengthen the effects of the speech act. In the expression 我真是太幸運啦! (I am really and extremely lucky!), the adverbs really and extremely are used together to strengthen the ironic effect. Thus combinations of degree adverb phrases and a positive adjective are used as patterns in this study to find possible irony expressions automatically in the candidate dataset.

Not all degree adverbs in Chinese are used here because some of them are mostly used in formal texts and not frequently present in microblogs. The degree adverb phrases used here include the combinations of the adverbs “還” (hái), “也” (yĕ), “未免”

(wèimĭan), “可” (kĕ) and “實在” (truly) and the degree adverbs “真” (really), “太”

(extremely) and “非常” (very).

The following bootstrapping procedure is used to find more patterns.

(1) Which patterns should be used is decided. At the very beginning of the bootstrapping procedure, the [degree adverb + positive adjective] pattern mentioned above is used.

(2) Messages containing the patterns in step (1) are automatically retrieved from the candidates. NTUSD is used to determine sentiment polarity, and the CKIP parser is used to get parts of speech.

(3) Messages retrieved in step (2) were reviewed by the annotator to decide which of them are actually ironic.

(4) If the annotator finds new irony patterns in the reviewed messages, then the procedure starts again from step (1) and uses the patterns to repeat the process.

This process was repeated for four times. After the fourth iteration, no more new patterns were found by the annotator. Finally, 2,825 messages are found to have any of

the patterns, and 1,005 of them are confirmed to be ironic and make up the NTU Irony Corpus.9 These patterns and examples of the ironic messages are shown in Section 5.3.