Results - Emotions in Microblogs - 意在言外？微文本中情緒、合法性與反諷之辨識與分析

Chapter 3 Emotions in Microblogs

3.4 Results

Classifiers were trained and tested with 10-fold cross-validation. In this section, the results of the models from the three types of perspectives are shown and discussed.

3.4.1 Text Features (T)

T Reader model 80.67%

Writer model 88.75%

Reader+Writer model 88.71%

S 82.78%

B-int 84.14%

B_+int 86.25%

B_s 86.93%

R 81.53%

Table 3.1 Accuracies of different feture sets

An individual feature set is used at a time to compare the performance. The linguistic feature set (T) is used to model replier’s emotion generation from three different perspectives. When performing the prediction task with the reader model and

the writer model, 3,000 bigrams from poster’s and replier’s messages were used, respectively. To perform the task with the reader + writer model, all the bigrams from both the reader and writer models were used. As a result, a total of 6,000 features are used.

Table 3.1 shows that the writer model and the reader + writer model achieved higher performance than the reader model. The performance of the writer model is slightly higher than that of the reader + writer model, but the t-test shows that the difference is insignificant. The performance of the writer model and the reader + writer model is higher than the baseline (84.23%), while the performance of the reader model is lower than that of the baseline.

The classifier with the interactive user behavior (B+int) feature outperformed the one with the non-interactive user behavior (B-int) feature, achieving performance (86.25%) higher than the baseline. After applying back-off smoothing, the interactive user behavior (Bs) proved to achieve even higher performance (86.93%), which is the best among all non-linguistic feature sets.

The social relation (S) and relevance degree (R) features result in lower performance than the baseline. In summary, when each of the non-linguistic feature sets is used individually, Bs is the most effective: Bs > B+int > B-int > S > R. For the behavior feature set, back-off smoothing is useful. In addition, the behavior pattern in response to a specific poster is more useful than to all posters, suggesting that the affective interaction between two given users may be based on a certain pattern.

3.4.2 Combination of Feature Sets

Experimentation with some combinations of different feature sets is also performed. Table 3.2 shows the results with these combinations from reader, writer, and

reader and writer perspectives. Writer models still outperformed reader models and are slightly better than reader+writer models for all feature combinations except for the model with the T + Bs + S combination.

When combined with textual features, the behavioral feature set was still more powerful than social relation and relevance degree. However, all the 3 non-textual feature sets are helpful since paired t-tests show that the differences between T and T + Bs, T and T + S, and T and T + R are significant (p < 0.05).

Reader Models Writer Models Reader + Writer Models

T 80.67% 88.75% 88.71%

T + S 83.42% 89.60% 89.26%

T + Bs 88.02% 91.42% 91.16%

T + R 82.73% 89.14% 88.93%

T + Bs + R 88.14% 91.48% 91.27%

T + Bs + S 88.42% 91.60% 91.61%

T + Bs + S + R 88.37% 91.53% 91.30%

Table 3.2 Accuracies of models with different feature combinations

Because Bs is most useful when used with textual features, T + Bs, T + Bs + S and T + Bs + R were compared to find out how S and R can improve performance. For the reader models, the difference between T + Bs and T + Bs + S is significant (p < 0.05), but the difference between T + Bs and T + Bs + R is insignificant. This suggests that T + Bs + S is a more useful combination than T + Bs + R. For writer and reader + writer models, T + Bs + S still outperformed T + Bs + R.

Although each of the 3 non-linguistic features can improve performance, combining all of them (T + Bs + S + R) does not achieve the highest performance. The best performance is achieved by the combination of T + Bs + S regardless of which

perspective is adopted. According to results of the paired t-test, the difference between T + Bs + S + R and T + Bs + S is insignificant for the reader model and the writer model.

This suggests that although adding R to the combination does not decrease the performance significantly, it is also not helpful. The reasons for this can be the following: both social relation and interactive behavior are related to interaction between two specific users, so their effects may overlap. In addition, only 14.73% of the conversations have a relevance value higher than 0.5.

3.4.3 Different Perspectives

For all feature set combinations, the writer models and the reader + writer models achieve better performance than the reader models. These differences are significant according to the paired t-tests, which suggests that the message generated by the replier him- or herself contains more useful information than the message generated by the poster and then read by the replier.

When using the textual feature set only, the performance of the reader model (80.67%) was much lower than that of the writer model (88.75%) and reader + writer model (88.71%). When T is used with Bs and S, in contrast, the performance of the reader model is 88.42%, only slightly lower than the performance of the writer model (91.60%) and the reader + writer mode (91.61%). This indicates that non-linguistic features play a more important role when modeling emotion generation on a social network.

The performance of textual feature set for the writer model is 88.75%, slightly higher than that for the reader + writer model (88.71%). According to results of the paired t-test, the difference between them is insignificant. For the T + Bs + S combination, the performance of the reader + writer model (91.61%) is slightly higher

than that of the writer model (91.60%), though the difference is also insignificant. It thus makes little difference in performance whether emotion generation is modeled from writers’ perspective or both readers’ and writers’ perspectives. In this series of experiments, 91.61% was the highest accuracy achieved.

3.4.4 Writer Model

As mentioned in the Section 3.3, posters’ writer mode also exists. In such a model, only the linguistic feature set can be used, and the classification accuracy is 89.19%.

The t-test shows that the difference between posters’ and repliers’ writer models are insignificant (p<0.082). However, it is important to note that the dataset used for the posters’ writer model differs from the one used for the repliers’ writer model, so this comparison is for reference only.

在文檔中意在言外？微文本中情緒、合法性與反諷之辨識與分析 (頁 36-40)