General Discussion - 音樂與演唱影像的整合對於情緒判斷的影響

Several important findings were obtained in this study. First, information relating to

the mode of the music is a strong factor affecting our perceived emotion when listening

to music. Experiment 1 first established this point by using music in major or minor

mode and showed robust relationships between positive emotions and major mode, and

between negative emotions and minor mode. Although this mode-emotion relationship

has already been shown for western participants (Gabrielsson & Juslin, 2002;

Gabrielsson & Lindstrom, 2001) and seems to be a commonly held view in western

society, no empirical evidence so far has been provided for Taiwanese. Our finding of

the mode-emotion relationship in Taiwan not only provides evidence consistent with the

literature in western societies, but also supports the view that even with the influence of

different cultural backgrounds, the mode of music conveys emotional valence, perhaps

across different cultures (Balkwill & Thompson, 1999; Hoshino, 1996).

Furthermore, when we asked the participants to directly evaluate the emotional

valence of each unimodal stimulus in Experiment 3A, although there were some stimuli

rated as neutral in the silent video, the mode information of the music could still help

the performer to express appropriate emotional intention which could also be detected

by the participant. The mode of the music was implicitly detected by listeners, but it still

has power to influence perceived emotion and form a medium supplying the performer

with the ability to communicate the emotional intention of the music to the audience

through visual performance, even without acoustic emotional cues.

The second important finding is that congruency in mode between the video image

and music mediates audio-visual integration in music. Based on the results of

Experiment 1, Experiment 2 investigated whether an incongruent pairing of music and

video (vs. a congruent pair) could modify our emotional magnitude (i.e. the mode

congruence effect) in perceiving musical performance. The mode congruence effect

demonstrated in Experiment 2 indicates that emotional information processed from one

modality can be affected by information processed from the other modality, and that

information from both modalities integrates to modify our emotional responses (de

Gelder & Bertelson, 2003; Shams & Seitz, 2008).

Comparing our results with other studies, the mode congruence effect reflects that

the musical component, mode, is not only important in musical perception but is also a

medium that conveys emotional connotation visually. A musical piece which lacks mode

information is thus likely to lose that medium with which to convey emotional intention.

Accordingly, the result of audio dominance in perceived tension in Vines et al. (2006)

might have resulted from the fact that the musical stimuli they used did not have clear

mode information.

The third important finding is that the combination of music and video gives us

stronger perceived emotion than listening to music alone. The results in Experiment 3B

and 3C indicate that emotionally congruent visual information could enhance the

perceived positive emotion of music, whereas emotionally incongruent visual

information could attenuate both positive and negative emotions. Judging from the

results of the emotional judgments of both music and video, the possibility of visual

dominance of perceived emotion can be excluded. Had the emotional enhancement

effect resulted from the visual aspect being dominant, how could the attenuation of

emotionally incongruent stimuli in visual judgment of music be explained? The

emotional modulation effect from audio-visual emotional integration has also been

found in other cross-modal studies on music (Baumgartner, Lutz, Schmidt, & Jancke,

2006; Shevy, 2007; Spreckelmeyer et al., 2006; Thompson et al., 2008; Vines et al.,

2005). However, most studies explore the effect by examining the influence of the

combination of music with a picture (or movie clip) but not videos of musical

performance (Baumgartner et al., 2006; Shevy, 2007; Spreckelmeyer et al., 2006).

Although Vines et al. (2005) found that, compared to listening to music alone, images of

exaggerated visual performance could enhance the perceived emotion, the effect might

have resulted from the artificial manipulation of visual cues which were unrelated to the

musical performance itself. In our study, we asked the performer to sing appropriately

with regard to the music heard, without any exaggerated or unrelated expression. Our

finding that an emotionally congruent video image could still enhance the perceived

emotional magnitude points to a strong connection between mode and emotion.

Thompson et al. (2008) investigated whether the emotional congruence between a

vocalist’s dynamic facial expression and vocal sound in singing a major third or minor

third interval could affect the emotional judgment of music. Their results indicated that

the congruent pairs had the extreme scores, with the incongruent pairs in between. The

results seemed to suggest that emotional judgment of music could be modified by visual

information. Despite the fact that they found an emotional congruence effect, the

question remains open whether audio-visual integration enhances emotional strength

(i.e., cross-modal enhancement) or attenuates it (i.e., audio-dominance), since they did

not compare participants’ judgments of the congruent conditions with a music-alone

condition, as we have done here. Also, the audio-visual stimuli in their experiments are

more similar to binding pictures and simple vocal sounds than to real musical

performance (their audio signals contained just two notes). This makes their results

difficult to generalize to real–life situations such as a musical concert. We have made an

effort in this direction by using stimuli more similar to a musical performance in this

study, and the emotional congruence effects found in our Experiments 2 and 3 indicate

the robust mode-emotion relationship even across different modalities.

Music has been said to be analogous to motion, because of the dynamic sound flow

in music that is associated with the motion generated in music production (Eitan &

Granot, 2006). Some researchers have proposed that musical experience is derived from

cross-modal processing between audition and visual motion. Through the

correspondence of the musical dynamic change and the intention of expressing motion

behind the auditory code, we can understand what connotations are conveyed by music

(Livingstone & Thompson, 2009; Molnar-Szakacs & Overy, 2006; Overy &

Molnar-Szakacs, 2009). Molnar-Szakacs and Overy (2006) reviewed many studies

about the relation between music and motion, and they came to the conclusion that the

acoustic components of music, such as amplitude variation, rhythm, and contour of

melody, were systematically synchronized with performers’ motions. Musical

experience might thus be generated from the corepresentation of motor programming

between audio signal and motion production. Facial expression and body movement

express visual emotional cues that are decoded in order to understand the emotional

intention behind the visual image (Thompson et al., 2008). Brain imaging studies show

that the functions of detecting emotion in music and understanding what others are

thinking are highly associated with the mirror neuron system, which is considered a part

of the motor system. As well as music, facial expression and body movements also

activate the mirror neuron system (Livingstone & Thompson, 2009; Livingstone et al.,

2009). Through sensory-motor transformation by the mirror neuron system, we

understand the intention behind the sensory inputs, including emotional connotation.

Accordingly, we suggest that the audio-visual integration we found in this study

might result from cross-modal information reorganized in the mirror neuron system.

Ovary and Molnar-Szakacs (2009) propose the Shared Affective Motion Experience

(SAME) model to explain the mechanism of perceived emotion in music. According to

this model, musical signals enter the fronto-parietal mirror neuron system from the

temporal and occipital cortex to be decoded and generate motor programming. The

information flows to and is modified by anterior insula and is then transported to the

limbic system where emotional information is processed. Finally, musical information

processed by the neural network form the musical experience and emotional perception.

Recent neuroimaging studies support this hypothesis by showing that musical

processing activates the mirror neuron system and limbic regions (Blood & Zatorre,

2001; Green, Baerentsen, Stodkilde-Jorgensen, Wallentin, Roepstorff, & Vuust, 2008;

Hasegawa et al., 2004; Molnar-Szakacs & Overy, 2006; Peretz & Zatorre, 2005).

Considering visual images as part of musical inputs as shown in this study, the

SAME model can explain the cross-modal effect in perceiving the emotion in music

performance. However, most brain imaging studies only focus on “music” perception

and neglect the close relationship between the music itself and the image of the

performance. Future studies can use fMRI and ERP techniques to investigate if the

emotional enhancement effect of bimodal stimuli, compared to unimodal ones, reflects a

difference in brain activities in the neural network as predicted by the SAME modal.

Brain regions such as the frontal-parietal lobe, anterior insula, and limbic system may

play an important role in audio-visual integration of emotional perception, an important

part of musical experience.

在文檔中音樂與演唱影像的整合對於情緒判斷的影響 (頁 34-41)