User Studies - Experimental Results - 以物件與事件為基礎之視訊內容調適架構

3.5 Experimental Results

3.5.2 User Studies

To evaluate our approach, two user studies (TRIAL-I, TRIAL-II) are sepa-rately carried out. The objective of TRIAL-I is to determine if the accompanied content changes of our approach (e.g., enlarged UIOs) are visually acceptable to users, and to determine which of our approach and the conventional approaches is visually preferred. In TRIAL-II, we aim to investigate the effectiveness of our approach in practical usage. The viewer’s viewing experience of our approach is compared with that of the conventional approaches on hand-held devices. For reference, the test conditions are listed in Table 3.4. The detailed methodology of each experiment is explained in the following:

TRIAL-I

For our purpose, two aspects of the results need to be investigated. One is whether UIOs are effectively emphasized, and another is whether recomposed videos look reasonable. Furthermore, the usefulness of our approach depends on whether it is actually preferred by viewers. Therefore, we apply a pair-comparison technique [CXF⁺03, LG05] to study viewers’ visual acceptability and preference.

That is, at each time, an observer will be shown two different adapted results of the same video, and asked to subjectively decide which one would be better based on some predefined questions. In this way, it allows us to know the relative advantage and disadvantage of our approach.

In this study, the used testing clips are at the resolution format of screen type 3 as prescribed. Twenty participants are randomly invited in our campus. They

Figure 3.11: Example of the displayed web page for TRIAL-I. For reality, both the pair of testing clips are presented on a virtual cellular phone. (See Subsec-tion 3.5.2 for details.)

are in the ages of 20 to 27, all with Chinese as their native language. Before joining the study, they have no ideas about our research work. Since the study will be conducted via the web, every participant is assigned a 17-inch LCD at the viewing distance of 40 centimeters.

Initially, the testing goal, process, and relevant details are explained to the participants, such as descriptions of the predefined questions. For fair comparison, they are not told any details about our video adaptation algorithm, e.g., UIOs are extracted and reintegrated with the background. In addition, they are required to conceal personal interests in different video clips since the study focuses on viewer’s visual experience rather than his/her emotional perception. After making sure that all participants understood the instructions clearly, we begin the experiment.

At each time, a pair of testing clips (one is generated from our approach and another is from the conventional approaches) is displayed side by side on a web page, cf. Figure 3.11. To be fair, the source videos are not presented to the participants in advance, and names of the corresponding approaches will not be

prompted. Both the order of presentation and which clip to be appeared on the left or right side are independently randomized for each participant. After browsing the pair of clips, the participants are asked to answer the following eight predefined questions (Q1-Q8):

• Q1: In terms of UIOs, which clip is more visible to be recognized?

• Q2: In terms of UIOs, which clip appears with better motion and shape continuity on the screen?

• Q3: In terms of UIOs, which clip would you visually prefer?

• Q4: According to the relative size of video objects, which clip looks more reasonable?

• Q5: According to the interactive behavior of video objects, which clip looks more natural?

• Q6: According to the scene composition, which clip would you visually pleasant?

• Q7: Generally, for content comprehension, which clip would be more infor-mative?

• Q8: Generally, for browsing on your hand-held device, which clip would you prefer to receive?

For each of the questions, three given comments are allowed for participants to choose as their answer: “the left one is better”, “no difference”, and “the right one is better”. Note that the answering time is unrestricted and the pair of clips are allowed to be repeated. The same process continues until all combinations of possible clips are tested for each participant.

For our testing purpose, Q1 to Q3 concentrate on the UIO itself. Specifically, Q1, Q2, and Q3 examine whether our UIOs are visually emphasized, accept-able, and preferred to viewers, respectively. Here, the acceptance refers to user-perceived motion smoothness and shape consistency of UIOs, which is affected by adopted underlying algorithms, such as the UIO segmentation. Therefore, Q2 in some sense serves as a performance index of our system. Next, Q4 to Q6 relate to the user-perceived visual rationality of the whole recomposition. The static and dynamic visual perceptions are individually explored in Q4 and Q5. Further, Q7 explores the assistance in content comprehension and helps us to know the functional role of our approach. Finally, Q8 investigates whether viewers would like to receive recomposed videos in practical applications, which demonstrates the usefulness of our approach.

Table 3.5 shows the statistical results of our approach. According to the cate-gories of clip subgroups and competitive approaches, the results are further divided into three sub-tables (Tables 3.5(A)-(C)). Note that the fourth “worse” column denotes the percentage that the competitive approaches are chosen as better by viewers. For reference, we compute the weighted value µ_RP as an index of the user’s relative preference (RP) to our approach, that is

µ_RP = ((+1) · fb+ 0 · fn+ (−1) · fw)/100, (3.23) where fb, fn, and fw are the “better”, “no difference”, and “worse” percentages for a specific question in a sub-table, respectively. Clearly, µRP is in the range of [−1, 1]. If the value is positive, our approach would be more preferred by users, otherwise the conventional approaches. The RP strength is measured by its absolute magnitude, i.e., |µ^RP|. Meanwhile, a corresponding RP variance σ^RP is estimated.

According to Q1’s statistics in Table 3.5, our approach is really helpful to improve the visibility of UIOs for viewers. As shown in Q3’s statistics, most of the viewers also prefer such an improvement, but there is a 10%-40% decrease in the “better” percentage and the RP variance is high. Based on our observations, it is mainly caused by two reasons: First, the motion and shape continuity of emphasized UIOs is not perfect in our approach, e.g., shape inconsistency of the actress’s feather tail in clip D1 as prescribed. As shown in Q2’s statistics, some viewers are displeased to this kind of artifacts (e.g., there is a 13.33% “worse” in Table 3.5(A)) and would rather visually prefer the conventional approaches with smaller UIOs. Second, the effectiveness of UIO emphasis is content dependent.

For example, in Figure 3.10, it is useful to emphasize the boy of clip H for showing his important details to viewers, such as the facial expression. However, in Fig-ure 3.9, it becomes less meaningful for the car of clip B since viewers can easily recognize its appearance even in a smaller form. In this case, our approach is not specially preferred by viewers. That is also the reason why we have a lower

“better” percentage (53.33%) and a higher “no difference” percentage (30.00%) of Q3 in Table 3.5(A) than those in Tables 3.5(B) and (C).

Further, according to statistics of Q4 and Q5 in Table 3.5, the visual ratio-nality of our approach is generally acceptable to viewers. We find an exception is in Q4’s statistics of Table 3.5(C). One reason is due to the artificial essence of our approach. Since we recompose videos with software-based techniques rather than real video reshooting, the visual rationality of our approach could not be so realistic as that in the original. Another reason is that the object distortion caused by AR change is undesirable to viewers, which makes the perceived relative size of video objects visually unreasonable. It is found that the shape distortion seems more intolerable to viewers. For example, compared with the Q4 statistics

in Table 3.5(A), the “worse” percentage decreases to 2.50% in Table 3.5(B), but increases to 52.50% in Table 3.5(C). The more the video objects are distorted in AR, the more the relative size of them looks unreasonable. An interesting phe-nomenon is that even if the viewers are aware of those visual imperfection, in average, they still prefer the scene composition of our approach, cf. Q6’s statis-tics in Table 3.5. Notice that the terms “better” and “worse” in Table 3.5 do not mean absolute success or failure but the relative performance of the proposed or the conventional approaches. Therefore, we can say that although the visual rationality of recomposed videos is not perfect, it does not fall far short either when compared with the original one. The recomposed visual quality seems good enough to be accepted by most viewers.

Finally, in Table 3.5, Q8’s statistics show that most of the participants are willing to receive our results for practical usage. Furthermore, as shown in Q7’s statistics, our approach improves the viewer’s comprehension of video contents;

however, the corresponding RP variance (σ_RP) is high, as shown in Table 3.5(B).

From the perspective of information delivery, this makes it plausible that viewers would prefer our approach for its informative benefits that attributed to the en-hanced visibility of important details. Overall, while the results of our preliminary experiments may be inconclusive, we find it encouraging. The proposed approach seems really helpful to improve the video experience for mobile users. Also, the mobile users would prefer our approach to obtain the improvements.

TRIAL-II

To further evaluate the effectiveness of our approach in practical use, we care-fully design the experiment to assess viewers’ subjective satisfaction with the viewing experience on hand-held devices. Specifically, the satisfaction refers to

the level that a viewer is satisfied with his/her viewing experience of an adapted video on a hand-held device when compared with that of the original video on a standard display. For our purpose, overall, the viewing experience is defined as the user-perceived detail visibility, visual rationality, and browsing ease of video con-tent. Specifically, the detail visibility indicates the user-perceived clarity of small objects or things in a scene; the visual rationality relates to the user-perceived relative size and interactive behavior of video objects, cf. TRIAL-I; the browsing ease refers to if the user can comfortably view a whole video. In this way, we are able to measure how effective our approach would be in improving the viewing experience for mobile users.

In this study, the used testing clips are at all resolution formats as described in Table 3.2. To ensure the validity, another twenty participants different from TRIAL-I are randomly invited. A 17-inch LCD and a Dopod 900 Smartphone (with a 3.6-inch LCD) are assigned to each participant as the testing platforms.

They are set at the viewing distance of 40 and 30 centimeters, and treated as the standard display and the hand-held device, respectively.

Initially, the testing purpose, process, and relevant details are explained to the participants, e.g., definitions of the viewing experience and subjective satisfaction.

For fair comparison, they are not told any details about our video adaptation al-gorithm and required to conceal personal interests in different video clips, as like in TRIAL-I. Then, one of the source clips at its original format (cf. Table 3.3) is shown to participants through the standard display. To avoid viewers’ misconcep-tion, the participants are asked to read the corresponding content descriptions in Table 3.3. The playing is repeated until all of the participants have well under-stood the video content. Next, all the corresponding testing clips of that source clip, one at a time, are presented on the hand-held device. For each of the

test-Figure 3.12: Comparison of the user study between our approach and the conven-tional approach (direct-resizing) for the clips of subgroup 1 at different resolution formats.

ing clips, when the playing is finished the participants have one minute to give a subjective score in the range of 0 to 1 with two decimal places at most, e.g., 0.75.

The score value is proportional to the viewer’s relative satisfaction with that clip as prescribed. For example, if the perceived viewing experience for a viewer is about the same as that on the standard display, the viewer will give a large score value, otherwise a small one instead. Note that the replay is inhibited and the answering time is restricted since we believe the viewer’s first impression without reconsideration reveals his/her true satisfaction about the viewing experience. To avoid biasing, testing clips of the same resolution format are displayed in series and at a random order. In the following, the same process is conducted for all of the other source clips.

Figures 3.12 and 3.13 illustrate the statistical comparisons of the user studies between our approach and the conventional approaches for the clips of subgroup

Figure 3.13: Comparison of the user study between our approach and the con-ventional approaches (direct-resizing and linear-resizing) for the clips of subgroup 2 at different resolution formats.

1 and subgroup 2, respectively. In the figures, each of the points is obtained by averaging the participants’ satisfaction of the adapted results of an approach at a fixed resolution format. The symmetric error bar indicates two standard deviation units in length. In addition, the techniques of hypothesis testing are applied to obtain the statistical significance (P -value) of our approach [Dev95].

Since the claim is that the viewer’s satisfaction with our approach is higher than that of the conventional approaches, the P -value provides the probability that the difference (i.e., improvement) in the experiment happened by chance [pva].

For each resolution format in Figure 3.12 and Figure 3.13, we compute a P -value between our approach and one conventional approach using the upper-tailed t-test with n − 2 degrees of freedom [Dev95], where n is the number of observed viewers’ satisfaction. Specifically, for each resolution format, we obtain one P -value between our approach and the direct-resizing in Figure 3.12. Similarly, we

obtain two P -values (one between our approach and the linear-resizing, another between our approach and the direct-resizing) in Figure 3.13. The P -value results show that except for the cases at resolution 240 × 180 with the linear-resizing and at 168 × 126 with the direct-resizing in Figure 3.13 (i.e., 0.006 and 0.001, respectively), the other P -values are far less than 0.001.

Generally, according to the average satisfaction in Figure 3.12 and Figure 3.13, our approach outperforms the conventional approaches in all cases. It is found that the satisfaction of our approach remains high (above 0.7) throughout all res-olution formats, but that of the conventional approaches drop rapidly down to an unacceptable level as the screen size decreases. This phenomenon indicates that important visual details (e.g., UIOs) have dominant effects on the viewing expe-rience, which confirmed the statements given in [KMS05]. Figure 3.13 exhibits another fact that viewers prefer linear-resizing to direct-resizing when there is an AR mismatch between the source video and the target screen. As also indicated in TRIAL-I, it is interesting to find that although the linear-resizing wastes a large amount of screen space, the UIO distortion seems more intolerable to viewers.

In summary, our approach is helpful to maintain acceptable video property and generates comfortable viewing results for viewers.

在文檔中以物件與事件為基礎之視訊內容調適架構 (頁 87-96)