Summary - 以物件與事件為基礎之視訊內容調適架構

This chapter reviews relevant literature on the semantic adaptation, from sev-eral perspectives of the semantic concept ontology, the semantic concept analysis, and the semantic content adaptation. A new frontier of the research is to integrate the analysis of content semantics and the development of adapting operations for efficient adaptation, which is recognized as a promising but challenging direction.

That motivates our work of this dissertation.

Semantic Object Based Video Adaptation

The browsing of quality videos on small hand-held devices is a common sce-nario in pervasive media environments. In this chapter, we propose a novel frame-work for video adaptation based on content recomposition. Our objective is to provide effective small size videos which emphasize the important aspects of a scene while faithfully retaining the background context. That is achieved by ex-plicitly separating the manipulation of different video objects. A generic video attention model is developed to extract user-interest objects (UIOs), in which a high-level combination strategy is proposed for fusing the adopted three types of visual attention features: intensity, color, and motion. Based on the knowledge of media aesthetics, a set of aesthetic criteria is presented. Accordingly, these ob-jects are well reintegrated with the direct-resized background to optimally match the specific screen sizes. Experimental results demonstrate the efficiency and ef-fectiveness of our approach.

3.1 Introduction

On the Internet, multimedia content has been widely used for sharing in-formation among users. Their transparent access from almost everywhere at anytime through all kinds of devices is desired and often required. To enable such universal multimedia access (UMA), one key technology is video adaptation [PB03, MSL99, BGP03, CV05]. In general, it is defined as the mechanism of transforming a video stream with one or more operations to meet specific appli-cation needs, such as device capabilities, network characteristics, and user pref-erences. At the user’s end, hand-held devices including cellular phones, Smart-phones, PDAs, and Pocket PCs are now in widespread use for their mobility and portability. In order to compete with desktop computers for practical comput-ing tasks, they are not only developed for more powerful functionality but also equipped with more storage capacity. However, one exception is the display. For the portable requirement of hand-held devices, the screen size is kept permanently unchanged and even as small as possible. With the rapid growth of quality video sources (e.g., mobile TV, VCD/DVD on demand), the physical limitation would seriously disrupt user’s viewing experience [MSL99, LCC⁺01, CXF⁺03]. Thus, it is crucial to develop an efficient tool for facilitating video presentation on devices with limited display.

The conventional schemes that have been proposed for adapting videos on a small display can be divided into two categories: spatial transcoding and frame cropping [AWSZ05, XLS05]. The former subsamples each frame to preserve intact video contexts and the latter discards partial surroundings to highlight specific user interests. Due to the bias of their design purposes, an adaptation engine has to make visual trade-offs between the subject readability and content completeness [KMS05, STG⁺04, LG05]. However, sacrificing either aspect is usually intolerable

because they are both important in our viewing experience. For example, when watching sports programs, player recognition and full-court variation are both important visual concerns [KMS05, LG05]. The difficulties with the conventional schemes arise because they both passively attempt to adapt the plain frame but not the actual content it contains. Consequently, the adapting process is forced to specify a desired area of the source frame (maximally itself) and uniformly stuff it onto the target screen. Until we move away from that paradigm, the obtained performance will fall short of our expectations [STG⁺04, LG05].

In this work, we propose a novel framework for video adaptation based on content recomposition. Our objective is to provide effective small size videos that emphasize important aspects of the scene while retaining the background context for adaptive delivery. We focus on non-uniform processing of different video regions by giving more display resource (i.e., space) to the important ones and less to the other parts. Specifically, we use visual attention analysis to extract user-interest objects (UIOs) of a scene. With regard to the background, these objects are downsized at a light level and with constant aspect ratio (AR). Then, according to the principles of media aesthetics [BT03, Zet98, DV01], they are well reintegrated with the direct-resized background to optimally match various screen sizes of client devices (cf. Figure 3.1). Note that in this chapter the term video objects will be used interchangeably to indicate the collection of UIOs and the background. The recomposing-based framework provides a number of advantages over the conventional schemes. First, it improves the visibility of user’s interests as well as retains faithful context information, e.g., the viewer can see not only who but also where a person is in the video. Second, it allows multiple key objects to be emphasized at the same time and we can easily control the visual importance by adjusting their relative sizes. Third, it is robust to the shape distortion of

Figure 3.1: Flowchart of the proposed framework for conducting video adaptation.

objects caused by changes in video aspect ratio, which gives consistent content experience to different viewers.

The main contributions of our work are twofold. First, a generic visual atten-tion model is developed for video user-interest finding. The model is universal for its adequate utilization of inherent video characteristics, such as object and cam-era motions. Specifically, a high-level feature combination strategy based on the camera motion information is proposed. In addition, the motion feature model is integrated with confidence measures to improve its robustness and reliability.

Second, based on the knowledge of media aesthetics, a set of aesthetic criteria is presented for guiding relevant decisions-making during video objects reintegra-tion, such as the background positions to place UIOs. Without requiring user intervention, video content is automatically recomposed with satisfactory resul-tant visual rationality. We have conducted many experiments on various kinds of video data to demonstrate the efficiency and effectiveness of our approach.

The rest of this chapter is organized as follows. After a discussion of related work, Section 3.3 presents a video attention model and associated algorithms for user-interest analysis and determination. The media aesthetics based content recomposition are described in Section 3.4. Section 3.5 shows experimental results, and Section 3.6 presents our concluding remarks and the directions of our future work.

在文檔中以物件與事件為基礎之視訊內容調適架構 (頁 53-59)