國立臺灣大學電機資訊學院資訊網路與多媒體研究所 碩士論文
Graduate Institute of Networking and Multimedia College of Electrical Engineering & Computer Science
National Taiwan University Master Thesis
縮時環場視訊的穩定
Time-lapse Panoramic Video Stabilization
沈超 Chao Shen
指導教授:洪一平博士 Advisor: Yi-Ping Hung, Ph.D.
中華民國 104 年 7 月
July, 2015
謝
能夠完成這篇論文,首先要感謝的就是指導教授洪一平。洪老師總 是能在最短的時間內找出我研究的不足,以及指點思考方向,讓我可 以很快的找到問題和解法。從中,老師教會了我如何找尋研究方向,
思考和解決問題。還要特別感謝口試委員歐陽明教授,莊永裕教授,
邱志義教授給與的寶貴意見,使得我最後的研究和論文更加完整。在 研究過程中,還要感謝實驗室的同學,學長姐,學弟妹們的陪伴和支 持,感謝大源,秉軒在找題目時候的幫助和支持,感謝彥齊,冠翔,
歆如,冠捷等在研究過程中的陪伴和鼓勵。最後要非常感謝我的父母,
感謝你們的精神和物質上的支持和鼓勵。謝謝。
中文 要
在虛擬實境用戶體驗中,使用基於圖片渲染的環場視訊越來越普 遍,通過縮時來觀看已經錄好的冗長的視訊则是用戶的普遍選擇。然 而,縮時會放大視訊原本就存在的抖動,沉浸式觀看抖動的視訊更會 使得用戶頭暈和不舒服。
我們的目標是產生穩定的縮時環場視訊。我們第一個提出了一種可 以獲得穩定的縮時環場視訊的方法。我們首先闡述一個基本的視訊穩 定的方法框架以及如何改進使得可以減輕視差帶來的問題。然後我們 提出適用於環場視訊穩定的方法,適用於縮時視訊穩定的方法以及縮 時環場視訊穩定的方法。最後我們回顧最終的完整的處理方法以及驗 證效果。虛擬實境用戶體驗中使用穩定的縮時環場視訊可以得到很好 的效果。
關鍵字:縮時視訊、環場視訊、視訊穩定、影像變形、相機路徑、
虛擬實境
Abstract
Image-based rendered panoramic videos are used in virtual reality user ex- periences commonly. Long videos recorded in situations and speed up watch- ing by a created time-lapse video is a natural choice for users. Videos will be further shaking because of the time warping. Immersion viewing shaking videos are bring dizzy and uncomfortable for users.
Our goal is the time-lapse panoramic videos stabilization. Our paper pro- pose the first method which can perform time-lapse panoramic videos sta- bilization well. We propose a traditional video stabilization framework and how to use Laplacian mesh warping method to correction the parallax er- ror of the frames first. Then we optimize the proposed framework to fit panoramic videos well. After that, we extend the basic framework by an op- timized frames selection method to generating time-lapse videos. Finally, we go through our approaches and evaluation by using several input videos.
Using stabilized panoramic videos in virtual reality user experiences provide good results.
Key words: Time-lapse video, Panoramic video, Video stabilization, Image warping, Camera path, Virtual reality
Contents
口試委員會 定 i
謝 ii
中文 要 iii
Abstract iv
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1
2 Related Work 4
3 Basic Video Stabilization Framework 8
3.1 Dynamic Scenes Handling . . . 8 3.2 Parallax Handling . . . 11 3.2.1 Laplacian Mesh Warping . . . 12
4 Panoramic Processing 16
4.1 Panoramic Video Stabilization . . . 16 4.2 Panoramic Processing Evaluation . . . 19 4.2.1 Panoramic Analysis . . . 21
4.2.2 Compare With VideoStitch . . . 21
5 Time-lapse Processing 23 5.1 Optimal Frame Selection . . . 23
5.1.1 Shakiness Cost . . . 24
5.1.2 Velocity Cost . . . 25
5.1.3 Appearance Cost . . . 25
5.1.4 Optimal Frame Selection . . . 26
5.2 Single Camera Path Optimization . . . 28
5.3 Time-lapse Video Stabilization Evaluation . . . 31
5.3.1 Single Camera Path Optimization Analysis . . . 32
5.4 Time-lapse Panoramic Video Stabilization . . . 33
6 Conclusions 37
Bibliography 38
List of Figures
3.1 An example of the original trajectory of image contents generated by orig-
inal frame transforms . . . 9
3.2 Smoothed trajectories by different parameters . . . 10
4.1 Panoramic Video Stabilization Framework . . . 20
4.2 Mapping between Original Frames and Panoramic Frame . . . 22
5.1 Used directed hidden Markov chain model . . . 27
5.2 Position . . . 29
5.3 Velocity . . . 29
5.4 Acceleration . . . 29
5.5 Time-lapse Video Stabilization Framework . . . 32
5.6 Gaussian Filter . . . 33
5.7 Adaptive Gaussian Filter . . . 33
5.8 Time-lapse Panoramic Video Stabilization Framework . . . 34
5.9 Kolor Showing . . . 35
5.10 An example of using stabilized time-lapse panoramic video in virtual re- ality user experience. . . 35
List of Tables
5.1 Table of Costs Analysis . . . 31
Chapter 1 Introduction
Virtual reality devices developing improved much today. It is useful for immersive experience for users. It allow the user visit distant or hardly reachable places such as sea floor, sky, and even outer space. It also allow the user feel difficult behaviors experience such as skiing, climbing and even flying. Virtual reality devices become more and more popular recent years and have a bright future.
Panoramic video based virtual reality dynamic scene is created easily. It perform a realistic feeling for users than 3D modelling based scene. The first step for image-based rendering is generating panoramic videos to use. Fortunately, digital cameras become more and more cheap, small and easy to use today. We also are convenient to combine multiple cameras as a system to get the panoramic video. It can be attached to out-vehicles, computers, phones, and wearable cameras. 360Heros1 is a good example to hand-free and always-on to allow the users record videos in many special situations such as skiing, climbing and even sky driving and create panoramic videos by video stitching.
While more and more panoramic videos are recorded, watching through this video is a challenge for users. The first reason is the video recorded is too long to boring. Second, video shaking with natural motion disturbing caused the watching uncomfortable. Speed up is a natural choice for users to fast browsing the videos. A naive fast forward video
1Capture life in 360 video, http://www.360heros.com/
method is skip fixed frames to show as a time-lapse video. But the time-lapse videos will be further disturbing viewing for users because of its speed up. Traditional video stabi- lization can not perform a result well for them. In addition, immersion viewing shaking videos is rejected for users because it brings intensive dizzy and uncomfortable.
Traditional video stabilization methods [1–6] can be divided three steps: motion estima- tion, motion compensation, and image composition. They can perform a stabilized result well for general non distortion egocentric videos. But they are not suitable for generated time-lapse videos stabilization. Existed time-lapse video stabilization approaches [7, 8]
add a pre-step before traditional video stabilization framework’s three steps to optimize frames selection to generate time-lapse videos first. But they only considered the ego- centric videos time warping. Existed panoramic video stabilization method [9] follow traditional video stabilization steps and use different motion estimation methods to adapt panoramic videos stabilization. It can not perform a good result for time-lapse videos.
There is no existed approach framework to handle the time-lapse panoramic video stabi- lization problems.
Our goal is creating and stabilization the time-lapse panoramic videos. This paper pro- pose the first method which can perform time-lapse panoramic video stabilization well.
We propose an optimize frame selection method by consider camera motion averaging, video times scaling uniform and subdivision camera paths stabilization for panoramic videos. This approach measure the influence of the panoramic local distortion for stabi- lization. In addition, It guarantee the global compression ratio of the time and maintain the total camera motion for the final time-lapse video. We propose an optimize single camera path method for time-lapse video by adaptive Gaussian smooth considered the time-lapse video and corresponding original video’s velocity and acceleration characteristics chang- ing. We consider to use 2D Laplacian mesh warping to generate subdivision camera paths for panoramic video stabilization. We summarize our approaches frameworks to handling the time-lapse, panoramic and time-lapse panoramic videos stabilization and evaluate the effectiveness of our approaches using serval input videos.
The rest of paper is organized as follows. We survey related works in chapte 2. We propose a basic video stabilization framework and introduce how to use Laplacian mesh warping method to correction the parallax of the frames in chapte 3. Then we perform an optimize method for panoramic video stabilization in chapte 4. In chapte 5, we propose a new time-lapse video stabilization method suitable for egocentric videos and improve it to fit panoramic videos. We summarize our approaches frameworks and evaluate in each chapter.
Chapter 2
Related Work
Hardware-based or software-based approaches are two main kinds of approaches for traditional videos stabilization.
Hardware based approaches used the sensors to collect camera motion informations.
Multiple works [10, 11] has showed how to use sensors to directly measure the camera motion during capturing and how to use the camera motions to video stabilization and rolling shutter correction. Hardware based approaches utilizing onboard gyros can be quite successful with specialized hardware at capture time. However, heavy specialized hardware is not convenient to take in many situations because it occupy too much loading for quadcopter and is a burden for people in sports. In addition, it can not be apply to existing videos.
Most software based video stabilization methods [1–3, 5, 6, 12, 13] follow a framework include three steps: motion estimation, motion compensation, image composition.
• Motion estimation get the previous frame to current frame transformation and ac- cumulate the transformations to get the image motion trajectory. Software video stabilization techniques can be roughly divided as 2D and 3D methods through this steps using different kinds of approaches. 3D methods use structure from motion (SfM) method to estimate camera paths in 3D apace for stabilization [12]. 3D meth- ods reproduce the real 3D camera paths. However, the motion model estimation is
less robust in various degeneration such as feature tracking failure, motion blur, camera zooming. 2D methods common to use a 2d transform matrix to represent the camera motion [5]. The best advantage of 2D methods is robust and fast because of only calculation a linear transformation in processing. [6] considered remove the foreground influence in transforms calculation by an classification iteration of op- tical flow.
• Motion compensation smooth the trajectory generated above and use the smooth tra- jectory to generate a new set of previous to current transformations. Many methods have been proposed for the goal of this step: remove high-frequency jitters from the estimated camera motion. For example, Litvin [14] construct a physics-based state- space model of these interframe motion parameters and use recursive Kalman fil- tering to perform stabilized camera position estimation. Grundmann [13] present a novel algorithm for automatically applying constrainable, L1-optimal camera paths to generate stabilized videos by removing undesired motions.
• In image composition, most approaches use each original frame to generate a sta- bilized frame by applying a new transformation in each original frame. It is too weak to handle the parallax problems because a full-frame warp can not model the scene without depth information. Recent stabilized methods let a frame warping as a Laplacian mesh distorting to localize features influence for the final results. Liu [12]
is the first technique that can perform 3D video stabilization for dynamic scenes by weighted Laplacian mesh distorting to preserve contents. Liu [5] present bundles camera paths model which maintains multiple, spatially-variant camera paths. This kind of methods can ease the parallax problems for the result.
Kamali [9] propose the first and only omnidirectional video stabilization method. The main contribution of this paper is propose a structure from motion method to create the camera path of omnidirectional video. Then, they use 3D spherical Laplacian Triangle mesh distorting by local features to stabilization. While this image stitching based struc- ture from motion to generate a single camera path too coarse to represent local shaking.
This paper aims the goal of robust high-quality result for panoramic videos. We propose a subdivision camera paths motion model like Liu [5] to panoramic video stabilization.
Each different area in panoramic frame have different level of deform and we generate a camera path to fit the local area.
Time-lapse video stabilization techniques can also be decided in 2D and 3D methods.
Kopf [15] first to consider this problem and use a sophisticated 3D method to solve it.
They used structure-from-motion method to create the 3D scene and obtain the camera path first. Then, they propose a new camera path smooth method to generate new smooth camera path. Finally, they propose an algorithm to choose multi original frames to merge a new frame to generate new smooth time-lapse video. This approach works well when there is sufficient camera motion and parallax in a scene, but has difficulty where the camera motion is small or purely rotational, as the depth triangulation in the structure from motion step is not well constrained. Structure from motion also have difficulties to get results when the scene is dynamic. This approach also has very high computational cost that over one minutes pre frame.
Latest works consider 2D methods to time-lapse video stabilization because of low cost, fast and robust. They add a frame selection step before traditional video stabilization meth- ods three steps to create a lime-lapse video first. Joshi [8] present an optimize algorithm in frame selection step to create time-lapse videos real time. Pole [7] also propose a new optimize frame selection algorithm in frame selection step before traditional stabilization framework. Both of them propose a method to optimize frames selection to creating time- lapse video and choosing an existed method to stabilize created time-lapse video. But, both of above methods to frames selection can not strict guarantee the compression ratio of time and space for the final time-lapse video. In addition, Their methods are not suitable for the panoramic videos time warping and stabilization.
This paper propose an optimize frame selection method to generate time-lapse videos and propose a camera paths optimization method in video stabilization steps. We can
perform videos with dynamic scenes well. We considered camera motion averaging and video times scaling uniform in time-lapse processing. We guarantee the global compres- sion ratio of the time and maintain the total camera motion for the final time-lapse video.
Chapter 3
Basic Video Stabilization Framework
In this section, we introduce the basic video stabilization framework for dynamic scenes first. Then, we propose how to use Laplacian mesh warping method to correc- tion the parallax of the frames.
3.1 Dynamic Scenes Handling
In Motion Estimation, We get previous to current frame transformation by Calculation optical flow of sparse features setted by the iterative Lucas-Kanade method with pyramids.
This result matrix H only represent 2D rigid transform, no scaling, no sharing, to fast and robust enough.
H =
R0,0 R0,1 T0 R1,0 R1,1 T1
(3.1)
We define dxrepresent the motion of image contents in x axis on the image plane, the same as dy represent the motion of image contents in y axis and darepresent the angle of the image contents rotation on the image plane. We obtain dx, dy and davalues from 2D homography matrix H.
Figure 3.1: An example of the original trajectory of image contents generated by original frame transforms
dx = T0 (3.2)
dy = T1 (3.3)
da= arctanR00
R11 (3.4)
We will explain the image contents motion smooth in x,y axis and then discuss the image rotation smooth in the end in this section. Figure ?? show the dx transformation from previews to current frame sequence as an example. The next step use the accumu- lated dx,dy transformations to calculate a related image contents motion trajectory in x and y axis. We use (0, 0) as the start image contents motion position and apply the trans- formations one by one to obtain the trajectory in x,y axis and the intermediate position of this trajectory. The final trajectory we use is shift the trajectory to let the intermediate position in (0, 0). Figure ?? show the calculated x original trajectory from the original transformation showing in Figure ??.
The trajectory is a rather abstract quantity that it not represent the real camera motion directly. It is a 2D rigid transforms in a plane can not afford the representation of the camera motion. We not calculated the 3D camera trajectory directly because the structure from motion approaches are too slow and much restriction to suitable multiple situation robust. We are thinking about the opposite direction, the image contents motion camera taken also represent the camera motion, we use image contents 2D motion and rotation in
Ǧͤ͜͝
Ǧ͢͜͝
Ǧ͜͝͠
Ǧ͜͝͞
Ǧ͜͜͝
Ǧͤ͜
Ǧ͢͜
Ǧ͜͠
Ǧ͜͞
͜
(a) w = 60, σ = 1
Ǧͤ͜͝
Ǧ͢͜͝
Ǧ͜͝͠
Ǧ͜͝͞
Ǧ͜͜͝
Ǧͤ͜
Ǧ͢͜
Ǧ͜͠
Ǧ͜͞
͜
(b) w = 60, σ = 5 Figure 3.2: Smoothed trajectories by different parameters
image plane to mapping the camera motion in 3D space. The trajectory represent here all means the 2D image contents motions and rotations.
In motion compensation, our goal is smooth the trajectory. Recent methods [9, 12]
consider Gaussian filter to smooth the trajectory. we also use Gaussian smooth method in this basic video stabilization framework. We should pay attention to the Gaussian kernel size and sigma value. Large kernel size and sigma make the video more smooth but take much more black areas to the result videos. In addition, large kernel size and sigma can not maintain original image motion user active operations. It’s difficult for users to setup this parameters. Figure 3.2 show the smooth trajectory of x axis with large and small kernel and sigma.
The original trajectory and the corresponding smooth trajectory are used to generate the new set of previews to current transformations in x,y axis which is the final we use to transform the videos frames to smooth. We use dsmooth−x and dsmooth−y to represent the smoothed image contents motion in x,y axis. They are obtained by the correspondent dx,dy added the difference between the original trajectory and the smooth trajectory.
dsmooth−x = dx+ (T rajsmooth−x− T rajx) (3.5) dsmooth−y = dy + (T rajsmooth−y− T rajy) (3.6)
The smooth homography matrix used to image warping can be created by the smoothed image contents motion transformation dsmooth−x,dsmooth−y in x,y axis and image rotation dsmooth−a. Apply the list of new transformations to the original video frames can get the
stabilized videos.
Hsmooth =
cos dsmooth−a − sin dsmooth−a dsmooth−x
sin dsmooth−a cos dsmooth−a dsmooth−y
(3.7)
We handle the image contents rotation daas same as the image contents motion in x,y axis. While we observe a phenomenon that the image contents rotation changed small in most video sequence. But the changing of image contents rotation influence users feeling of video shaking much. So, using large Gaussian kernel and sigma, even using a L1 line fit or a fixed rotation value can perform a good result in video stabilization.
Following all above steps can perform egocentric video stabilization well. But this ba- sic framework can not work with panoramic videos or time-lapse videos. Panoramic video frames should not have a global image contents motion and rotation because it represent the omnidirectional for the camera position. Each local area in panoramic video has its local contents motion and rotation trajectory. So, we can take the panoramic frame warp- ing as a Laplacian mesh distorting in video stabilization. The next chapte 4 will explain a method to update this basic framework to panoramic video stabilization. Time-lapse videos amplify the shaking of frames because the contents motion and rotation in videos speed up. We update this basic video stabilization framework detail in chapte 5 to perform time-lapse videos well.
3.2 Parallax Handling
We first represent the warping-based subdivision camera paths generation methods framework for parallax handling following [5, 12] first. Then, we introduce the detail of Laplacian mesh warping approach.
Parallax phenomenon in video stabilization lead the features motion difference for each frame. This influence the Homography matrix calculation accuracy and stabiliza- tion results. Camera subdivision can perform each sub camera only consider the features motion in camera surrounded area to local homography calculation accuracy. Using local accurate smoothed frame transformation can reduce the parallax problems influence of the
video stabilization method result. Partial video frame stabilization and single camera path smooth follow the basic video stabilization framework detailed above. The key improve the framework here is how to combine the multiple camera paths smoothly by Laplacian warping detailed the following steps.
The first step is to create a subdivision camera paths mesh. We split a frame as a n»n mesh. The size of the mesh is same as the video frame size. Each vertex of the mesh represent a local camera. The expected situation is getting the local camera path in each mesh vertex by local features of the part of the video directly. But in fact the obvious features distribution of any frame is not uniformed. The camera paths generated directly in feature rich region of frames is used to generate feature poor region of frames camera paths by Laplacian mesh warping.
After subdivision camera paths mesh generated, each region in the single input frame is warped to generate a single output frame. For each region of the frame, it is a rect- angle region constraint by four camera position. We already know the four camera po- sition before and after warping. So, find the homography of each single region of the frame and combine all the region of each frame to generate the final warped frame is the way we chose in image composition. The next formula show the homography H calcu- lated by four boundary points. For the region I, (xi, yi), (xi+1, yi+1), (xi+mcols, yi+mcols), (xi+mcols+1, yi+mcols+1) are the four boundary points of the frame region. mcols represent the number of mesh columns.
x′i x′i+1 x′i+mcols x′i+mcols+1 y′i yi+1′ yi+mcols′ y′i+mcols+1
1 1 1 1
=
h11 h12 h13
h21 h22 h23 h31 h32 h33
·
xi xi+1 xi+mcols xi+mcols+1
yi yi+1 yi+mcols yi+mcols+1
1 1 1 1
(3.8)
3.2.1 Laplacian Mesh Warping
The subdivision camera paths can represent as a 2D Laplacian mesh. Let M = (V, E) be a triangles mesh where V are the set of vertices of the mesh and E are the set of edges of
the mesh. Each vertex Vi ∈ M is represented by absolute cartesian coordinates, denoted by Vi = (xi, yi, zi) and Zi = 0 in 2D Laplacian mesh here.
Considering the Laplacian mesh deforming can be in different views such as vertices, edges or faces. We adopt the mesh warping approach can be as multiple steps optimized problems following the [5, 12, 16] which considered the triangles deforming. It Generate an intermediate result by minimize the error metric of mesh warping prevents shearing and non-uniform stretching but permits rotation and uniform scaling first. After that, It takes this result and adjusts the scale of each triangle.
• Scale Free Construction:
For a corresponding triangle V1, V2, V3, we calculated the relative coordinates x01, y01 in the related coordinate system create by vector V0,V1 to represent V2 in this coor- dinates system.
V2 = V0+ x01V0⃗V1+ y01R90V0⃗V1, R90=
0 1
−1 0
(3.9)
We defined the deformed triangle as V1′, V2′, V2′. The V2desiredalso can be defined in the relative coordinates system created by V0′, V1′.
V2desired= V0′ + x01V0⃗′V1′ + y01R90V0⃗′V1′, R90 =
0 1
−1 0
(3.10)
The error associated about this triangle vertex V2 is means the difference between the desired vertex position by the relative local triangle warping and actual vertex position by the global optimized mesh deforming results.
EV2 =∥V2desired− V2′∥2 (3.11)
This results is suitable for represent all the vertices in this mesh represented by each
triangle related and we can express this in matrix form.
E1(V′) = V′TM V′, V′ = (V0x′ , V0y′ , ..., Vnx′ , Vnx′ )T (3.12)
According to the theory of Laplacian mesh deforming, the matrix M has multiple solution because the rank of M is not enough to consider to solve this equation. The camera paths generated directly in feature rich region of frames as the constraint points of the Laplacian mesh which can fix the relation rows and cols in matrix M . The feature poor region of frames camera paths here is also means the free vertices of the mesh. So, the minimize problem can solved by the partial derivatives of the function E1(V′) because constraint vertices existence.
This step not consider the scale changing of the mesh to fast calculation enough.
We need to adjust the scale of the mesh for the final results.
• Scale Adjustment:
We split the position handling and rotation handling processing in this step. A min- imize methods used to optimize the triangle position fitting. After that, an edge based minimize error methods used to optimize the triangle rotation and get the final results.
The intermediate triangles mesh only use to refer in this step because of its large unwanted scaling. We fit the original triangles to the intermediate triangles by a minimize error measure approach. We defined the fitted triangle (V1f it, V2f it, V3f it) for the original triangle (V1, V2, V3) and calculate its position by the following func- tional:
E2(Vf it
tri)= ∑
i=1,2,3
∥Vif it− Vi′∥2 (3.13)
This can extend to the whole mesh. We can minimize E2 by setting the partial derivatives of its free variables to zero. After that, the system will calculate the fi- nal vertices positions by given the constraint vertices and minimize the free vertices
position between the resulted now and the fitted before. We defined the original tri- angle vertices (V1, V2, V3) and the related fitted vertices (V1f it, V2f it, V3f it) calculated above steps. We define the minimize error equation:
E3(V′′
i ) = ∑
(i,j)=(0,1),(1,2),(2,0)
∥ ⃗Vi′′Vj′′− ⃗
Vif itVjf it∥2 (3.14)
We evaluate the error of each edge instead of the vertex because we consider the rotation of each triangle and ignore the position. We minimize E3 by setting the partial derivatives of its free vertices to zero too.
Finally, summarize all the mesh vertices V are divided free vertices and constraint vertices. Free vertices means this camera position can not calculate the camera mo- tion by features points directly and the constraint vertices can. Mesh deforming approach using the known vertices motion to calculate the unknown vertices mo- tions.
This 2D Laplacian mesh deforming method perform a fast enough method to gener- ate all subdivision cameras motion. We predefine the subdivision mesh size before running this algorithm. Then, we can find that all above parameter matrix used in minimize error equation can obtained because it is only decided the mesh defined.
When the vertices position matrix updated by constraint vertices position changed, we can fast obtain the free vertices deforming results because of only need a simple matrix multiply.
Chapter 4
Panoramic Processing
This section propose a panoramic video stabilization method. It is based on Laplacian mesh warping. We introduce the panoramic video characters, the difficulties of panoramic video stabilization and propose the detail of our panoramic video stabilization method first.
Then, we summarize our approaches frameworks and evaluate it.
4.1 Panoramic Video Stabilization
Panoramic video stabilization intuitive method created the 3D camera path by structure from motion. Traditional structure from motion method has much restricts. Panoramic frames only can be as six traditional photos and are not benefit for this steps stable. Kamali [9] proposed a new structure from motion method used in frames stitching to create the 3D camera path to video stabilization. However, 3D camera path calculated by features of frames is not reflect the true camera motion because the parallax problem is much serious in panoramic videos camera path generating. Outdoor panoramic videos include the ground plane and the sky at the same time often. The difference between the distance from camera to the ground plane and the distance from camera to the sky is too large to fit a single camera path for panoramic frame.
Traditional 2D video stabilization methods is stable and fast in use. However it is not suitable for handling the panoramic frames sequence. The features motion in different part of the frame must be difference in panoramic frames contents relations in 3D space. [5,12]
propose a method to use Laplacian mesh warping to reduce the parallax effective of the video stabilization. It performed a good result under the promise of the features motion description accurately. It is easy in video frames are not distortion. While the panoramic frame is rectangle size and can considered as a texture for a sphere in space. The texture mapping lead to the panoramic frames vision contents distortion. The features motion in panoramic frames combine the reason of the shaking and mapping. In addition, the distortion of the panoramic also influence the feature extracting in this step.
Another method of panoramic video stabilization is each original video stabilized first.
Then, using all the stabilized original video to generate the panoramic video. Each Syn- chronous frame in different videos need to calibration. We can not accept the flicker taken by the error of calibrations between frames for the panoramic video. That’s why tradi- tional video calibration to generate panoramic video method is choose one or multiple frames for each video synchronized to create a template. All frames of the videos use this template to calibration. It is fast and stable.
Our goal is stabilize the panoramic video robust. We propose a subdivision camera paths for panoramic video stabilization. Split the influence of the image distortion by texture mapping is the core idea of this processing.
While the panoramic frame features mapping is effected by the distortion. We can obtain the features motion by video shaking in original videos not distortion. So, to obtain the right motion of shaking, we obtain the features motion between two frames by optical flow calculation for each original video.
We need to know the mapping of each feature between the original videos and the panoramic video. Fortunately, we apply the traditional panoramic video calibration which use a fixed mapping template. This means the features we extracted in original frames can applied to panoramic frames easily. Note that this mapping succeed in surround the center of original video because of the panoramic frame deforming. A panoramic frame with right motion features generated now.
We separate the influence of panoramic deforming and obtain the right feature motion by original video shaking now. Subdivision camera paths is a good solution for panoramic
video stabilization. We split the whole panoramic frames to multiple areas and generate smooth camera path for each area to stabilization. It fit the observation that different part of panoramic frame must have different pixel motions. For each subdivision camera path, we follow the basic framework to stabilization. We need to use the original features motion results to calculate the homography matrix. The 2D motion trajectory represented the subdivision camera motion generated by the sequence of transformations in feature rich area in panoramic frame. This result is not the final camera position pixel motion in panoramic video. The intensity of panoramic deforming in subdivision camera position effect the final motion represented.
The panoramic deformed frame mapping to a sphere called UV mapping. For any point P (dx, dy, dz) on the sphere , UV normalized coordinate in the range of (0, 1) calculated as follows:
u = 0.5 + arctan 2(dz, dx) 2π v = 0.5− arcsin dy
π
We find the intensity of pixels deforming in panoramic frames is same if they have same latitude. The intensity of pixels deforming in panoramic frames fit the sin func- tion of longitude. The relationship of the intensity of each pixels latitude and longitude fit the above normalizations. Using this above relationship of U,V to adjust the calcu- lated intensity of the camera motion perform the right camera points position adjustment in panoramic frames. Defined the camera position point P original motion by shaking motion vector is (x, y) and the adjust result is (x′, y′), the adjustment as follows:
x′ = x· α · sin θ (4.1)
y′ = y· β · sin θ (4.2)
Note α, β is the ratio of the original frame size and the panoramic frame size in rows
and columns directions. The sinθ function is fit the influence of latitude.
We obtain the right constraint points motion for Laplacian mesh. Then, we use Lapla- cian mesh warping approach generating all subdivision camera trajectory. The challenge is maintain the omnidirectional view in mesh warping. Parallax reducing laplacian mesh warping can not maintain the rectangle shape of the frame. It caused the field of view reduce. We can accept this FOV changing in traditional video stabilization. Panoramic video stabilization must maintain the panoramic view of frame at any time. We propose a subdivision camera paths generating approach maintain the panoramic views by Laplacian mesh warping.
We reform the subdivision camera paths mesh according to the panoramic frame char- acteristic. The left and right mesh edge points have corresponding relationship. We con- sidered pair of points in left and right edge have same latitude as the same points and combine their neighbours influence in calculating. The top edge of the mesh points cor- respond to the same point in space sphere. We considered all the top edge mesh points as a same point and combine all their neighbours influence in mesh warping calculation.
The vertices on bottom edge of mesh handled the same approach of the top edge vertices.
Then, we obtain all the vertices new positions. In image compensation, we define the filled area is same as the original panoramic frame area. Tilling outside the area defined all can draw back to the filled area because the edge extend relations defined above. This guarantee using the original panoramic pixels filling the filled area is enough. We obtain a same size warped panoramic frame to make sure the panoramic views.
4.2 Panoramic Processing Evaluation
We propose general panoramic video stabilization approach in above section. Fig- ure 4.1 show the whole processing of this panoramic video stabilization.
We use 360Heros to fix the six Gopro Hero 4 cameras. Fixed camera position in video catching is useful for video calibration. The frames from difference video frame sequences catching at the same time calibration can use the same calibration template because of the fixed camera positions. Once we obtain the videos, videos synchronization is very impor-
&ĞĂƚƵƌĞdžƚƌĂĐƚŝŶŐ
/ŵĂŐĞ^ƚŝƚĐŚŝŶŐ sŝĚĞŽĂůŝďƌĂƚŝŽŶ ϯϲϬ,ĞƌŽƐsŝĚĞŽƐ
^LJŶĐŚƌŽŶŝnjĂƚŝŽŶ
DŽƚŝŽŶƐƚŝŵĂƚŝŽŶ WĂŶŽƌĂŵŝĐǀŝĚĞŽ
WŽƐƚWƌŽĐĞƐƐŝŶŐ WĂŶŽƌĂŵŝĐsŝĚĞŽ /ŵĂŐĞŽŵƉŽƐŝƚŝŽŶ DŽƚŝŽŶŽŵƉĞŶƐĂƚŝŽŶ
Figure 4.1: Panoramic Video Stabilization Framework
tant in this processing. Incorrect video synchronization influence all the frame sequence and caused the panoramic have the local motion. We are sensitive to the junctions be- tween original videos in watching panoramic videos because incorrect connection parts of the panoramic frame take the intensive illusion. We can not tolerate the Synchronization error even only one frame. Synchronization by audio can not suitable for our applica- tion. We use motion estimation functions in VideoStitch to obtain the high accuracy video Synchronization. VideoStitch is a good tool that can generate a panoramic video fast and robust. We use this software generating a panoramic video without brightness adjustment and stabilization.
Panoramic video generation is the first step in this framework. We apply our panoramic video stabilization method to the generated panoramic video. Our panoramic video sta- bilization method also follow the three steps in basic video stabilization framework. But, We should extract the sparse features and calculated the features motion by optical flow for each original video before basic three steps. In addition, we also need to find the fea- tures mapping between the each original video and the panoramic video. Then, we accept
the idea of Liu [5]. Single camera motion can not fit the contents motion in panoramic video. camera subdivision is particular suitable for panoramic video. Contents motion in panoramic video is partial similarity. Each sub camera motion can be represented the partial contents motion. We create the subdivision camera paths mesh and use the features motion calculated in original video to optimize all the subdivision camera paths. We are warping the panoramic frame by the adjusted key points represented the camera positions.
4.2.1 Panoramic Analysis
The subdivision camera paths mesh size is influence the result of video stabilization.
The surround area are more large, the contents motion in this area are more difference.
Sparse vertices distributed in this mesh may be too coarse to represent the motion of the surround area. Density vertices distributed in the mesh may be not perform a result well.
The surround area may be too small to have enough feature points to calculate the camera path directly. Less constraint camera paths influence the others camera path calculation result. We defined a mesh suitable for panoramic frame by the size of 13∗ 13. It give a suitable size of each cell and a suitable sub mesh to map to the original video frames. Fig- ure 4.2 show the mapping between the original frames and the panoramic frame. This cor- respondence is fixed if the template is chosen to panoramic video generation. Figure 4.2 also show the correspondence of the subdivision camera paths between the original frames and the panoramic frame.
4.2.2 Compare With VideoStitch
kamali [9] is the only paper propose a stabilization method suitable for panoramic video. But, they can not show their results in website. VideoStitch provide a panoramic video stabilization functions to use. We compared the results between our method and the VideoStitch software.
The VideoStitch can remove the contents high frequency shaking in panoramic video.
Our method also can remove the contents high frequency shaking in panoramic video too.
The difference between the method proposed here and the VideoStitch is showing in the
Figure 4.2: Mapping between Original Frames and Panoramic Frame
original frame connecting border in the panoramic video. VidioStitch can not explain the method they used to panoramic video stabilization. But, we guess they stabilize each orig- inal video or adjust the calibration template to panoramic video stabilization because we observed the stabilized panoramic video by VideoStitch have different degree of stability in different area mapping the different original frame. This cause the connection border shaking in the result.
Chapter 5
Time-lapse Processing
Our goal is to create time-lapse videos at any speed up ratio with no constraints on the scene content or camera motion. In addition, it should be suitable for multiple camera systems. We propose an optimize frame selection method by consider camera motion av- eraging and video times scaling uniform first. We guarantee the global compression ratio of the time and maintain the total camera motion for the final time-lapse video. Then, we propose an optimization of single camera path method for time-lapse videos by con- sider to the characteristic of velocity and acceleration in the corresponding original video.
This time-lapse video stabilization are suitable for egocentric videos, We summaries our method framework and evaluate it. Finally, we improve the above time-lapse video stabi- lization approach we proposed to suitable multiple cameras system to generate panoramic videos.
5.1 Optimal Frame Selection
Given an input video represented as a frame sequence F = 1, 2, , N , we defined a time- lapse video Fcis a sub frame sequence which all frames selected from F we expected. The optimized image contents motion path means the chose frames contents motion between the frames.
We formulate the time-lapse frame selection optimization problems as a minimize path costs choose problem. We propose how to measure the costs first.
5.1.1 Shakiness Cost
An optimal frame-to-frame translation is where both frames aligned well and have significant overlap. The first criteria measure the path costs is total of neighbour frames transformation. Given two frames in the original video Fi, Fj. Define T (i, j) is the ho- mography matrix measure the transformations between Fi, Fj. We obtain the sparse fea- ture points by optical flow method. We use the standard RANSAC method on the sparse feature points to calculate the T . The shakiness cost function as follows:
Si,j =
Co(i, j) Cr(i, j) < τc γ Cr(i, j)≥ τc
(5.1)
Note that the Cr term is equivalent to the average of all measurement 2D features projection errors. We used this term to check the frame shaking motion too large or not.
τcis the threshold of the average projection error. If the Crterm is too large than threshold to accept, we give a large γ cost means that we prefer not to choose this frame in path optimize unless no others can choose. Because large costs means frames are less overlap for the final time-lapse video. The large costs causes the optimization to avoid choose the frame to generate time-lapse video. If Crterm in acceptable region, we use the frame geometry center point projection error as the final cost for this term Cowhich can represent this frame rotation and position. The Co, Cr terms defined as follows:
Cr(i, j) = 1 n
∑n p=1
∥(xp, yp)Tj − T (i, j)(xp, yp)Ti ∥2 (5.2)
Co(i, j) = ∥(x0, y0)T − T (i, j)(x0, y0)T∥2 (5.3)
5.1.2 Velocity Cost
The matching costs measure the time-lapse frames smooth. However, achieve the tar- get speed-up ratio is more important for the time-lapse video. Recent methods considered the time-lapse video generating from the first frame in the original video and they consid- ered this term by frame serial numbers velocity and acceleration from the first frame. But the first frame influence the optimize path choose in this term. In addition, their methods can not guarantee the speed-up caused frame decrease ratio of original frames is right.
We consider the first frame of the time-lapse video can be one of the first v/2 original frame. The v defined the ratio of speed up times. If we fixed the first frame already, the probability of each frame in the original video chose to generate the time-lapse video by velocity and accelerate can costed. The perfect chose by this cost term is
Fic= F0c+ kv and the defined costs for this term is follows:
Ci =
∥i − (i0+ kv)∥22 |i − (i0+ kv)| < v
τs others
(5.4)
Note the i0is the first chose frame in original video. τsis the threshold to suggest the frame choose only consider the frame within v distance to the perfect choose. This term decide only depend on its position in original frame sequence.
5.1.3 Appearance Cost
We propose this term to measure the camera uniform motion in space. Record origi- nal videos may not contain a uniform moving speed. Slight record video camera motion influence the time-lapse video contents smooth showing. Global features flowing is a 2D representation of the camera motion. We consider uniform the 3D camera motion be- tween two selected frames by uniform the 2D optical flow values between two selected frames. We defined the total values G(x, y) by accumulate the feature motions calculated
by optical flow for all chose frames.
G =
N∑−1 i=0
gi−1,i (5.5)
The cost of this term is represent as:
Vi,j = Kf low− G/N (5.6)
We defined the Kf low represented the current frame motion. We prefer each frame motion is similar to the average frame motion of this choose frame. This term is only consider the chose frame sequence.
5.1.4 Optimal Frame Selection
We consider the path optimization using graphical model. Minimize functions of the form:
ˆ
w1...N = argminw1...N
( N
∑
n=1
Un(wn) +
∑N n=2
Pn(wn, wn−1)
)
(5.7) Set up the costs for traversing graph – each path from left to right is one possible configuration of world states. Unary term Un(wn) costs only related by itself. We consider the appearance costs as the Unary term in this solution. Appearance costs only decided by its position in frame sequence. We can pre-known the costs if the first frame and speed up rate are defined. Then, we can create the graphical model by the Unary term defined.
Each Unary state we want to choose means a frame in time-lapse video. The total of states is fixed already and this proof our time-lapse method make sure the speed up ratio.
Each state have 2v units represent original frames. We need to choose a frame from sub frames sequence on each state. Unary term has its own costs only related itself frame. We consider the appearance costs as the Unary probability as follows
P r(xn|wn) = Ci (5.8)
Figure 5.1: Used directed hidden Markov chain model
The velocity cost is only related its own position in original frame sequence and can split multiple section as the state of the graphical. This term represent the high costs this frame has, also represent the low probability this frame chose to generate time-lapse video.
The overall weight of the edge between nodes means frame i and frame j in different states wnand wn−1is given by:
P r(wn|wn−1) = α· Si,j+ β· Vi,j (5.9) Note the α, β are represented the importance of the related term costs in the overall weight of the edge. Nodes represented the sub frame sequence may be have the same frame in neighbour states in our graphical model. We add zero weights edges to avoid the repeat frames chose at the same time. We then use directed hidden Markov chain model to compute the shortest path represented have global largest probability to generate the time- lapse frames. Working through this graph model compute maximize probability to reach each node. Keep going until we reach the last column, find the maximum probability and trace back to see how we got there. This is the maximum probability time-lapse frame sequence we got. Figure 5.11is a picture of this model.
1vision: models, learning and inference. copyright 2011 Simon J.D. Prince
5.2 Single Camera Path Optimization
We propose a general algorithm to generate a time-lapse video stabilized relatively. We propose a complete time-lapse video stabilization method here. Time-lapse video amply the shaking effect of the original video. We explain how to model the enlarged shaking phenomenon first. Then, we propose a camera path optimization approach to stabilization.
First, we follow the theory of [13] to analysis the original video and its related time- lapse video. We analysis the camera paths to be composed of the following path segments.
A constant path term represent a static camera motion. Dp(t) = 0 be the differential operator to smooth. A velocity term represented a camera panning or dolly shot. Dp2(t) = 0 means this term to smooth. An acceleration term represent the ease in and out transition between static and panning cameras motion. D3p(t) = 0 to smooth it.
For a video frame sequence I1, I2, I3, , In. Defined each pair frames (It−1, It) is asso- ciated with a linear motion model Ht. From now on, we considered the camera path can as follows:
Ft = Ft−1Ht⇒ Ft = H1H2...Ht (5.10) Ftcan calculated by matrix multiplication iteratively. We generate the original video and its related time-lapse video frames position trajectory, velocity and acceleration trans- formations. Figure 5.2, figure 5.3 and figure 5.4 show an example of this analysis. Time- lapse video frames total movement distance drop a little with the original frames. But the proportion of frames and motion distance improved significantly. In addition, we observed the standard deviation of the frame motion increased much especially in large motion in original video frames sequence. This cause the time-lapse video’s velocity and acceleration values sensitive for the original large motion. This large always not the camera shaking because we filtered in frame choose step. It represent the camera motion actively. It is always conclude multiple frames motion continues but sharp in time-lapse video frames sequence. We need to maintain this sharp camera motion to increase the overlap area in stabilized video.
Ǧͥ͜͜͜
Ǧͤ͜͜͜
Ǧͣ͜͜͜
Ǧ͢͜͜͜
Ǧ͜͜͜͡
Ǧ͜͜͜͠
Ǧ͟͜͜͜
Ǧ͜͜͜͞
Ǧ͜͜͜͝
͜
(a) Original Position Trajectory
Ǧ͢͜͜͜
Ǧ͜͜͜͡
Ǧ͜͜͜͠
Ǧ͟͜͜͜
Ǧ͜͜͜͞
Ǧ͜͜͜͝
͜
(b) Time-lapse Position Trajectory Figure 5.2: Position
Ǧ͝͡
Ǧ͜͝
Ǧ͡
͜
͡
͜͝
͝͡
͜͞
(a) Original velocity Trajectory
Ǧ͜͝͞
Ǧ͜͜͝
Ǧͤ͜
Ǧ͢͜
Ǧ͜͠
Ǧ͜͞
͜
͜͞
͜͠
͢͜
ͤ͜
͜͜͝
(b) Time-lapse velocity Trajectory Figure 5.3: Velocity
Ǧ͞͡
Ǧ͜͞
Ǧ͝͡
Ǧ͜͝
Ǧ͡
͜
͡
͜͝
͝͡
͜͞
͞͡
(a) Original Acceleration Trajectory
Ǧ͜͝͡
Ǧ͜͜͝
Ǧ͜͡
͜
͜͡
͜͜͝
͜͝͡
(b) Time-lapse Acceleration Trajectory Figure 5.4: Acceleration
We propose an adaptive Gaussian smooth to camera paths approach to balance the overlap area and camera smooth. The smoothed camera trajectory is much more fit the trend of original camera trajectory. Gaussian smooth is common used [8, 12] in camera path smooth in video stabilization. A Gaussian smooth kernel can proposed as follows:
G1D(x, σ) =− 1 σ√
2πe−2σ2x2
The σ determines the width of the Gaussian kernel. The difference velocity and accel- eration information of the original video frames sequence and the time-lapse video frames sequence can be used to define the Σ at the frame t. The function σtrepresent the σ at the frame t is
σt = Gα
k1
Vc− 1 num(S)
∑
o∈St
Vo
+ k2
Ac− 1 num(S)
∑
o∈St
Ao
(5.11)
k1, k2 represent the weight of velocity and acceleration term for sigma. Vc represent the velocity of this frame smoothed. S means the sequence of time-lapse frames sequence.
We consider the difference between the current frame velocity and the time-lapse frame sequence average velocity to define the current frame σ. We consider the same approach to the acceleration. Gσ is a conversion function to restrict the final result to mapping to a range of σ value.
We also consider the influence of windows of discrete Gaussian. We follow the idea of bilateral filter [5, 17] and design it by this functions :
wt= Gβ
(∑
r∈ωt
G1(∥r − t∥)G2(∥C(r) − C(t)∥)
)
(5.12)
∥r − t∥ represent the distance between frame r and frame t in the time-lapse video frames sequence serial number. ∥C(r) − C(t)∥ represent the transformation distance be- tween the two frames in the time-lapse video frame. G1, G2 are Gaussian function to smooth the contribution of the two terms. ωt is the largest windows this approach used and Gβis a mapping function that the total weight result is mapping to the right region of