國 立 交 通 大 學
資訊工程學系
博 士 論 文
即時資料驅動描繪之參數表示式與張量近似演算法
Parametric Representations and Tensor Approximation Algorithms for Real-Time Data-Driven Rendering
研 究 生:蔡侑庭
指導教授:施仁忠 博士
中 華 民 國 九 十 八 年 五 月
即時資料驅動描繪之參數表示式與張量近似演算法
Parametric Representations and Tensor Approximation Algorithms for Real-Time Data-Driven Rendering
研 究 生:蔡侑庭 Student:Yu-Ting Tsai
指導教授:施仁忠 博士 Advisor:Dr. Zen-Chung Shih
國 立 交 通 大 學 資 訊 工 程 學 系
博 士 論 文
A Dissertation
Submitted to Department of Computer Science College of Computer Science
National Chiao Tung University in partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
in
Computer Science
May 2009
Hsinchu, Taiwan, Republic of China
中華民國九十八年五月
即時資料驅動描繪之參數表示式與張量近似演算法
學生:蔡侑庭 指導教授: 施仁忠 博士
國立交通大學資訊工程學系博士班
摘 要
過去幾十年來,電腦繪圖領域之專家學者致力於開發各式各樣特殊新穎的視覺效果,
以期能夠利用電腦合成出如同真實相片般之影像。為了達到高畫質擬真影像輸出,許 多被稱為資料驅動描繪之進階繪圖演算法,會事先運用複雜程序處理輸入之三維場景 以獲得必要資訊,抑或從真實世界當中預先擷取影像資訊,以便於執行時期能藉由這 些視覺資訊快速重建出所希望描繪之視覺特效。然而隨著人類對於影像逼真程度之需 求日益增高,預先採樣資訊之資料量也隨之水漲船高,不但耗費大量的儲存空間,也 同時增加執行時期之資料存取時間以及描繪運算量。
為了解決上述問題,我們可以透過簡潔的表示式來有效率地描述預先採樣資訊,
並應用精密複雜的壓縮方法以進一步減少資料儲存量。然而從以往相關研究成果卻可 以觀察到,要能夠大幅度減少資料量,同時保有即時運算效能,是相當困難的一項挑 戰,因為此兩項主要目標之間經常具有互斥性質。有鑑於此,我們於本論文當中將特 別著重相關議題之探討,針對資料之表示式與近似演算法兩大主題,研究兩者於即時 視覺資訊描繪之發展與應用。關於資料表示式,本論文將介紹兩項特殊參數表示式:
單變量以及多變量球面輻射基底函數,主要用以描述視覺資訊中常見的輻射與照明度 函數,同時更可以進一步拓展至各種於球表面採樣之資訊。而資料壓縮方面,本論文 則提出兩項嶄新張量近似演算法:叢集以及稀疏叢集張量近似法,不但能夠充分利用 視覺資訊之相依性關係,達到資料量減少與即時快速重建皆可兼顧之目的,也能夠延 伸至各式高維度大型科學資料之壓縮、分析、或描述。
本論文最終更將深入研討如何透過所提出之方法來近似與重建預先採樣資訊,以期
能於計算量、壓縮誤差、以及儲存空間三者之間達到最佳平衡點,同時更選取於電腦 繪圖與視覺領域中具代表性之數項應用作為實驗基礎。實驗成果充分驗證本論文所提 方法之可行性與可塑性,不但能夠有效保留原始資訊當中之視覺特徵,同時也能輕易
達到即時描繪速率。如此一來,高畫質之擬真影像合成便可以於現今一般個人電腦上
實現,不再是超級電腦或電影製作所專屬之權益。
Parametric Representations and Tensor Approximation Algorithms for Real-Time Data-Driven Rendering
Student: Yu-Ting Tsai Advisor: Dr. Zen-Chung Shih
Department of Computer Science National Chiao Tung University
ABSTRACT
Over the last decades, computer graphics and vision researchers have focused on developing novel visual effects for computer-generated photo-realistic images. To achieve high-quality output, many state-of-the-art rendering algorithms, which are known as data-driven rendering, pre-process an input three-dimensional scene with complex procedures to obtain necessary data, or pre-capture the real world with a set of images. The desired visual effects are then recon- structed from the pre-sampled observations for efficient run-time rendering. Nevertheless, with the increasing demand of more and more photo-realistic image synthesis, the amount of pre- sampled data expands accordingly. It not only consumes a great deal of storage space, but also increases data access time and rendering costs at run-time.
In order to solve this issue, we can adopt a compact representation to efficiently describe the pre-sampled observations and further apply sophisticated approximation methods to reduce the amount of data, but achieving real-time performance at the same time is frequently another chal- lenging problem. In this dissertation, we thus focus on data representations and approximation algorithms for real-time rendering of visual data sets. Two novel parametric representations, univariate and multivariate spherical radial basis functions (SRBFs), and two sophisticated tensor approximation algorithms, clustered tensor approximation (CTA) and K-clustered tensor approximation(K-CTA), are proposed to exploit the coherence in visual data sets. The univari- ate and multivariate SRBFs are especially suitable for modeling radiance and illumination data sets that are common and ubiquitous in computer graphics and vision. Additionally, SRBFs
also can be applied to represent various kinds of observations on the unit hyper-sphere. As for CTA and K-CTA, they are developed based on tensor approximation to simultaneously achieve high compression ratios and real-time rendering performance for multi-dimensional visual data sets. CTA and K-CTA are also general approximation algorithms that are not restricted to spe- cific applications. They can be employed to compress, analyze, or represent other large-scale scientific data sets that intrinsically exhibit multi-dimensional structures.
Last but not the least, we also investigate how to approximate and reconstruct the pre- sampled observations with a trade-off among computational costs, compression errors, and storage space. The proposed methods are further applied to various representative applications in computer graphics and vision. Our experiments show promising results in which important features of visual data sets are well-preserved after approximation and rendered at real-time rates. As a result, high-quality photo-realistic image synthesis can be efficiently realized on modern personal computers, not privileged to supercomputers or film production anymore.
Acknowledgements
First of all, I appreciate my advisor, Dr. Zen-Chung Shih, for his support and guidance dur- ing the hard time of my stay at the Department of Computer Science, National Chiao Tung University. I would also like to thank the members of Computer Graphics Laboratory, espe- cially Chih-Hao Chen for helping me with parts of the precomputed radiance transfer program, Shr-Ching Weng and Qing-Zhen Jiang for profound discussions on the proposed importance sampling algorithm, Yu-Pao Tsai for his advice on the applications of the proposed methods, Dr. Wen-Chieh Lin and Kuei-Li Fang for suggestions and implementation of the hierarchical fitting algorithm, Jia-Yin Ji for preparing the model Cloth, and to name a few.
Finally, special thanks are due to Ya-Mei Chen for her encouragement and assistance during the last few years of my Ph.D. program. I would also like to dedicate this dissertation to my parents for their selfless love and support. Without all of you, I would never receive the Ph.D.
degree in computer science, and this dissertation would not have been written.
Contents
Abstract (in Chinese) i
Abstract (in English) ii
Acknowledgements iv
Contents v
List of Figures ix
List of Tables xii
List of Algorithms xiii
List of Symbols xiv
1 Introduction 1
1.1 Background . . . 1
1.2 Motivations and Methods . . . 2
1.3 Applications of Data-Driven Rendering . . . 4
1.4 Contributions . . . 5
1.5 Dissertation Organization . . . 7
2 Literature Review 9 2.1 Data Representations . . . 9
2.1.1 Functional Models . . . 9
2.1.2 Spherical Radial Basis Functions . . . 11
2.1.3 Parameterization . . . 11
2.2 Dimensionality Reduction . . . 12
2.2.1 Linear Models . . . 12
2.2.2 Non-Linear Models . . . 13
2.2.3 Multi-Linear Models . . . 14
2.3 Data-Driven Rendering . . . 15
2.3.1 Spatially-Varying Appearance Models . . . 16
2.3.2 Radiance Transfer . . . 17
3 Univariate Spherical Radial Basis Functions 20 3.1 Mathematical Formulation . . . 20
3.2 Characteristics of Univariate SRBFs . . . 23
3.2.1 Spherical Singular Integral . . . 23
3.2.2 Spatial Localization . . . 24
3.2.3 Univariate Gaussian SRBFs and Von Mises-Fisher Distributions . . . . 24
3.3 Uniform Univariate SRBF Representation . . . 26
3.3.1 Ordinary Least-Squares Projection . . . 26
3.3.2 Regularized Least-Squares Projection . . . 27
3.4 Scattered Univariate SRBF Representation . . . 27
3.4.1 Fitting Algorithm . . . 27
3.4.2 Initial Guess . . . 28
3.5 Mathematical Proofs . . . 30
3.5.1 Convolution of Univariate Gaussian SRBFs . . . 30
3.5.2 Relation between Univariate Gaussian SRBF and Von Mises-Fisher Dis- tribution . . . 32
4 Multivariate Spherical Radial Basis Functions 33 4.1 Mathematical Formulation . . . 33
4.1.1 Basic Definitions . . . 33
4.1.2 Example . . . 35
4.2 Scattered Multivariate SRBF Representation . . . 35
4.2.1 Fitting Algorithm . . . 35
4.2.2 Initial Guess . . . 36
4.3 Parameterized Multivariate SRBF Representation . . . 37
4.3.1 Overview . . . 37
4.3.2 Example . . . 39
4.3.3 Fitting Algorithm . . . 40
4.4 Hierarchical Fitting Algorithm . . . 40
4.4.1 Upsampling Stage . . . 43
4.4.2 Optimization Stage . . . 44
5 Clustered Tensor Approximation 46 5.1 Preliminaries and Background . . . 46
5.1.1 Basic Definitions . . . 47
5.1.2 Tensor Approximation . . . 48
5.2 Mathematical Formulation . . . 49
5.3 Algorithm . . . 51
5.3.1 Overview . . . 51
5.3.2 Clustering Stage . . . 53
5.3.3 Update Stage . . . 53
5.4 Implementation Issues . . . 54
5.4.1 Initial Guess . . . 54
5.4.2 Global Basis Matrices . . . 55
5.5 Mathematical Proofs . . . 55
6 K-Clustered Tensor Approximation 59 6.1 Mathematical Formulation . . . 60
6.2 Algorithm . . . 61
6.2.1 Overview . . . 61
6.2.2 Clustering Stage . . . 62
6.2.3 Update Stage . . . 66
6.3 Implementation Issues . . . 67
6.3.1 Initial Guess . . . 67
6.3.2 Degeneracy and Convergence . . . 68
6.4 Mathematical Proofs . . . 69
6.4.1 Greedy Search . . . 69
6.4.2 Optimal Projection . . . 70
7 Application I: Illumination and Reflectance Functions 73 7.1 High Dynamic Range Environment Maps . . . 73
7.1.1 Problem Formulation . . . 73
7.1.2 Experimental Results . . . 74
7.2 Bidirectional Reflectance Distribution Functions . . . 78
7.2.1 Univariate SRBFs . . . 79
7.2.2 Multivariate SRBFs . . . 80
7.2.3 Experimental Results . . . 81
7.3 Importance Sampling of Direct Illumination . . . 83
7.3.1 Problem Formulation . . . 88
7.3.2 Importance Sampling Algorithm . . . 90
7.3.3 Experimental Results . . . 90
8 Application II: Spatially-Varying Appearance Models 95 8.1 Bidirectional Texture Functions . . . 95
8.1.1 Tensor Approximation Algorithms . . . 96
8.1.2 Multivariate SRBFs . . . 102
8.1.3 Comparisons and Discussions . . . 111
8.2 View-Dependent Occlusion Texture Functions . . . 118
8.2.1 Problem Formulation . . . 118
8.2.2 Rendering Issues . . . 120
8.2.3 Experimental Results . . . 122
9 Application III: Radiance Transfer 127 9.1 Precomputed Radiance Transfer . . . 127
9.1.1 Problem Formulation . . . 128
9.1.2 Algorithm . . . 130
9.1.3 Experimental Results . . . 133
9.1.4 Discussions . . . 135
9.2 Bi-Scale Radiance Transfer . . . 139
9.2.1 Problem Formulation . . . 140
9.2.2 Algorithm . . . 144
9.2.3 Experimental Results . . . 146
10 Conclusions and Future Work 150 10.1 Conclusions . . . 150
10.1.1 Spherical Radial Basis Functions . . . 150
10.1.2 Tensor Approximation Algorithms . . . 151
10.1.3 Summary . . . 151
10.2 Future Work . . . 152
10.2.1 Flexible Data Representations . . . 152
10.2.2 Powerful Approximation Algorithms . . . 153
10.2.3 General Data-Driven Applications . . . 154
Bibliography 156
List of Figures
3.1 Two-dimensional and three-dimensional plots of univariate Gaussian spherical
radial basis functions. . . 21
3.2 A univariate spherical function in univariate spherical radial basis function ex- pansions. . . 22
4.1 Approximate a bidirectional texture function using the proposed hierarchical fitting algorithm. . . 42
5.1 The decomposed core tensor and basis matrices of a third order tensor based on tensor approximation. . . 48
5.2 Decompose a third order tensor using clustered tensor approximation. . . 50
6.1 Decompose a third order tensor using K-clustered tensor approximation. . . 61
7.1 Examples of high dynamic range environment maps. . . 74
7.2 Squared error ratio plots for high dynamic range environment maps. . . 75
7.3 Reconstructed images of the high dynamic range environment map St. Peter’s Basilica. . . 76
7.4 Reconstructed images of the high dynamic range environment maps Eucalyptus Grove, Grace Cathedral, and The Uffizi Gallery. . . 77
7.5 Examples of bidirectional reflectance distribution functions. . . 78
7.6 Squared error ratio plots for bidirectional reflectance distribution functions. . . 83
7.7 Reconstructed images of the bidirectional reflectance distribution function Blue Metallic Paint. . . 84
7.8 Reconstructed images of the bidirectional reflectance distribution function Kry- lon Blue. . . 85
7.9 Reconstructed images of the bidirectional reflectance distribution function Nickel. 86 7.10 Rendered images for bidirectional reflectance distribution functions. . . 87
7.11 Rendered images based on the proposed importance sampling algorithm for Bunnywith the bidirectional reflectance distribution function Blue Metallic Paint. 92 7.12 Rendered images based on the proposed importance sampling algorithm for the model Buddha with the bidirectional reflectance distribution function Krylon Blue. . . 93
7.13 Rendered images based on the proposed importance sampling algorithm for the model Venus with the bidirectional reflectance distribution function Nickel. . . . 94 8.1 An example of a bidirectional texture function. . . 96 8.2 Squared error ratio plots based on different tensor approximation algorithms for
bidirectional texture functions. . . 102 8.3 Rendered images based on different tensor approximation algorithms for bidi-
rectional texture functions. . . 103 8.4 Reconstructed images of the bidirectional texture function for the material Fiber
based on different tensor approximation algorithms. . . 104 8.5 Reconstructed images of the bidirectional texture function for the material Rough-
Holebased on different tensor approximation algorithms. . . 105 8.6 Reconstructed images of the bidirectional texture function for the material Sponge
based on different tensor approximation algorithms. . . 106 8.7 Reconstructed images of the bidirectional texture function for the material Wool
based on different tensor approximation algorithms. . . 107 8.8 Rendered images based on the parameterized multivariate spherical radial basis
function representation for bidirectional texture functions. . . 112 8.9 Reconstructed images of the bidirectional texture function for the material Car-
petbased on the parameterized multivariate spherical radial basis function rep- resentation. . . 113 8.10 Reconstructed images of the bidirectional texture function for the material Im-
palla based on the parameterized multivariate spherical radial basis function representation. . . 114 8.11 Reconstructed images of the bidirectional texture function for the material Sponge
based on the parameterized multivariate spherical radial basis function repre- sentation. . . 115 8.12 Reconstructed images of the bidirectional texture function for the material Wool
based on the parameterized multivariate spherical radial basis function repre- sentation. . . 116 8.13 Examples of the proposed view-dependent occlusion texture functions. . . 119 8.14 Comparison between different encoding schemes for view-dependent occlusion
texture functions . . . 121 8.15 Error texel ratio plots based on different encoding schemes for view-dependent
occlusion texture functions. . . 123 8.16 Error texel ratio plots based on different tensor approximation algorithms for
view-dependent occlusion texture functions. . . 124 8.17 Reconstructed images of the view-dependent occlusion texture function for the
material Fiber. . . 125
8.18 Reconstructed images of the view-dependent occlusion texture function for the material RoughHole. . . 126 9.1 System diagram of the proposed all-frequency precomputed radiance transfer
algorithm. . . 130 9.2 Tensor representation for radiance transfer matrices in the proposed all-frequency
precomputed radiance transfer algorithm. . . 132 9.3 Rendered images based on the proposed all-frequency precomputed radiance
transfer algorithm. . . 136 9.4 Rendered images with various configurations of clustered tensor approximation
based on the proposed all-frequency precomputed radiance transfer algorithm. . 137 9.5 Comparison between the proposed all-frequency precomputed radiance transfer
algorithm and all-frequency clustered principal component analysis. . . 138 9.6 The proposed all-frequency bi-scale radiance transfer algorithm determines vis-
ibility values at the meso-scale. . . 141 9.7 System diagrams of the proposed all-frequency bi-scale radiance transfer algo-
rithm. . . 143 9.8 Rendered images with different tensor approximation algorithms for meso-structures
based on the proposed all-frequency bi-scale radiance transfer algorithm. . . 147 9.9 Rendered images with/without indirect illumination based on the proposed all-
frequency bi-scale radiance transfer algorithm. . . 148 9.10 Rendered images based on the proposed all-frequency bi-scale radiance transfer
algorithm. . . 149
List of Tables
7.1 Statistics and timing measurements for bidirectional reflectance distribution functions. . . 81 7.2 Rendering performance for bidirectional reflectance distribution functions. . . . 82 8.1 Statistics and timing measurements of different tensor approximation algorithms
for bidirectional texture functions. . . 100 8.2 Rendering performance of different tensor approximation algorithms for bidi-
rectional texture functions. . . 101 8.3 Statistics and timing measurements of the parameterized multivariate spherical
radial basis function representation for bidirectional texture functions. . . 110 8.4 Rendering performance of the parameterized multivariate spherical radial basis
function representation for bidirectional texture functions. . . 111 8.5 Feature comparisons between tensor approximation algorithms and multivariate
spherical radial basis functions for bidirectional texture functions. . . 117 8.6 Statistics and timing measurements of different tensor approximation algorithms
for view-dependent occlusion texture functions. . . 123 9.1 Statistics and timing measurements of the proposed all-frequency precomputed
radiance transfer algorithm. . . 134 9.2 Comparisons between the proposed all-frequency precomputed radiance trans-
fer algorithm and all-frequency clustered principal component analysis. . . 135 9.3 Feature comparisons of the proposed all-frequency bi-scale radiance transfer
algorithm with various meso-scale appearance models. . . 140 9.4 Statistics and timing measurements of the proposed all-frequency bi-scale radi-
ance transfer algorithm. . . 146
List of Algorithms
3.1 Fitting algorithm for the scattered univariate spherical radial basis function rep-
resentation. . . 28
3.2 Modified soft von Mises-Fisher clustering algorithm for preconditioning. . . . 31
4.1 Fitting algorithm for the scattered multivariate spherical radial basis function representation. . . 36
4.2 Initial guess for the scattered multivariate spherical radial basis function repre- sentation. . . 38
4.3 Fitting algorithm for the parameterized multivariate spherical radial basis func- tion representation. . . 41
4.4 Hierarchical fitting algorithm based on the scattered multivariate spherical ra- dial basis function representation. . . 43
5.1 N -mode singular value decomposition. . . 49
5.2 Static and iterative clustered tensor approximation. . . 52
6.1 Iterative K-clustered tensor approximation. . . 63
List of Symbols
General
N set of natural numbers
R set of real numbers
Rm m-dimensional Euclidean space a, b, . . . scalars (italic roman lowercase letters)
a, b, . . . column vectors (boldface roman lowercase letters) A, B, . . . matrices (boldface roman capitals)
c, i, j, k, n indices
C, I, J, K, N maximum values of indices c, i, j, k, n e base of the natural logarithm
π ratio of the circumference of a circle to its diameter
|·| absolute value of a real number or cardinal of a set k·k0 `0 norm of a vector
k·k2 `2 norm of a vector
{·} a set
\ set minus operator
× scalar multiplication operator or Cartesian product operator of sets
Γ(·) gamma function
Pn(·) normalized Legendre polynomial of degree n so that Pn(1) = 1
`(F )n degree-n Legendre coefficient of a function F (·) In(·) order-n modified Bessel function of the first kind A−1 inverse of a square matrix A
A+ Moore-Penrose pseudo-inverse of a matrix A In identity matrix of size n×n
AT transpose of a matrix A
(A)ij entry in row i and column j of a matrix A (A)i∗ i-th row of a matrix A
(A)∗j j-th column of a matrix A
Spherical Radial Basis Functions
Sm unit hyper-sphere embedded in Rm+1 (or unit m-sphere) φ geodesic distance on the unit hyper-sphere
ω a point on the unit hyper-sphere
β basis coefficient of a spherical radial basis function ξ center of a spherical radial basis function
λ bandwidth of a spherical radial basis function θ a parameterization coefficient
ω · ξ dot product of ω and ξ ψ(·) a parameterization function
Ω a set of points on the unit hyper-sphere(s) Ξ a set of spherical radial basis function centers Λ a set of spherical radial basis function bandwidths Θ a set of parameterization coefficients
Ψ a set of parameterization functions
F (·) a univariate/multivariate spherical function G(·) a spherical radial basis function
?m spherical singular integral operator over Sm Ω(m) total surface area of Sm
τG center of gravity of a spherical radial basis function G(·) σG variance of a spherical radial basis function G(·)
Tensor Algebra
A, B, . . . tensors (boldface calligraphic capitals) k·kF Frobenius norm of a tensor (or a matrix)
(A)i1i2···iN entry of an N -th order tensor A indexed by i1, i2, . . . , iN Ahnii the i-th mode-n sub-tensor of a tensor A
hA, Bi scalar product of A and B
ufn(·) mode-n unfolded matrix of a tensor
×n mode-n product operator
⊗ Kronecker product operator ą
n operator of a series of mode-n products N operator of a series of Kronecker products Rn a mode-n reduced rank
Z a core tensor
Un a mode-n basis matrix Vn a dual mode-n basis matrix
Applications
ωl an illumination direction on S2 ωv a view direction on S2
L(ωl) a high dynamic range environment map
ρ(ωl, ωv) a bidirectional reflectance distribution function p a pixel or a point on an object surface
np surface normal at point p
Lo,p(ωv) exitant radiance in view direction ωvat point p Lin(ωl) incident radiance in illumination direction ωl
Vp(ωl) visibility function at point p in illumination direction ωl
x horizontal coordinate in the two-dimensional Cartesian coordinate system y vertical coordinate in the two-dimensional Cartesian coordinate system t = x, yT two-dimensional spatial coordinates, x and y, of a texel
B(ωl, ωv, t) a bidirectional texture function
O(ωv, t) a view-dependent occlusion texture function
S a spatial coordinate texture for spatially-varying materials
Chapter 1 Introduction
1.1 Background
Synthesizing photo-realistic images is a significant and ambitious goal in computer graphics and vision. It has not only been extensively studied in the academic community, but also be- come ubiquitous in our daily life. Special visual effects and vivid animations are nowadays essential to many computer applications as well as film production, and could even be found on various consumer electronic devices. In real-time rendering applications, visual realism has always been a more challenging problem, since realistic image synthesis is not the only con- cern, but the human-computer interaction is also of great importance. The difficulty lies in that the interactivity usually dominates other objectives, hence the desired visual effects can only be realized with limited computational power.
Researchers have conventionally focused on developing analytic models and simulation- based algorithms to achieve photo-realistic image synthesis in real time. Nevertheless, real- world object shape, surface reflectance, micro-scale appearance, and natural illumination effects are frequently too complicated to be synthesized using analytic models or simple simulations.
Over the last decades, there have been tremendous advances in this field. First, the emergence of programmable graphics processing units (GPUs) [103, 141] and general-purpose computation on GPUs(GP-GPUs) [131, 157] introduced flexible programming models whose compiled code could be executed and accelerated by graphics hardware. The fast pace of progress in the computational power of GPUs not only enables many rendering methods that were previously impractical to operate in real time, but also has stimulated the development of highly parallel algorithms in many scientific communities beyond computer graphics and vision.
Second, while traditional rendering algorithms generate images using computationally ex- pensive procedures, state-of-the-art methods, which are known as data-driven rendering or in- verse rendering, handle the same problem from a different standpoint – if we can not afford the computational costs of complex procedures at run-time, why not perform rendering from cached or pre-sampled data that represent the results of complex procedures or even real-world mea-
surement? A common example is the bidirectional reflectance distribution functions (BRDFs) for local illumination reflection. In data-driven rendering, the whole process begins by mea- suring data samples from real-world object surfaces [113, 114]. The data collections are then modeled with an appropriate representation for fast run-time rendering from novel illumination and view directions. This simple idea obviously reverses the conventional ’forward’ rendering process that develops a representation from theories or heuristics, and also ushers in a new era for real-time rendering.
1.2 Motivations and Methods
Although data-driven rendering avoids computationally expensive procedures at run-time, they are usually subject to cumbersome pre-sampled observations that consume a large amount of storage space and memory bandwidth. As the demand of more and more photo-realistic computer-generated images increases, this problem has become even worse since we have to record more information to account for more degrees of freedom and more detailed descrip- tions of the desired visual effects. Nowadays, the amount of pre-sampled data often exceeds tens or hundreds of gigabytes, and tends to increase in the future. Since memory bandwidth is a major bottleneck of GPUs in modern real-time graphics and vision applications, the per- formance of data-driven rendering may be, in the worst case, downgraded to be slower than directly employing complex procedures.
To solve this problem, researchers have been devoted to developing effective representations and approximation algorithms for handling the pre-sampled observations. While an expressive representation provides an efficient and meaningful description of data, a powerful compres- sion method results in compact storage space and fast data reconstruction on GPUs. Tight cooperation between them may be a remedy for real-time photo-realistic image synthesis. In this dissertation, we thus focus on data representations and approximation algorithms for real- time rendering of visual data sets. Related topics and methods are extensively surveyed and studied. We also discuss how to achieve an appropriate trade-off among computational costs, approximation errors, and storage space for real-time applications. Our investigation leads to the following methods:
Data Representations
We introduce two parametric representations for radiance and illumination functions, which are named univariate and multivariate spherical radial basis functions (SRBFs). A univariate SRBF is a circularly axis-symmetric and rotation-invariant function defined on the unit hyper- sphere. Radiance functions thus can be modeled in their intrinsic spherical domain, rather than a cubic or planar domain as in previous articles [22, 125]. This can avoid false boundaries and distortions that may result from improper resampling strategies in a non-intrinsic domain. The
spatial localization property of univariate SRBFs especially allows the high-frequency signals within local regions to be handled efficiently. On the other hand, a multivariate SRBF is a function constructed by the product of multiple univariate SRBFs. This breaks the limitation of univariate SRBFs, as they only account for a single variable on the unit hyper-sphere. Since the surface appearance of a real-world object is frequently an effect of different physical factors, multivariate SRBFs are particularly suitable for modeling the complex behaviors of measured reflectance data. Moreover, we also demonstrate that a linear combination of either univariate or multivariate SRBFs can generalize SRBFs into a powerful analysis tool for heterogenous radiance functions.
To obtain a compact representation, it is also well-known that transforming the parameters of a multivariate function into another parametric space, which we refer to as parameterization, can improve approximation efficiency [77, 90, 151, 213, 214]. However, previous articles have considered only heuristic or fixed transformation functions, little attention have been paid to a data-dependent parameterization method. Therefore, we propose to learn a set of optimized parameterization functions for a given visual data set to overcome the main disadvantage of conventional fixed parameterization. By using a parametric representation to model the trans- formation functions, the parameterization process can be tightly integrated into the proposed multivariate SRBF representation. Previous heuristic parameterizations, such as the half-way vector for reflectance functions, thus become special cases in our general framework.
Approximation Algorithms
For data compression, we propose two dimensionality reduction algorithms, namely clustered tensor approximation (CTA) and K-clustered tensor approximation (K-CTA), for large-scale multi-dimensional visual data sets. A major drawback of traditional tensor approximation (also called multi-linear models or multi-way analysis) [33, 165] is that reconstruction costs may be still too high for real-time rendering applications when the data variations in the input tensor are large [185, 192]. The proposed CTA algorithm therefore aims at overcoming this drawback by classifying the input tensor into disjoint clusters, so that the member data within each cluster form another tensor with much lower variations and can be more efficiently decomposed using tensor approximation. An iterative technique for updating cluster members is also introduced to derive a locally optimal solution for CTA.
Based on CTA, we further develop K-CTA to classify the input tensor into ’overlapped’
regions so that inter-cluster coherence can be exploited by mixing the decomposed results of more than one cluster. Since the maximum number of mixture clusters is guaranteed in the proposed model, K-CTA provides a sparse multi-linear representation in which the sparsity is totally under user control. This especially leads to predictable run-time reconstruction costs and an easy-to-optimize shader program on GPUs.
1.3 Applications of Data-Driven Rendering
To demonstrate the effectiveness of our methods, intensive experiments have been conducted on various applications of data-driven rendering and visual data sets in computer graphics and vision. They are summarized as follows:
Illumination and Reflectance Functions
Lighting and shading effects are considered as one of the main contributing factors in human visual perception. A large number of attempts have been made to develop realistic illumina- tion and reflectance models in computer graphics and vision. In our experiments, two common kinds of illumination and reflectance data, including high dynamic range (HDR) environment maps [34, 36] and BRDFs, are modeled with univariate or multivariate SRBFs. One of the ma- jor advantages of SRBFs is their ’non-uniformity’. Specifically, SRBFs can be non-uniformly distributed based on the input observations to obtain a compact representation for the original function. This non-uniformity particularly captures the irregularity and high-frequency signals of real-world illumination and reflectance functions, which are difficult to be modeled efficiently using conventional data representations, for example, spherical harmonics and wavelets.
Furthermore, ray tracing is a promising framework that allows photo-realistic image synthe- sis. However, it frequently relies on time-consuming Monte Carlo sampling to simulate numer- ous illumination effects. We thus focus on the direct illumination computation in ray tracing, and demonstrate that the univariate SRBF representation for BRDFs can be seamlessly inte- grated with existing Monte Carlo sampling methods. The proposed importance-driven method especially exploits the non-uniformity of univariate SRBFs to efficiently generate sampling di- rections from the distributions of BRDFs.
Spatially-Varying Appearance Models
Meso-scale surface appearance is often difficult to be faithfully captured using analytic mod- els. The shape details and spatially-varying reflectance distributions owing to complex meso- structures may need millions of polygons to model. Traditional bidirectional texture functions (BTFs) [28] only consider meso-scale illumination effects, but completely ignore the micro- geometry of object surfaces. We thus introduce a new spatially-varying appearance model, which is named view-dependent occlusion texture function (VOTF), to render view-dependent meso-scale occlusions from precomputed visibility information. The proposed VOTFs can be easily implemented on GPUs and further combined with existing appearance models for visu- alizing meso-scale silhouettes.
In addition, modeling spatially-varying surface appearance is a more challenging problem than approximating illumination and reflectance functions, since the large amount of appearance data frequently prevents sophisticated data representations and approximation algorithms. In
this dissertation, we specifically demonstrate the experimental results of two different approx- imation approaches for spatially-varying appearance models: one is a non-parametric multi- linear model based on CTA or K-CTA; the other is a parametric representation based on multi- variate SRBFs and optimized parameterization.
First, the multi-linear framework provides a computationally efficient approximation scheme for large-scale multi-dimensional appearance models. By organizing an appearance data set as a high-order tensor, both CTA and K-CTA allow a compact representation for meso-structures and efficient run-time reconstruction on GPUs with just a few low-order factors and reduced multi-dimensional core tensors. Second, by using the proposed multivariate SRBF representa- tion, we show that BTFs can be accurately approximated and efficiently rendered on GPUs. A hierarchical fitting algorithm is developed to exploit the spatial coherence in BTFs and reduce the computational costs of non-linear optimization.
Radiance Transfer
Rendering global illumination effects is one of the most computationally expensive problems in computer graphics. The light transport is a highly coupled effect of scene geometry, lighting environments, and object materials, such that its real-time simulation is difficult to achieve.
Our experimental results demonstrate that based on precomputed radiance transfer (PRT) [125, 161, 162] and bi-scale radiance transfer (BRT) [163], real-time global illumination effects for a static scene in dynamic lighting environments can be easily realized by combining the univariate SRBF representation and CTA (or K-CTA). Furthermore, all-frequency signals in the rendered images, such as sharp shadow boundaries and specular reflections, also can be well-preserved with high compression ratios.
1.4 Contributions
In brief, this dissertation makes the following contributions to data representations and approx- imation algorithms for real-time data-driven rendering:
Data Representations
• A compact parametric representation for univariate spherical functions based on a linear combination of non-uniformly distributed univariate SRBFs is proposed. It particularly allows efficient spherical integrals and reconstruction on GPUs.
• A novel parametric representation based on a linear combination of multivariate SRBFs is introduced to efficiently model multivariate spherical functions that result from complex effects of various physical factors.
• An parameter transformation method based on parametric models is presented to auto- matically derive the optimal parameterization functions from a given visual data set. It can seamlessly cooperate with the proposed multivariate SRBF representation to improve approximation efficiency for multivariate spherical functions.
• The proposed univariate and multivariate SRBF representations as well as optimized pa- rameterization are hardware-friendly and easy-to-implement on modern GPUs. Since they are all continuous functions, no additional interpolation or filtering techniques are required for smooth run-time rendering.
Approximation Algorithms
• A new algorithm for analyzing large-scale multi-dimensional visual data sets, namely CTA, is proposed. It iteratively re-classifies the input tensor into disjoint regions, so that the member data within each cluster can be more efficiently decomposed using traditional tensor approximation.
• A novel multi-linear model, which is called K-CTA, is introduced to permit accurate and compact approximation of large-scale multi-dimensional visual data sets. K-CTA not only extends CTA to exploit intra- and inter-cluster coherence at the same time, but also provides a sparse multi-linear representation in which the sparsity is totally under user control.
• Both CTA and K-CTA are efficient and GPU-friendly for run-time reconstruction. They can be further combined with other data representations and approximation algorithms to derive a more powerful model for real-time data-driven rendering.
Applications
• Novel approaches for modeling illumination and reflectance functions, including HDR environment maps and BRDFs, with non-uniform univariate or multivariate SRBFs are demonstrated. Although they are based on time-consuming non-linear optimization, GPUs can be employed to reduce the processing time from hours to minutes.
• The proposed univariate SRBF representation for reflectance functions is also promising for the importance sampling of direct illumination in ray tracing. By exploiting the rota- tional invariance of the univariate SRBF representation, the distribution of a BRDF can be efficiently analyzed to determine sampling directions for Monte Carlo integration.
• A new spatially-varying appearance model, namely the VOTF, is introduced to enable per- pixel rendering of view-dependent occlusions without actually modifying the geometric shape of objects at the meso-scale.
• A multi-linear framework for compressing spatially-varying appearance materials, such as BTFs and VOTFs, by using CTA or K-CTA is presented. It can be easily integrated into other data representations and approximation algorithms for efficient run-time rendering.
• A view-dependent signed-distance representation for VOTFs is introduced to preserve sharp features at the meso-scale even after approximation using multi-linear models.
• An extended texture synthesis method, based on the decomposed results of CTA or K- CTA, is demonstrated to account for the synthesis of view-dependent meso-structures over arbitrary surfaces.
• A hierarchical fitting algorithm for BTFs based on multivariate SRBFs and optimized parameterization is proposed to exploit the spatial coherence in visual data sets. It sig- nificantly accelerates the approximation process and is particularly suitable for multi- resolution data analysis and practical rendering applications owing to the inherent mipmap pyramid construction.
• A novel all-frequency PRT framework based on univariate SRBFs and CTA (or K-CTA) is introduced. It permits a compact data representation and real-time rendering of complex objects with global illumination effects in high-frequency lighting environments. The PRT application especially demonstrates the flexibility and potential of univariate SRBFs and CTA (or K-CTA). Since they do not conflict with each other, their combination is possible to provide a more powerful analysis tool for visual data sets.
• An all-frequency BRT algorithm is proposed to allow real-time rendering of objects with spatially-varying surface appearance and various global illumination effects on GPUs.
1.5 Dissertation Organization
The remainder of this dissertation is organized as follows. Chapter 2 reviews the literature on data representations, dimensionality reduction, and data-driven rendering in computer graphics and vision. Next, Chapters 3 and 4 introduce two parametric data representations, univariate and multivariate spherical radial basis functions, to model spherical radiance and illumination information. Chapter 4 also describes the main concept of optimized parameterization that can be applied to improve approximation efficiency for modeling multivariate functions on the unit hyper-sphere. Chapters 5 and 6 then develop two novel dimensionality reduction algorithms, namely clustered tensor approximation and K-clustered tensor approximation, to compress large-scale multi-dimensional visual data sets.
In addition, applications and experimental results of the proposed methods are demonstrated in Chapters 7–9. We choose various representative applications and visual data sets in com- puter graphics and vision, including high dynamic range environment maps (Section 7.1), bidi-
rectional reflectance distribution functions (Section 7.2), importance sampling of direct illumi- nation in ray tracing (Section 7.3), bidirectional texture functions (Section 8.1), view-dependent occlusion texture functions (Section 8.2), precomputed radiance transfer (Section 9.1), and bi- scale radiance transfer (Section 9.2). Finally, Chapter 10 discusses the conclusions of this dissertation and sheds lights on future research directions.
Chapter 2
Literature Review
2.1 Data Representations
The data representation is a fundamental element of algorithms which can not be simply ignored or depreciated. It significantly affects the way in which we handle problems. Since various rep- resentations have entirely different advantages and disadvantages, there is no general solution to all problems. To derive a satisfactory representation, we need to analyze and exploit the special attributes of the given problem. The following three sections thus briefly review contemporary functional models (Section 2.1.1), spherical radial basis functions (Section 2.1.2), and param- eterization methods (Section 2.1.3) that have been widely adopted for data representations in computer graphics and vision.
2.1.1 Functional Models
Scientists frequently rely on finite representation sets to establish an effective basis for further data analysis or processing. In computer graphics and vision, common forms of the finite sets may include:
• Primitive elements: Points, piecewise elements, solid primitives, and so forth.
• Parametric splines: Parametric curves, surfaces, volumes, and so forth.
• Topological structures: Meshes, graphs, hierarchical structures, and so forth.
• Functional models: Harmonic functions, wavelets, radial basis functions, and so forth.
Functional models have recently received great attention due to their efficiency and flex- ibility. In this category, data sets are analyzed and transformed into the space spanned by a finite set of basis functions. The combination of these functions thus ’implicitly’ describes or approximates the raw data. Since this function set is often compact and sparse, the computa- tional complexity of data analysis can be greatly reduced. Furthermore, the flexible ability of
handling different problems with appropriate functions is also an appealing property. Applica- tions of functional models in computer graphics and vision include image-based modeling and rendering [113, 119, 203, 213, 214], point-based graphics [122, 134, 136, 152], photo-realistic illumination computation [22, 49, 50, 125, 162], multi-resolution techniques [39, 106, 168], and to name a few.
Harmonic functions are important in physics, applied mathematics, and engineering com- munities. In computer graphics and vision, one major harmonic function, spherical harmonics, has been employed to represent illumination and reflectance functions for efficient rendering, for example, scattering functions [74], reflectance distributions [143, 145, 146, 160, 201], and environment maps [144, 145]. Light transport data sets have also been modeled with spherical harmonics to render self-shadowing, self-inter-reflection, subsurface scattering, and caustics effects [99, 161, 162]. Similar to Fourier series, spherical harmonics form a complete set of orthonormal basis functions on the unit sphere, so that a square-integrable univariate spherical function can be decomposed as a linear combination of the spherical harmonic basis. In ad- dition to spherical harmonics, Ren et al. [149] and Sloan et al. [164] further employed zonal harmonics, which are special cases of spherical harmonics, to account for the radiance transfer observations of dynamic scenes and deformable objects.
Wavelets possess the characteristics of multi-resolution decompositions and hierarchical structures to provide an adaptive basis for data analysis. Although originated in applied mathe- matics and signal processing, wavelets have a great impact on several topics of computer graph- ics and vision [168], including image processing [12, 66], geometric modeling [39, 43, 106], global illumination [48], and so forth. As data-driven rendering becomes prevalent, wavelets have been applied to derive a compact representation for precomputed radiance transfer [125, 126, 170, 195, 196] and measured reflectance data sets [22, 87, 113].
Radial basis functions (RBFs) are also among the most popular basis functions for data representations except harmonic functions and wavelets. A RBF is a kernel function that de- pends on the ’distance’ with respect to its center. The simplicity and powerful capability of RBFs have led to many applications in computer graphics and vision, such as geometric mod- eling [17, 38, 182], point-based rendering [136, 152, 215], point-based animation [122, 134], and so forth. For data-driven rendering, spatially-varying reflectance functions [213, 214] and view-dependent light transport [49, 50] have also been approximated with RBFs.
In this dissertation, we seek to develop an appropriate data representation for modeling ra- diance and illumination functions. Since radiance data are highly related to directions on the unit hyper-sphere, it would be better to preserve their intrinsic nature in the spherical domain.
Additionally, this representation should be compact, rotation-invariant, and efficient in comput- ing the spherical integral of the rendering equation [73]. In contrast to previous approaches, we introduce univariate and multivariate SRBFs that satisfy the desired objectives.
2.1.2 Spherical Radial Basis Functions
Spherical radial basis functions (also called spherical basis functions in [123]) are special RBFs defined on the unit hyper-sphere. Their intrinsic nature in the spherical domain and other ap- pealing properties, such as rotational invariance and positive definiteness, make them appropri- ate for modeling spherical functions without introducing any artificial boundaries or distortions.
When combined with multi-resolution approaches, such as spherical wavelets [44, 123], SRBFs become a powerful mathematical tool for analyzing scattered data on the unit hyper-sphere, in- cluding information measured by satellites [41] and observed stations on the entire globe [129].
In fact, computer graphics and vision researchers are not unfamiliar with SRBFs. The gen- eralized cosine lobe [86] and the isotropic Gaussian kernel of Ward model [198] are two special cases of SRBFs. In both articles, SRBFs were utilized to model bidirectional reflectance distri- bution functions(BRDFs), and usually lead to a compact, expressive, and physically plausible representation for reflectance functions. Green et al. [49, 50] also employed spherical Gaus- sian functions, which can be viewed as a variant of SRBFs, to model all-frequency glossy and mirroring effects with self-occlusions.
In this dissertation, we apply traditional SRBFs, which are referred to as univariate SRBFs, to model radiance and illumination functions, and propose a novel category of SRBFs, namely multivariateSRBFs, to overcome the major drawbacks of univariate SRBFs. We also present practical fitting algorithms based on non-linear optimization to estimate the parameters of uni- variate/multivariate SRBF representations. Besides, it is worth noted that a special form of univariate SRBFs is highly related to the well-known von Mises-Fisher distribution [112] in statistics and various computer science applications, including document clustering [6], orien- tation distribution approximation [55, 117], and illumination estimation [56]. We will discuss this relation in more details in Section 3.2.3.
2.1.3 Parameterization
For data representations, identifying or deciding the parameters of a model, which is often called parameterization, is an essential pre-process. The identified parameters may be of par- ticular interests for a specific application, or define a proper coordinate system for the given observations. An intuitive parameterization approach is to employ the same coordinate system during data acquisition, where each parameter corresponds to a single acquisition condition.
Nevertheless, there are two problems with this approach. First, we may not be able to identify and control all the conditions that influence the measured data. Second, even all of them are controllable during acquisition, they may not form an effective coordinate system for the obser- vations. A common solution to the above issues is to transform the acquisition conditions into another set of more powerful parameters. The overall effect thus is equivalent to representing the observations with a suitable coordinate system.
In computer graphics and vision, half-way [151] and reflected vector [145] parameteriza-
tions for BRDFs have been shown to be effective in modeling highly specular materials. Stark et al. [167] also proposed several physically-interpretable methods for isotropic BRDFs based on coordinate systems with triangular supports. In recent years, various heuristic parameteri- zations have been applied to approximate spatially-varying surface appearance [90, 213, 214]
and radiance transfer functions [49], further demonstrating the promising potentials of parame- terization. In general, parameterization is beneficial to reduce the dimensionality of reflectance functions, which leads to a compact and low-dimensional representation for surface appear- ance. It also greatly increases data coherence that can be subsequently exploited by different approximation algorithms.
Nevertheless, previous parameterization approaches are limited to fixed or heuristic trans- formation functions. Without intensive experiments, it is difficult to predict which parameter transformation performs the best for the given data sets. By contrast, we propose to learn the optimal parameterization functions from data, within restricted functional forms, by introducing additional parameters in the proposed multivariate SRBF representation. The resulting model thus successfully combines parameterization and multivariate SRBF approximation in a uni- fied framework to fill the gap between these two problems that have been solved separately in conventional data representations.
2.2 Dimensionality Reduction
The high-dimensional ’curse’ has driven the advances of dimensionality reduction techniques for a long time. Scientists generally assume that low-dimensional manifolds are embedded in their high-dimensional observations, and could be estimated by linear or non-linear transfor- mations. The result of this estimation should preserve the structures of the embedded low- dimensional manifolds, and therefore implies an appropriate approximation (or representation) of the high-dimensional observations. In the following sections, we briefly review three cate- gories of dimensionality reduction algorithms: linear (Section 2.2.1), non-linear (Section 2.2.2), as well as multi-linear (Section 2.2.3) models, and summarize their applications in computer graphics and vision.
2.2.1 Linear Models
Dimensionality reduction algorithms have been widely adopted to analyze and compress var- ious types of data sets. Perhaps the most popular approach is principal component analysis (PCA) [72], which is a linear model and often computed using singular value decomposition (SVD). In PCA, data samples are transformed from a high-dimensional space into another low-dimensional sub-space spanned by only a few principal components (PCs). The origi- nal samples thus can be approximated by projecting them onto these PCs. Variants of PCA have also been proposed to incorporate special considerations in practical applications, includ-
ing non-negative matrix factorization [93, 94, 132] and non-negative sparse PCA [63, 210].
Nevertheless, a major drawback of PCA, together with its variants, is that observations must be re-arranged into a standard two-mode matrix before analysis. In real cases, data are fre- quently sampled under different conditions, and can be naturally classified into more than two modes. The original structures and important information of data sets may be lost after the re-arrangement.
In addition to traditional PCA, some linear models, such as probabilistic PCA [174, 175]
and Bayesian PCA [13], derive PCs and projection coefficients in the linear sub-space of ob- servations from a probabilistic standpoint. These linear models thus can be made more toler- ant to noise and outliers than conventional PCA, and easily integrated with other probabilistic methods, for example, the mixture of Gaussians [13, 174]. There are also other popular linear models, for instance, factor analysis [8], classical multi-dimensional scaling [26], independent component analysis [64], and to name a few. Comprehensive surveys of linear dimensionality reduction algorithms can be found in [7] and references therein.
In general, linear models are computationally efficient and easy to implement, but they are inadequate to analyze data sets with non-linear structures. Due to the discrepancy between the model assumption and the intrinsic non-linear nature of some real-world data sets, we may ob- tain fallacious results from linear analysis. In computer graphics and vision, applications of lin- ear models include lighting information approximation [20, 67, 99, 113, 114], spatially-varying appearance models [153, 194], texture synthesis [96, 97], and motion tracking or estimation [104, 127].
2.2.2 Non-Linear Models
Apart from linear models, numerous dimensionality reduction algorithms have been proposed to explore the non-linear correlations among data. Although the structures of real-world ob- servations are complicated and globally non-linear, local PCA [75] assumed that local corre- lations are approximately linear, so that the observations could be ’locally’ transformed into low-dimensional linear sub-spaces without a significant loss of information. Locally linear embedding [150, 154] and Laplacian eigenmaps [9] aimed at preserving the local geometry of the embedded manifolds by transforming nearby data samples to nearby points in the low- dimensional sub-space. Isomap [173] instead attempted to preserve both local and global struc- tures of the embedded manifolds, in terms of geodesic distance, when transforming the ob- servations into the low-dimensional sub-space. Moreover, kernel PCA [155] took higher-order statistics among data samples into account by performing PCA in the reproducing kernel Hilbert space. If a proper set of kernel functions is employed, the non-linear structures of the embedded manifolds in the original space may be mapped into linear structures in the reproducing kernel Hilbert space. Generalized PCA [188] further extended this concept by estimating an unknown number of sub-spaces from the observations, where the identified sub-spaces could be modeled
with a set of homogeneous polynomials whose degree is the number of subspaces. It thus over- comes one of the major drawbacks of kernel PCA in which learning appropriate kernels is still an open problem.
There are also other non-linear models, for example, auto-associative neural networks [82], generative topographic mapping [14], self-organizing maps [80], permuted PCA [68], Gaussian process latent variable model [92], and to name a few. Their detailed descriptions and com- parisons are beyond the scope of this dissertation. Interested readers may refer to [95] for a comprehensive review of non-linear dimensionality reduction algorithms.
Although non-linear models allow complex analysis on observations, not all of them are generative. Data reconstruction may not be possible from the derived results. This frequently prevents some of them from practical applications in data-driven rendering. Additionally, non- linear models are computationally more expensive than linear models, and sometimes may be intractable for large-scale data sets. Despite these disadvantages, applications of non-linear models are still prevalent in computer graphics and vision, for instance, precomputed radiance transfer [161], texture synthesis [97], and material modeling [113, 114, 193].
2.2.3 Multi-Linear Models
In recent years, multi-linear models (also called tensor approximation or multi-way analysis) [33, 165] have become widespread and caught a lot of attention. They can be regarded as a generalization of PCA, where data samples are processed in their intrinsic form as a multi- dimensional array, and separate reduction is allowed along each dimension. Unlike linear and non-linear models, tensor approximation relies on decomposing a high-dimensional space into multiple low-dimensional sub-spaces that are respectively associated to each mode of observa- tions to remove the curse of dimensionality. The extracted low-order factors in each sub-space then can be combined to effectively model the original high-dimensional space. In this way, multi-linear models successfully preserve the intrinsic structures and important information of observations, and thus overwhelm one of the main disadvantages of previous dimensionality reduction techniques. Two primary categories of traditional multi-linear models are Tucker models [180, 181] and parallel factor analysis [57] (or canonical decomposition [18]). Both of them play important roles in chemometrics and psychometrics, and have recently become prevalent in signal processing [78] and computer science [1].
In computer graphics and vision, multi-linear models have also been successfully extended [159, 204, 205, 207] and applied to various applications, such as data-driven rendering [171, 185, 192], human facial processing [183, 184, 189, 190, 191], and image analysis [59, 60]. Even some matrix factorization methods [90, 124, 172, 208, 209] are implicitly related to multi-linear models. Vasilescu and Terzopoulos [185] organized a bidirectional texture function as a high- order tensor and applied tensor approximation to decompose it. Wang et al. [192] introduced an out-of-core and block-wise tensor approximation technique based on N -mode SVD [33]. Xu et
al. [205] instead focused on the least-squares reconstruction, from the same sub-spaces, for a set of objects represented as high-order tensors. Shashua and Hazan [159] extended non-negative matrix factorization to derive positive and sparse factors from a general multi-dimensional array.
Sun et al. [171] proposed a tensor representation to model the light transport of inter-reflections for dynamic BRDFs. Yan et al. [207] suggested re-arranging each element of a tensor so that data coherence is maximized for tensor-based subspace learning. Furthermore, Wu et al. [204]
presented a hierarchical tensor decomposition model to expose the multi-scale structures among multi-dimensional visual data sets. This thus successfully incorporated multi-resolution analy- sis with existing framework of tensor approximation.
Although multi-linear models allow higher compression ratios than PCA, directly employ- ing them to data-driven rendering would be inadequate for real-time applications. Even after applying N -mode SVD [33] to derive an optimal approximation of the input tensor, the amount of compressed data is still cumbersome. It is also difficult to achieve real-time performance for run-time rendering or analysis in computer graphics and vision applications. In this dis- sertation, we therefore propose two novel multi-linear models, clustered tensor approximation and K-clustered tensor approximation, to solve these problems. The proposed methods rely on combining the merits of clustering and tensor approximation to form new mathematical tools for data analysis, and can be regarded as integrating the concept of some non-linear models into multi-linear models.
2.3 Data-Driven Rendering
Rendering from data has a long history in computer graphics and vision. Maybe the most well- known method is texture mapping. Recently, data-driven rendering approaches have enhanced and glorified this concept to synthesize photo-realistic images from large-scale measured or precomputed visual data sets. In this way, the output quality and rendering performance of data-driven rendering are independent of scene complexity, but depend on the acquisition, en- coding, and decoding schemes of the pre-sampled data. Numerous variants and extensions of data-driven rendering have been proposed over the last decade. The light fields [101] and the lumigraphs [47] employed densely-sampled images to render novel views from arbitrary camera positions without additional information. Shade et al. [158] proposed to synthesize the requested view from a layered depth image set that stores depth information of a scene. Dana et al. [28] proposed bidirectional texture functions (BTFs) that combine BRDFs and texture maps to account for the illumination- and view-dependent variations in texels. The progress of sensor technologies has also stimulated the development of many data-driven rendering techniques, such as material modeling [21, 113, 114, 135], time-varying appearance [51, 58, 169, 193], and so forth.
Two primary challenges of data-driven rendering are the accurate measurement of natural phenomena and the efficient manipulation of observations. Nowadays, the acquisition, repre-
sentation, and compression of visual data sets for complicated phenomena become even more difficult to be handled. The purpose of this dissertation is to overcome the latter challenge of data-driven rendering by applying effective data representations and powerful approximation algorithms to measured observations. Although there are numerous related articles, we only briefly review spatially-varying appearance models (Section 2.3.1) and radiance transfer (Sec- tion 2.3.2) techniques that are directly related to this dissertation.
2.3.1 Spatially-Varying Appearance Models
Appearance Models
For detailed surface appearance, BTFs [28] extended texture maps and BRDFs to describe spatially-varying local reflectance distributions owing to micro-geometry that may need mil- lions of polygons to model. Thus, a BTF is a six-dimensional function that can be regarded as a two-dimensional texture in which each texel records the exitant radiance of different view di- rections with respect to incident illumination variations1. Koudelka et al. [81] and Sattler et al.
[153] also discussed the acquisition, compression, and rendering issues of BTFs and made their measured BTF databases publicly available. Although BTFs effectively model complex meso- scale reflectance behaviors with multiple images, they are still restricted to local illumination responses.
Apart from BTFs, various appearance models have also been developed to render shadows and complex illumination effects from precomputed meso-scale data or special fields, such as visibility information [30, 61], view-dependent displacement mapping [194, 197], shell texture functions [21], and relief mapping [130, 138, 139]. However, even if these methods permit effi- cient representations for complicated surface appearance and meso-structures, inter-reflections can not be rendered at real-time rates.
Approximation Algorithms
For the approximation of appearance models, we summarize three main categories of algo- rithms: functional linear models, non-parametric models, and probabilistic models.
Functional linear models are of the most popular representations for data-driven rendering.
Their main concept is to approximate a complex appearance function as a linear combination of simple basis functions. In this category, the choice of an appropriate basis is one of the major research issues, which significantly influences approximation efficiency and quality. Pre- vious articles have proposed to model various appearance functions using parametric kernels [46, 86, 115, 198], polynomials [111, 118], radial basis functions [213, 214], spherical har- monics [145, 163], and wavelets [107, 156]. For all-frequency appearance data sets, parametric kernels and radial basis functions generally provide the best trade-off between rendering per-
1
For a single texel, the actual intersection point on meso-structures may be different from each view direction.
formance and image quality. Nevertheless, they usually rely on time-consuming non-linear optimization to derive model parameters, and thus are impractical for approximating spatially- varying materials.
Non-parametric models can be regarded as functional models that do not have pre-defined forms of basis functions. In this category, an appropriate basis is learnt from appearance data sets, rather than the prior information specified by researchers. The most popular approaches in computer graphics and vision include clustering and dimensionality reduction techniques, such as variants of principal component analysis [27, 77, 81, 121, 153], matrix factorization [89, 90, 116, 135, 172], tensor approximation [185, 192], and vector quantization [42, 100].
Although non-parametric models are entirely data-dependent methods that provide accurate and flexible representations for appearance functions, the amount of compressed data and run-time rendering costs are still high when compared to other categories of approximation algorithms.
Additionally, special interpolation or estimation techniques are frequently required to synthesize surface appearance from novel illumination and view directions.
In probabilistic models, spatial correlations in appearance data are described with probabil- ity density functions, so that similar materials can be synthesized from the estimated parameters of distributions and noise maps [52, 54]. Recently, Haindl and Filip [53] proposed a multi-scale probabilistic BTF model based on the casual autoregressive random field, and further com- bined range maps to enhance the surface roughness of rendered objects. Although probabilistic models can achieve very high compression ratios, their main goal is efficient and seamless ap- pearance synthesis, not an optimal reconstruction of the original appearance data. Additionally, the run-time rendering process is rather slow and currently not GPU-friendly.
2.3.2 Radiance Transfer
Precomputed Radiance Transfer
Recently, precomputed radiance transfer (PRT) has received a growing interest owing to its ability of rendering complex illumination and shadowing effects, such as self-inter-reflections, sub-surface scattering, caustics, and self-shadows, in dynamic lighting environments at real- time rates. The key concept is to precompute and model light scattering between an object and its surroundings by representing both incident radiance and light transport functions in the spherical harmonic basis. Thus, run-time rendering of exitant radiance can be reduced to a simple dot product for a diffuse object, or a matrix-vector multiplication for a glossy one. Since the spherical harmonic basis is inadequate to approximate high-frequency signals, PRT methods based on spherical harmonics are also called low-frequency PRT [99, 161, 162].
Beyond low-frequency radiance transfer, the all-frequency PRT methods [105, 125, 126, 195, 196] pre-sampled high-resolution light transport data to accurately capture hard and soft shadows in all-frequency lighting environments. The densely-sampled PRT data were then com- pressed using sophisticated compression techniques, for instance, non-linear wavelet approxi-