Iterative K-clustered tensor approximation

Procedure: IterativeK-CTA(A, {R_n}^N_n=1, C, K_m)

Input: An N -th order tensor A ∈ R^I¹^×I²^×···×I^N, a set of reduced ranks {R_n}^N_n=1, the number of clusters C for the clustered mode m, and the number of mixture clusters Km.

Output: The core tensor and basis matrices of each clustern

Z_c,U_n,c N Initialize each entry of Um,cto zero

end

for i ← 1 to I_mdo // Greedy search

Obtain the first mixture cluster ci1 of Ahmiiby solving Equation 6.2 Update Um,c_i1

i∗as Equation 6.3 for k ← 2 to Kmdo

Obtain the k-th mixture cluster c_i_k of Ahm_iiby solving Equation 6.5 Update

j=1as Equation 6.6 // Optimal projection end

end

for c ← 1 to C do // Post-processing

Decompose U_m,c to obtain U⁰_m,c and W_c(Equation 6.10) U_m,c ← U⁰_m,c

Z_c← Z_c×_mW_c end

// Update stage for c ← 1 to C do

Compute R_cas Equation 6.12

R⁰_c ← R_c×_mM^T_c (refer to Equations 5.9 and 5.10 for M_c)

Then, the corresponding mixing coefficients for the first mixture cluster of a mode-m sub-tensor are given by the following corollary:

Corollary 6.1: The mixing coefficients for the first mixture cluster ci1 of the i-th mode-m sub-tensorA_hm_i_iare computed as

U_m,c_i1

where Z_c_i1 is a diagonal matrix whose j-th diagonal entry is the `² norm of the j-th row of ufm(Z_c_i1).

Interested readers may refer to Section 6.4.1 for the mathematical proof of Corollary 6.1.

From Theorem 5.3, we identify that correlated mode-m sub-tensors will likely be classified into the same cluster. Corollary 6.1 further indicates that the mixing coefficients for the first mixture cluster of the i-th mode-m sub-tensor is the projection coefficients of Ahmiionto the sub-spaces of cluster ci1. Equation 6.2 also implies that if the total number of mixture clusters is set to one, K-CTA will be identical to CTA in the clustering stage.

Remaining mixture clusters: After resolving the first mixture cluster of a mode-m sub-tensor, its remaining mixture clusters are then iteratively derived, one cluster at each iteration, from the results of previous iteration. We thus propose the following theorem to settle the k-th mixture cluster of a mode-m sub-tensor:

Theorem 6.2: The k-th mixture cluster c_i_k of thei-th mode-m sub-tensor Ahm_ii is resolved from the mixing coefficients of the previously selectedk−1 mixture clusters c_i₁, c_i₂, . . . , c_i_k−1 by minimizing the approximation error of the residual sub-tensor

R^(k)_hm

which is equivalent to solving the following constrained integer optimization problem:

maxc_ik

The mathematical proof of Theorem 6.2 can be found in Section 6.4.1. For each mode-m sub-tensor, the first term of the objective function in Equation 6.5 is actually the same as the

objective function in Equation 6.2, and the second term instead penalizes a cluster whose basis matrices are correlated to those of previously selected mixture clusters. Therefore, Theorem 6.2 implies that the k-th mixture cluster of a mode-m sub-tensor is determined by maximizing intra-cluster correlations and minimizing inter-intra-cluster correlations at the same time. This interesting result is similar to the optimized orthogonal matching pursuit approach [148], where the k-th atom is resolved by simultaneously minimizing its linear dependence with previously selected atoms and maximizing the projected norm of the residual.

Moreover, it is obvious that Equation 6.5 can be computed in the reduced tensor space¹ to significantly decrease computational costs. The first term of the objective function in Equa-tion 6.5 is the projecEqua-tion coefficients of Ahm_ii onto the basis matrices and the dual mode-m basis matrix of cluster c_i_k. It should be already computed when resolving the first mixture cluster of Ahmiiand remains unchanged during the whole clustering stage. The second term in-stead can be interpreted as transforming the projected Ahmiiin the sub-spaces of cluster cij, for j = 1, 2, . . . , k − 1, to the sub-spaces of cluster c_i_k, followed by the multiplication with V_m,c^T

to obtain projection coefficients. As a result, we can avoid computing the residual sub-tensor in the original tensor space (Equation 6.4), which needs to first reconstruct the corresponding mode-m sub-tensor from the results of previous iteration.

Optimal Projection

Since the spaces of different clusters may be correlated, each time when a mode-m sub-tensor joins a new mixture cluster, its mixing coefficients for previously selected mixture clus-ters should be updated to account for the change in the cluster membership. This guarantees an optimal projection of the mode-m sub-tensor onto the sub-spaces of all selected mixture clusters. We therefore introduce the following theorem to update the mixing coefficients of a mode-m sub-tensor:

Theorem 6.3: The mixing coefficients for the k selected mixture clusters ci1, c_i₂, . . . , c_i_k of thei-th mode-m sub-tensor A_hm_i_iare given by

u^(k)_m_i = Z^(k)+_m_i a^(k)_m_i, (6.6) where the superscript ’+’ specifies the Moore-Penrose pseudo-inverse of a matrix, and

u^(k)_m

i =h

U_m,c_i1

i∗ · · · U_m,c_ik

i∗

, (6.7)

In this dissertation, the reduced tensor space is referred to as the union of all decomposed cluster sub-spaces,

whose dimensionality is frequently much less than the original tensor space.

Z^(k)_m_i = Interested readers may refer to Section 6.4.2 for the mathematical proof of Theorem 6.3.

Intriguingly, Equation 6.6 resembles the least-squares solution to the projection coefficients of an observation onto a set of basis vectors, where Z^(k)mi can be regarded as the Gram matrix that accounts for the correlations between all available cluster sub-spaces. Note that the proposed K-CTA algorithm is indeed efficient since K_m is usually a small positive integer. Moreover, all of the operations during the clustering stage are performed in the reduced tensor space, and most of them only need to be computed once at the beginning of this stage.

Post-Processing

To satisfy the orthonormal constraints on basis matrices in Equation 6.1, we additionally de-compose the mode-m basis matrix of each cluster using principal component analysis to obtain

Um,c = U⁰_m,cWc, (6.10)

where U⁰_m,c ∈ R^I^m^×R^mis a basis matrix whose columns are orthonormal principal components of Um,c, and the columns of Wc∈ R^R^m^×R^mcontain the projection coefficients of each column of Um,c. After that, we replace the mode-m basis matrix of cluster c with U⁰_m,cand re-compute the core tensor of cluster c by the mode-m product Zc×_mW_c, so that the value of the objective function in Equation 6.1 is unchanged.

6.2.3 Update Stage

In this stage, the core tensor and basis matrices of each cluster are updated from the results of the clustering stage. While this problem for CTA can be easily solved by simultaneously applying tensor approximation to the member sub-tensors of each cluster, sub-space learning for all clusters at the same time would be difficult for K-CTA, since each mode-m sub-tensor now belongs to K_mdifferent clusters that may be correlated with each other. We thus alternately decompose one cluster at a time by tensor approximation, while fixing the results of other

clusters. To simplify the proposed algorithm, the cluster membership of each mode-m sub-tensor is not altered in this stage, leaving it to be updated only in the clustering stage.

At the c-th iteration, the core tensor Zcand basis matricesUn,c

n=1of cluster c are allowed to change, while those of other clusters are fixed. Equation 6.1 thus can be rewritten as the following constrained least-squares optimization problem:

min

where Rcis the residual tensor at the c-th iteration defined as

R_c= A −

By comparing the objective function in Equation 6.11 to Equations 4.1 and 4.2 in [33], we know that Equation 6.11 can be solved with the aid of tensor decomposition, and Um,c will be the decomposed mode-m basis matrix of cluster c. To enforce the sparse constraints on Um,c, the key idea is to only include cluster members in the tensor decomposition process and just update the non-zero entries of U_m,c, which is based on the same concept in the update stage of CTA (Section 5.3.3) and the codebook update stage of K-SVD [3, 4]. As a result, the membership of cluster c is fixed, and the zero entries of U_m,c remain zeros. However, the non-zero entries of U_m,c are allowed to change with the decomposed results of cluster c at the same time.

6.3 Implementation Issues

In this section, we discuss some practical issues of K-CTA, such as the initial guess of the core tensor and basis matrices of each cluster (Section 6.3.1) and the degeneracy and conver-gence problems of K-CTA (Section 6.3.2). We omit a detailed description of the extraction of global basis matrices, since they can be easily derived based on the same technique described in Section 5.4.2.

6.3.1 Initial Guess

Similar to CTA, one practical issue of K-CTA is the initial guess of the core tensor and basis matrices of each cluster. A simple and heuristic solution is to employ the decomposed results of

hard clustering. We can execute static CTA to obtain the initial core tensor and basis matrices of each cluster for further data analysis using K-CTA. Various general soft clustering techniques for determining the initial membership of each mode-m sub-tensor, such as mixture models [6, 37, 174] and sparse representation [3, 4, 177], also provide desirable solutions.

Interestingly, while finding a favorable initial guess is a significant issue for CTA and other iterative algorithms, K-CTA is less sensitive to the quality of an initial solution. As the num-ber of mixture clusters increases, the importance of an appropriate initial guess for K-CTA de-creases. This result is not surprising since the impact of inappropriate initial cluster membership can be compensated by additional mixture clusters. The compensation also can be regarded as an optimal interpolation scheme from other cluster sub-spaces in the least-squares sense, which particularly allows smooth transitions across different physical conditions in a practical appli-cation. In Section 9.2, we will further demonstrate the influence of this characteristic of K-CTA in the experimental results of bi-scale radiance transfer.

6.3.2 Degeneracy and Convergence

When the total number of clusters is large, we have observed that sometimes all the entries in a column of a mode-m basis matrix, say U_m,c, may become zeros at the end of the clustering stage, which implies that no mode-m sub-tensors are classified into cluster c. Although this degeneracy problem does not occur frequently in practice, an empty cluster actually consumes memory space without any contributions to final approximation errors². In the worst case, K-CTA may even be trapped in a poor locally optimal solution when there are too many empty clusters, as if the total number of clusters were set to a lower value. Our solution to this issue is to split the cluster with the largest total sum of approximation errors in half by sorting the approximation error of each mode-m sub-tensor within this cluster. If there are more than one empty cluster, we can sequentially split non-empty clusters using the above method until each empty cluster has been assigned at least one mode-m sub-tensor.

Another important question is whether the proposed iterative K-CTA algorithm always verges to a local optimum. Apparently, the answer is no. Similar to K-SVD [3, 4], the con-vergence of iterative K-CTA is not guaranteed, since only the approximate mixing coefficients (and also the cluster membership) of each mode-m sub-tensor are derived in the clustering stage.

Therefore, the total sum of approximation errors is not guaranteed to decrease when compared to the decomposed results of previous iteration. Fortunately, although iterative K-CTA theoreti-cally does not ensure convergence, we have found that it practitheoreti-cally converges within just a few iterations in the experiments. We currently have no idea about how to efficiently update mixing coefficients and cluster sub-spaces so that approximation errors are always improved. However,

When encountering the degeneracy problem, we can save sparse mode-m basis matrices in the compressed

column storage format [5] without memory space overhead. Nevertheless, it is better to employ the compressed

row storage format [5], so that the mixing coefficients of a mode-m sub-tensor are grouped together to reduce the

cache miss rates of CPUs (or GPUs) for efficient run-time reconstruction.

a simple and intuitive technique can be applied to prevent bad results due to the divergence problem. At the end of the update stage, if the total sum of approximation errors increases, we instead restore the decomposed results of previous iteration. As a result, the approxima-tion errors of an input tensor will never increase, and the convergence criterion of K-CTA in Algorithm 6.1 can be always reached.

6.4 Mathematical Proofs

In the following two sections, we present the mathematical proofs of the proposed greedy search (Section 6.4.1) and optimal projection (Section 6.4.2) methods in the clustering stage of K-CTA, including Corollary 6.1 for the first mixture cluster as well as Theorems 6.2 and 6.3 for the remaining mixture clusters.

6.4.1 Greedy Search

In the clustering stage of K-CTA, each mode-m sub-tensor of A is sequentially classified into K_m mixture clusters using a greedy approach, while the core tensor and basis matrices, ex-cept for the mode-m basis matrix, of each cluster are fixed. Since the cluster membership and mixing coefficients of each mode-m sub-tensor are independent, Equation 6.1, without the or-thonormal constraints on U_m,1, U_m,2, . . . , U_m,C, can be separated into I_m distinct constrained Equa-tion 6.1 can be enforced from the optimized results of EquaEqua-tion 6.13 using a post-processing approach as described in Section 6.2.2.

First Mixture Cluster

For the first mixture cluster and corresponding mixing coefficients of a mode-m sub-tensor, we propose to obtain them from Theorem 5.3 and Corollary 6.1. By following the same approach as in the proof of Theorem 5.3, it is quite easy to verify the correctness of Corollary 6.1.

Proof of Corollary 6.1: The first mixture cluster ci1 of the i-th mode-m sub-tensor Ahmii is selected as if Km = 1. Equation 6.13 thus can be further simplified into Equation 5.13. From

Equation 5.19, the least-squares solution to Equation 5.13 is the projection coefficients of Ahm_ii

onto the sub-spaces of cluster c_i₁ so that the approximation error of Ahm_ii is minimized. Equa-tion 6.3 is thus proved.

Remaining Mixture Clusters

After selecting the first mixture cluster of a mode-m sub-tensor, its remaining mixture clusters are then sequentially derived from its previously determined mixture clusters and mixing coef-ficients by applying Theorem 6.2. The mathematical proof of Theorem 6.2 is described in the following paragraph.

Proof of Theorem 6.2: For the k-th mixture cluster c_i_k of the i-th mode-m sub-tensor Ahm_ii

other than ci1, it is determined as if Km = k. When previously selected mixture clusters c_i₁, c_i₂, . . . , c_i_k−1 and the corresponding mixing coefficients are fixed, the objective function in Equation 6.13 becomes

ii is defined as Equation 6.4. By following the same approach as in the proofs of Theorem 5.3 and Corollary 6.1, the minimization of Equation 6.14 is equivalent to the maxi-mization of

Substituting Equation 6.4 for R^(k)_hm

iiin Equation 6.15 then yields the objective function in Equa-tion 6.5. Theorem 6.2 thus is proved by identifying that the cluster membership is implicitly specified in the solution to Equation 6.13.

6.4.2 Optimal Projection

In the clustering stage of K-CTA, each time when adding a new mixture cluster to a mode-m sub-tensor, its mixing coefficients for previously determined mixture clusters should be updated by applying the proposed optimal projection algorithm (Theorem 6.3). This actually guarantees an optimal solution of mixing coefficients with respect to all selected mixture clusters, since the sub-spaces of different clusters may have strong correlations. For completeness, we present the detailed mathematical proof of Theorem 6.3 as follows.

Proof of Theorem 6.3: Suppose that total k mixture clusters c_i₁, c_i₂, . . . , c_i_k of the i-th mode-m sub-tensor Ahm_ii have been selected as if K_m = k. Similar to Equations 5.14 and 5.15,

since the constraints in Equation 6.13 can be satisfied by setting U_m,c

i∗ to zeros for all c /∈ {c_i₁, c_i₂, . . . , c_i_k}, Equation 6.13 can be simplified into a standard unconstrained least-squares problem whose objective function is

By taking the first-order partial derivatives of Equation 6.16 with respect to each entry of U_m,c_i1

i∗and setting the resulting derivatives to zeros, we have the following linear equation:

The right side of Equation 6.17 can be rewritten as

ufm

Similarly, the left side of Equation 6.17 is equal to

By following Equations 6.17–6.19 with respect to each entry of U_m,c_ij

i∗ for all j, we have

the following set of linear equations: respectively defined in Equations 6.7–6.9. As a result, we can conclude that Equation 6.6 gives the least-squares solution of Um,c_i1

i∗, Um,c_i2

i∗, . . . , Um,c_ik

i∗ to Equation 6.16. Theorem 6.3 thus is proved.

Chapter 7 Application I: Illumination and Reflectance Functions

This chapter presents the applications of the proposed spherical radial basis function (SRBF) representations to important illumination and reflectance functions in computer graphics and vi-sion, such as high dynamic range environment maps (Section 7.1) and bidirectional reflectance distribution functions (Section 7.2). In the third part of this chapter, we also describe an im-portance sampling scheme for direct illumination estimation in ray tracing algorithms, where the materials of object surfaces are modeled with the scattered univariate SRBF representation (Section 7.3).

7.1 High Dynamic Range Environment Maps

7.1.1 Problem Formulation

The dynamic range of intensities between light and dark areas in a real-world scene is frequently much greater than traditional digital imaging techniques and display devices. Therefore, the high dynamic range(HDR) environment map [34, 36] particularly aims at capturing the incident radiance of object surfaces from all illumination directions in a natural scene with a wide range of intensity levels (Figure 7.1). When adopting a HDR environment map as the light source in a real-time rendering application, the visual fidelity of rendered images can be greatly enhanced to simulate the surface irradiance under realistic illumination.

It is obvious that a HDR environment map is a real-valued univariate spherical function L(ω_l) ∈ R that depends on the illumination direction ωl on the unit sphere S². To efficiently shade objects using a HDR environment map, we can therefore model the lighting environ-ment with the scattered univariate SRBF representation as described in Section 3.4. The entire process is formulated as an optimization problem, and handled by minimizing squared approx-imation errors.

Given a desired number of univariate SRBFs J for a HDR environment map L(ω_l), we

(a) Eucalyptus Grove (b) Grace Cathedral (c) St. Peter’s Basilica

Figure 7.1: Examples of HDR environment maps from the light probe image gallery [35].

Each image shows the spectral radiance (red, green, and blue light) from different illumination directions.

intend to learn three sets of SRBF parameters: the basis coefficient setβ_j^(L)∈ R J

j=1, the center setξ^(L)_j ∈ S² J

j=1, and the bandwidth setλ^(L)_j ∈ R J

j=1, such that the following unconstrained least-squares optimization problem is solved:

min

β_j^(L),ξ^(L)_j ,λ^(L)_j oJ

j=1

1 2

S²

L(ω_l) − ˆL(ω_l)2

dω_l, (7.1)

where

L(ωˆ _l) =

j=1

β_j^(L)G ω_l· ξ_j^(L)|λ^(L)_j . (7.2)

It is then straightforward to derive the SRBF parameters in Equation 7.1 by applying Algorithm 3.1.

7.1.2 Experimental Results

The experiments and simulation timings of HDR environment map approximation were con-ducted and measured on a workstation with an Intel Core 2 Extreme QX9650 CPU, an NVIDIA GeForce 9800 GX2 graphics card, and 8 gigabytes main memory under Microsoft Windows Vista operating system. Parts of the fitting process, including the initial guess of SRBF param-eters and gradient computation, were accelerated on GPUs using NVIDIA Compute Unified Device Architecture(CUDA) [128]. The univariate Gaussian SRBFs were adopted to represent the HDR environment maps in our experiments. In general, we have found no significant

dif-0%

150 300 450 600 750 900 1050 1200 1350 1500 1650 1800

Squared error ratio

150 300 450 600 750 900 1050 1200 1350 1500 1650 1800

Squared error ratio

Figure 7.2: Plots of the squared error ratio versus the number of coefficients/parameters based on different functional representations for HDR environment maps, including spherical har-monics(SH); area-weighted Haar wavelets (Haar) [125]; scattered univariate SRBFs (SRBF);

scattered univariate SRBFs with non-negative basis coefficients(N-SRBF).

ference between various types of univariate SRBFs for approximating HDR environment maps, but univariate Gaussian SRBFs are more locally compact than univariate Abel-Poisson SRBFs and handle most common cases very well. The SRBF center and bandwidth sets were con-strained to be the same for red, green, and blue channels of a HDR environment map, since separately fitting the data of each channel only slightly reduces approximation errors, but in-creases computational costs and storage space by a factor of 2 or more.

Figure 7.2 plots the squared error ratio versus the number of coefficients/parameters for dif-ferent functional representations, including spherical harmonics, area-weighted Haar wavelets [125], and the proposed scattered univariate SRBF representation (with/without non-negative basis coefficient constraints). Figures 7.3 and 7.4 further demonstrate the reconstructed results of various HDR environment maps. Note that we compare the functional representations based on the same total number of coefficients/parameters, namely the same storage space. For spher-ical harmonics and area-weighted Haar wavelets, the number of coefficients is equivalent to the number of basis functions, whereas the number of parameters for the scattered univariate SRBF representation is six times the number of basis functions. Typically, fitting a 6×256×256 HDR environment map with 300 univariate SRBFs (total 1800 parameters) takes about 25 minutes on average. Figures 7.2–7.4 obviously show that the proposed approach achieves the lowest

在文檔中即時資料驅動描繪之參數表示式與張量近似演算法 (頁 81-190)