Temporal Coherence Methods in Rendering

Chapter 2 Related Work

2.3 Temporal Coherence Methods in Rendering

Temporal coherence (TC) describes the correlation or the predictable relationship

of contents between adjacent moments in time. It has been around for computer graphics. By taking advantage of it, the computation of the rendering tasks is alleviated and the quality of the images is improved. For example, in general, there

are very little difference in the shading computation between two consecutive frames.

Therefore, re-computing everything each frame is wasteful. The key to utilizing TC is how the previous computed information are stored and reused. Scherzer et al. [12] and Nehab et al. [8] independently proposed a technique called Reverse Re-projection Cache . The idea is to store the previous shading results in a screen buffer and project the current pixel to the prior frame. By comparing the stored depth with current depth, it can decide whether the pixel was visible in the past frame (detail in section 3.1).

This special framework has been used for a variety of rendering applications, like anti-aliased hard shadows [8,12] , real-time soft shadows [13,14] , motion blur and stereoscopic rendering [8]. Our approach is inspired by their concept of reusing data and spreading the computation over several frames.

In summary, our algorithm uses the reflective shadow maps to simulate one bounce indirect illumination and the screen space ambient occlusion to enhance the realistic of the scene. By taking up the idea of reverse re-projection cache, we can save redundant re-calculation and provide the high quality images.

Chapter 3 Algorithm

In this thesis, we propose a novel idea of reusing cached information of previously rendered frames to enhance the screen space global illumination algorithms such as Reflective Shadow Maps [3] and standard screen space ambient occlusion [7]. In addition, our algorithm is based on deferred shading pipeline. By taking advantages of temporal coherence, we display the high quality images which consider single bounce indirect illumination, ambient occlusion and soft shadows in real time.

1. In section 3.1, we describe in detail how to implement reverse re-projection cache [8,12] for getting the available data stored in history cache and how to accumulate the cached information by taking into account the confidence of these information.

2. In section 3.2, we depict the simple implementation of screen space ambient

4. In section 3.4, we also show that we can combine Percentage Closer Filtering [10]

with our caching strategy to increase the shadow accuracy and to render the fake soft shadows.

3.1 Temporal Coherence

Various rendering applications exploit temporal coherence to achieve high quality images at a lower cost. When using temporal coherence, there are two important decisions. One is how the previously computed data are stored. The other is how the stored data are efficiently retrieved. As long as we resolve these problems, we can obtain the benefit from temporal coherence.

3.1.1 Reverse Re-projection Cache

In order to reuse available information, we follow the reverse re-projection cache [8,12]. We first define a buffer called history cache similar to Nehab et al. [8] and Scherzer et al. [12]. This buffer is viewport-sized and stores the required information at visible surface points in previous frame. Fortunately, on modern graphics hardware, we can represent the buffer as the textures as shown in Figure 1.2 (d) and store it in the texture memory. Therefore, history cache is efficiently maintained on the GPU.

In addition, we require some necessary data for determining whether the current pixel was visible in the earlier frame. Unlike Nehab et al. [8] and Scherzer et al. [12]

store the non-linear depth for the new and the old frame in the buffer, we store the world-space positions in the history cache and do the re-projection in fragment shader.

Formally, let 𝑊_𝑡 denote the world-space position of the pixel stored in G-buffer at time step t as shown in Figure 1.2 (a). Let 𝑃_𝑡−1 and 𝑉_𝑡−1 symbolize the projection matrix and the view matrix at time step t-1. Consequently, for static geometry, we can do the following transformation to get the clip-space coordinates 𝐶_𝑡−1 at time step t-1:

𝐶_𝑡−1 = 𝑃_𝑡−1 ∗ 𝑉_𝑡−1 ∗ 𝑊_𝑡 (1)

Then we do the perspective division to get the normalized device coordinates 𝑁_𝑡−1. In order to get the correct texture coordinates 𝑡𝑒𝑥_𝑡−1, we should scale the 𝑁_𝑡−1 to the range [0, 1]. Here, we define 𝐻_𝑡(𝑡𝑒𝑥_𝑡) as the buffer values of the texel at time step t. Therefore, we can get the correspondent world-space position at time step t-1 stored in the history cache :

𝑊_𝑡−1 = 𝐻_𝑡−1(𝑡𝑒𝑥_𝑡−1) (2)

Finally, we have to detect dis-occlusions for deciding whether the pixel at time step t was visible at time step t-1. Accordingly, we compute the Euclidean distance between 𝑊_𝑡 and 𝑊_𝑡−1 and compare it with a threshold 𝜀 :

‖𝑊_𝑡− 𝑊_𝑡−1‖ ≤ 𝜀 (3)

If the distance is greater than a threshold, the pixel was not presented in the last frame, and there are no previous data can be safely reused. Note that the lower the threshold is, the more invisible pixels appear (see Figure 3.1 ).

Figure 3.1: Invisible pixel is shown in red. The left image is before a translation, and the middle image is after a translation to left with 𝜀= 0.1. The right image is after a translation to left with 𝜀=0.01. The more invisible pixels will impact the performance.

3.1.2 Accumulation with Confidence Value

In spite of avoiding the execution on the hidden surfaces, the screen space global illumination techniques still have some bottlenecks in their sampling strategy (e.g., Mittring [7] ; Dachsbacher and Stamminger [3] ). For example, in order to increase the frame rates, they usually reduce the number of samples. Thus, it brings the banding artifact as shown in Figure 3.2 (a). The general way to alleviate this artifact is using the various sampling pattern at each pixel. But it also introduces a lot of noise in the picture as shown in Figure 3.2 (b). A common method to remove the noise is to blur the image, but it will lose the details of the scene as shown in Figure 3.2 (c).

Figure 3.2: (a) SSAO with a single sampling pattern. (b) SSAO with the various sampling patterns. (c) SSAO with Gaussian filter.

The simple way to remove the above drawbacks is using a lot of samples, but it is too expensive for interactive applications. Fortunately, since we can efficiently get the previous data by using the reverse re-projection technique shown in Section 3.1.1.

We may use these cached information well for improving the quality of the image.

Consequently, our goal is spreading the cost of sampling. We first rotate a sampling pattern which has a small number of samples and jitter the sampled positions to generate slightly different result stored in history cache of each frame. Secondly, by

a b c

recursively accumulating the results, we can get the high quality image. The key is how to accumulate the cached information. Therefore, we take the confidence of the previous solution into account, which can be motivated as follows. The various sampling patterns may capture the difference information. We can estimate how confident the value from previous frame is by calculating the difference between the cached result at time step t-1 and the current result as follows :

𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 1 − 𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑒( ‖𝑉_𝑡−1 − 𝑉_𝑡‖ )

1+𝑏𝑖𝑎𝑠 (4)

Where 𝑉_𝑡−1 is the per-pixel value which is obtained by reverse re-projection technique, and saturate means clamping the specified value within the range of 0 to 1.

The value of bias is user-specified for avoiding zero condition of the confidence and adjusted to your applications to achieve the quality you want. In the other words, this confidence value tells us how much information the cached result did not have. While

we complement the difference over several frames, we would forecast the final result without variance. Thus, we apply the following operation:

𝑉_𝑝= (𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒)𝑉_𝑡−1 + (1 − 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒)𝑉_𝑡 (5)

We take the confidence value to be a weighting factor for estimating how much percentage of the value between two consecutive frames we can take, and 𝑉_𝑝 means the predicted value. Note that because of taking the confidence value per pixel to be a weighting factor, we can avoid spending time to find the acceptable value for the weight. Figure 3.3 depicts the flowchart of our amortized sampling method.

Figure 3.3: The top row shows various sampling patterns in each frame. The bottom row shows the correspondent results.

3.2 Ambient Occlusion

Ambient Occlusion (AO) is a popular method for approximating global illumination. It gives perceptual clues of depth and curvature, and it also enhances the geometry features of the scene. For e conomic reason, we need simpler implementation and higher performance. Therefore, Screen Space Ambient Occlusion (SSAO) was published by Mittring [7]. This algorithm uses only depth information and samples the 3D space around visible point c as shown in Figure 3.4. Because of ignoring the geometric factor, there are some drawbacks that we have to overcome as shown in Figure 3.5. Bavoil et al. [2] use depth and normal information. They consider the depth image as a height field. By comparing the horizon angle between the surface points and the samples, they implicitly take geometric factor into account.

They also use the quadratic falloff function to attenuate the occlusion with the distance for all samples. Both Mittring [7] and Bavoil et al. [2] randomly rotate the sampling directions per pixel to remove the banding artifacts. Our SSAO algorithm is inspired by the Mittring [7] but also the Bavoil et al. [2].

Figure 3.4: Half of the samples are inside the wall and half outside (left). Three quarters of the samples are inside the wall (middle). A quarter of the samples are inside the wall (right).

Figure 3.5: The flat surfaces are gray color (red block).The edges appear brighter (green block).The white halos are around the edges of the objects (blue block).

3.2.1 SSAO Generation

We first need the two world-space information taken from eye’s G-buffer. One is world-space positions, and the other is world-space normal vectors as shown in Figure 3.6. Moreover, we can observe the following condition. For every visible point p, if the sampled point q on the occluding surface is close to it, there are fewer photons around it as shown in Figure 3.7. We also have to add the attenuation parameter which is important to avoid common SSAO artifact on the border of an object. Consequently, the ambient occlusion AO at a point p with a surface normal 𝑛⃗ is :

AO(𝑝, 𝑛⃗ ) = ¹

𝑘

∑

^{max (𝑛}⃗⃗⃗ ∙ 𝑝𝑞⃗⃗⃗⃗⃗⃗⃗ ,0)_𝑖 D(𝑑)

𝑘𝑖=1

(6) D(𝑑) = 1.0 + a𝑑 + b𝑑² (7)

Where 𝑞_𝑖 means the i-th sample around the surface point p, D denotes the attenuation function, constant 1.0 is used to avoid singularity and then 𝑑 is the distance between 𝑞_𝑖 and p. Besides, a and b are user-specified parameters to control the effect of the attenuation with distance.

Figure 3.6: The world-space positions in G-buffer (left). The world- space normal vectors in G-buffer (right).

Figure 3.7: The sample q does not occlude the surface point p, because the dot product of the surface normal and the vector 𝑝𝑞⃗⃗⃗⃗ is zero in case (a). The surface point p is occluded, because the dot product is not zero in case (b).

In order to achieve higher performance, we limit the number of samples and use the regular sampling directions per pixel as shown in Figure 3.8 (a). Furthermore, in order to avoid the banding artifact, we should perturb the sampling pattern per pixel in screen space by using a random 2D-vectors texture that discards the z component as shown in Figure 3.8 (b). In general, we can say that the samples that are close to the surface point p in world space are also close in the screen space and contribute more occlusion to surface point p as shown in Figure 3.8 (c). Consequently, according to the above reasons, we apply the technique that is used to partition the view frustum proposed by Zhang et al. [17] as follows:

𝑆_𝑖 =

𝑛𝑒𝑎𝑟 (

_{𝑛𝑒𝑎𝑟}^𝑓𝑎𝑟

)

^{𝑖 𝑚}^⁄

⁽⁸⁾

Where 𝑆_𝑖 is the i-th sample around the surface point p per direction, 𝑚 denotes the number of samples per direction, 𝑛𝑒𝑎𝑟 is the minimum distance from surface point p in screen space, 𝑓𝑎𝑟 is the maximum distance from surface point p in screen space, and |𝑓𝑎𝑟 – 𝑛𝑒𝑎𝑟| is the sampling range in screen space. Therefore according to equation 8, we can construct a better sampling pattern.

Figure 3.8: Our initial sampling directions (a). Random vector texture for perturbing the sampling pattern per pixel (b). the sample q1 is close to surface point p in world space as well as in screen space. Otherwise, q2 is far from surface point p in world space as well as in screen space (c).

(a) (b) (c)

Finally, we have to blur the image for reducing the noise as another image-space ambient occlusion technique as shown in Figure 3.9. However, the operation of blurring should impact the performance. In Section 3.2.2 we show that we can use the cached information to remove this redundancy.

Figure 3.9: Before blurring, there is a lot of noise in left image. After blurring, the details of the objects are lost in right image.

3.2.2 Exploiting Cached Information

In order to get better results of screen space ambient occlusion, we require a lot of samples that is an unwise choice for real-time applications. Therefore, we can take up the idea of amortizing the cost of sampling (In section 3.1.2). First, we use the reverse re-projection technique (In Section 3.1.1) to get the past value of ambient occlusion 𝐴𝑂_𝑡−1. Second, before perturbing the sampling pattern per pixel, we rotate the sampling directions with fixed angle 𝜃 and slightly scale the sampling range. We then compute the current ambient occlusion 𝐴𝑂_𝑡. Third, we calculate the difference between past and current ambient occlusion according to Eq. 4. Finally, we use Eq. 5 to accumulate 𝐴𝑂_𝑡−1 and 𝐴𝑂_𝑡 by taking the difference value for weighting factor. Note that we should specify the value to limit the number of times to

rotate the sampling pattern. Therefore, we can avoid the unlimited accumulation and determine whether the results are already converged or not. In our implementation, 𝜃 is ten degrees. Figure 3.10 shows the comparison of SSAO with and without employing cached information.

Figure 3.10: Both left and right SSAO use 32 samples per pixel. In the left image, SSAO without accumulating cached information has the noisy artifacts. In the right image, SSAO with accumulating cached information provides less noisy artifacts and maintains the details of the objects.

3.3 Indirect Illumination

Consider Figure 3.11, both direct and indirect illumination are shown. Indirect illumination can increase the scene luminance for improving the image quality.

Unfortunately, the computational cost of it is too expensive for real-time applications.

However, in general case, we can only consider one-bounce indirect illumination.

Reflective Shadow Maps [3] is a technique for efficiently generating one-bounce indirect illumination. In the following, we describe how to implement Reflective Shadow Maps and how to refine the results of it by taking the cached information into account.

Figure 3.11: Surface point p is illuminated by actual light source (marked green arrow).Surface point q is illuminated by reflected light source (marked red arrow).

3.3.1 Reflective Shadow Maps

Reflective Shadow Maps (RSMs) [3] is a virtual point light (VPL) based method.

It uses the observation that all one-bounce indirect illumination is caused by surfaces that are directly illuminated from the actual light source. Therefore, each pixel in the shadow map can be considered as a VPL that illuminates the scene. We need to extend a standard shadow map by storing more lighting information such as world space positions, world space normal vectors, depth and reflected radiant flux as shown in

Figure 3.12. We then use the following equation to calculate the contribution of a VPL q at a surface point p.

𝐼_𝑝 = Φ_𝑞 max (0 , 𝑁_𝑝 ∙𝐷)max (0, 𝑁_𝑞 ∙(−𝐷))

‖𝑞−𝑝‖²

(9)

D = ^𝑞−𝑝

‖𝑞−𝑝‖ (10)

Where 𝐼_𝑝 denotes the irradiance at a surface point p, 𝑁_𝑝 is a normal vector of p, 𝑁_𝑞 is a normal vector of VPL q, Φ_𝑞 is the radiant flux of q and D represents a unit vector between p and q.

Figure 3.12: From left to right: world space positions, world space normal vectors, depth information, and reflected radiant flux.

This question depicts how can we gather the VPLs to compute indirect illumination. As the standard shadow mapping, we can project the pixel p stored in G-buffer into the reflective shadow maps and do the sampling to gather the VPLs as shown in Figure 3.13 (a). The original RSMs uses a fixed sampling pattern which can be generated by simple random sampling or Poisson Disk sampling. Figure 3.13 (b)

shows our sampling pattern. Note that we multiply the samples with their squared distance because of the varying sampling density.

Figure 3.13: Project the pixel p into RSMs for gathering the VPLs (a). The visualization of the sampling pattern (b).

Since original RSMs ignores the occlusion for the indirect light sources, in some special cases it may lead to severe wrong results. To approximate visibilities of VPLs, we can use the world space positions and depth information stored in G-buffer and RSMs. Like [7] that samples visibility rays against the scene depth buffer, we first get the unit vector D between the surface point p and VPL q according to Eq.10. Next, we do the same sampling strategy according to Eq. 8 along the direction of vector D. Let 𝑠_𝑖 is the i-th tested sample and 𝑍_𝑡(𝑠_𝑖) represents the depth value of the sample 𝑠_𝑖 in eyes space. Finally, as the standard shadow mapping, we can determine whether the VPL q contributes the radiance on p or not by comparing the depth value 𝑍_𝑡 and 𝑍_𝐺 which is the corresponding depth value stored in G-buffer. If 𝑍_𝑡 > 𝑍_𝐺 , the VPL q should be rejected, and the visibility weighting factor V(𝑞) is zero. By multiplying Eq. 9 and weighting factor V(𝑞) together, we can obtain the enhanced results that have better visual perception as shown in Figure 3.14 and Figure 3.15. Note that this method is not the correct calculation of visibility, and it should introduce the performance overhead. In our experiments, it approximately halves the frame rate.

(a) (b)

Figure 3.14: Top row: The view of light source (left). The view of camera (right).

Note that the green points are VPLs which provide too much radiance toward the floor point (marked orange) and red points are tested samples. Bottom row: The rendered scene without the visibility test (left). The rendered scene with the visibility test (right).

Figure 3.15: The middle wall is not present in the view of the light source. There are severe light-bleeding artifacts without the visibility test for VPLs (left). The light-bleeding artifacts disappear with the visibility test for VPLs (right)

3.3.2 Exploiting Cached Information

Since the computational cost of indirect illumination is very expensive, we need to limit the number of VPL samples. In our experiments, the number of samples that is no more than 512 is suitable for real-time applications. However, if the samples are few, the banding artifacts will appear as shown in Figure 3.16.

Fortunately, we can remove this disgusting artifact by accumulating the cached information. First, we get the past value of one-bounce indirect illumination 𝐼_𝑡−1 by using reverse re-projection. Next, we always have to rotate the sampling pattern to generate the current indirect illumination 𝐼_𝑡. We then calculate the difference between 𝐼_𝑡−1and 𝐼_𝑡 as Eq. 4. Finally, we follow the Eq. 5 to combine 𝐼_𝑡−1 with 𝐼_𝑡. Therefore, we can get the final converged results. Figure 3.17 shows the comparison of indirect illumination with and without accumulating the cached information.

Figure 3.16: Direct illumination with indirect illumination. From left to right:

Indirect illumination with 512 and 1024 VPLs. The left image is darker than the right image and the banding artifacts in the left image are more apparent than the right image.

Figure 3.17: The comparison of indirect illumination with 512 VPLs. The banding artifacts are apparent without accumulating cached information (left). The result is smooth by accumulating cached information (right).

3.4 Fake Soft Shadow

Shadow is an important effect for realistic image synthesis. Nowadays, shadow mapping which creates the shadows by comparing the depth of the pixels in eye’s view with the corresponding depth value in the shadow map is a popular real-time algorithm for dynamic hard shadows. Unfortunately, it always suffers the perspective aliasing and the projective aliasing problems that influence the visual perception in the images. A feasible solution is using Percentage Closer Filtering [10] (PCF). This technique takes more than one sample around each

在文檔中暫存資訊在即時屏幕空間全域照明技術之應用 (頁 18-0)