Chapter 1 Introduction
1.3 Thesis Organization
This thesis is structured as follows: First we review the related work about screen space ambient occlusion, VPLs based global illumination and temporal coherence techniques in Chapter 2. In Section 3.1, we explain why we can get the converged results while taking into account changing the sampling patterns and using confidence value to accumulate the data between current and past frame. Then, we depict how to combine the idea of exploiting Temporal Coherence with the generations of screen space global illumination (e.g., SSAO, RSMs) in Section 3.2 and Section 3.3. We show that using the cached shadow values for rendering anti-aliased shadows and soft shadows in Section 3.4. Chapter 4 provides our implementation and the results.
Conclusion and future work are discussed in Chapter 5.
5 illumination that was originally developed by Landis [6]. Shadowing of ambient light is referred to as ambient occlusion. It gives perceptual clues of curvature and enhances the geometry features of the scene. Unfortunately, computing the ambient occlusion requires casting a large number of rays on each surface point. This is too expensive for real-time applications. Therefore, Bunnell [1] computes the ambient occlusion by using disk-based occluders as show in Figure 2.1 (a) and produces high quality results. However, it has a huge pre-processing step for dynamic objects, and the per-vertex occlusion algorithm reveals a linear interpolation artifact. Crytek [7]
first developed a screen space ambient occlusion approach used in the PC game Crysis. It sparsely samples visibility rays against the scene depth buffer and produces
plausible results. The main drawback of this method is the incorrect gray flat surfaces as shown in Figure 2.1 (b). Shanmugam and Arikan [15] describe an approach splitting the ambient occlusion problem into high and low frequency part. One uses an image-space method on near objects. The other generates coarse AO using spherical proxies on distant occluder. The results of their algorithm are quite well, but
6
performance is bad due to the two separate computation parts. Bavoil et al. [2]
proposed Image-Space Horizon-Based Ambient Occlusion which computes AO by comparing the horizon angles between sampled point and surface point. It works more effectively and yields nice results.
Figure 2.1 (a) Disk-shaped elements, Bunnell [1] (b) Grayish flat surfaces
2.2 VPL Methods
Standard global illumination methods such as path tracing, radiosity and photon mapping can give full and accurate global illumination. But the algorithms require minutes or hours to render a realistic scene. Keller [5] introduced Instant Radiosity which can approximate the indirect illumination of a scene by using a set of virtual point lights (VPLs). This method is popular because it produces no noise images and drastically reduces the computational cost. However, it is expensive to trace the light path for determining a set of VPLs as well as calculate visibility by creating an individual shadow map for each VPL. Consequently, In order to resolve these problems, many VPL based methods are developed. Dachsbacher and Stamminger [3]
extend a standard shadow map to a reflective shadow map storing the lighting information. All pixels of a shadow map are considered as indirect light sources. This
7
novel idea efficiently addresses the production of virtual point lights and offers the approximating indirect illumination at interactive frame rates. In addition, to account for indirect shadows, Ritschel et al. [11] use a crude point-based representation of geometry to accelerate the shadow map generation for each VPL. Yang et al. [16]
construct the per-pixel linked lists to implement real-time indirect shadows on the GPU. But their method requires large memory to store all the fragments of the scene objects. Dong et al. [4] cluster the VPLs into a small number of virtual area lights (VALs), hence the number of the shadow maps can be decreased. The above methods have demonstrated that accurate visibility between VPLs and surface points is not necessary for indirect illumination because human visual is slightly sensitivity to the correctness of indirect shadows.
Although the above screen space global illumination algorithms (e.g., Mittring[7], Dachsbacher and Stamminger[3] ) have several advantages such as computation not depending on scene complexity, avoidance of the execution for hidden surfaces and simple implementation, there are a lot of potential for quality and speed improvements by exploiting temporal coherence. In the following section, we briefly introduce a variety of applications using the concept of temporal coherence in rendering.
2.3 Temporal Coherence Methods in Rendering
Temporal coherence (TC) describes the correlation or the predictable relationship
of contents between adjacent moments in time. It has been around for computer graphics. By taking advantage of it, the computation of the rendering tasks is alleviated and the quality of the images is improved. For example, in general, there
8
are very little difference in the shading computation between two consecutive frames.
Therefore, re-computing everything each frame is wasteful. The key to utilizing TC is how the previous computed information are stored and reused. Scherzer et al. [12] and Nehab et al. [8] independently proposed a technique called Reverse Re-projection Cache . The idea is to store the previous shading results in a screen buffer and project the current pixel to the prior frame. By comparing the stored depth with current depth, it can decide whether the pixel was visible in the past frame (detail in section 3.1).
This special framework has been used for a variety of rendering applications, like anti-aliased hard shadows [8,12] , real-time soft shadows [13,14] , motion blur and stereoscopic rendering [8]. Our approach is inspired by their concept of reusing data and spreading the computation over several frames.
In summary, our algorithm uses the reflective shadow maps to simulate one bounce indirect illumination and the screen space ambient occlusion to enhance the realistic of the scene. By taking up the idea of reverse re-projection cache, we can save redundant re-calculation and provide the high quality images.
9
Chapter 3 Algorithm
In this thesis, we propose a novel idea of reusing cached information of previously rendered frames to enhance the screen space global illumination algorithms such as Reflective Shadow Maps [3] and standard screen space ambient occlusion [7]. In addition, our algorithm is based on deferred shading pipeline. By taking advantages of temporal coherence, we display the high quality images which consider single bounce indirect illumination, ambient occlusion and soft shadows in real time.
1. In section 3.1, we describe in detail how to implement reverse re-projection cache [8,12] for getting the available data stored in history cache and how to accumulate the cached information by taking into account the confidence of these information.
2. In section 3.2, we depict the simple implementation of screen space ambient
4. In section 3.4, we also show that we can combine Percentage Closer Filtering [10]
with our caching strategy to increase the shadow accuracy and to render the fake soft shadows.
10
3.1 Temporal Coherence
Various rendering applications exploit temporal coherence to achieve high quality images at a lower cost. When using temporal coherence, there are two important decisions. One is how the previously computed data are stored. The other is how the stored data are efficiently retrieved. As long as we resolve these problems, we can obtain the benefit from temporal coherence.
3.1.1 Reverse Re-projection Cache
In order to reuse available information, we follow the reverse re-projection cache [8,12]. We first define a buffer called history cache similar to Nehab et al. [8] and Scherzer et al. [12]. This buffer is viewport-sized and stores the required information at visible surface points in previous frame. Fortunately, on modern graphics hardware, we can represent the buffer as the textures as shown in Figure 1.2 (d) and store it in the texture memory. Therefore, history cache is efficiently maintained on the GPU.
In addition, we require some necessary data for determining whether the current pixel was visible in the earlier frame. Unlike Nehab et al. [8] and Scherzer et al. [12]
store the non-linear depth for the new and the old frame in the buffer, we store the world-space positions in the history cache and do the re-projection in fragment shader.
Formally, let 𝑊𝑡 denote the world-space position of the pixel stored in G-buffer at time step t as shown in Figure 1.2 (a). Let 𝑃𝑡−1 and 𝑉𝑡−1 symbolize the projection matrix and the view matrix at time step t-1. Consequently, for static geometry, we can do the following transformation to get the clip-space coordinates 𝐶𝑡−1 at time step t-1:
𝐶𝑡−1 = 𝑃𝑡−1 ∗ 𝑉𝑡−1 ∗ 𝑊𝑡 (1)
11
Then we do the perspective division to get the normalized device coordinates 𝑁𝑡−1. In order to get the correct texture coordinates 𝑡𝑒𝑥 𝑡−1, we should scale the 𝑁𝑡−1 to the range [0, 1]. Here, we define 𝐻𝑡(𝑡𝑒𝑥𝑡) as the buffer values of the texel at time step t. Therefore, we can get the correspondent world-space position at time step t-1 stored in the history cache :
𝑊𝑡−1 = 𝐻𝑡−1(𝑡𝑒𝑥𝑡−1) (2)
Finally, we have to detect dis-occlusions for deciding whether the pixel at time step t was visible at time step t-1. Accordingly, we compute the Euclidean distance between 𝑊𝑡 and 𝑊𝑡−1 and compare it with a threshold 𝜀 :
‖𝑊𝑡 − 𝑊𝑡−1‖ ≤ 𝜀 (3)
If the distance is greater than a threshold, the pixel was not presented in the last frame, and there are no previous data can be safely reused. Note that the lower the threshold is, the more invisible pixels appear (see Figure 3.1 ).
Figure 3.1: Invisible pixel is shown in red. The left image is before a translation, and the middle image is after a translation to left with 𝜀= 0.1. The right image is after a translation to left with 𝜀=0.01. The more invisible pixels will impact the performance.
12
3.1.2 Accumulation with Confidence Value
In spite of avoiding the execution on the hidden surfaces, the screen space global illumination techniques still have some bottlenecks in their sampling strategy (e.g., Mittring [7] ; Dachsbacher and Stamminger [3] ). For example, in order to increase the frame rates, they usually reduce the number of samples. Thus, it brings the banding artifact as shown in Figure 3.2 (a). The general way to alleviate this artifact is using the various sampling pattern at each pixel. But it also introduces a lot of noise in the picture as shown in Figure 3.2 (b). A common method to remove the noise is to blur the image, but it will lose the details of the scene as shown in Figure 3.2 (c).
Figure 3.2: (a) SSAO with a single sampling pattern. (b) SSAO with the various sampling patterns. (c) SSAO with Gaussian filter.
The simple way to remove the above drawbacks is using a lot of samples, but it is too expensive for interactive applications. Fortunately, since we can efficiently get the previous data by using the reverse re-projection technique shown in Section 3.1.1.
We may use these cached information well for improving the quality of the image.
Consequently, our goal is spreading the cost of sampling. We first rotate a sampling pattern which has a small number of samples and jitter the sampled positions to generate slightly different result stored in history cache of each frame. Secondly, by
a b c
13
recursively accumulating the results, we can get the high quality image. The key is how to accumulate the cached information. Therefore, we take the confidence of the previous solution into account, which can be motivated as follows. The various sampling patterns may capture the difference information. We can estimate how confident the value from previous frame is by calculating the difference between the cached result at time step t-1 and the current result as follows :
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 1 − 𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑒( ‖𝑉𝑡−1 − 𝑉𝑡‖ )
1+𝑏𝑖𝑎𝑠 (4)
Where 𝑉𝑡−1 is the per-pixel value which is obtained by reverse re-projection technique, and saturate means clamping the specified value within the range of 0 to 1.
The value of bias is user-specified for avoiding zero condition of the confidence and adjusted to your applications to achieve the quality you want. In the other words, this confidence value tells us how much information the cached result did not have. While
we complement the difference over several frames, we would forecast the final result without variance. Thus, we apply the following operation:
𝑉𝑝= (𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒)𝑉𝑡−1 + (1 − 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒)𝑉𝑡 (5)
We take the confidence value to be a weighting factor for estimating how much percentage of the value between two consecutive frames we can take, and 𝑉𝑝 means the predicted value. Note that because of taking the confidence value per pixel to be a weighting factor, we can avoid spending time to find the acceptable value for the weight. Figure 3.3 depicts the flowchart of our amortized sampling method.
14
Figure 3.3: The top row shows various sampling patterns in each frame. The bottom row shows the correspondent results.
15
3.2 Ambient Occlusion
Ambient Occlusion (AO) is a popular method for approximating global illumination. It gives perceptual clues of depth and curvature, and it also enhances the geometry features of the scene. For e conomic reason, we need simpler implementation and higher performance. Therefore, Screen Space Ambient Occlusion (SSAO) was published by Mittring [7]. This algorithm uses only depth information and samples the 3D space around visible point c as shown in Figure 3.4. Because of ignoring the geometric factor, there are some drawbacks that we have to overcome as shown in Figure 3.5. Bavoil et al. [2] use depth and normal information. They consider the depth image as a height field. By comparing the horizon angle between the surface points and the samples, they implicitly take geometric factor into account.
They also use the quadratic falloff function to attenuate the occlusion with the distance for all samples. Both Mittring [7] and Bavoil et al. [2] randomly rotate the sampling directions per pixel to remove the banding artifacts. Our SSAO algorithm is inspired by the Mittring [7] but also the Bavoil et al. [2].
Figure 3.4: Half of the samples are inside the wall and half outside (left). Three quarters of the samples are inside the wall (middle). A quarter of the samples are inside the wall (right).
16
Figure 3.5: The flat surfaces are gray color (red block).The edges appear brighter (green block).The white halos are around the edges of the objects (blue block).
3.2.1 SSAO Generation
We first need the two world-space information taken from eye’s G-buffer. One is world-space positions, and the other is world-space normal vectors as shown in Figure 3.6. Moreover, we can observe the following condition. For every visible point p, if the sampled point q on the occluding surface is close to it, there are fewer photons around it as shown in Figure 3.7. We also have to add the attenuation parameter which is important to avoid common SSAO artifact on the border of an object. Consequently, the ambient occlusion AO at a point p with a surface normal 𝑛⃗ is :
AO(𝑝, 𝑛⃗ ) = 1
𝑘
∑
max (𝑛 ⃗⃗⃗ ∙ 𝑝𝑞⃗⃗⃗⃗⃗⃗⃗ ,0)𝑖 D(𝑑)𝑘𝑖=1
(6) D(𝑑) = 1.0 + a𝑑 + b𝑑2 (7)
17
Where 𝑞𝑖 means the i-th sample around the surface point p, D denotes the attenuation function, constant 1.0 is used to avoid singularity and then 𝑑 is the distance between 𝑞𝑖 and p. Besides, a and b are user-specified parameters to control the effect of the attenuation with distance.
Figure 3.6: The world-space positions in G-buffer (left). The world- space normal vectors in G-buffer (right).
Figure 3.7: The sample q does not occlude the surface point p, because the dot product of the surface normal and the vector 𝑝𝑞⃗⃗⃗⃗ is zero in case (a). The surface point p is occluded, because the dot product is not zero in case (b).
18
In order to achieve higher performance, we limit the number of samples and use the regular sampling directions per pixel as shown in Figure 3.8 (a). Furthermore, in order to avoid the banding artifact, we should perturb the sampling pattern per pixel in screen space by using a random 2D-vectors texture that discards the z component as shown in Figure 3.8 (b). In general, we can say that the samples that are close to the surface point p in world space are also close in the screen space and contribute more occlusion to surface point p as shown in Figure 3.8 (c). Consequently, according to the above reasons, we apply the technique that is used to partition the view frustum proposed by Zhang et al. [17] as follows:
𝑆𝑖 =
𝑛𝑒𝑎𝑟 (
𝑛𝑒𝑎𝑟𝑓𝑎𝑟)
𝑖 𝑚⁄(8)
Where 𝑆𝑖 is the i-th sample around the surface point p per direction, 𝑚 denotes the number of samples per direction, 𝑛𝑒𝑎𝑟 is the minimum distance from surface point p in screen space, 𝑓𝑎𝑟 is the maximum distance from surface point p in screen space, and |𝑓𝑎𝑟 – 𝑛𝑒𝑎𝑟| is the sampling range in screen space. Therefore according to equation 8, we can construct a better sampling pattern.
Figure 3.8: Our initial sampling directions (a). Random vector texture for perturbing the sampling pattern per pixel (b). the sample q1 is close to surface point p in world space as well as in screen space. Otherwise, q2 is far from surface point p in world space as well as in screen space (c).
(a) (b) (c)
19
Finally, we have to blur the image for reducing the noise as another image-space ambient occlusion technique as shown in Figure 3.9. However, the operation of blurring should impact the performance. In Section 3.2.2 we show that we can use the cached information to remove this redundancy.
Figure 3.9: Before blurring, there is a lot of noise in left image. After blurring, the details of the objects are lost in right image.
3.2.2 Exploiting Cached Information
In order to get better results of screen space ambient occlusion, we require a lot of samples that is an unwise choice for real-time applications. Therefore, we can take up the idea of amortizing the cost of sampling (In section 3.1.2). First, we use the reverse re-projection technique (In Section 3.1.1) to get the past value of ambient occlusion 𝐴𝑂𝑡−1. Second, before perturbing the sampling pattern per pixel, we rotate the sampling directions with fixed angle 𝜃 and slightly scale the sampling range. We then compute the current ambient occlusion 𝐴𝑂𝑡. Third, we calculate the difference between past and current ambient occlusion according to Eq. 4. Finally, we use Eq. 5 to accumulate 𝐴𝑂𝑡−1 and 𝐴𝑂𝑡 by taking the difference value for weighting factor. Note that we should specify the value to limit the number of times to
20
rotate the sampling pattern. Therefore, we can avoid the unlimited accumulation and determine whether the results are already converged or not. In our implementation, 𝜃 is ten degrees. Figure 3.10 shows the comparison of SSAO with and without employing cached information.
Figure 3.10: Both left and right SSAO use 32 samples per pixel. In the left image, SSAO without accumulating cached information has the noisy artifacts. In the right image, SSAO with accumulating cached information provides less noisy artifacts and maintains the details of the objects.
21
3.3 Indirect Illumination
Consider Figure 3.11, both direct and indirect illumination are shown. Indirect illumination can increase the scene luminance for improving the image quality.
Unfortunately, the computational cost of it is too expensive for real-time applications.
However, in general case, we can only consider one-bounce indirect illumination.
Reflective Shadow Maps [3] is a technique for efficiently generating one-bounce indirect illumination. In the following, we describe how to implement Reflective Shadow Maps and how to refine the results of it by taking the cached information into account.
Figure 3.11: Surface point p is illuminated by actual light source (marked green arrow).Surface point q is illuminated by reflected light source (marked red arrow).
3.3.1 Reflective Shadow Maps
Reflective Shadow Maps (RSMs) [3] is a virtual point light (VPL) based method.
It uses the observation that all one-bounce indirect illumination is caused by surfaces that are directly illuminated from the actual light source. Therefore, each pixel in the shadow map can be considered as a VPL that illuminates the scene. We need to extend a standard shadow map by storing more lighting information such as world space positions, world space normal vectors, depth and reflected radiant flux as shown in
22
Figure 3.12. We then use the following equation to calculate the contribution of a VPL q at a surface point p.
𝐼𝑝 = Φ𝑞 max (0 , 𝑁𝑝 ∙𝐷)max (0, 𝑁𝑞 ∙(−𝐷))
‖𝑞−𝑝‖2
(9)
D = 𝑞−𝑝
‖𝑞−𝑝‖ (10)
Where 𝐼𝑝 denotes the irradiance at a surface point p, 𝑁𝑝 is a normal vector of p, 𝑁𝑞 is a normal vector of VPL q, Φ𝑞 is the radiant flux of q and D represents a unit vector between p and q.
Where 𝐼𝑝 denotes the irradiance at a surface point p, 𝑁𝑝 is a normal vector of p, 𝑁𝑞 is a normal vector of VPL q, Φ𝑞 is the radiant flux of q and D represents a unit vector between p and q.