Literature Review - 計算攝影學之區塊匹配加速器

2.1 Image Denoising

The research for efficient image denoising methods is still a valid challenge at the crossing of functional analysis and statistics. Despite of the sophistication of the recently proposed methods, most algorithms have not yet attained a desirable level of applicability. All show an outstanding performance when the image model corresponds to the assumptions of the method but fail in general and create artifacts or remove image fine structures. They only have good quality on some specified kinds of image.

Image denoising aims to recover the original image signal from its observed noisy image, which can be formulated as y = x + n, where x is the desired clean image, n represents the additive noise, and y is the corrupted observation. To solve such ill-posed problem, it is critical to exploit the prior knowledge that characterizes the statistical features of the images. Early regularization methods mainly use the local correlation among image pixels. A typical example of this kind is the total variation (TV)

regularization [1]. It depicts the feature that natural images are smooth in most regions, and can be seen as a kind of sparsity regularization in gradient domain. Typical sparsity regularization models are based on the assumption that images can be sparsely

presented in transform domain (discrete cosine transform (DCT) or discrete wavelet transform (DWT) ) These transforms are orthogonal or nearly orthogonal, and use fixed transform basis. The non-local means (NLM) denoising [2] that exploits the

repetitiveness of patch patterns in image signals, extensive research works are motivated to take advantage of non-local similarity for image restoration tasks, and achieve

superior performance over local regularization. Among these non-local similarity based schemes, the famous benchmark block matching 3-D (BM3D) [3] is basically a

doi:10.6342/NTU201700228

combination of DCT coefficient thresholding and nonlocal block matching. It stacks similar blocks of a reference patch into a three dimensional (3D) block, on which 3D transform is applied, and hard thresholding (in the first step) or Wiener filtering (in the second step) is performed. This benchmark with “Non-local Sparse Models for Image Restoration (LSSC)[4] ”, “From Learning Models of Natural Image Patches to Whole Image Restoration (EPLL)[5]” are three strong foundations in image denoising. LSSC utilizes both sparse dictionary and image self-similarity while EPLL makes use of external dictionary to train the GMM model for image denoising.

Natural images often have many repetitive local pattern, even a local patch can have many similar patches across all scale of the whole image. This important statistical information that stems from being applied in Non-local means stimulates the

development of many image denoising algorithms such as BM3D[3] ,CSR[6] which gathers similar patches into cluster and approximate them to the centroid, “Separating Signal from Noise using Patch Recurrence Across Scales[7]”which makes use of the information that similar patches occur across different resolution of an image,

WNNM[8] which performs block matching and weighted nuclear norm minimization and so on.

2.2 Super Resolution

We just consider about the “Single Image Super-Resolution” which aims to generate the high-resolution (HR) images from a low-resolution (LR) image input.

While domain-specific SISR algorithms focus on specific classes of images such as faces, scenes, and graphics artwork, general SISR algorithms are developed for all kinds of images where the priors are typically based on primitive image properties such as

edges and segments. In order to evaluate the performance of a SISR algorithm, human subject studies or ground truth images are utilized.

Due to different types of priors for SISR, we simply categorize it into several types of methods.

Prediction Models.

SISR algorithms in this category generate high-resolution images from low-resolution inputs through a predefined mathematical formula without training any data.

Interpolation-based methods such as bilinear, bicubic, and Lanczos generate high-resolution pixel values by weighted averaging neighboring low-high-resolution pixel. Since interpolated intensities are locally similar to neighboring pixels, these algorithms achieve good smooth regions but insufficient large gradients along edges and at high-frequency regions.

Edge Based Methods.

Edges are important primitive image structures that play a main role in visual perception. Some SISR algorithms have been proposed to learn priors from edge features for reconstructing high-resolution images. Various edge features have been proposed such as the depth and width of an edge [9] or the parameter of a gradient profile [10]. Since the priors are primarily learned from edges, the reconstructed high-resolution images have high-quality edges with proper sharpness and limited artifacts.

However, edge priors are less effective for modeling other high-frequency structures such as textures.

Image Statistical Methods.

doi:10.6342/NTU201700228

Various image properties can be exploited as priors to predict HR images from LR images. The sparsity property of large gradients in generic images is exploited in [11] to reduce the computational load and in [12] to regularize the low-resolution input images.

Patch (Example) Based Methods.

Given a pair of low-resolution and high-resolution training images, patches can be cropped from the training data to learn mapping functions. The exemplar patches can be generated from external datasets [13,14], the input image itself [15,16], or combined sources [17]. Various learning methods of the mapping functions have been proposed such as weighted average, kernel regression, support vector regression, Gaussian process regression, and sparse dictionary representation. In addition to equally averaging overlapped patches, several methods for blending overlapped pixels have been proposed including weighted averaging [16,17].

2.3 Patch Matching

Patch matching is the problem of finding the nearest patch in a source image Z for each patch in a target image X under a matching criteria (L1, L2 norm etc.).In the past, it was customary to solve it by regarding patch as an independent sample and build hierarchical data structures such as Locality Sensitive Hashing (LSH) or KD-trees.

Recently, a novel solution, named PatchMatch[18,19], proved to outperform those methods by up to two orders of magnitude. This solution relieson the fact that ifwe find a pair of similar patches in two images, thentheir neighbors in the relative image plane are also likely to be similar. It utilizes a random search to seed thepatch matches and iterates for a small number of times to propagate good matches. However, PatchMatchis

not as accurate as LSH or KD-trees and increasing itsaccuracy requires more iterations which cost much more time. Stimulated by PatchMatch, a new method, Coherency Sensitive Hashing (CSH)[20], replaces the random search step of PatchMatch with a hashing scheme, similar to the one used in LSH. As a result, information is propagated to nearby patches in the relative image plane, as is done in PatchMatch, and to similar patches that were hashed to the same value. In other words, it propagates information to patches that are either close in the image plane or are similar in appearance. In addition, [21] simply uses partial sum reuse strategy to reduce the complexity from O(r²) to O(r) and extend it to high dimensions which can be applied to video. It computes the several next search range in parallel to reuse the partial sum and applies the algorithm to image denoising, image editing and so on. It shows that it is faster than some approximated methods when the patch size is not large. [22] utilizes the information of multiple resolution. It performs patch matching from low resolution then propagates to high resolution for reducing the time. [23] combines PatchMatch-based random search and edge-aware image filtering to get better accuracy and higher speed. [24] even proposes a novel patch representation, Needle, which is image pyramid of the original image to get better performance by replacing the Patchmatch with Needle-patch when plugged into some state of the art patch-based algorithms.

2.4 Hardware Accelerators

In the past, a number of hardware architecture of block matching was proposed for motion estimation in video coding. [25] uses 1D systolic array which allows sequential inputs but perform parallel processing. [26] uses 2D modular systolic array to generate a motion vector for every reference block in raster scan order.

Recently, [27] and [28] utilize partial sum reuse to accelerate the patchmatch

doi:10.6342/NTU201700228

process in image inpainting and video upscaling respectively. [27] uses cache-based memory to reuse data and puts data in special way to reduce bandwidth. [28] performs patch matching in next four search range in parallel and shift register to process macro blocks. [29] proposes a GPU-friendly work on the combination of tiling, hierarchical clustering using k-means, and query within a single cluster. By dividing image into many non-overlapping patches, sub-sample some patches from each tile to process k-means. Next, only find the candidates in one cluster not all the search range. Finally, apply collaborative filtering to process part of overlapping patches to calculate the results. [30] designs an energy-efficient k-nearest-neighbor (kNN) accelerator by using adaptive precision strategy. In other words, it processes 2 bits per time not the whole 8 bits and uses the current minimum to eliminate impossible candidates to save the power and area.

在文檔中計算攝影學之區塊匹配加速器 (頁 20-26)