DTLN: A Deep Two-branch Lightening Network with Saturation Adjustment for Low-light Enhancement

(1)

DTLN: A Deep Two-branch Lightening Network with Saturation Adjustment for Low-light Enhancement

1 Yu-Wei Chen (陳昱瑋),¹ Soo-Chang Pei (貝蘇章),^2,*Chiou-Shann Fuh (傅楸善)

1 Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan

2Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

*E-mail: r09942066@ntu.edu.tw peisc@ntu.edu.tw fuh@csie.ntu.edu.tw

ABSTRACT

Single-image Low-light enhancement is widely studied for long time but still be a challenging task. In spite of many prior works achieve excellent performance, they do not consider noise for low-light image, which is common in real world low-light image. Another problem is lack of data, some prior work synthesizes low-light image using public dataset, however, since the public dataset do not serve for low-light enhancement specifically, there contain many unsuitable under-exposure ground truth images. In this work, we proposed a novel Deep Two- branch Lightening Network (DTLN), to enhance low- light image and de-noising simultaneous in data-driven approach, we also propose a novel criterion to remove unsuitable ground truth in public dataset while synthetize low-light dataset. We compare proposed method with many other methods, experiment show our method outperform others under subjective and objective metric.

1. INTRODUCTION

Captured in low-light environment usually result under- exposure, low contrast, and noisy images, that become troublesome in high-level vision task, e.g. object detection, 3D reconstruction, and so on. In spite of low- light enhancement is studied for long time, there are still plenty of room to improve. To obtain clear images from low-light image, many approaches and theorems have been developed. For example, Histogram Equalization (HE)-based and Retinex theorem are widely adopted in many works. Histogram equalization-based method aims to increase contrast under uniform distribution assumption, some well-known methods such as Adaptive Histogram Equalization (AHE) [1], Contrast-Limited Adaptive Histogram Equalization (CLAHE) [2] are proposed one after another. Retinex theorem assumes the image can be decomposed into two components:

reflectance and illumination, which is similar with rod and cone cell in human vision system. Retinex-based method focuses on estimating illumination part to recover brightness and contrast. Some previous work also

Fig. 1. Unsuitable (under-exposure) ground truth examples in PASCAL VOC dataset which might be low- illumination, nighttime, or backlight images.

performs de-nosing operation on reflectance image to obtain high-quality result, for example, LIME [3].

However, there are still limitations in previous work, result in hard to recover visual-pleasing image, which captured in low-light and noisy environment. For example, the result might be under-enhanced or blur. To handle noise from real world, there are mainly two branches for research. First, de-nosing as clear as possible using synthesis Additive White Gaussian Noise (AWGN). Second branch of research focuses on realism of synthesis noise [4]. Some pervious work handle noise in low-light image through separate de-noising and enhancement, but obtain blurry result, Lv et. al [5] argue that these two operations can be perform simultaneously, to obtain blur-free result, our work also follow this manner to design the model. Another problem for some enhanced images is low saturation [6, 7]. In this work, we propose a novel saturation adjustment module based on simple but effective prior and framework [8] to improve this problem. Due to the difficulty of collecting data, lack

(2)

of data makes low-light enhancement in data-driven approach more difficult. Instead of using real-world data, numbers of papers [5, 7] work on synthetic data for large public dataset, e.g. ImageNet, PASCAL Visual Object Classes (VOC), we also work in this fashion due to several advantages, i.e. adequate data and labels for advanced high-level task. However, since the dataset does not serve for low-light enhancement specifically, there contain many under-exposure images, which show in Fig.1. Which are not suitable to be ground truth while training the neural network under supervised learning setting. In this paper, we remove these unsuitable examples in a novel approach.

To deal with the problem of low-light image enhancement with noise, we propose a novel network called Deep Two-branch Lightening Network (DTLN), utilizing Back Projection (BP), which is popular in super- resolution, we notice the work of Wang et. al. [7] is similar to ours. We also propose a novel saturation adjustment module for post-processing. Our contribution are as follows:

• Propose a novel low-light enhancement network, which utilize the back-projection and two branches manner.

• Propose a saturation adjust module for post-processing based on framework and simple but effective prior proposed by [8].

• Propose a novel approach to remove unsuitable examples from public large dataset for synthetic low- light images.

The organization of this paper are as follows, we will briefly review some related work for low-light enhancement, image de-nosing, and back projection in Section 2. Introduce the synthesis dataset in Section 3. In Section 4, we will show our low-light enhancement and saturation adjust module. Experiment detail and result will be elaborated in Section 5. Finally, a conclusion will be Section 6.

2. RELATED WORK

Single-image low-light enhancement and de-noising have been studied for long time. In this section, we will briefly review the most related methods.

2.1. Low-Light Enhancement

Low-light enhancement methods can mainly divide into two directions: traditional methods and deep learning- based methods. For traditional methods, there are two main branches of research, which are Histogram Equalization (HE)-based method and Retinex based method. HE-based aims to increase contrast under uniform distribution assumption, however, while image has block of white or black, HE will tend to over-enhance contrast and produce noisy image. Some variations are proposed to solve the drawback, such as AHE [1],

CLAHE [2] and Brightness-Preserving Dynamic Histogram Equalization (BPDHE) [9]. In another branch, Retinex methods focus on decomposing image into reflectance and illumination part, where reflectance component describes detail of scene and is illumination- invariant; illumination component describes lighting of scene. To process low-light enhancement, the first step is decomposed image into these two components, then enhance the illumination component, finally, reconstruct image using reflectance and enhanced-illumination component to obtain normal-light image. Many works adopt this approach, such as Single Scale Retinex (SSR) and its variation [10-11], LIME [3], SRIE [12].

For deep learning, there are also some works follow Retinex theorem, which use neural network to decompose image into reflectance and illumination component, for example, RetinexNet [13], DeepUpE [14], TBEFN [15]. Another view takes low-light enhancement as image restoration task. Lv et al. propose a multi-branch fusion network [16] to enhance low-light image, however this network do not deal with noise, which is common problem of low-light image, the author further proposed novel network using attention and similar multi-branch network to de-noise and enhance simultaneously [5]; Motivated by image processing software, Guo et al. [17] propose an efficient network called zero-DCE that iterative estimate curve for adjustment of low-light image; There are also some works utilize Generative Adversarial Network (GAN) , e.g. EnlightenGAN [18], which using local-global discriminator and intensity image as attention map to guide training. Wang et al. [7] borrow concept of back projection from super-resolution to construct Lightening Back Projection (LBP) block, and enhance low-light image by stacking LBP, and obtain marvelous result, however, they do not tackle noise, which is common problem in low-light image.

2.2. De-noising

De-noising is also a fundamental problem in low-level vision and develop for a long time, and can be divide into two categories, traditional and deep learning based. For traditionally methods, BM3D combine inter-patch correction method and intra-patch correction method, and achieve the state-of-art result. For deep learning approach, Noise2Noise [19], Noise2Void [20] works on AWGN, which utilize property of AWGN is zero expected value, to achieve de-noising without clear image. There is another branch of de-nosing, which focus on realism of synthesis noise, Guo et al. [21] analyze where the noising come and synthesis noise consider camera Image Signal Processing (ISP). Similar work also done by Wei et al. [4].

2.3. Back Projection

(3)

Back Projection (BP) is a popular technique in super- resolution that restores high-resolution image via back projection blocks iteratively. BP block can formulate as follows:

𝑌̂ = 𝑌̂ + 𝜆𝑈 (𝑋 − 𝐷(𝑌̂)) (1) _𝑡+1 where 𝑌̂ is super resolution image in time T + 1 and _𝑡+1 D(∙) and U(∙) present down-sampling and up-sampling operation respectively. Harris et al. [22] propose Dense Projection Unit to compose Deep Back-Projection Networks (DBPN). Where in Dense Projection Unit, low-resolution image will first up-projection to high- resolution, then down-projection back to low-resolution, the residual of reconstructed low-resolution image will input into next up-projection block, and result will add back to the first high-resolution image as output. We also adopt this manner to design our De-noising Lightening Back Projection Block (DLBP) as Lightening Back Projection (LBP) [7] done.

For low-light enhancement problem, it can be formulated as below.

𝑌 = 𝑋 + 𝛾𝑃(𝑋) − 𝑛 (2) Where Y is normal-light and noise-free image; X is low- light image which controls the lightening power of the low-light enhancement; P(∙) is enhance operator; and n denotes noise.

3. SYNTHESIS DATASET

Many prior works have constructed low-light and normal-light images pair’s dataset. LOL [13] dataset is constructed by capturing pictures with different exposure and ISO to generate normal-light image, low-light image is generated via linear degradation approximately. The SID [23] dataset using Sony and Fuji camera to capture raw image under extremely low-light environment, which is different to general low-light image enhancement researches, i.e. enhance with JPEG image.

Another method to prepare normal-light ground truth is expert-retouched, DeepUpE [14] adopt this manner and the constructed dataset contain 3000 images. SICE [24]

dataset using exposure-fusion method to obtain ground truth images with multiple-exposure sequence. However, there are still some limitation and imperfect for existed dataset, for example, they do not contain sufficient annotation for high-level vision task, e.g. object detection and image segmentation. To address this problem, prior work [7] train their model with synthesis dataset, which using PASCAL VOC 2007 public dataset to synthetize.

The public dataset does not especially serve for low-light image enhancement, hence containing many under- exposure images, which are unsuitable to be ground truth in Fig. 1. In this work, we propose an under-exposure image removal method using natural images prior.

3.1. Remove Unsuitable Ground Truth

To remove under-exposure images, we construct three criteria to filter out them, which obtain via analysis histogram, natural image prior, and expectation to well- exposure images.

1) Histogram:

To judge under-exposure image, image histogram is a good tool, many classical low-light enhancement works also conduct in this manner, e.g. HE. An under-exposure image will skew to left on histogram. Under this observation, we filter out image that 0.4 quantile of intensity is smaller than 0.157, for image value range in [0, 1].

2) Natural Image Prior:

Prior work [25] show for natural-light, blur-free image, their Bright Channel Prior (BCP) are closed to 1 for image value range in [0, 1]. To obtain high-quality images, we utilize this prior and filter out the image that quantile of BCP map is smaller than threshold. We empirically set quantile to 0.8, threshold to 0.7, and window size to 21.

3) Expectation to Well-exposure Images:

Motivated by [17], which proposed an unsupervised low-light enhancement network that design an exposure control loss to restrain under-/over-exposure image. The exposure control loss measures distance of average intensity to target well-exposure level E in local area. We change the loss function to distance measurement, which can be defined as:

𝐷𝑒= 1

𝑀∑ ∥ 𝑌𝑘− 𝐸 ∥¹ (3)

𝑀

𝑘=1

Which D_e is exposure distance to well-exposure level, M is number of non-overlapping local area with size 16 × 16, Y is average intensity of local area, and E is well- exposure level, we refer to [17] to set this value as 0.6.

We filter out the image which exposure distance is larger than 0.3.

In this work, we use training set of PASCAL VOC 2007 public dataset as synthesis dataset, randomly select 10%

as validation set, and LOL dataset [13] evaluation subset for testing.

3.2. Low-Light Image Synthesis

Two main properties of low-light image are low illumination/contrast and suffer from noise. [5, 7]

discover that combination of linear and gamma transformation can approximate the illumination and contrast degradation. The simulation process can be formula as:

𝐼𝐿𝐿𝑖 = 𝛽 × (𝛼 × 𝐼𝑁𝐿𝑖 )^𝛾, 𝑖 ∈ {𝑅, 𝐺, 𝐵} (4) where 𝐼_𝐿𝐿 is output low-light image, and 𝐼_𝑁𝐿 is input normal-light image, 𝛼 and 𝛽 is linear transformation, sampled from uniform distribution that 𝛼~𝑈(0.9, 1),

(4)

Fig. 2. Our proposed low-light enhancement and de-nosing model. The input image will first enter the pre-processing module, which form by two filters to process first step de-nosing. Then a two-branch feature extraction will perform and input to enhance part, which stack by three proposed DLBP block. Finally, the saturation adjustment will perform for better visual result.

𝛽~𝑈(0.5, 1), γ is gamma transformation factor, where 𝛾~𝑈(1.5, 5). We adopt this process to synthetize low- light image.

For noise synthesis, we follow [5, 21, 26] to generate the Gaussian-Poisson mixed noise model and consider the in-camera image processing pipeline to synthesis real world noise, which can be formulated as:

𝐼_{𝑛𝑜𝑖𝑠𝑒}= 𝑓 (𝑀⁻¹(𝑃 (𝑀(𝑓⁻¹(𝐼_{𝑐𝑙𝑒𝑎𝑟}))) + 𝑁_𝐺)) (5) Where 𝑃(∙) means adding Poisson noise with variance 𝜎_𝑝², 𝑁_𝐺 model AWGN with variance 𝜎_𝑔², 𝑓(∙) represent camera response function, 𝑀⁻¹(∙) is demosaicing function. We also do not consider image compression in this work as [5], and follow same parameter setting.

4. PROPOSED METHOD

In the proposed low-light enhancement model, in Fig. 2, we first adopt a two-branch blurring filter as preprocessing to remove noise in first step. One branch is linear filter, which is motivated by Gaussian filter, and another branch is bilateral filter, then a learnable variable alpha will be used to blend two blurry images.

After pre-processing, the image will input to feature extraction block, which also adopt in two-branch manner.

We used Feature Aggregation block (FA) modified by [7]

to feature select. Then iterative enhance using De-nosing Lightening Back Projection block (DLBP), finally used our proposed saturation adjust module to obtain visual pleasing result. Detail of model is described below.

4.1. Image Preprocessing

Image preprocessing aims to coarse de-noising in first step. Noise in low-light image are usually belongs to shot noise [5], to handle shot noise, we discover that first apply some blur filters, such as bilateral filter, Gaussian filter, median filter can roughly de-nosing and benefit the enhancement process. In our experiment, alpha blending result of bilateral filtering and Gaussian filtering as preprocessing achieve best performance. Thus, we use two-branch blur filtering as image preprocessing, which one is linear filter, and the other one is non-linear filtering,

i.e. bilateral filter with two trainable parameters, and use a learnable parameter alpha to learn the best weighting to blend these two images. We empirically initial linear filter weight as Gaussian filter and 0.117 and 0.098 for σ_space and σ_color for bilateral filter.

4.2. Feature Selection

After preprocessing, we adopt two branch manner, one branch stack two convolution layers and the other branch used two dilated convolution layers, which aim to enlarge receptive field, to process feature extraction. Then we apply Feature Aggregation (FA) block modified by [7] to process feature selection. The modified FA block are shown in Fig.3. We remove the first convolution layers of original FA block [7], since this operation rebuild the input feature and lose the meaning of feature selection for input feature. Input feature will be squeeze into1 × 1 × 𝐶, where C is channel number of input feature, then two Fully Connected layer (FC) will be applied to encode the squeeze vector as weighting vector. Finally, weighting will expand to shape of input feature and perform element-wise product to obtain weighted feature. We use a convolution layer to aggregate weighted feature into target channel number of final features as [7].

Fig. 3. Modified Feature Aggregation (FA) block structure, four-input for example. We remove the first convolution layer of original FA block [7].

4.3. DLBP

Wang et al. [7] utilize back-projection theory to proposed Lightening Back Projection Block (LBP) for low-light image enhancement, which contains two lighten blocks

(5)

Fig. 5. Our proposed lighten block and darken, which is utilize share offset estimator to constrain encoding to illumination latent space. We do not use activation function in offset estimator that output offset be negative or positive.

and one darken block. The low-light image 𝑋 will input into first lighten block to estimate normal-light image𝑌̃.

Next, darken block predict the low-light version of enhanced normal-light 𝑋̃ , finally, the different of 𝑋 and 𝑋̃ will input into second Lighten block and add back to 𝑌̃ as LBP output. The procedure can formula as:

𝑌̂ = 𝜆2𝐿1(𝑋) + 𝐿2(𝐷(𝐿₁(𝑋)) − 𝜆1𝑋) (6) Where 𝜆₁ and 𝜆₂ are two weight to balance update. Each lighten block and darken block contain three parts, which are encoding, offset estimating, and decoding.

For lighten and darken block, input will pass through encoding part and estimate offset latent code (for low- light image to normal-light image and vice versa.), the author points that latent code must be positive, since pass through PReLU layer. Darken block will subtract offset and lighten block will add offset to obtain enhanced latent code. Then decoding part will transform enhanced latent code back to image. Please note that the problem formulation of [7] is:

𝑌 = 𝑋 + 𝛾𝑃(𝑋) (7) Where Y is normal-light image; X is low-light image;

𝑃(∙) is lightening operation; and 𝛾 control the lightening power. Thus, 𝑃(∙) can be view as Lighten block of LBP.

We further extend LBP to Denoising Lightening Back Projection block (DLBP), for de-noising and lightening simultaneously, our problem can be formula as:

𝑌 = 𝑋 + 𝛾𝐽(𝑋) (8) where Y is normal-light and noise-free image; X is low- light and noisy image; 𝐽(∙) is de-nosing and lightening operation; and 𝛾 control the lightening and de-nosing level, which 𝐽(∙) can also view as lighten block in our DLBP and vice versa.

Fig.4. illustrate our DLBP block, which also consist encoding, offset estimating, and decoding parts. For

encoding, we use one convolution and PReLU to encode input image into latent code. For offset estimator, darken block and lighten block in LBP [7] has their own offset estimator. However, we argue that this setting might result in encoding into different latent space that make offset operation be ambiguity. The latent code should be in same latent space.

Fig.4. Structure of DLBP, which utilize back projection to design, this structure is same with LBP proposed in [7]. The Detail of lighten and darken block are elaborate in Fig.5.

We can use share offset estimator to constraint lighten block and darken block encode to same latent space, then the decoder of lighten block is target to decode noise-free and well-exposure latent code; darken decoder aims to decode noisy and under-exposure latent code. We adopt two branch manner to design the share offset estimator which one branch estimate exposure latent offset and the other estimate noise latent offset.

We remove PReLU for share offset estimator to let the offset can be negative or positive. Finally, we adopt one convolution block and one PReLU layer from the decoding part to transform latent code back to image.

In proposed model, we stack three DLBP to iterative enhance input low-light image. After enhancement, the intensity map guide attention map, which is inverse of intensity map will perform element-wise multiply to the enhancement result then add back to original input. With

(6)

Fig. 6. Structure of our proposed saturation adjust module which utilizes the framework and prior knowledge proposed in [8].

the residual learning design, the model is more stable while training and obtain better performance.

4.4. Saturation Adjust Module

Low saturation is also a common problem of low-light enhancement result, we utilize a simple yet effective prior and framework proposed by [8] to design a saturation adjust module with original input image is low-light noisy image. Author of [8] show that color saturation is monotonically related to contrast and proposed a method called contrast-dependent color saturation adjustment (CDCSA). CDCSA first calculate local contrast for enhanced image and original according to Michelson formula [27].

𝐶(𝑥, 𝑦) =^𝐿^max^{(𝑥,𝑦)−𝐿}^𝑚𝑖𝑛^(𝑥,𝑦)

𝐿_max(𝑥,𝑦)+𝐿_𝑚𝑖𝑛(𝑥,𝑦) (9) where 𝐿_𝑚𝑎𝑥 and 𝐿_𝑚𝑖𝑛 means maximum and minimum intensity of local area center at (x, y). Then we obtain ratio of local contrast of enhanced image 𝐶_𝑒 and original image 𝐶_𝑜.

𝑅 =^𝐶^𝑒

𝐶_𝑜 (10) However, for smooth area, local contrast might be close to zero, making R unstable. To avoid this problem, [8]

empirically set lower bound of 𝐶_𝑜(𝑥, 𝑦) as 0.2, and modify ratio R as:

𝑅 = {

𝐶_𝑒(𝑥,𝑦)

𝐶_𝑜(𝑥,𝑦), for 𝐶_𝑜(𝑥, 𝑦) > 0.2 1 +^𝐶^𝑒^{(𝑥,𝑦)−𝐶}^𝑜^(𝑥,𝑦)

0.2 , for 𝐶_𝑜(𝑥, 𝑦) ≤ 0.2 (11) Finally, adjust saturation as:

𝑆_𝑒(𝑥, 𝑦) = 𝑅(𝑥, 𝑦) ∙ 𝑆_𝑜(𝑥, 𝑦) (12) where 𝑆𝑒 is saturation of adjusted image, and 𝑆𝑜 is saturation of original image. However, this process introduces blocky artifact and noisy result in our experiment for two reason. First, calculate local contrast using Michelson formula might result blocky artifact, since for some local block, 𝐿_𝑚𝑎𝑥 and 𝐿_𝑚𝑖𝑛 might be same value and whole local block will have same local contrast value according to Eq (9). To improve this problem, we change local contrast formula to RMS contrast [28], which is:

𝐶(𝑥, 𝑦) = √¹

𝑀𝑁∑^𝑁−1_𝑖=0 ∑^𝑀−1_𝑗=0(𝐼_𝑖𝑗− 𝐼̅)² (13) where 𝐼_𝑖𝑗 is intensity i-th and j-th elements in local area of size M, N, and 𝐼̅ is average intensity of local area. We empirically set M=5, and N=5. Second problem is noisy saturation of original image 𝑆_𝑜, since our input original is noisy image, the saturation map is also noisy, directly set saturation of enhanced image 𝑆_𝑒 as ratio R multiply 𝑆_𝑜 will result in noisy output. To avoid this problem, we use a simple statistics-based method to approximate the noisy-free saturation of original image 𝑆̂ . _𝑜

Since the input enhanced image is low saturation but noise-free image, we first transform saturation of enhanced image 𝑆̂ via feature standardization. _𝑒

𝑆_𝑒_𝑠𝑡𝑑

̂ =^𝑆^̂−𝜇^𝑒

𝜎 (14) where μ is mean of 𝑆̂ , and σ is standard deviation of 𝑆_𝑒 ̂ , _𝑒 then approximated noisy-free saturation of original image by inverse transform 𝑆̂ using scaler of original _𝑒 image, the formula is shown below:

𝑆_𝑜

̂ = (𝑆̂ ∗ 𝜎_{𝑒 𝑠𝑡𝑑} 𝑆_𝑜) + 𝜇_{𝑆 𝑜} (15) where 𝑆̂ is final approximated noisy-free saturation map _𝑜 of original image, 𝜎_𝑆_𝑜 is standard deviation of original image 𝑆_𝑜, and 𝜇_{𝑆 𝑜} is average of 𝑆_𝑜. Fig. 6 illustrates the full process of our saturation adjust module.

4.5. Loss Function

We adopt supervised learning setting to train our low- light enhancement network, the loss function is shown below:

ℒ = α_eℒ_e+ α_rℒ_r (16) where ℒ_e is loss function of first step enhancement module, and ℒ_r is loss function of saturation adjustment module, and 𝛼_𝑒 and 𝛼_𝑟 is weighting of each part, which empirically set to 1 and 0.02 respectively.

1) Enhancement Module Loss:

Enhancement module loss form by three components, which are structural loss, total variation, and MAE loss, respectively, and defined as:

(7)

ℒ_𝑒= 𝑤_{𝑠𝑠𝑖𝑚}ℒ_{𝑠𝑠𝑖𝑚}+ 𝑤_𝑡𝑣ℒ_𝑡𝑣+ 𝑤_𝑙1ℒ_𝑙1 (17) where 𝑤_{𝑠𝑠𝑖𝑚}, 𝑤_𝑡𝑣, and 𝑤_𝑙1 are weights of each component, we empirically set to 2, 0.0001, and 0.01, respectively.

The structural is introduced to preserve structural of image, prior work [5, 7] also point out that training with structural loss can avoid blurring and benefit edge preserving. We use the well-known image assessment metric, SSIM [29] as structural loss. The structural loss is defined as:

ℒ_{𝑠𝑠𝑖𝑚} = 1 − ^2𝜇^𝑥^𝜇^𝑦^+𝑐¹

𝜇_𝑥²+ 𝜇_𝑦²+𝑐₁∙ ^2𝜎^𝑥𝑦^+𝑐²

𝜎_𝑥²+𝜎_𝑥²+𝑐₂ (18) where 𝜇𝑥 and 𝜇𝑦 are pixel value averages, 𝜎𝑥 and 𝜎𝑦 are pixel value variance, 𝑐₁ and 𝑐₂ are constants to avoid dividing zero, we set to 0.0001 and 0.0009 in our experiment respectively.

In spite of pixel loss, i.e. MAE, MSE cannot deal with structural of image and usually result blur output, for image restoration task [30, 31] supposed that joint using structural loss, i.e. SSIM, and pixel loss can benefit training. We adopt MAE as pixel loss.

ℒ_l1=∥ 𝑌 − 𝑌̂ ∥¹ (19) where 𝑌 is ground truth normal-light and noise-free image; 𝑌̂ is our model predict enhanced image for input low-light image.

Total variation loss acts a smoothness constraint in our loss function design, to minimize the gradient of predict image and avoid noisy output, which can express as:

ℒ𝑡𝑣= ∑ ∑ √(𝐼𝑖𝑗− 𝐼𝑖+1,𝑗)(𝐼𝑖𝑗− 𝐼𝑖,𝑗+1)

𝑊

𝑗=1 𝐻

𝑖=1

(20) Where I is input image with height H and width W, i and j are index of pixel in image I.

2) Saturation Adjust Module Loss:

We notice that if we only use enhance module loss as total loss function, the model is prone to produce low saturation image, since such image can be ambiguity for hue map (most low saturation colors are similar) and achieve relative high performance on PSNR. However, this results in slight color distortion and will be enlarged after applying saturation adjustment. Thus, we further add saturation adjust module loss to guide training the model, in spite of that this module do not contain trainable parameter, it can provide good guiding for correct color and perform better. The saturation adjust module are defined as:

ℒ_r=∥ 𝑌 − 𝑌̂ ∥¹ (21) where Y is ground truth image, and 𝑌̂ is saturation adjusted image. Since 𝑌̂ is high saturation image, by applying this loss function, it can describe distance of hue and illumination to ground truth more precise, and enforce model to produce more accurate in pixel level.

5. EXPERIMENT

In this section, we compare our proposed method wuit many existing approaches. For traditional approaches, we select HE [1] and LIME [3]. For CNN based method, we compare with MBLLEN [16], Retinex-Net [13] and zero- DCE [17], which is unsupervised approach. For GAN- based, EnlightenGAN [18] are selected to compare.

All of these are implemented through their public available codes, and training with author recommend hyper-parameters or using pretrain model. Evaluation will perform on synthesis data and real-world dataset, i.e.

LOL dataset [13]. We manual select 14 high-quality images to form synthesis evaluation dataset.

For quality evaluation, to follow prior works [1, 3, 16, 17, 18], we use Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity (SSIM), which are widely used in image restoration. We will present visualization result for quantity evaluation.

5.1. Training Detail

We chose 4965 images (training set) in PASCAL VOC 2007 dataset to construct synthesis low-light dataset. The images are resized via bicubic method to make the shorter side be 384 pixel and keep original aspect ratio of image.

We use Adam optimizer to optimize parameters with momentum set to 0.9, and weight decay is 0.0001.

Learning rate are set to 0.00001. For data augmentation, we perform random crop to 128*128 pairs from input and ground truth pair. For each iteration, we set batch size to 32, and the model is trained 500 epochs. All of our experiments are conducted on two NVIDIA GTX 2080Ti GPU and Intel i7-8700 CPU, and our code implement using Pytorch based on [7].

5.2. Result on Synthesis Image

Table I shows the quality comparison of several off-the- shelf low-light enhancement algorithms on our synthesis dataset. Our method outperforms all others method with large margin. The main reason is many prior works do not consider noise of low-light image, and fail to obtain noise-free or noise reduction result.

Table I: Evaluation on Synthesis dataset

PSNR SSIM

HE 15.69 0.556

LIME 15.29 0.553

MBLLEN 15.22 0.618

Zero-DCE 16.87 0.619

EnlightenGAN 16.03 0.620 Retinex-Net 14.19 0.510 DTLN-1(Ours) 21.62 0.783 DTLN-2(Ours) 20.44 0.758

(8)

Fig. 7. Visualization evaluation of several low-light enhancement algorithms on synthesis dataset, please zoom in for better view.

Fig. 7 visualizes results of enhancement for comparison result. HE, LIME, MBLLEN, Retinex-Net, and Zero- DCE generated result still exist a little under-exposure, EnlightenGAN obtain well-exposure result after enhanced. Let us check the front wheel of bicycle, all other methods leave the color distortion which is caused by noise, however, our DTLN-1 generates visual pleasing result, but Our DLTN-2 generates a little oversaturated result and also introduce some color distortion, which means the saturation adjustment are too strong to generate artifact.

5.3. Result on Real Image

For evaluating on real world images, we choose LOL dataset [13], which contains 500 pairs images, 485 for training and 15 for testing. Table II shows the quality comparison of several off-the-shelf low-light enhancement algorithms on LOL dataset [13]. The proposed DTLN outperform all comparison methods, for both metric, which are PSNR and SSIM. The experiment shows that DTLN have excellent ability to recover low- light image to normal light images.

Fig. 8 presents the visualization results of enhancement for comparison result where HE can significantly enhanced image from extreme low-light image, however, as we mentioned before, HE is prone to under-/over- enhancement for local area. LIME, Zero-DCE, EnlightenGAN, and Retinex-Net notably improve visual quality of low-light image but fail to denoising, in the other hand, MBLLEN successfully enhanced but some local areas are over-smooth, this might cause by noise.

Our DTLN-1 can generate high visual quality result but the saturation is relatively low, however, after applying saturation adjustment, we obtain high quality and visual pleasing result, which shown in Fig.8. Our DTLN-2.

Table II: Evaluation on Real World dataset.

PSNR SSIM

HE 14.54 0.387

LIME 16.57 0.475

MBLLEN 17.86 0.725

Zero-DCE 16.86 0.557

EnlightenGAN 17.48 0.652 Retinex-Net 16.77 0.510 DTLN-1(Ours) 21.23 0.822 DTLN-2(Ours) 20.76 0.806 5.4 Ablation Study

In this section, we provide evaluation of effectiveness of different component in our proposed method. Table III shows the comparison result of our ablation study in PSNR and SSIM. All studies are token on LOL dataset [13].

1) Preprocessing

In this study, we mainly test the effectiveness of our proposed pre-processing module. After remove this module, both PSNR and SSIM drop down. The result show that with pre-processing can improve the enhancement result.

2) Saturation Adjust Module

In this study, we remove the saturation adjustment module, in others word, there is no saturation adjust module loss, so we increase the weighting of enhance module loss to balance training for fair comparison. The result shows that although saturation adjusted results have lower PSNR and SSIM than before adjustment, saturation adjusted result present more visual pleasing

(9)

Fig. 8 Visualization evaluation of several low-light enhancement algorithms on real-world dataset, please zoom in for better view

result and provide better distance estimation for loss function than original enhanced result.

3) Unsuitable Training Data Removal

We evaluate the proposed method for unsuitable training data removal via training our proposed model using these two datasets, which are original synthesis dataset and synthesis dataset apply our proposed method.

The experiment result shows that with our proposed method, the model can perform slightly better, in other words, our proposed method does remove the under- illumination images form training dataset. We also check the removed images and discover that some uniform color, e.g. sky, grass image, but well-illumination images are also removed. Remove too many inliers, i.e. well- illumination, might the reason that our proposed method cannot dramatically improve the performance.

4) Loss Function Weighting

We further test the different weighting of loss function, in our experiment, we discover that the best ratio of structural loss (SSIM loss in our experiment) and pixel

loss (L1 loss in our experiment) is 6:4. If increase the weighting of pixel loss, the PSNR will increase, but SSIM will dramatic drop down, and vice versa.

Table III: Ablation study of our proposed method and model structure. Where w/o means without.

PSNR SSIM

w/o preprocessing 21.14 0.8195 w/o dataset remove

unsuitable ground truth

21.09 0.821 w/o saturation adjust

module

21.23 0.822 w/o saturation adjust

module Loss

20.54 0.8045

6 CONCLUSION

In this paper, we proposed a novel low-light enhancement and de-nosing model utilize back projection and saturation adjust module to obtain high saturation result.

We also proposed a novel method to remove under-

(10)

illumination image from public dataset to construct synthetic low-light dataset. Our experiment shows our proposed method are positive to performance. In the future work, we will investigate trainable saturation adjust module and automatically regulate the intensity of saturation adjustment, and extend the target to low illumination with extreme weather images, for example, nighttime rainy or hazy images. We will also explore the possibility of application on autonomous vehicles system.

ACKNOWLEDGEMENT

This work is part of master thesis of Yu-Wei Chen, and we thank his advisor, Prof, Soo-Chang Pei, for giving useful suggestions.

REFERENCES

[1] J. A. Stark, "Adaptive image contrast enhancement using generalizations of histogram equalization,"

in IEEE Transactions on Image Processing, vol. 9, no. 5, pp. 889-896, May 2000, doi:

10.1109/83.841534.

[2] Karel Zuiderveld. “Contrast limited adaptive histogram equalization”. Graphics gems IV.

Academic Press Professional, Inc., USA, 474–485., 1994

[3] X. Guo, Y. Li and H. Ling, "LIME: Low-Light Image Enhancement via Illumination Map Estimation," in IEEE Transactions on Image Processing, vol. 26, no. 2, pp. 982-993, Feb. 2017, doi: 10.1109/TIP.2016.2639450.

[4] K. Wei, Y. Fu, J. Yang and H. Huang, "A Physics- Based Noise Formation Model for Extreme Low- Light Raw Denoising," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2755-2764, doi:

10.1109/CVPR42600.2020.00283.

[5] Feifan Lv, Yu Li and Feng Lu, "Attention Guided Low-light Image Enhancement with a Large Scale Low-light Simulation Dataset", arXiv preprint, arXiv: 1908.00682, 2019

[6] Kin Gwn Lore and Adedotun Akintayo and Soumik Sarkar, “LLNet: A deep autoencoder approach to natural low-light image enhancement”, in Pattern Recognition, vol. 61, pp. 650-662, 2017 [7] L. Wang, Z. Liu, W. Siu and D. P. K. Lun,

"Lightening Network for Low-Light Image Enhancement," in IEEE Transactions on Image Processing, vol. 29, pp. 7984-7996, 2020, doi:

10.1109/TIP.2020.3008396.

[8] Shuhang Wang, Woon Cho, Jinbeum Jang, Mongi A. Abidi, and Joonki Paik, "Contrast-dependent saturation adjustment for outdoor image enhancement," J. Opt. Soc. Am. A 34, 7-17, 2017 [9] H. Ibrahim and N. S. Pik Kong, "Brightness

Preserving Dynamic Histogram Equalization for Image Contrast Enhancement," in IEEE Transactions on Consumer Electronics, vol. 53, no.

4, pp. 1752-1758, Nov. 2007, doi:

10.1109/TCE.2007.4429280.

[10] D. J. Jobson, Z. Rahman, and G. A. Woodell,

“Properties and performance of a center/surround retinex,” IEEE Trans. Image Process., vol. 6, no. 3, pp. 451–462, Mar. 1997.

[11] Z. Rahman, D. J. Jobson and G. A. Woodell,

"Multi-scale retinex for color image enhancement,"

Proceedings of 3rd IEEE International Conference on Image Processing, 1996, pp. 1003-1006 vol.3, doi: 10.1109/ICIP.1996.560995.

[12] X. Fu, D. Zeng, Y. Huang, X. Zhang and X. Ding,

"A Weighted Variational Model for Simultaneous Reflectance and Illumination Estimation," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2782-2790, doi:

10.1109/CVPR.2016.304.

[13] Wei, C., Wang, W., Yang, W., Liu, J., “Deep retinex decomposition for low-light enhancement”.

British Machine Vision Conference (BMVC), 2018

[14] R. Wang, Q. Zhang, C. Fu, X. Shen, W. Zheng and J. Jia, "Underexposed Photo Enhancement Using Deep Illumination Estimation," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6842-6850, doi:

10.1109/CVPR.2019.00701.

[15] K. Lu and L. Zhang, "TBEFN: A two-branch exposure-fusion network for low-light image enhancement," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2020.3037526.

[16] Feifan Lv, Feng Lu, Jianhua Wu and Chongsoon Lim, “MBLLEN: Low-light Image/Video Enhancement Using CNNs”, British Machine Vision Conference (BMVC), 2018

[17] C. Guo et al., "Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement,"

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1777- 1786, doi: 10.1109/CVPR42600.2020.00185.

[18] Y. Jiang et al., "EnlightenGAN: Deep Light Enhancement Without Paired Supervision," in IEEE Transactions on Image Processing, vol. 30,

pp. 2340-2349, 2021, doi:

10.1109/TIP.2021.3051462.

[19] Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M. & Aila, T.,

“Noise2Noise: Learning Image Restoration without Clean Data”, Proceedings of the 35th International Conference on Machine Learning, 80:2965-2974, 2018

[20] A. Krull, T. Buchholz and F. Jug, "Noise2Void - Learning Denoising From Single Noisy Images,"

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2124- 2132, doi: 10.1109/CVPR.2019.00223.

[21] S. Guo, Z. Yan, K. Zhang, W. Zuo and L. Zhang,

"Toward Convolutional Blind Denoising of Real Photographs," 2019 IEEE/CVF Conference on

(11)

Computer Vision and Pattern Recognition (CVPR),

2019, pp. 1712-1722, doi:

10.1109/CVPR.2019.00181.

[22] M. Haris, G. Shakhnarovich and N. Ukita, "Deep Back-Projection Networks for Super-Resolution,"

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 1664-1673, doi:

10.1109/CVPR.2018.00179.

[23] C. Chen, Q. Chen, J. Xu and V. Koltun, "Learning to See in the Dark," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3291-3300, doi: 10.1109/CVPR.2018.00347.

[24] J. Cai, S. Gu and L. Zhang, "Learning a Deep Single Image Contrast Enhancer from Multi- Exposure Images," in IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049-2062, April 2018, doi: 10.1109/TIP.2018.2794218.

[25] Y. Yan, W. Ren, Y. Guo, R. Wang and X. Cao,

"Image Deblurring via Extreme Channels Prior,"

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6978-6986, doi: 10.1109/CVPR.2017.738.

[26] Hiroki Yamashita, Daisuke Sugimura, and Takayuki Hamamoto "Low-light color image enhancement via iterative noise reduction using

RGB/NIR sensor," Journal of Electronic Imaging 26(4), 043017,19 August, 2017

[27] Michelson, A.. Studies in Optics. U. of Chicago Press., 1927

[28] E. Peli (Oct 1990). "Contrast in Complex Images"

(PDF). Journal of the Optical Society of America A.7(10):2032–2040.doi: 10.1364/JOSAA.7.00203 2.

[29] Zhou Wang, A. C. Bovik, H. R. Sheikh and E. P.

Simoncelli, "Image quality assessment: from error visibility to structural similarity," in IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004, doi:

10.1109/TIP.2003.819861.

[30] G. Seif and D. Androutsos, "Edge-Based Loss Function for Single Image Super-Resolution,"

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp.

1468-1472, doi: 10.1109/ICASSP.2018.8461664.

[31] H. Zhao, O. Gallo, I. Frosio and J. Kautz, "Loss Functions for Image Restoration With Neural Networks," in IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 47-57, March 2017, doi: 10.1109/TCI.2016.2644865.