FastDeRain: A Novel Video Rain Streak Removal Method Using Directional Gradient Priors

(1)

FastDeRain: A Novel Video Rain Streak Removal Method Using Directional Gradient Priors

Tai-Xiang Jiang , Ting-Zhu Huang, Xi-Le Zhao , Liang-Jian Deng, and Yao Wang

Abstract— Rain streaks removal is an important issue in out- door vision systems and has recently been investigated extensively.

In this paper, we propose a novel video rain streak removal approach FastDeRain, which fully considers the discriminative characteristics of rain streaks and the clean video in the gradient domain. Specifically, on the one hand, rain streaks are sparse and smooth along the direction of the raindrops, whereas on the other hand, clean videos exhibit piecewise smoothness along the rain-perpendicular direction and continuity along the temporal direction. Theses smoothness and continuity result in the sparse distribution in the different directional gradient domain. Thus, we minimize: 1) the 1 norm to enhance the sparsity of the underlying rain streaks; 2) two 1 norm of unidirectional total variation regularizers to guarantee the anisotropic spatial smoothness; and 3) an 1 norm of the time-directional differ- ence operator to characterize the temporal continuity. A split augmented Lagrangian shrinkage algorithm-based algorithm is designed to solve the proposed minimization model. Experiments conducted on synthetic and real data demonstrate the effective- ness and efficiency of the proposed method. According to the comprehensive quantitative performance measures, our approach outperforms other state-of-the-art methods, especially on account of the running time. The code of FastDeRain can be downloaded at https://github.com/TaiXiangJiang/FastDeRain.

Index Terms— Video rain streak removal, unidirectional total variation, split augmented Lagrangian shrinkage algorithm (SALSA).

I. INTRODUCTION

O

UTDOOR vision systems are frequently affected by bad weather conditions [1]–[5], one of which is the rain. Raindrops usually introduce bright streaks into the acquired images or videos, because of their scattering of

Manuscript received March 20, 2018; revised July 15, 2018 and October 24, 2018; accepted October 26, 2018. Date of publication November 12, 2018;

date of current version December 19, 2018. This work was supported in part by the National Natural Science Foundation of China under Grant 61772003, Grant 61876203, Grant 61702083, and Grant 11501440, in part by the Fundamental Research Funds for the Central Universities under Grant ZYGX2016J132, Grant ZYGX2016J129, and Grant ZYGX2016KYQD142, in part by the Science Strength Promotion Programme of UESTC, and in part by the Key Research Program of Hunan Province, China, under Grant 2017GK2273. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Xiao-Ping Zhang. (Corresponding authors: Ting-Zhu Huang; Xi-Le Zhao.)

T.-X Jiang, T.-Z. Huang, X.-L. Zhao, and L.-J. Deng are with the School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail:

[email protected]; [email protected]; xlzhao122003@163.

com; [email protected]).

Y. Wang is with the School of Mathematics and Statistics, Xi’an Jiaotong University, Xian 710049, China (e-mail: [email protected]).

Digital Object Identifier 10.1109/TIP.2018.2880512

Fig. 1. A frame of a rainy video (left), the rain streaks removal result by the proposed method FastDeRain (middle) and the extracted rain streaks (right).

The pixel values of the rain streaks are scaled for better visualization.

light into complementary metal–oxide–semiconductor cameras and their high velocities. Moreover, rain streaks also interfere with nearby pixels because of their specular high- lights, scattering, and blurring effects [1]. This undesirable interference will degrade the performance of various com- puter vision algorithms [6], such as event detection [7], object detection [8], tracking [9], recognition [10], and scene analysis [11]. Therefore, the removal of rain streaks is an essential task [78], which has recently received considerable attention.

Numerous methods have been proposed to improve the visibility of images/videos captured with rain streak interference [12]–[49]. They can be classified into two categories:

multiple-images/videos based techniques and single-image based approaches. Fig. 1 exhibits an example of video rain streaks removal. Without loss of generality, in this paper, we use “background” to denote the rain-free content of the data.

For the single-image de-raining task, Kang et al. [12]

decomposed a rainy image into low-frequency (LF) and high-frequency (HF) components using a bilateral filter and then performed morphological component analysis (MCA)- based dictionary learning and sparse coding to separate the rain streaks in the HF component. To alleviate the loss of the details when learning HF image bases, Sun et al. [13] tactfully exploited the structural similarity of the derived HF image bases. Chen and Hsu [14] considered the similar and repeated patterns of the rain streaks and the smoothness of the background. Sparse coding and dictionary learning were adopted in [16]–[18]. In their results, the details of backgrounds were well preserved. Meanwhile, Zhang and Patel [19] decomposed a rainy image into a clear background image and a rain streak image using a set of generic sparsity-based and low-rank representation-based convolutional filters. The recent work by Li et al. [1], [20] utilized Gaussian mixture model (GMM)

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(2)

patch priors for rain streak removal, with the ability to account for rain streaks of different orientations and scales.

Zhu et al. [21] proposed a joint bi-layer optimization method progressively separate rain streaks from background details, in which the gradient statistics are analyzed. Meanwhile, the directional property of rain streaks received a lot of attention in [24]–[26] and these methods achieved promising performances. Wang et al. [27] took advantage the image decomposition and dictionary learning. The recently developed deep learning technique was also applied to the single image rain streaks removal task, and excellent results were obtained [28]–[38].

For the video rain streaks removal, Garg and Nayar [39]

firstly raised a video rain streaks removal method with comprehensive analysis of the visual effects of the rain on an imaging system. Since then, many approaches have been proposed for the video rain streaks task and obtained good rain removing performance in videos with different rain circum- stances. Comprehensive early existing video-based methods are summarized in [40]. Chen and Chau [15] took account of the highly dynamic scenes. Whereafter, Kim et al. [41]

considered the temporal correlation of rain streaks and the low-rank nature of clean videos. Santhaseelan and Asari [42]

detected and removed the rain streaks based on phase congru- ency features. You et al. [43] dealt with the situations where the raindrops are adhered to the windscreen or the window glass. In [44], a novel tensor-based video rain streak removal approach was proposed considering the directional property.

Ren et al. [45] handled the video desnowing and deraining task based on matrix decomposition. The rain streaks and the clean background were stochastically modeled as a mixture of Gaussians by Wei et al. [46] while Li et al. [47] learned the multiscale convolutional filters from the rainy data. Both of these two methods [46], [47] achieved excellent performances with surveillance videos. For the video rain streaks removal, the deep learning based methods also started to reveal their effectiveness [48]–[50].

In general, the observation model for a rainy image is formulated as O = B+R [1], which can be generalized to the video case as: O = B+R, where O, B, andR ∈ R^m^×ⁿ^×^t are three 3-mode tensors representing the observed rainy video, the unknown rain-free video and the rain streaks, respectively. When considering the noise or error, the observation model is modified as O = B +R+N, where N is the noise or error term. The goal of video rain streak removal is to distinguish the clean video B and the rain streaks Rfrom an input rainy video O. This is an ill-posed inverse problem, which can be handled by imposing prior information. Therefore, from this point of view, the most significant issues are the rational extraction and sufficient utilization of the prior knowledge, which is helpful to wipe off the rain streaks and reconstruct the rain-free video. In this paper, we mainly focus on the discriminative characteristics of rain streaks and background in different directional gradient domains.

From the temporal perspective, the clean video is contin- uous along the time direction, while the rain streaks do not share this property [41], [46], [51]. As observed in Fig. 2, the

Fig. 2. From left to right: the histograms of temporal gradient of the rainy video (a-1), the clean video (a-2) and the isolated rain streaks (a-3), respectively; several example frames from the rainy video, the clean video and the isolated rain streaks; and the histograms of the vertical gradient (b-1,2,3) and the intensities along a row (c-1,2,3) in the rainy video, the clean video and the isolated rain streaks, respectively.

time-directional gradient of the rain-free video (a-2) exhibits a different histogram compared with those of the rainy video (a-1) and the rain streaks (a-3). The temporal gradient of the clean video is much sparser and it is corresponding to the temporal continuity of the clean video. Therefore, we intend to minimize ∇tB1, where ∇t is the temporal differential operator.

From the spatial perspective, it has been widely recognized that natural images are largely piecewise smooth and their gradient fields or the coefficients in the tight wavelet frame domain are typically sparse [52]–[54]. Many aforementioned de-rain methods take the spatial gradient into consideration and use the total variation (TV) to depict the property of the rain-free part [1], [14]. However, the effects of the rain streaks on the vertical gradient and horizontal gradient are different.

This phenomenon was likewise noticed in [24]–[26]. Initially, for the sake of convenience, we assume that rain streaks are approximately vertical. The impact of the vertical rain streaks on the vertical gradient is limited. The subfigures (b-1,2,3) in Fig. 2 reveal that the vertical gradient of rain streaks are much sparser than those of the clean video and the rainy video. Nonetheless, the vertical rain streaks severely disrupt the horizontal piecewise smoothness. As exhibited in Fig. 2 (c-1,2,3), the pixel intensity is piecewise smooth

(3)

only in (c-2), whereas burrs frequently appear in (c-1) and (c- 3). Therefore, we intend to minimize ∇1R1 and ∇2B1, where∇1and∇2are respectively the vertical difference (or say vertical unidirectional TV [55]–[57]) operator and horizontal difference (or say horizontal unidirectional TV) operator.

Given a real rainfall-affected scene, without the wind, the raindrops generally fall from top to bottom. Meanwhile, when not very windy, the angles between rain streaks and the vertical direction are usually not very large. Therefore, the rain streak direction can be approximated as the vertical direction, i.e. the mode-1 (column) direction of the video tensor. Actually, this assumption is reasonable for parts of the rainy sceneries. For the rain streaks that are oblique (or say far from being vertical), directly utilizing the directional property is very difficult for the digital video data, which are cubes of distinct numbers. To cope with this difficulty, in Sec. III-E, we would design the shift strategy, based on our automatical rain streaks’ direction detection method.

The contributions of this paper include three aspects.

• We propose a video rain streaks removal model, which fully considers the discriminative prior knowledge of the rain streaks and the clean video.

• We design a split augmented Lagrangian shrinkage algorithm (SALSA) based algorithm to efficiently and effectively solve the proposed minimization model. The convergence of our algorithm is theoretically guaranteed.

Meanwhile, the implementation on the graphics processing unit (GPU) device further accelerates our method.

• To demonstrate the efficacy and the superior performance of the proposed algorithm in comparison with state-of- the-art alternatives, extensive experiments both on the synthetic data and the real-world rainy videos are conducted.

This work is an extension of the material published in [44].

The new material is the following: a) the proposed rain streaks removal model is improved and herein introduced in more technical details; b) we explicitly use the split augmented Lagrangian shrinkage algorithm to solve the proposed model;

c) to make the proposed method more applicable, we provide the shift strategy to deal with oblique rain streaks; d) in our experiments, we re-simulate the rain streaks for the synthetic data, using two different techniques and considering the rain streaks not very vertical; e) three recent state-of-the-art methods, consisting of methods in [31] and [47] and the method in our conference paper [44], are brought into comparison.

The paper organized as follows. Section II gives the prelim- inary on the tensor notations. In Section III, the formulation of our model is presented along with a SALSA solver. Exper- imental results are reported in Section IV. Finally, we draw some conclusions in Section V.

II. NOTATION ANDPRELIMINARIES

Following [58]–[61], we use lower-case letters for vectors, e.g., a; upper-case letters for matrices, e.g., A; and calligraphic letters for tensors, e.g., A. An N -mode tensor is defined as X ∈RÎ¹^×Î²^×···×Î^N, and xi1,i2,···,iN denotes its(i1,i2,· · ·,iN)- th component.

TABLE I TENSORNOTATIONS

A fiber of a tensor is defined by fixing every index but one.

A third-order tensor has column, row, and tube fibers, denoted by x_:j k, xi:k, and xi j:, respectively. When extracted from their tensors, fibers are always assumed to be oriented as column vectors.

A slice is a two-dimensional section of a tensor, defined by fixing all but two indices. The horizontal, lateral, and frontal slides of a third-order tensor X are denoted by Xi::, X_:j:, and X_::k, respectively. Alternatively, the k-th frontal slice of a third-order tensor, X_::k, may be denoted more compactly by Xk.

The inner product of two same-sized tensors X and Y is defined as X,Y :=

i₁,i₂,···,i_N

xi₁i₂···i_N · yi₁i₂···i_N. The corresponding norm (Frobenius norm) is then defined as XF :=√

X,X.

Please refer to [62] for a more extensive overview.

III. MAINRESULTS

A. Problem Formulation

As mentioned before, a rainy video O ∈ R^m^×ⁿ^×^t can be modeled as a linear superposition:

O=B+R+N, (1) where O,B,RandN ∈ R^m^×ⁿ^×^t are four 3-mode tensors representing the observed rainy video, the unknown rain-free video, the rain streaks and the noise (or error) term, respectively.

Our goal is to decompose the rain-free videoBand the rain streaksRfrom an input rainy videoO. To solve this ill-posed inverse problem, we need to analyze the prior information for bothBandRand then introduce corresponding regularizers, which will be discussed in the next subsection.

B. Priors and Regularizers

In this subsection, we continue the discussion on the prior knowledge with the assumption that rain streaks are approximately vertical.

a) Sparsity of rain streaks: When the rain is light, the rain streaks can naturally be considered as being sparse. To boost the sparsity of rain streaks, minimizing the 1 norm of the rain streaksRis an ideal option. When the rain is very heavy, it seems that this regularization is not proper. However, when

(4)

the rain is extremely heavy, it is very difficult or even impossible to recover the rain-free part because of the huge loss of the reliable information. The rainy scenarios discussed in this paper are not that extreme, and we assume that the rain streaks always maintain lower energy than the background clean videos. Therefore, when the rain streaks are dense, the1

norm can be viewed as a role to restrain the magnitude of the rain streaks. Meanwhile, in our model, other regularization terms would also contribute to distinguishing the rain streaks.

Thus, we can tackle the heavy raining scenarios by tuning the parameter of the sparsity term so as to reduce its effect.

b) The horizontal direction: In Fig. 2, (c-1,2,3) show the pixel intensities along a fixed row of the rainy video, the clean video and the rain streaks, respectively. It is obvious that the variation of the pixel intensity is piecewise smooth only in (c-2), whereas burrs frequently appear in (c-1) and (c-3).

Therefore, a horizontal unidirectional TV regularizer is a suitable candidate forB.

c) The vertical direction: It can be seen from Fig. 2 that (b-3), which is the histogram of the intensity of the vertical gradient in a rain-streak frame, exhibits a distinct distribution with respect to (c-1) and (c-2). The long-tailed distributions in (c-1) and (c-3) indicate that the minimization of the l1 norm of∇1Rwould help to distinguish the rain streaks.

d) The temporal direction: From the first column of Fig. 2, it can be observed that clean videos exhibit the continuity along the time axis. Sub-figures (a-1,2,3), which present the histograms of the magnitudes in the temporal directional gradient, illustrate that the clean video’s temporal gradients consist of more zero values and smaller non-zero values, whereas those of the rainy video and rain streaks tend to be long-tailed. Therefore, it is natural to minimize the l1 norm of the temporal gradient of the clean video B. By the way, the low-rank regularization used in [44] is discarded since that the low-rank assumption is not reasonable for the videos captured by dynamic cameras and the rain streaks, which always share the repetitive patterns, can occasionally be more low-rank than the background along the spatial directions.

C. The Proposed Model

Generally, there is an angle between the vertical direction and the real falling direction of the raindrops. The rain streaks pictured in Fig. 2 are not strictly vertical and there is a 5-degree angle between the rain streaks and the y-axis. In other words, the prior knowledge discussed above are still valid when this angle is small. Large-angle cases would be discussed in Sec. III-E). Therefore, the rain streak direction is referred to as the vertical direction corresponding to the y-axis, whereas the rain-perpendicular direction is referred to as the horizontal direction corresponding to the x-axis. Thus, as a summary of the discussion of the priors and regularizers, our model can be compactly formulated as follows:

minB,Rα1∇1R1+α2R1+α3∇2B1

+α4∇tB1+1

2O−(B+R)²F

s.t. OB0, OR0, (2)

where∇1,∇2and∇t are the vertical, horizontal and temporal differential operators, respectively.∇1and∇2are also written as ∇y and ∇x in [24] and [44]. An efficient algorithm is proposed in the following subsection to solve (2).

D. Optimization

Since the proposed model (2) is concise and convex, many state-of-the-art solvers are available to solve it. Here, we apply the ADMM [63], which has been proved an effective strategy for solving large scale optimization problems [64]–[67]. More specifically, we adopt SALSA [68].

After introducing four auxiliary tensors the proposed model (2) is reformulated as the following equivalent con- strained problem:

B,Vmini,Di

α1V11+α2V21+α3V31+α4V41

+1

2O−(B+R)²_F

s.t. V1=∇1(R), V2=R, V3=∇2(B),

V4=∇t(B), OB0, OR0 (3) whereVi ∈R^m^×ⁿ^×^t (i =1,2,3,4).

Then, the augmented Lagrangian function of (3) is L_μ(B,R,Vi,Di)

= 1

2O−B−R²_F+α1V11+α2V21

+α3V31+α4V41+μ

2∇1R−V1−D1²F

+μ

2R−V2−D2²F+μ

2∇2B−V3−D3²F

+μ

2∇tB−V4−D4²_F,

where the Dis (i = 1,2,3,4) are the scaled Lagrange multipliers and theμ is a positive scalar.

e)Vi sub-problems: For i = 1,2,3,4, the Vi

sub-problem can be written as a equivalent problem:

V_i⁺=arg min

Vi

αiVi1+μ

2Ai −Vi²_F.

Such a problem has a closed-form solution, obtained through soft thresholding:

V_i⁺=S^αi μ (Ai) .

Here, the tensor non-negative soft-thresholding operator S_v(·)is defined as

Sv(A)= ¯A

with

¯

ai1i2···iN =

ai₁i₂···i_N −v, ai₁i₂···i_N > v,

0, otherwise.

(5)

Therefore, Vi (i = 1,2,3,4) can respectively be updated as follows:

⎧⎪

⎪⎪

⎪⎨

⎪⎪

⎩

V₁⁽^t⁺¹⁾=Sα1

μ

(∇1R−D1) , V₂⁽^t⁺¹⁾=Sα2

μ

(R−D2) , V₃⁽^t⁺¹⁾=Sα3

μ

(∇2B−D3) , V₄⁽^t⁺¹⁾=Sα4

μ

(∇tB−D4) .

(4)

The time complexity of each sub-problem above is O(mnt).

f) B and R sub-problems: B and R sub-problems are least-squares problems:

B⁺ =arg min

O≤B≤0

1

2O−B−R²_F +μ

2∇2B−V3−D3²_F +μ

2∇tB−V4−D4²F, R⁺ =arg min

O≤R≤0

1

2O−B−R²_F+μ

2∇1R−V1−D1²_F +μ

2R−V2−D2²F. Then, we have

B⁺ = O−R+μ∇₂(V3−D3)+μ∇t (V4−D4) 1+μ∇₂∇2+μ∇t ∇t

R⁺ = O−B+μ∇₁(V1−D1)+μ(V2−D2)

1+μ∇₁∇1+μ (5)

We adopt the fast Fourier transform (FFT) for fast calculation when updating B andR. Meanwhile, the elements in B⁽^t⁺¹⁾ and R⁽^t⁺¹⁾that are smaller than 0 or larger than the corresponding elements inO will be shrunk. The time complexity of updating B(orR) is O(mnt·log(mnt)).

g) Multipliers updating: The Lagrange multipliers Dis (i =1,2,3,4) can be updated as follows:

⎧⎪

⎪⎪

⎨

⎪⎪

⎪⎩

D1=D1+∇1R−V1

D2=D2+R−V2

D3=D3+∇2B−V3

D4=D4+∇tB−V4

(6)

The proposed algorithm for video rain streak removal is denoted as “FastDeRain” and summarized in Algorithm 1. For a video with dimensions of m×n×t, the time complexity of the proposed algorithm is proportional to O(mnt log(mnt)). E. The Shift Strategy for Oblique Rain Streaks

As we know that, in a real rainfall-affected scene, the rain streaks are not always vertical. Thus, the directional property we utilized in our model is a double-edged sword when dealing with digital videos. Fortunately, as shown in the experimental part, our FastDeRain is robust to a range of angles, about−15^◦ to 15^◦with respect to the vertical direction. Thus, we consider to divide the rainy situations into different cases.

Without loss of generality, we assume that the rain streaks are in a similar direction and the angle between rain

Algorithm 1 FastDeRain

Fig. 3. Illustrations of the shift I and the shift II operations. For better visualization, the rain streaks in the left part are roughly labeled with the red color, while the pixel values of the rain streaks images in the right are scaled.

streaks and the vertical direction is denoted as θ. Generally, the angle θ distributes in (−90^◦,90^◦). If the angle θ ∈ (−90^◦,0^◦), we can restrict it to the range of (0^◦,90^◦) by the left-right flipping of each frame. When θ ∈ (45^◦,90^◦), we can restrict it to the range of(0^◦,45^◦)by transposing (i.e.

interchanging the rows and columns of a given matrix) each frame. Therefore, our goal turns to handle the rain streaks with angles in [0^◦,45^◦].

When the angle θ is close to zero and not bigger than, we directly use our FastDeRain, and for other elaborately In this subsection, inspired by the shearing techniques used for cartoon-texture image decomposition [69], we propose the shift strategy to deal with rain streaks not vertical.

a) The shift operations: We first introduce two shift operations, as shown in Fig. 3. Different from the rotation operation recommended in [44], the core idea of the shift operations is to rationally slide the rows of the rainy frames and make the rain streaks being approximately vertical without any degradation caused by the interpolation [70]. We remark here that these shifting operations would not affect the priors mentioned in Sec. III-B. These two shift operations are detailed as follows:

Shift I For each frame O_::k, we slide the i -th row(i−1) pixel(s) to the right.

Shift II For each frame O_::k, we slide the i -th row⁽ⁱ⁻₂¹⁾¹ pixel(s) to the right.

Without loss of generality, we assume that the rain streaks are in a similar direction and the angle between rain streaks and the vertical direction is denoted asθ. Shift I is suitable for the rain streaks withθ =45^◦ while Shift II is ideal for θ = 26.57^◦, since that arctan 1 = 45^◦ and arctan 1/2 ≈ 26.57^◦. Considering that the proposed FastDeRain is robust to a range

1x denotes the rounding the x to the nearest integers towards minus infinity.

(6)

of angles (see details in Sec. IV), our method with Shift I and Shift II is sufficient for the situations when θ∈ [0^◦,45^◦].

Generally, the angle θ distributes in (−90^◦,90^◦). If the angle θ ∈ (−90^◦,0^◦), we can restrict it to the range of (0^◦,90^◦)by the left-right flipping of each frame. When θ ∈ (45^◦,90^◦), we can restrict it to the range of (0^◦,45^◦) by transposing (i.e. interchanging the rows and columns of a given matrix) each frame. Hence, our method with the shift operations is able to handle all the cases.

b) The shift strategy: After giving the shift operations and the left-right flipping and transposing transformations, the question comes to how to automatically decide the transformation and the shift operation. Fortunately, based on our analysis of the prior knowledge in Sec. III-B, it’s not difficult to come up with a practical and efficient strategy with these two shift operations.

Then, for a rainy videoO∈R^m^×ⁿ^×^t, our strategy consists of three steps:

1) Filtering. Filter the horizontal slices of the rainy video with a 3 × 3 median filter, i.e., for i = 1,2,· · ·,m, O(i, :,:)=med(O(i,:,:)), and obtainR0=O−O.

2) Transforming and shifting. Left-right flip and transpose each frame of R0, and respectively apply shift I and shift II operations. Then we obtain a set of tensors, consisted of R0

and the variants of R0.

3) Computing vertical gradients. For each tensorRⁱ₀in this set, we compute yi = ∇1Rⁱ₀1. Then we select the transformation and shift operations corresponding to the minimal yi. By these three steps, the transforming and shift operations are automatically selected. We input the data after transforming and shifting into Algorithm 1 and finally conduct the inverse transformation and shift on the output.

IV. EXPERIMENTALRESULTS

In this section, we evaluate the performance of the proposed algorithm on synthetic data and real-world rainy videos.

a) Implementation details: Throughout our experiments, color videos with dimensions of m×n×3×t are transformed into the YUV format. YUV is a color space that is often used as part of a color image pipeline. Y stands for the luma component (the brightness), and U and V are the chrominance (color) components². We apply our method only to the Y channel with the dimension of m×n×t. The exhibited rain streaks are scaled for better visualization.

Since that the graphics processing unit (GPU) device is able to speed up the large-scale computing, we implement our method on the platform of Windows 10 and Matlab (R2017a) with an Intel(R) Core(TM) i5-4590 CPU at 3.30GHz, 16 GB RAM, and a GTX1080 GPU. The involved operations in algorithm 1 is convenient to be implemented on the GPU device [71]. If we conduct our algorithm on the CPU, the running time for dealing with a video of size 240×320×3×100 is about 23 seconds, while 7 seconds on the GPU device.

Meanwhile, Fu et al.’s method [31] can also be accelerated by the GPU device, from 38 seconds on the CPU to 24 seconds on the GPU, dealing with a video of size 240×320×3×100.

2https://en.wikipedia.org/wiki/YUV

Fig. 4. The rainy frame, rain streaks removal results, extracted rain streaks and corresponding error images by different methods with synthetic rain streaks in case 1, respectively. The corresponding videos from top to bottom are the “’foreman”, ”bus”, ”waterfall” and ”highway”. From left to right are: the rainy data (or the color bar), results by TCL [41], DDN [31], DIP [44], (MS-CSC [47],) FastDeRain, and the ground truth (GT) clean video, respectively.

Thus, we only report the GPU running time of FastDeRain and Fu et al.’s method in this section.

b) Compared methods: To validate the effectiveness and efficiency of the proposed method, we compare our method (denoted as “FastDeRain”) with recent state-of-the- art methods, including one single image based method, i.e., Fu et al.’s deep detail network (DDN) method³[31]; and three video-based methods, i.e., Kim et al.’s method using temporal correlation and low-rankness (TCL) ⁴ [41], the method, in our conference paper, utilizing discriminative intrinsic priors (DIP) [44], and Li et al.’s multiscale convolutional sparse cod-

3http://smartdsp.xmu.edu.cn/xyfu.html

4http://mcl.korea.ac.kr/˜jhkim/deraining/deraining_code_with_example.zip

(7)

ing (MS-CSC) method⁵[47]. In fact, DDN is a single-image- based rain streak removal method, but their performance has already surpassed some video-based methods. The deep learning technique shows a great vitality and an extremely wide application prospect. Hence, the comparison with DDN is reasonable and challenging.

A. Synthetic Data

a) Rain streak generation: Adding rain streaks to a video is indeed a complex problem since there is not an existing algorithm nor a free software to accomplish it in one step. Meanwhile, as Starik and Werman [51] pointed out that the rain streaks can be assumed temporal independent, thus we can simulate rain streaks for each frame using the synthetic method mentioned in many recently developed single image rain streaks removal approaches [12], [17], [30], i.e., using the Photoshop software with the tutorial documents [72].

The density of the simulated rain streaks by this method is mainly determined by the ratio of the amounts of dots (in [72, Step 8]) to the number of all the pixels, and, for convenience, the ratio is denoted as r . Another way to syn- thesize the rain streaks was proposed in [46] and [47], adding rain streaks taken by photographers under black background⁶. Referring to [46], [47], and [72], we generate 3 types of rain streaks as follows:

Case 1: Rain streaks simulated referring to [72] with r ≤ 0.04. In a single frame, the rain streaks share the same angle.

The fixed angles for different frames increase from −15^◦ to 15^◦with time;

Case 2: Rain streaks simulated referring to [72] with r ≥ 0.05. In a single frame, the rain streaks are with different angles. The angles uniformly distribute in a range[−15^◦,15^◦];

Case 3: Rain streaks simulated referring to [46].

Four videos are selected as the clean background. Three videos⁷, named “foreman” with the size of 144×176×3×160,

“bus” and “waterfall” with the size of 288×352×3×100, are captured by dynamic cameras, while the other one⁸, named “ highway” with the size of 240×320×3×100, are recorded by a static camera.

MS-CSC [47] is designed mainly for the videos captured by static cameras, and directly applying it on the video captured by dynamic camera would result in poor performances (see the gray values in Table II. Therefore, for a fair comparison, the compared methods included DDN [30], TCL [41] and DIP [44] when dealing with the synthetic rainy data generated on the videos “foreman” “bus” and “waterfall”. When dealing with the rainy data simulated with the video “highway”, MS-CSC [47] would be brought into comparison.

b) Quantitative comparisons: For quantitative assessment, the peak signal-to-noise ratio (PSNR) of the whole video, and the structural similarity (SSIM) [73], the fea- ture similarity (FSIM) [74], the visual information fidelity (VIF) [75], the universal image quality index (UIQI) [76], and

5https://github.com/MinghanLi/MS-CSC-Rain-Streak-Removal 6http://www.2gei.com/video/effect/1_rain/

7http://trace.eas.asu.edu/yuv/

8http://www.changedetection.net

TABLE II

QUANTITATIVECOMPARISONS OF THERAINSTREAKREMOVALRESULTS OF[41], [31], [46], [47]AND THEPROPOSEDMETHOD ONSYNTHETIC

VIDEOS. THE BESTQUANTITATIVEVALUESARE IN BOLDFACE

the gradient magnitude similarity deviation (GMSD, smaller is better) [77] of each frame are calculated. The PSNR, the corresponding mean values of SSIM FSIM VIF and UIQI, and the running time are reported in Table II, in which the best quantitative values are in boldface.

As observed in Table II, our method considerably outperformed the other four state-of-the-art methods in terms of all the selected quality assessment indexes. Notably, in many cases, the performances of the single-image-based deep learn-

(8)

ing method DNN surpassed the those of the video-based method TCL. This is in agreement with the aforementioned rationality of considering comparisons with the single-image- based method.

The running time of the our FastDeRain is extremely low. In particular, our method took less than 10 seconds when dealing with all the synthetic data. The speed of DIP and DNN are comparably fast. After removing the nuclear norm term and avoiding the time consuming singular values decomposition, our algorithm, with closed-form solutions to its sub-problems and a time complexity of approximately O(mntlog(mnt))for an input video with a resolution of m×n and t frames, is expected to be efficient. In the meantime,

the aforementioned implementation on the GPU device also largely accelerated our algorithm.

c) Visual comparisons: Fig. 4, 5 and 6 exhibit the results conducted on videos with synthetic rain streaks in case 1, case 2 and case 3, respectively. In Fig. 4, since the angles of rain streaks in case 1 increase with time, we display the frames at the beginning or end. Meanwhile, only one frame is exhibited in Fig. 5, Fig. 6 on account of that the rain streaks in every frame are of various directions.

In Fig. 4, all the methods removed almost all of the rain streaks and the proposed method maintained the best background. Many details in the background were incorrectly extracted to the rain streaks by DDN and TCL. It can be found in the 6-th row of Fig. 4, i.e., the error images of the results

(9)

Fig. 7. The mean SSIM FSIM and UIQI values with respect to different values ofα1,α2,α3,α4andμ. The solid lines are corresponding to the results of FastDeRain while the dashed lines are related to the results obtained by our method without theN in Eq. (1).

on the video “bus”, that little vertical patterns were mistakenly extracted as the rain streaks by the proposed method.

For the rain streaks in case 2, the denser rain streaks imply that it is more difficult than rain streaks in case 1.

As we mentioned in Sec. III-B, the low-rank assumption is not reasonable for the videos with moving objects. The performance of DIP on the video “highway” was degraded.

From Fig. 5, we can find that our method preserved the backgrounds well and other four methods erased the details of the backgrounds.

In Fig. 6, the proposed method removed most of the rain streaks and considerably preserves the background.

Other methods tended to obtain over de-rain or under de-rain results. Considering the similarity of the extract rains streaks to the ground truth rain streaks, our FastDeRain held obvious advantages.

In summary, for these different types of synthetic data, our method can simultaneously remove almost all rain streaks while commendably preserving the details of the underlying clean videos.

d) Discussion of each component: There are four components in our model (2). To elucidate their distinct effects, we degrade our method by setting each αi (i = 1,2,3,4) equal to 10⁻¹⁵, respectively. These degraded methods and FastDeRain are tested on the video “waterfall” with synthetic rain streaks in case 1. We present the quantitative assessments in Fig. 9 and the visual results in Fig. 8.

From Fig. 9 and Fig. 8, we can conclude that all the four components contribute to the removal of rain streaks. Specif- ically, (a) when setting α1 =10⁻¹⁵, the rain streaks tend to be intermittent along the vertical direction; (b) the rain streaks are fatter when the sparsity term contributes little; (c) some rain streaks remain in the background when the horizontal smoothness of the background is not sufficiently enhanced;

(d) the temporal continuity seems overwhelmingly important since that without this regularization term our method nearly failed.

e) Parameters: To examine the performance of the proposed FastDeRain with respect to different parameters, we conduct a series of experiments on the rainy data on synthetic video “waterfall” with the synthetic rain streaks in case 1 and the Gaussian noise with zero mean and standard deviation 0.02. In Fig. 7, a parameter analysis is presented and the SSIM FSIM and MUIQI are selected. Based on guidance from Fig. 7, our tuning strategy is as following:

(1) set α2 andα3 as 10⁻⁴ and otherαis to 0.01, andμ=1, (2) tune α1 and α4 until the results are barely satisfactory,

Fig. 8. The top row shows the 80th frame of the rainy video, the results by FastDeRain and its degraded versions, in which theαis in Eq. (3) are set as=10⁻¹⁵ in turn, and the ground truth (GT) clean video, respectively.

The middle row presents the extracted rain streaks by FastDeRain and its degraded versions and the ground truth rain streaks, while the color bar and corresponding error images are exhibited in the bottom row.

Fig. 9. The quantitative performances of the proposed method and its degraded versions, in which theαis in Eq. (3) are set as 10⁻¹⁵in turn.

(3) and then fix α1 and α4 and enlargeα2 and α3 to further improve the performance. The tuning principle is as follows:

when some of the texture or detail of the clean video is extracted into the estimated rain streaks, we increaseα2 and α1 or decrease α4 and α3, and we do the opposite when rain streaks remain in the estimated rain-free content. Our recommended set of candidate values for α1 through α4 is {0.0001,0.0003,0.001,0.003,0.01}. The Lagrange parameter μ is suggested to be 1. In practice, the time cost for the empirical tuning of the parameters is not much.

f) Discussion of the noise term N in Eq. (1): In this paper, the noise (or error) term (N in Eq. (1)) is taken into consideration in the observation model. To illustrate its effects, we conduct a series of experiments, in which the Gaussian noises of different standard deviations are respectively added to the video “waterfall” with synthetic rain streaks in case 1.

The quantitative assessments of the results obtained by the proposed method with and without the noise (or error) term N taken into consideration (denoted as “wN” and “w/oN”,

(10)

TABLE III

QUANTITATIVECOMPARISONS OF THERAINSTREAKREMOVALRESULTS OF THE PROPOSED FASTDERAIN WITH (W) AND WITHOUT(W/O)

THENOISE TERMTAKENINTOCONSIDERATION ONSYNTHETIC VIDEO“WATERFALL” WITH THESYNTHETICRAINSTREAKS IN

CASE1. THE BESTQUANTITATIVEVALUESARE IN BOLDFACE

respectively ) are reported in Table III. In addition, we also exhibit the effects of different parameters on the proposed method withoutN in Fig. 7.

From Table III, we can conclude our method without N would acquire a better result when the rainy video is free from the noise. However, when the video is simultaneously affected by the rain streaks and the noise, which is unavoidable in real data, our method withN got better results. Therefore, we adopt the termN in Eq. (3) which enhances the robustness of our method to the noise. Meanwhile, the solid lines and the dashed lines in Fig. 7 also demonstrate that taking the noise (or error) term N into account would contribute to the robustness of the proposed method to different parameters.

B. Real Data

In this section, four real-world rainy videos are chosen in this subsection. The first one (denoted as “wall”) of size 288×368×3×171 is download from the CAVE dataset⁹and the second video¹⁰(denoted as “yard”) of size 512×256×3× 126 was recorded by one of the authors on a rainy day in his backyard. The background of the video “wall” is consist of regular patterns while the background of the video “yard” is more complex. The third video is clipped from the well-known film “the Matrix”. The scene in this clips changes fast so that it is more difficult to deal with this video. The last video of size 480×640×3×108 is denoted as “crossing”¹¹, and it was captured in the crossing with complex traffic conditions. In the mean time, to further illustrate the effects from noise term N in Eq. (1) in the experiments with real-world rainy data, we also exhibit the results obtained by our method without the noise (or error) termN taken into consideration (denoted as“w/o N”).

9http://www.cs.columbia.edu/CAVE/projects/camera rain/

10https://github.com/TaiXiangJiang/FastDeRain/blob/master/yard.mp4 11https://github.com/hotndy/SPAC-SupplementaryMaterials/blob/master/

Dataset_Testing_RealRain/ra4_Rain.rar

Fig. 10. Rain streak removal performance of different methods obtained on the video “wall”. From top to bottom, two adjacent frames of the deraining results and corresponding extracted rain streaks are illustrated. From left to right are: the rainy data, results by different methods, and the ground truth.

Fig. 11. Rain streak removal results on the video “yard”. From left to right are frames of the rainy video, rain streaks removal results and corresponding extracted rain streaks by different methods, respectively. From left to right are: the rainy data, results by different methods, and the ground truth.

Fig. 10 shows two adjacent frames of the results obtained on the video “wall”. There are many vertical line patterns in the background of this video. Thus, exhibiting two adjacent frames would further help to distinguish the rain streaks from the background. It can be found in the zoomed in red blocks that this rain streak with high brightness is not handled properly by DNN and MS-CSC. Our methods, including DIP and FastDeRain, remove almost all the rain streaks and preserves the background best compared with the results by other three methods. It can be found that the rain streaks acquired by our FastDeRain is more smooth along the vertical direction compared with the results obtained by our method withoutN. Since that there is little texture or structure similar to rain streaks in the video “yard”, only one frame is exhibited in Fig. 11. DNN didn’t distinguish most of the rain streaks, especially in the zoomed in red blocks. Although TCL and MS-CSC separated the majority of rain streaks, we could still observe remaining rain streaks in the zoomed in area. The deraining results got by DIP and FastDerain were similarly clean, and our FastDeRain without N incorrectly extracted some content of the background into the rain streaks.

In Fig. 12, two adjacent frames of the rainy video “the Matrix” and deraining results by different methods are shown.

(11)

Fig. 12. Rain streak removal performance of different methods obtained on the clips of movie “the Matrix”. From top to bottom, 2 adjacent frames of the rainy video/deraining results and corresponding extracted rain streaks are illustrated. From left to right are: the rainy data, results by different methods, and the ground truth.

Fig. 13. Rain streak removal performance of different methods obtained on the video “crossing”. From left to right are: the rainy data, deraining results or extracted rain streaks by different methods, and the ground truth.

The two adjacent rainy frames reveal the rapidly changing of the scene, particularly the luminance. For this video, DIP showed its limitation, remaining rain streaks in the deraining result. Once again, our FastDeRain obtained the best result, especially when dealing with the obvious rain streak on the face of Neo. Our FastDeRain comparatively outperformed its variation ersion withoutN in consideration of preservation of the face of Neo.

The results on the rainy video “crossing” are exhibited in Fig. 13. From the zoomed in areas in the first row, we can observe that TCL and our FastDeRain (with and without N) acquired the most clean background while DNN MS-CSC and DIP left some rain streaks in the background more of less.

The extracted rain streaks in the second row show that TCL extracted some the structure of the curb line into the rain streaks while DNN tended to remove all the textures with line pattern. The extracted rain streaks by the proposed FastDeRain were visually the best among all the results.

The scenarios in these four videos are of large differences.

Our method obtains the best results, both in removing rain streaks and in retaining spatial details. In addition, the running time of our method is also obviously less than other methods, especially those video-based methods.

TABLE IV

QUANTITATIVECOMPARISONS OF THERAINSTREAKREMOVALRESULTS OF[41], [30], [46], [47] AND THE PROPOSED METHOD WITH THE

SHIFTSTRATEGYWHENRAIN STREAKS ARE FAR AWAYFROM BEINGVERTICAL. THE BESTQUANTITATIVEVALUESARE

IN BOLDFACE

C. Oblique Rain Streaks

In this subsection, we examine the performance of our method with the shift strategy and other four methods, when

(12)

Fig. 14. From top to bottom are the rain streaks removal results, extracted rain streaks and corresponding error images by different methods on the video

“highway1” (top 3 row) and “highway2” (bottom 3 row), respectively. From left to right are: the rainy data, results by TCL [41], DDN [31], MS-CSC [47], DIP [44] , and FastDeRain with the shift strategy, and the ground truth.

the rain streaks are far away from being vertical. We simulated two rainy videos: one is rain streaks with angles varying in [15^◦,35^◦] added to the video “waterfall” (captured by a dynamic camera); another one is rain streaks with angles varying in[35^◦,55^◦]added to the video “highway” (captured by a static camera). As shown in Table IV and Fig. 14, the shift strategy helped our method to obtains the best results when dealing with the oblique rain streaks. The superior of the proposed FastDeRain is obvious both quantitatively and visually.

V. CONCLUSION

We have proposed a novel video rain streaks removal approach: FastDeRain. The proposed method, based on directional gradient priors in combination with sparsity, outperforms a series of state-of-the-art methods both visually and quantitively. We attribute the outperforming of FastDeRain to our intensive analysis of the characteristic priors of rainy videos, clean videos and rain streaks. Besides, it notable that our method is markedly faster than the compared methods, even including a every fast single-image-based method. Our method is not without limitation. The natural rainy scenario is sometimes mixed with haze, and how to handle the residual rain artifacts remains an open problem. These issues will be addressed in the future.

ACKNOWLEDGMENT

The authors would like to express their sincere thanks to the editor and referees for giving us so many valuable comments and suggestions for revising this paper. They would like to thank Dr. Xueyang Fu and Dr. Minghan Li for their generous sharing of their codes.

REFERENCES

[1] Y. Li, R. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog- nit. (CVPR), Jun. 2016, pp. 2736–2744.

[2] Y.-F. Liu, D.-W. Jaw, S.-C. Huang, and J.-N. Hwang, “DesnowNet:

Context-aware deep network for snow removal,” IEEE Trans. Image Process., vol. 27, no. 6, pp. 3064–3073, Jun. 2018.

[3] W. Ren et al., “Deep video dehazing with semantic segmentation,” IEEE Trans. Image Process., to be published, doi:10.1109/TIP.2018.2876178.

[4] B. Li et al., “Benchmarking single-image dehazing and beyond,” IEEE Trans. Image Process., vol. 28, no. 1, pp. 492–505, Jan. 2019.

[5] R. Liu, X. Fan, M. Hou, Z. Jiang, Z. Luo, and L. Zhang, “Learning aggregated transmission propagation networks for haze removal and beyond,” IEEE Trans. Neural Netw. Learn. Syst., to be published, doi:10.1109/TNNLS.2018.2862631.

[6] T. Bouwmans, “Traditional and recent approaches in background mod- eling for foreground detection: An overview,” Comput. Sci. Rev., vol. 11, pp. 31–66, May 2014.

[7] M. S. Shehata et al., “Video-based automatic incident detection for smart roads: the outdoor environmental challenges regarding false alarms,”

IEEE Trans. Intell. Transp. Syst., vol. 9, no. 2, pp. 349–360, Jun. 2008.

[8] X. Zhang, C. Zhu, S. Wang, Y. Liu, and M. Ye, “A Bayesian approach to camouflaged moving object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 9, pp. 2001–2013, Sep. 2017.

[9] C. Ma, Z. Miao, X.-P. Zhang, and M. Li, “A saliency prior context model for real-time object tracking,” IEEE Trans. Multimedia, vol. 19, no. 11, pp. 2415–2424, Nov. 2017.

[10] K. Garg and S. K. Nayar, “Vision and rain,” Int. J. Comput. Vis., vol. 75, no. 1, pp. 3–27, 2007.

[11] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.

[12] L.-W. Kang, C.-W. Lin, and Y.-H. Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1742–1755, Apr. 2012.

[13] S.-H. Sun, S.-P. Fan, and Y.-C. F. Wang, “Exploiting image structural similarity for single image rain removal,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2014, pp. 4482–4486.

[14] Y.-L. Chen and C.-T. Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in Proc. IEEE Int. Conf.

Comput. Vis. (ICCV), Dec. 2013, pp. 1968–1975.

[15] J. Chen and L.-P. Chau, “A rain pixel recovery algorithm for videos with highly dynamic scenes,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1097–1104, Mar. 2014.

[16] D.-Y. Chen, C.-C. Chen, and L.-W. Kang, “Visual depth guided color image rain streaks removal using sparse coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 8, pp. 1430–1455, Aug. 2014.

[17] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in Proc. IEEE Int. Conf. Comput.

Vis. (ICCV), Dec. 2015, pp. 3397–3405.

[18] C.-H. Son and X.-P. Zhang, “Rain removal via shrinkage of sparse codes and learned rain dictionary,” in Proc. IEEE Int. Conf. Multimedia Expo Workshops (ICMEW), Jul. 2016, pp. 1–6.

[19] H. Zhang and V. M. Patel, “Convolutional sparse and low-rank coding- based rain streak removal,” in Proc. IEEE Winter Conf. Appl. Comput.

Vis. (WACV), Mar. 2017, pp. 1259–1267.

[20] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Single image rain streak decomposition using layer priors,” IEEE Trans. Image Process., vol. 26, no. 8, pp. 3874–3885, Aug. 2017.

[21] L. Zhu, C.-W. Fu, D. Lischinski, and P.-A. Heng, “Joint bi-layer optimization for single-image rain streak removal,” in Proc. IEEE Int.

Conf. Comput. Vis., Oct. 2017, pp. 2545–2553.

[22] B.-H. Chen, S.-C. Huang, and S.-Y. Kuo, “Error-optimized sparse representation for single image rain removal,” IEEE Trans. Ind. Electron., vol. 64, no. 8, pp. 6573–6581, Aug. 2017.

[23] S. Gu, D. Meng, W. Zuo, and L. Zhang, “Joint convolutional analysis and synthesis sparse representation for single image layer separation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1717–1725.

[24] Y. Chang, L. Yan, and S. Zhong, “Transformed low-rank model for line pattern noise removal,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Oct. 2017, pp. 1735–1743.

[25] L.-J. Deng, T.-Z. Huang, X.-L. Zhao, and T.-X. Jiang, “A directional global sparse model for single image rain removal,” Appl. Math. Model., vol. 59, pp. 662–679, Jul. 2018.

[26] S. Du, Y. Liu, M. Ye, Z. Xu, J. Li, and J. Liu, “Single image deraining via decorrelating the rain streaks and background scene in gradient domain,”

Pattern Recognit., vol. 79, pp. 303–317, Jul. 2018.