Laplacian Pyramid Networks: A New Approach for Multispectral Pansharpening

(1)

Laplacian Pyramid Networks: A New Approach for Multispectral Pansharpening

Cheng Jin^a, Liang-Jian Deng^b,∗, Ting-Zhu Huang^b, Gemine Vivone^c

aSchool of Optoelectronic Science and Engineering, University of Electronic Science and Engineering of China, Chengdu, 611731, China

bSchool of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, 611731, China

cInstitute of Methodologies for Environmental Analysis, CNR-IMAA, 85050 Tito Scalo, Italy.

Abstract

Pansharpening is about fusing a high spatial resolution panchromatic image with a simultaneously acquired multispectral image with lower spatial resolution. In this paper, we propose a Laplacian pyramid pansharpening network architecture for accurately fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image, aiming at getting a higher spatial resolution multispectral image. The proposed architecture considers three aspects. First, we use the Laplacian pyramid method whose blur kernels are designed according to the sensors’ modulation transfer functions to separate the images into multiple scales for fully exploiting the crucial spatial information at different spatial scales. Second, we develop a fusion convolutional neural network (FCNN) for each scale, combining them to form the final multi-scale network architecture. Specifically, we use recursive layers for the FCNN to share parameters across and within pyramid levels, thus significantly reducing the network parameters.

Third, a total loss consisting of multiple across-scale loss functions is employed for training, yielding higher accuracy.

Extensive experimental results based on quantitative and qualitative assessments exploiting benchmarking datasets demonstrate that the proposed architecture outperforms state-of-the-art pansharpening methods.

Keywords: Laplacian Pyramid, Modulation Transfer Function, Convolutional Neural Network, Pansharpening, Image Fusion, Machine Learning, Remote Sensing.

1. Introduction

Thanks to a wide range of applications, such as medicine [1], sociology [2], ecology [3], and other fields, pansharpening has drawn loads of attention from the scientific community. This can be corroborated by the organization of the data fusion contest from the IEEE Geoscience and Remote Sensing Society in 2006 [4] and a huge recent literature about this topic [5, 6, 7]. Pansharpening is about merging a high resolution panchromatic (PAN) and a low resolution multispectral (LRMS) images, which can be easily obtained by several satellite sensors like WorldView-3, Quick- Bird, and GaoFen. As Fig. 1 illustrates, the goal of pansharpening is to yield a high resolution multispectral (HRMS) image. Besides, pansharpening algorithms have gained much interest from commercial en- terprises. For instance, famous software, such as ENVI

∗Corresponding author: Liang-Jian Deng

Email addresses:[email protected](Cheng Jin),[email protected](Liang-Jian Deng), [email protected](Ting-Zhu Huang),

[email protected](Gemine Vivone)

and ERDAS, utilizes fusion techniques to deal with image enhancement problems [8]. Moreover, pansharpening has been considered a preliminary step for several image processing tasks, e.g., change detection [9], which demonstrates the important role of pansharpening in practical applications.

Regarding pansharpening algorithms, we may cat- egorize them into four classes, i.e., component substitution (CS) approaches, multi-resolution analysis (MRA) methods, variational optimization-based (VO) approaches, and deep learning-based (DL) techniques [7]. In this work, we mainly focus on developing a new deep learning network to address the pansharpening problem.

The first two categories are CS and MRA methods which obtain the final pansharpened outcome from the perspective of details injection, see, e.g., [10, 12, 13, 14, 15]. The former approach is based on the substitution of a component of the spectral transformed multispectral (MS) image with the PAN image, see, e.g., the band- dependent spatial-detail with local parameter estimation (BDSD) [10], the robust band-dependent spatial-

(2)

Figure 1: First row: the schematic diagram of pansharpening, where MS∈R^m×n×b, PAN∈R^M×N×1and GT∈R^M×N×b, M =4m,N=4n,bis the number of bands of the MS image.

Second row: the pansharpened results by a state-of-the-art traditional method, BDSD [10], a deep learning approach, DMD- Net [11], and the proposed method. Third row: the residual maps between BDSD, DMDNet, the proposed approach, and the ground-truth (GT). It is clear that our method obtains a better residual map, thus showing a better visual quality.

detail (BDSD-PC) [16], and the PRACS approach [12].

The MRA class relies upon the injection of the spatial details of the PAN image into the upsampled LRMS image to obtain a high spatial resolution MS image, see, e.g., the smoothing filter-based intensity modulation (SFIM) [17], the generalized Laplacian pyramid (GLP) [18, 19, 20], the GLP with full-scale regression (GLP-Reg) [21], the deconvolution based methods [22, 23], and the bilinear filtering [24]. Hybrid CS/MRA approaches can also be found in the related literature [14, 25]. In general, the two types of methods could get quite fast computation and promising outcomes, but, sometimes, may lead to slight spatial and spectral distortions.

The third category, the so-called VO class, is based on the formulation of models with proper regularizers to address the pansharpening problem, then reconstructing the high-resolution outcomes by designing algorithms for the given models. The whole process can be seen from a mathematical point of view as the reconstruction of incomplete complementary observations of multi- channel data. In general, it mainly contains Bayesian-

, Sparse Reconstruction (SR)-, and Model-based Opti- mization (MBO) techniques, whose generally utilizes the regularization-based technique to address this chal- lenge, see, e.g., [26, 27, 28, 29, 30]. In [31], Ballester et al. proposed the P+XS method, which obtained the spectral information for the fused image under the as- sumption that the PAN image can be approximated as a linear combination of the high resolution multispectral bands. In [32], Yokoya et al. employed the coupled nonnegative matrix factorization (CNMF) unmixing to deal with the pansharpening problem, which could pro- duce high quality fusion results, both spectrally and spa- tially. In [33], Deng et al. proposed a reproducible kernel Hilbert space and Heaviside based framework to address the task of pansharpening. Recently, Wu et al. in [34] proposed a new DL-based VO scheme that could benefit from both traditional VO methods and the outcome of DL-based approaches, thus getting both competitive results and generalization ability. Especially, some recent hyperspectral and multispectral image fusion approaches can be also applied to the task of pansharpening, see e.g., [35, 36]. More recently, this field of pansharpening has evolved into more specific applications, i.e., in cloud-contaminated circumstances [37], where X. Meng et al. proposed a variational-based integrated pansharpening model specifying in cloud con- tamination scenarios. Although VO-based techniques can obtain competitive results, they are sometimes limited, e.g., the use of many hyperparameters, the insuf- ficient feature representation and extraction that could result in spatial and spectral distortions.

The fourth category is represented by the DL-based methods, which are mainly based on a feature extraction/representation phase, the training of network parameters, and the testing on real-world data. These methods are often far from a theoretical development and require enormous datasets to train the underlying parameters, but they have been widespread used (including but not limited to image super-resolution, pat- tern recognition, and image classification), often showing state-of-the-art performance. In particular, many satisfactory results have been obtained by deep learning networks for pansharpening [38, 39, 40] [41, 42]. In [43] , Masi et al. employed first a convolutional neural network (CNN) with three layers to deal with the task of multispectral image pansharpening, obtaining excellent pansharpened results comparing with traditional state- of-the-art approaches. In [44], Fu et al. developed a deep CNN architecture with a high-pass filtering technique to fuse the PAN and the LRMS images, which obtains state-of-the-art pansharpening outcomes. Besides, He et al. [45] followed physical concepts to design a

(3)

CNN based method for pansharpening relied upon the spatial detail injection framework. In [11], Fu et al.

elaborated their high-pass filtering network design with a grouped multiscale network structure. In [46], Deng et al. proposed two network architectures, named CS- Net and MRA-Net, respectively, according to the traditional CS and MRA fusion equations. To limit the drawbacks of the two networks, they further proposed a simple but effective fusion network, called FusionNet, to yield state-of-the-art pansharpening outcomes. More recently, Generative Adversarial Networks (GAN) and unsupervised training have teemed with the pansharpening field thanks to their novelty and powerfulness.

Literature like [47, 48] demonstrates the promising feature of these method. For improving the effectiveness of this type of methods, the solutions mainly focus on two broad aspects. One is to make more generalized datasets for network training, e.g. [49]. The other mainly fo- cuses on the amelioration of the network architecture.

Although these methods have achieved excellent performance, there is still room for improvement by con- sidering the following points. First of all, the multi- scale property plays an important role in resolution enhancement applications, see, e.g., the image super- resolution issue [50]. It is important to design more effective multi-scale network architectures based on widely used multi-scale structures, see, e.g., the state- of-the-art Laplacian pyramids. Second, for the specific pansharpening problem, the Modulation Transfer Func- tion (MTF) is a reasonable way to describe the physical procedure of image capture, thus its consideration in the datasets simulation is expected to get better performance.

Motivated by the above-mentioned points, we propose, in this work, a new network architecture that is able to deal with these aspects. Hence, a novel Lapla- cian pyramid pansharpening network (LPPN) architecture is considered. The given network accounts for both the image multi-scale information and the sensors’ MTFs. To exploit the multi-scale information, we use the Laplacian pyramid to decompose the original PAN image and the upsampled LRMS image in several scales, then designing the corresponding sub-networks for each image scale and incorporating them into a fusion convolutional neural network (FCNN). Due to the use of multi-scale sub-networks, the final loss function is represented by a combination of multipleℓ₂loss functions. Moreover, we also introduce recursive blocks into our sub-networks, aiming to reduce the number of network parameters. Additionally, unlike the classical Laplacian pyramid that uses a fixed Gaussian kernel for all the image channels, our approach employs the spe-

cific sensors’ MTFs, which vary along the spectral dimension (i.e., different Gaussian kernels for different spectral bands). Extensive experiments on reduced and full resolution datasets demonstrate that the proposed method gets the best quantitative and qualitative performance compared with state-of-the-art pansharpening approaches.

The contributions of this paper can be summarized as follows:

• To the best of our knowledge, this is the first work merging both the recursive sub-network and Lapla- cian pyramids to address the pansharpening task.

Unlike the classical Laplacian pyramid using a fixed Gaussian kernel for all the spectral channels, the Laplacian pyramid used in our work takes ad- vantage of the specific sensors’ MTFs for image scale decomposition, which can yield significant improvements.

• The exploitation of Laplacian pyramids allow us to develop multi-scale structured sub-networks improving the capability of managing different spatial details at the feature extraction stage. Besides, we also build multiple loss functions to describe the information loss for each scale, which can aid the image details recovery at different scales.

• The recursive block is used within each sub- network to effectively decrease the number of network parameters. Furthermore, its use can increase the depth of the sub-networks. Thus, the proposed approach can be seen as a lightweight network for pansharpening. For example, our method only in- volves about 50,000 network parameters to achieve the state-of-the-art performance on WorldView-3 datasets, which is significantly far away from the compared approaches.

The remaining of the paper is as follows. Firstly, the related works and motivations under the development of the proposed method are reported in Sect. 2. Sect. 3 is devoted to the presentation of the proposed method, including the network architecture, the loss function, the training details, and so forth. Sect. 4 shows the quantitative and qualitative outcomes. Finally, conclusions are drawn in Sect. 5.

2. Related Works and Motivations

Laplacian pyramid decomposition (LPD) and its multi-scale structure, which are suitable for image resolution enhancement tasks. Recently, some pansharpening algorithms are proposed from the perspective of

(4)

constructing multi-scale structures, see, e.g., [51], obtaining promising outcomes. For instance, Yuan et al.

[52] formulated a novel pansharpening approach by em- ploying the multi-scale convolutional kernels, which could fully widen the network and effectively extract the image features on different scales. In [53], although the authors mentioned the words “Laplacian pyramid”

in their work, they did not really exploit the multi-scale LPD for the task of pansharpening, which limited the final pansharpening performance. Different from the introduced multi-scale methods, here we accordingly re- sort to the classical LPD with some special operations in pansharpening, e.g., LPD with MTF, to formulate an effective and lightweight deep CNN with a multi-scale structure, aiming to yield state-of-the-art pansharpening results. In this section, we will introduce first the Gaussian pyramid decomposition (GPD) and the related LPD. Afterwards, we will point out the motivations under the development of the proposed method.

2.1. GPD and LPD

Proposed by H. Olkkonen et al. [54], the GPD is about the reconstruction of the image by the Gaussian expand wavelet transform for multi-resolution analysis of images. It aims to use Gaussian kernels to create a series of images at different scales. The general GPD equation is as follows:

G₁(I)=I,

G_i(I)=(k⊗G_i−1(I))↓²,i=2,· · ·,S, (1) where G_i stands for the i-th layer of the Gaussian pyramid, Iis the original image,⊗denotes the convolution operation,↓² indicates the downsampling with a scale factor of 2,krepresents the Gaussian kernel, andS is the total number of layers. It is worth to be remarked thatkis fixed for each channel of an MS image.

The LPD is a bandpass image decomposition derived from the GPD. It is originally proposed by Burt and Adelson [55] before multiresolution wavelet analysis was introduced. More in detail, the LPD technique is a multi-resolution image representation obtained through a recursive reduction of the set of data.

The general LPD processing equation is as follows:

L_S(I)=G_S(I),

L_i(I)=G_i(I)−G_i+1(I)↑²,i=S −1,· · ·,1, (2) whereL_istands for thei-th layer of the LPD,G_iis the outcome of thei-th layer of the GPD, and↑²indicates the upsampling with a scale factor of 2. Please, refer to Fig. 2 for a better understanding.

2.2. Motivations

Since conventional deep learning networks do not often exploit multi-scale information, the proposed work is motivated by the introduction of this kind of information. In particular, the pyramid structure achieves a multi-scale representation in image processing to por- tray global and local information in a better way. This property inspired us to utilize the multi-scale spatial representation structure of the LPD to design an end-to-end deep neural network structure capable of progressively restoring more image information to get a better performance.

It is important to note that conventional GPD methods mainly utilize a fixed Gaussian kernel for each channel of an MS image (e.g., an RGB image) to proceed with the convolution operation. However, this approach fails to meet the physics principles of pansharpening since the MS sensors’ point spread functions, which model the spatial responses of each spectral channel (also called band), are not the same along the spectral dimension. In other words, the Gaussian kernels related to the spectral bands of an MS image should be different from each other. They are about a special function, i.e., the MTF. Thus, we exploit the MTF to generate the specific kernels for every band of the MS image in order to implement the GPD. For more details, the interesting readers can refer to Sect. 3.1.1.

3. The Proposed Method

In this section, we describe the design of the proposed LPPN. The whole procedure is depicted in Fig. 3. The architecture of the network, the adopted loss function, and the training details are introduced in the remaining of this section. For simplicity, we denote the PAN image as Pand the MS image asM.

3.1. Network Architecture

This section is devoted to the detailed description of the proposed network architecture.

3.1.1. GPD with MTF

The Modulation Transfer Function (MTF) is used to model the magnitude response of the optical system at different spatial frequencies. In pansharpening, it has widespread used to design filters for image degradation. The sensors’ MTFs are usually band-dependent, thus generating filters that have different Gaussian kernels for each spectral band. In this work, our goal is to address a resolution enhancement issue involving natural images, thus motivating us to implement Laplacian

(5)

Figure 2: Schematic diagram of the Gaussian pyramid decomposition (GPD) and the Laplacian pyramid decomposition (LPD) for S=5.

pyramids using the MTFa prioriinformation. For more details, the interesting readers can refer to [6, 56].

3.1.2. LPD with MTF

Multi-scale information is essential for image resolution enhancement applications. Thus, we want to utilize the LPD, a well-established technique for producing multi-scale images, to recover more image details. In what follows, we will introduce our MTF LPD, which performs first the MTF GPD into several scales of the original PAN image, the upsampled LRMS image, and the ground-truth (GT) multispectral image, thus getting the corresponding LPD components using (2).

To match the spatial resolution, we employ the trans- posed convolution to upsampleM with a factor 4 (denoted asM) to the PAN,e P, scale. Then, we separately decomposePandMe into theirS-layer pyramids, which is processed by the following procedure:

a) We generate convolution kernels according to the MTF information. P and Me have corresponding MTF functions (readers can find the kernels by having a look at the codes MTF-PAN and MTF, which are available in the pansharpening toolbox¹). In particular, the MTF kernel for P is denoted as k_P. Regarding to theb-th band of M, we separately indicate each band’s kernel ase k₁,k₂,· · ·,k_b, respectively.

b) Compute the GPD with MTF for P andM. Lete G_i(P), G_i(M) denote the MTF-decomposed PANe

1http://openremotesensing.net/knowledgebase/

a-critical-comparison-among-pansharpening-algorithms/

and upsampled LRMS images of thei-th Gaussian pyramid layer (i=1,2,· · ·,S), and setG₁(P)=P, G₁(M)e =M. Afterwards, we respectively computee the output image at each GPD layer for P

G₁(P)=P,

G_i(P)=(k_P⊗G_i−1(P))↓²,i=2,· · ·,S, (3) and forMe

G₁(M)e =M,e

G_i(Mej)=(k_j⊗G_i−1(Mej))↓²,i=2,· · ·,S. (4) where ⊗ denotes the convolution operation, ↓² stands for the downsampling operation with a scale factor of 2, and j=1,2,· · ·,bis the band index.

c) Compute LPD with MTF forPandMebased on the obtained GPD in (3) and (4), i.e.

L_S(P)=G_S(P),

L_i(P)=G_i(P)−G_i+1(P)↑²,i=S −1,· · ·,1, (5) and

L_S(M)e =G_S(M)e ,

L_i(Me_j)=G_i(Me_j)−G_i+1(Me_j)↑²,i=S −1,· · ·,1, (6) where↑²stands for the upsampling operation with a scale factor of 2, and j =1,2,· · ·,bis the band index.

More details about Laplacian pyramids can be found in [55].

(6)

Figure 3: The network architecture of the proposed Laplacian pyramid pansharpening network (LPPN) for a resolution ratio between PAN and MS equal to 4.

In summary, the whole procedure of generating of our LPD with MTF forPandMecan be found in (5) and (6).

Please, refer also to Fig. 4 for the LPD and the GPD for PandM.e

3.1.3. Fusion Convolutional Neural Network (FCNN) After getting the multi-scale LPD components of P andM, we consider them into the designed fusion con-e volutional neural network (FCNN, denoted asNe t⁽ⁱ⁾(θ) in Fig. 3) for each scale to extract features. Thus, our final architecture is the combination of multiple sub- networks.

More in detail, we concatenate first the L_i(P) and L_i(M) at thee i-th layer of LPD, after taking them into account at the corresponding sub-networkNe t⁽ⁱ⁾(θ) and producing the output. Furthermore, the output is op- erated by the ReLU activation function and upsampled

exploiting a factor of 2. Finally, it is transmitted to the output of the next layer.

For each sub-networkNe t⁽ⁱ⁾(θ) (i.e., each FCNN), we mainly employ the residual learning [57] and recursive blocks [58] for enhancing the performance of the network, which separately play the role of improving accuracy and reducing the number of parameters. Please, see Fig. 5 for the structure of the FCNN. More details on theNe t⁽ⁱ⁾(θ) sub-network are pointed out as follows.

a) Feature Extraction: The goal of this step is to ac- quire the feature maps of the corresponding inputs.

For higher layers, we take fewer kernels. In this work, we apply 2^S⁻ⁱ⁺¹kernels with a 3×3 size for convolution operations to the concatenated input.

b) Recursive Blocks:To achieve the parameter reduction and improve the accuracy, we employ several

(7)

Figure 4: Detailed illustration of the generation of the i-th Laplacian pyramid layer of Me and P (exploiting an exemplary WorldView-3 data withS = 5 and resolution ratio between the products to be fused equal to 4). G_i represents thei-th layer of the Gaussian pyramid. Note that we do not use the classical Gaussian pyramid, rather we use MTF-based filters to generate Laplacian pyramid layers.

recursive blocks. As the gradient vanishing and exploding in training of deep models, the shared- source skip-connection residual learning is incor- porated into our sub-network, whose accuracy is proven in [50]. The goal of our recursive block setting is to share the network parameters across each recursive block and between each convolutional layer. Therefore, a single set of parameters is capable of building multi-layer sub-networks.

c) Output Reconstruction:After recursive blocks, the FCNN output for each layer is integrated to reconstruct the multiple outputs, which lead to multiple across-scale loss functions for training. In the testing phase, we get multiple outputs and the sole output at the first layer, i.e.,Out⁽¹⁾(θ), represents the outcome of the proposed approach.

Fig. 3 sums up the whole proposed architecture. It mainly includes several sub-networks to progressively fuse and enhance the spatial details. Please, see Fig.

6 to get more insights about how the proposed network architecture works and its effectiveness on an exemplary real-world test case.

3.2. Loss Function

Due to the usage of the LPD, we will have multi-scale output images. For each image, we build one FCNN to train the network parameters. Therefore, the final loss function is the combination of the different loss functions. The final loss function is defined as follows:

L(θ⁽ⁱ⁾)= XS

i=1

kOut⁽ⁱ⁾(θ)−G_i(GT)k²_F, (7) whereOut⁽ⁱ⁾(θ) is obtained byF_θ(i)

L_i(P),L_i(M)e and its subsequent operations (please see Fig. 3 for more details). In particular,iindicates thei-th layer of LPD and θstands for the network parameters. Besides,G_i(GT) is thei-th layer of the GT image obtained via the GPD and k · kF is the Frobenius norm. In this work, we empirically setS equal to 5. It is worth to be pointed out that

(8)

Figure 5: The flowchart of the fusion convolution neural network (FCNN). In which, the output channels are set as 2^S⁻ⁱ⁺¹, where Srepresents the total number of LPD layers, and theiis the corresponding LPD layer number.

Figure 6: The outputs of all theS layers of our LPPN, i.e., Out⁽ⁱ⁾(θ),i=1,2,· · ·,S (from right to left) withS =5 on an exemplary real-world test case.

in order to have a robust layer-by-layer injection procedure, the use of multiple losses to constrain the distance between the output image of a layer and the corresponding target (GT) image is advisable and commonly used in the related literature. If the loss function is only about the layer 1 (i.e., the final layer), the control of each layer of the network could be not easy, making the training a hard task. Moreover, although all the loss functions in (7) have equal weights, the more the training time the higher the impact of the first layer on the final solution.

3.3. Generation of Training Data

For the generation of training samples, we simu- lated three datasets acquired from the WorldView-3, the QuickBird, and the Gaofen-2 sensors. The simulation way for the three datasets is the same. We consider here the WorldView-3 datasets as an exemplary case. The WorldView-3 satellite datasets can be freely downloaded². The same way as in [46] is exploited

2https://www.maxar.com/product-samples/

to simulate the training/validation/testing datasets getting 12,580 PAN/LRMS/GT patch pairs of size 64×64, 16 × 16 × 8, and 64 × 64 × 8, respectively. Af- ter that, these datasets are divided into the 70/20/10%

for training (8,806 examples)/validation (2,516 examples)/testing (1,258 examples). We simulate the LRMS, the PAN, and the GT images according to Wald’s protocol [59] due to the unavailability of the GT image.

The upsampled LRMS image is obtained via a polyno- mial kernel with 23 coefficients [18], called EXP from hereon. Please, find more details on the implementation of Wald’s protocol in [46].

4. Experimental Results

In this section, we compare the proposed LPPN method with some recent state-of-the-art pansharpening approaches belonging to CS, MRA, VO, and DL categories. The employed sensors, the benchmarking methods, and the adopted quality indexes are described first.

Afterwards, the experimental analysis both at reduced and full resolutions is reported.

4.1. datasets

Several datasets have been acquired by the WorldView-3 sensor, which simultaneously cap- tures a high resolution PAN channel and eight MS bands. Four standard colors (red, green, blue, and near-infrared 1) and four new bands (coastal, yellow, red edge, and near-infrared 2) are acquired. The images are distributed with a pixel size of 0.3m and 1.2m for PAN and MS, respectively. The spatial resolution ratio is equal to 4. The radiometric resolution is 11 bits.

Different from WorldView-3, the images obtained by QuickBird and GaoFen-2 sensors consist of four MS

(9)

Table 1: Details of experimental datasets

Satellite Spatial Spectral Radiometric Themiatic Original data

Sensors Dimension Dimension Resolution Scenes Volume

WorldView-3 PAN 0.3m One band

11 bits 6

MS 1.2m Eight bands mixed

QuickBird PAN 0.6m One band

11 bits (urban, vegetation,

MS 2.4m Four bands water scenario) 1

GaoFen-2 PAN 1m One band

10 bits 1

MS 4m Four bands

bands and one PAN channel. The spatial resolution ratio is equal to 4, again. In particular, QuickBird and GaoFen-2 data have a spatial resolution of about 0.6m and 1m for the PAN channel, respectively. Moreover, they have a radiometric resolution of 11 bits and 10 bits, respectively. Please refer to Tab. 1 for the summary of the experimental datasets.

4.2. Benchmark

The benchmark consists of one representative CS based method (i.e., BDSD [10]), one representative MRA based methods (i.e., MTF GLP CBD [56], denoted as GLP CBD from hereon for saving space in the tables), two regularization-based (VO) methods (i.e., CNMF [32] and CVPR19 [60]), and three state-of-the- art DL methods (i.e., PanNet[44], DiCNN [45], and DMDNet [11])³.

For fair comparison, all the compared DL-based methods (i.e., LPPN, DiCNN, PanNet, and DMDNet) are trained on the same training data using Python 3.7.4 with Tensorflow 1.14.0 on a desktop computer equipped with a Linux operating system and a GPU NVIDIA GeForce GTX 2080Ti with 11GB.

4.3. Quality Assessment Indexes

For quantitative evaluation, we adopt the spectral an- gle mapper (SAM) [61] to evaluate the spectral quality, the erreur relative globale adimensionnelle de synth`ese (ERGAS) index [62] as an extension of the root mean square error for multidimensional arrays, the spatial cor- relation coefficient (SCC) [63] in order to assess the

3The source codes of BDSD, GLP CBD and CNMF can be downloaded at the website http://openremotesensing.net/kb/codes/

pansharpening/. Additionally, the source codes of CVPR19 and Pan- Net can be downloaded at the website https://xueyangfu.github.io/.

Instead, the source codes of DiCNN and DMDNet are not available online, thus we re-implemented them by ourselves using the default parameters indicated in the related papers to ensure their best performance.

spatial quality, and two universal image quality indexes [64], Qavg (an average version of the Q index along the spectral bands) and Q2ⁿ (Q4 and Q8 for four and eight bands datasets, respectively) representing the multidimensional extension of the Q index [65, 66]. These indexes can be used when a reference (GT) image is available (i.e., at reduced resolution). Instead, when we need to assess the performance at full resolution, quality without reference indexes should be used [67, 68, 69].

In this paper, the quality with no reference (QNR) [67]

index is exploited. It is obtained by the combination of the spatial distortion index,DS, and the spectral distortion index, Dλ. The ideal values for SCC, QNR, Qave, and Q4/Q8 are 1. Instead, for ERGAS, SAM,Dλ, and DS are 0. Furthermore, we exhibit the average running time for each fusion method, denoted as A.T. in seconds in the results tables.

4.4. Parameters Tuning

In our LPPN network, we empirically employ the Adam [70] approach with a learning rate equal to 0.003 in order to minimize the loss function in (7). The number of iterations for the training step is 1×10⁵and the batch size is 32. Additionally, we set the kernel size of all filters as 3×3. In particular, for the setting of the other compared methods, we use the default asset pointing out in the related papers or source codes. In the DiCNN case, the batch size is set to 64 and the number of iterations is 3×10⁵. Instead, for PanNet and DMD- Net, the batch size is set to 32 and the number of iterations is 255,000. Under the use of the above-mentioned settings, the three methods can achieve their best performance.

4.5. Reduced Resolution Assessment

In this section, we assess the qualitative and quantitative performance of the compared methods on the reduced resolution datasets. The process for simulating the testing data is the same as that of the training data

(10)

Figure 7: The visual comparisons on a reduced resolution WorldView-3 case (depicted bands: 1, 3 and 5). First two rows: The fusion results by means of BDSD, CNMF, GLP CBD, CVPR19, DiCNN, PanNet, DMDNet, and Proposed LPPN. Third and fourth rows: The corresponding residual maps using the GT image as reference. To aid the visual inspection, we display the residual maps obtained by the analysis of the third spectral band.

Table 2: Quality assessment at reduced resolution on 1258 WorldView-3 test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.

Method SAM ERGAS SCC Q8 Qave A.T.

EXP 5.85±1.99 7.04±2.93 0.660±0.106 0.627±0.142 0.640±0.157 0.016 BDSD 6.90±2.73 5.15±2.27 0.878±0.075 0.817±0.118 0.823±0.119 0.019 CNMF 5.53±1.88 4.62±1.93 0.888±0.068 0.822±0.123 0.825±0.125 0.035 GLP CBD 5.29±1.96 4.16±1.77 0.890±0.070 0.854±0.114 0.849±0.123 0.033 CVPR19 5.21±1.87 5.14±2.12 0.867±0.604 0.793±0.123 0.788±0.130 1.731 DiCNN 4.25±1.35 3.05±1.06 0.945±0.047 0.893±0.118 0.908±0.115 0.838 PanNet 4.10±1.30 2.96±1.00 0.949±0.046 0.896±0.116 0.910±0.116 0.863 DMDNet 3.97±1.25 2.86±0.97 0.953±0.045 0.900±0.114 0.912±0.115 0.951 LPPN 3.90±1.29 2.64±0.96 0.955±0.045 0.913±0.111 0.913±0.114 0.977

(see Sect. 3.3 for details). More in detail, the spatial size of the testing PAN/LRMS/GT patch is 64×64, 16×16, and 64×64 for WorldView-3 datasets in Tab. 2. For Figs. 7-9, the testing images for WorldView-3, Quick- Bird and GaoFen-2 cases are with the spatial size of 256×256, 64×64, and 256×256, respectively. We tested the compared approaches on a large dataset consisting of 1258 test cases extracted from WorldView-3 data,

Table 3: Quality assessment at reduced resolution on 25 QuickBird test cases . The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.

and Tab. 2 exhibits the average performance and the corresponding standard deviations for all the compared methods. From Tab. 2, it can be readily got that our method obtains high performance for all the quality indexes. We depict a typical result in Fig. 7. It can be seen that BDSD and CNMF show greater spectral distortion.

GLP CBD and CVPR19 have various blurring defects, especially visible in the building area at the lower right

(11)

Figure 8: The visual comparisons on a reduced resolution QuickBird case (depicted bands: 1, 2 and 3). First two rows: The fusion results by means of BDSD, CNMF, GLP CBD, CVPR19, DiCNN, PanNet, DMDNet, and Proposed LPPN. Third and fourth rows:

The corresponding residual maps using the GT image as reference. To aid the visual inspection, we display the residual maps obtained by the analysis of the fourth spectral band.

Table 4: Quality assessment at reduced resolution on 25 GaoFen-2 test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.

corner. Furthermore, the other three DL-based methods, i.e., PanNet, DMDNet and DiCNN, exhibit competitive visual performance. However, they still fail to outperform our method. To highlight the differences, we depicted several magnified sub-regions among the compared methods. It is clear that our method obtains the best visual performance, closer to the GT image in spatial aspects, including sharper edges and clearer ob-

jects. This can be observed in the lower right area of the sub-regions.(e.g., see the orange roofs of the close-ups in Fig. 7). Moreover, we compute the residual maps between the comparison candidates and the ground-truth (GT) image. It is clear that the fusion image of the proposed approach is closer to GT image, getting a residual map close to zero almost everywhere.

In order to corroborate the results obtained on the WorldView-3 test cases, we assessed the performance of the compared approaches on data acquired by the QuickBird sensor (Indianapolis datasets) and GaoFen- 2 sensor (Beijing and Guangzhou datasets⁴). For the best performance, we adjusted its output channels in the head of each FCNN to 2^S⁻ⁱ⁺² for both datasets.

The deep learning-based methods are properly trained on training data acquired by these sensors. Similar to WorldView-3 testing cases, we also compare the approaches on larger datasets of 256×256 spatial size consisting of 25 and 25 test cases extracted from Quick- Bird and GaoFen-2 datasets, respectively. Tabs. 3 and

4Datasets from: http://www.rscloudmart.com/dataProduct/sample

(12)

Figure 9: The visual comparisons on a GaoFen-2 case (depicted bands: 1, 2 and 3). First two rows: The fusion results by means of BDSD, CNMF, GLP CBD, CVPR19, DiCNN, PanNet, DMDNet, and Proposed LPPN. Third and fourth rows: The corresponding residual maps using the GT image as reference. To aid the visual inspection, we display the residual maps obtained by the analysis of the fourth spectral band.

4 exhibit the average performance and the corresponding standard deviations for all the compared methods, and Figs. 8 and 9 show the typical corresponding performance of the compared methods. In the scenario of the QuickBird sensor, BDSD and CNMF demonstrate acceptable visual outcomes. However, GLP CBD and CVPR19 still exhibit blurring defects on the white roof- like object, thus decreasing the spatial performance.

For DL-based methods, the outcomes are still competitive. With the aid of the residual maps, we can see our method outperforms the other DL-based compared methods for its close degree of proximity to the GT image. In the GaoFen-2 test case, BDSD and CNMF show acceptable visual results, again. GLP CBD and CVPR19 still suffer from spatial distortion. However, in this case, the CNMF method demonstrates an evident spectral distortion. For DL-based methods, we can ob- serve that our method yields better outcomes with the aid of the residual maps. In both the figures, the proposed LPPN clearly shows its spatial advantages thanks to lower image residuals (see the close-ups in the related figures), thus very high performance of the proposed LPPN method can be easily observed.

Table 5: Quality assessment at full resolution on 200 WorldView-3 test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.

Method QNR Dλ DS A.T.

EXP 0.913±0.031 0.000±0.000 0.086±0.030 0.018 BDSD 0.893±0.032 0.033±0.013 0.077±0.027 0.035 CNMF 0.896±0.072 0.040±0.037 0.067±0.047 0.416 GLP CBD 0.920±0.050 0.028±0.024 0.055±0.032 0.079 CVPR19 0.932±0.023 0.012±0.006 0.057±0.019 8.964 DiCNN 0.953±0.036 0.018±0.020 0.030±0.020 0.861 PanNet 0.961±0.021 0.019±0.009 0.020±0.012 0.886 DMDNet 0.960±0.020 0.019±0.010 0.021±0.012 1.038 LPPN 0.963±0.023 0.018±0.012 1.089±0.013 1.082

4.6. Full Resolution Assessment

We also compare our LPPN approach with recent state-of-the-art pansharpening approaches on full resolution WorldView-3, QuickBird and GaoFen-2 data, whose PAN/LRMS patch is of spatial size 256×256 and 64×64, respectively. Due to the lack of a reference (GT) image, the QNR index is used instead of the quality indexes in Sect. 4.5. Tabs. 5-7 report the outcomes re-

(13)

Figure 10: The visual comparisons on a full resolution WorldView-3 case (depicted bands: 1, 3, and 5).

Table 6: Quality assessment at full resolution on 20 QuickBird test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.

spectively on 200 real WorldView-3, 20 real QuickBird and 20 GaoFen-2 examples, synthesizing them using the mean and the standard deviation operators. It is worth to be pointed out that the proposed method gets the best results related to the overall quality index at full resolution, QNR. The other two indexes, i.e., D_λ andD_S, are also close to the approaches that get the best performance. Moreover, we also show the visual comparison on a full resolution WorldView-3 datasets in Fig. 10, in which the LPPN approach yields more image details and sharper image edges than the compared methods.

4.7. Discussions

In this section, we will deeply discuss about the proposed architecture (i.e., the number of Laplacian layers, the effect of recursive blocks, the use of kernels, and loss functions) after comparing the convergence of different fusion methods. Note that for the discussion of this part, we take a WorldView-3 datasets (spatial size 256×256) as the test example for the sake of brevity.

Table 7: Quality assessment at full resolution on 20 GaoFen test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.

Figure 11: Convergence of the proposed network.

4.7.1. Convergence Analysis

We utilize the mean squared error (MSE) loss on WorldView-3’s 8,806 training data to compare the dif-

(14)

Figure 12: Altered ResNet block structure of the FCNN.

Table 8: Performance assessment by varying the layer number of the proposed Laplacian pyramid-based approach.

Layers SAM ERGAS SCC Q8 Qave

1 4.81 3.29 0.9584 0.9467 0.9480

2 4.54 3.18 0.9624 0.9496 0.9515

3 4.41 3.12 0.9627 0.9507 0.9520

4 4.40 3.11 0.9640 0.9509 0.9522

5 4.30 3.10 0.9657 0.9518 0.9536

6 4.52 3.16 0.9621 0.9506 0.9519

Table 9: Performance assessment by varying the FCNN structure.

Methods SAM ERGAS SCC Q8 Qave

Without recursive 4.83 3.54 0.9442 0.942 0.943 Fixed Gaussian kernel 4.66 3.21 0.9616 0.947 0.950

ℓ₁loss 4.51 3.18 0.9617 0.948 0.949

Proposed 4.30 3.10 0.9657 0.952 0.954

ferent network convergence among all the compared DL-based method, i.e., DiCNN, PanNet, DMDNet and our LPPN architecture. As shown in Fig. 11, our network has demonstrated a lower training error with less iterations.

4.7.2. The Influence of the Laplacian Pyramid Layer Number

A key parameter of Laplacian pyramids is the layer number. Since the maximum spatial size of our training images is 64×64,S can only vary from 1 to 6. Using the datasets depicted in Fig. 7, the results of our method by varying the layer number are reported in Tab. 8. It is easy to show that the best performance can be obtained by the 5 layers configuration, i.e. the one adopted in this paper.

4.7.3. FCNN Structure Discussion

After determining the network levels, we look at the sub-network level, that is the FCNN. We studied several configurations of the FCNN. In particular, we have:

a) With recursive Vs. without recursive blocks: In order to investigate the effects of the recursive blocks, we substituted the recursive blocks with ResNet [57] blocks in each FCNN, as depicted in Fig 12. Having a look at the first and fourth rows in Tab. 9, it is straightforward that the proposed architecture with recursive blocks generates better results than that of the one without recursive blocks.

Moreover, the amount of parameters related to the compared architectures is 174086 (without recursive) against 50706 (with recursive), and the average usage of GPU RAM is 1485 MB (without recursive) against 1453 MB (with recursive), consid- ering the test case depicted in Fig. 7, thus demon- strating the reduction of the number of parameters under the use of recursive blocks.

b) Fixed Gaussian Vs. MTF kernels: To verify our statement in Sect. 3.1.1, we change the shape of the filters by using classical fixed Gaussian kernel instead of MTF kernels to deal with the generation of the Laplacian pyramid. Having a look at the second and fourth rows Tab. 9, it is clear that the proposed architecture with MTF-based kernels used for convolution obtains the best quantitative performance.

c) ℓ₁Vs. ℓ₂loss functions: We also consider the performance of our network architecture withℓ₁ and ℓ₂ loss functions. The results shown in the third and fourth rows demonstrate that the proposedℓ₂ loss function is advisable.

4.7.4. Generalization Ability Assessment

We also evaluate the generation ability of our network architecture among the aforementioned fusion methods

(15)

Table 10: Comparison of the number of parameters (NoPs), total training time in seconds for all the CNN-based methods.

DiCNN PanNet DMDNet LPPN NoPs 1.5×10⁵ 2.5×10⁵ 3.1×10⁵ 0.5×10⁵ Time 3.7×10³ 4.8×10³ 1.2×10⁴ 7.4×10³

Table 11: Generalization ability assessment on the WorldView-2 dataset.

Method SAM ERGAS SCC Q4 Qave

EXP 8.24 8.88 0.4958 0.6468 0.6667

BDSD 8.42 6.30 0.7989 0.8400 0.8479

CNMF 7.41 6.29 0.8453 0.8298 0.8365

GLP CBD 7.77 6.31 0.8046 0.8370 0.8390

CVPR19 7.30 6.89 0.8144 0.7882 0.7976

DiCNN 6.80 5.54 0.8680 0.8493 0.8658

PanNet 6.52 5.40 0.8588 0.8643 0.8785

DMDNet 6.37 5.22 0.8648 0.8707 0.8824

LPPN 6.50 5.38 0.8742 0.8641 0.8725

by utilizing WorldView-2 datasets, whose number of spectral bands are equal to the WorldView-3 ones. The datasets can be freely downloaded from the same link as for the WorldView-3 datasets. More in detail, the testing WorldView-2 datasets have the PAN image of spatial size of 256×256 with a 4 PAN/MS spatial ratio. The outcomes are reported in Tab. 11. From the table, we can see that albeit the performance of our method does not outstrip the DMDNet, they are very close to the best.

4.7.5. Number of Parameters and Total Training Time Assessment

Lastly, we compare the number of parameters and total training time of the four DL-based algorithms. As a consequence of the use of the recursive structure in each FCNN, our complexity significantly outperforms the one of the other methods. Tab. 10 shows the comparison of the number of parameters (NoPs) for all the CNN methods on the WorldView-3 datasets. From this table, it is clear that the proposed LPPN method has only 0.5×10⁵ parameters, which is significantly less than the NoPs of DiCNN (1.8×10⁵), PanNet (2.5×10⁵), and DMDNet (3.2×10⁵). To the best of our knowledge, the proposed LPPN is one of the best lightweight network for pansharpening showing a small NoPs and also getting state-of-the-art performance. As for total training time, DiCNN method yields the shortest. We be-

lieved the reason for this is the relatively simple network structure, which only contains 3 convolutional blocks and a skip-connection composition.

5. Conclusions

In this paper, we proposed an efficient deep pyramid network architecture for pansharpening. The network architecture consists of three parts. The first one is the decomposition of the input image set into Lapla- cian pyramids using MTF-based kernels. Afterwards, these pyramids are fused by the fusion convolutional neural network. Finally, we reconstruct the multi-layer outputs comparing them with the reference (GT) data exploiting an ℓ₂ loss function in order to train the network. A broad experimental analysis demonstrates that the proposed approach outperforms the compared state- of-the-art pansharpening methods. Furthermore, some discussions about the network convergence, the number of Laplacian pyramid layers, the influence of the loss function, the use of recursive blocks, the generalization ability and so forth, are provided to the readers. Finally, an analysis on the number of parameters and total training time of the network pointed out that our LPPN is a lightweight pansharpening network getting state-of-the- art performance.

6. Acknowledgments

The work is supported by National Natural Science Foundation of China grants 61702083 and 61772003, and Key Projects of Applied Basic Research in Sichuan Province (Grant No. 2020YJ0216), and National Key Research and Development Program of China (Grant No. 2020YFA0714001).

References

[1] V. Søren and M. Joav, Meta-analysis of Positive Effects, Side Effects and Adverse Events of Holistic Mind-body Medicine (Clinical Holistic Medicine): Experience from Denmark, Swe- den, United Kingdom and Germany, International Journal of Adolescent Medicine and Health 21 (4) (2009) 441–456.

[2] Z. Yong, Recent Developments in Technology and Language Learning: A Literature Review and Meta-analysis, CALICO Journal (2003) 7–27.

[3] W. Xu, W. Yuan, W. Dong, J. Xia, D. Liu and Y. Chen, A Meta-analysis of the Response of Soil Moisture to Experimental Warming, Environmental Research Letters 8 (4) (2013) 044027.

[4] D. Mauro, P. Saurabh, P. Fabio, G. Paulo, C. Jocelyn and B.

Atli, Challenges and Opportunities of Multimodality and Data Fusion in Remote Sensing, Proceedings of the IEEE 103 (9) (2015) 1585–1601.

(16)

[5] T. Claire, R. Thierry, W. Lucien and C. Jocelyn, Synthesis of Multispectral Images to High Spatial Resolution: A Critical Re- view of Fusion Methods Based on Remote Sensing Physics, IEEE Transactions on Geoscience and Remote Sensing 46 (5) (2008) 1301–1312.

[6] G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, A.

Garzelli, G. A. Licciardi, R. Restaino and L. Wald, A Critical Comparison Among Pansharpening Algorithms, IEEE Transac- tions on Geoscience and Remote Sensing 53 (5) (2014) 2565–

2586.

[7] G. Vivone, M. Mura, A. Garzelli, R. Restaino, G. Scarpa, M. Ul- farsson, L. Alparone and J. Chanussot, A New Benchmark Based on Recent Advances in Multispectral Pansharpening: Re- visiting Pansharpening with Classical and Emerging Pansharp- ening Methods, IEEE Geoscience and Remote Sensing Maga- zine (2020, DOI: 10.1109/MGRS.2020.3019315).

[8] D. Di and W. Di, Comparisons of ERDAS and ENVI in The- matic Mapping, IEEE International Conference on Communi- cation Software and Networks (ICCSN) (2011) 517–520.

[9] S. Carlos, F. Laurel, S. Moreira and R. Dar, Mapping Forest Degradation in the Eastern Amazon from SPOT 4 through Spec- tral Mixture Models, Remote Sensing of Environment 87 (4) (2003) 494–506.

[10] A. Garzelli, F. Nencini and L. Capobianco, Optimal MMSE Pan Sharpening of Very High Resolution Multispectral Images, IEEE Transactions on Geoscience and Remote Sensing 46 (1) (2008) 228–236.

[11] X. Fu, W. Wang, Y. Huang, X. Ding and J. Paisley, Deep Multi- Scale Detail Networks for Multi-Band Spectral Image Sharpen- ing, IEEE Transactions on Neural Networks and Learning Sys- tems (2020, DOI: 10.1109/TNNLS.2020.2996498).

[12] J. Choi, K. Yu and Y. Kim, A New Adaptive Component- substitution Based Satellite Image Fusion by Using Partial Re- placement, IEEE Transactions on Geoscience and Remote Sens- ing 49 (2011) 295–309.

[13] A. Laben and V. Brower, Process for Enhancing the Spatial Resolution of Multispectral Imagery Using Pan-sharpening, US Patent 6011875 (2000).

[14] X. Otazu, M. González-Audicana, O. Fors and J. N úñez, Intro- duction of Sensor Spectral Response into Image Fusion Meth- ods. Application to Wavelet-based Methods, IEEE Transactions on Geoscience and Remote Sensing 43 (2005) 2376–2385.

[15] Z. Zhang, T. -Z. Huang and L. -J. Deng, Pansharpening via RoG-based Filtering, IEEE International Geoscience and Re- mote Sensing Symposium (IGARSS) (2019).

[16] G. Vivone, Robust Band-Dependent Spatial-Detail Approaches for Panchromatic Sharpening, IEEE Transactions on Geoscience and Remote Sensing 57 (9) (2019) 6421–6433.

[17] J. Liu, Smoothing Filter-based Intensity Modulation: A Spectral Preserve Image Fusion Technique for Improving Spatial Details, International Journal of Remote Sensing 21 (18) (2000) 3461–

3472.

[18] B. Aiazzi, L. Alparone, S. Baronti and A. Garzelli, Context- driven Fusion of High Spatial and Spectral Resolution Images Based on Oversampled Multiresolution Analysis, IEEE Trans- actions on Geoscience and Remote Sensing 40 (2002) 2300–

2312.

[19] R. Restaino, G. Vivone, P. Addesso and J. Chanussot, A Pan- sharpening Approach Based on Multiple Linear Regression Es- timation of Injection Coefficients, IEEE Geoscience and Remote Sensing Letters 17 (1) (2020) 102–106.

[20] G. Vivone, S. Marano and J. Chanussot, Pansharpening:

Context-Based Generalized Laplacian Pyramids by Robust Re- gression, IEEE Transactions on Geoscience and Remote Sens- ing 58 (9) (2020) 6152–6167.

[21] G. Vivone, R. Restaino and J. Chanussot, Full Scale Regression- based Injection Coefficients for Panchromatic Aharpening, IEEE Transactions on Image Processing 27 (7) (2018) 3418–

3431.

[22] G. Vivone, M. Sim˜oes, M. Dalla Mura, R. Restaino, J.

M. Bioucas-Dias, G. A. Licciardi and J. Chanussot, Pansharp- ening Based on Semiblind Deconvolution, IEEE Transactions on Geoscience and Remote Sensing 53 (4) (2015) 1997–2010.

[23] G. Vivone, P. Addesso, R. Restaino, M. Dalla Mura and J. Chanussot, Pansharpening Based on Deconvolution for Multi- band Filter Estimation, IEEE Transactions on Geoscience and Remote Sensing 57 (1) (2019) 540–553.

[24] G. Vivone and J. Chanussot, Fusion of Short-wave Infrared and Visible Near-infrared WorldView-3 Data, Information Fusion 61 (2020) 71–83.

[25] G. Vivone, L. Alparone, A. Garzelli and S. Lolli, Fast Repro- ducible Pansharpening Based on Instrument and Acquisition Modeling: AWLP Revisited, Remote Sensing 11 (19) (2019) 2315:1–2315:23.

[26] J. Saeedi and K. Faez, A New Pan-sharpening Method Using Multiobjective Particle Swarm Optimization and the Shiftable Contourlet Transform, ISPRS Journal of Photogrammetry and Remote Sensing 66 (3) (2011) 365–381.

[27] J. Liu, L. -J. Deng, F. Fang and T. Zeng, A Rudin-Osher-Fatemi Model-Based Pansharpening Approach Using RKHS and AHF Representation, East Asian Journal on Applied Mathematics 9 (2019) 13–27.

[28] L. -J. Deng, M. Feng and X. -C. Tai, The Fusion of Panchro- matic and Multispectral Remote Sensing Images via Tensor- based Sparse Modeling and Hyper-Laplacian Prior, Information Fusion 52 (2019) 76–89.

[29] V. Rosaria, R. Rocco, V. Gemine, D. Mauro and C. Jocelyn, A Pansharpening Method Based on the Sparse Representation of Injected Details, IEEE Geoscience and Remote Sensing Letters 12 (1) (2014) 180–184.

[30] R. Gogineni and A. Chaturvedi, Sparsity Inspired Pan- sharpening Technique Using Multi-scale Learned Dictionary, ISPRS Journal of Photogrammetry and Remote Sensing 146 (2018) 360–372.

[31] B. Coloma, C. Vicent, I. Laura, V. Joan and R. Bernard, A Vari- ational Model for P+XS Image Fusion, International Journal of Computer Vision 69 (1) (2006) 43–58.

[32] Y. Naoto, Y. Takehisa and I. Akira, Coupled Nonnegative Ma- trix Factorization Unmixing for Hyperspectral and Multispec- tral Data Fusion, IEEE Transactions on Geoscience and Remote Sensing 50 (2) (2011) 528–537.

[33] L. -J. Deng, V. Gemine, W. Guo, M. Dalla Mura and J. Chanus- sot, A Variational Pansharpening Approach Based on Repro- ducible Kernel Hilbert Apace and Heaviside Function, IEEE Transactions on Image Processing 27 (2018) 4330–4344.

[34] Z. -C. Wu, T. -Z. Huang, L. -J. Deng, G. Vivone, J. Miao, J.

-F. Hu and X. -L. Zhao, A New Variational Approach Based on Proximal Deep Injection and Gradient Intensity Similarity for Spatio-Spectral Image Fusion, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13 (2020) 6277–6290.

[35] N. Yokoya, T. Yairi and A. Iwasaki, Coupled Nonnegative Ma- trix Factorization Unmixing for Hyperspectral and Multispec- tral Data Fusion, IEEE Transactions on Geoscience and Remote Sensing 50 (2) (2012) 528–537.

[36] R. Dian, S. Li, L. Fang, T. Lu and J. M. Bioucas-Dias, Non- local Sparse Tensor Factorization for Semiblind Hyperspectral and Multispectral Image Fusion, IEEE Transactions on Cyber- netics 50 (10) (2020) 4469–4480.

[37] X. Meng, H. Shen, Q. Yuan, H. Li, L. Zhang and W. Sun, Pan-