Laplacian Pyramid Networks: A New Approach for Multispectral Pansharpening
Cheng Jina, Liang-Jian Dengb,∗, Ting-Zhu Huangb, Gemine Vivonec
aSchool of Optoelectronic Science and Engineering, University of Electronic Science and Engineering of China, Chengdu, 611731, China
bSchool of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, 611731, China
cInstitute of Methodologies for Environmental Analysis, CNR-IMAA, 85050 Tito Scalo, Italy.
Abstract
Pansharpening is about fusing a high spatial resolution panchromatic image with a simultaneously acquired multi- spectral image with lower spatial resolution. In this paper, we propose a Laplacian pyramid pansharpening network architecture for accurately fusing a high spatial resolution panchromatic image and a low spatial resolution multi- spectral image, aiming at getting a higher spatial resolution multispectral image. The proposed architecture considers three aspects. First, we use the Laplacian pyramid method whose blur kernels are designed according to the sen- sors’ modulation transfer functions to separate the images into multiple scales for fully exploiting the crucial spatial information at different spatial scales. Second, we develop a fusion convolutional neural network (FCNN) for each scale, combining them to form the final multi-scale network architecture. Specifically, we use recursive layers for the FCNN to share parameters across and within pyramid levels, thus significantly reducing the network parameters.
Third, a total loss consisting of multiple across-scale loss functions is employed for training, yielding higher accuracy.
Extensive experimental results based on quantitative and qualitative assessments exploiting benchmarking datasets demonstrate that the proposed architecture outperforms state-of-the-art pansharpening methods.
Keywords: Laplacian Pyramid, Modulation Transfer Function, Convolutional Neural Network, Pansharpening, Image Fusion, Machine Learning, Remote Sensing.
1. Introduction
Thanks to a wide range of applications, such as medicine [1], sociology [2], ecology [3], and other fields, pansharpening has drawn loads of attention from the scientific community. This can be corroborated by the organization of the data fusion contest from the IEEE Geoscience and Remote Sensing Society in 2006 [4] and a huge recent literature about this topic [5, 6, 7]. Pansharpening is about merging a high reso- lution panchromatic (PAN) and a low resolution multi- spectral (LRMS) images, which can be easily obtained by several satellite sensors like WorldView-3, Quick- Bird, and GaoFen. As Fig. 1 illustrates, the goal of pansharpening is to yield a high resolution multi- spectral (HRMS) image. Besides, pansharpening algo- rithms have gained much interest from commercial en- terprises. For instance, famous software, such as ENVI
∗Corresponding author: Liang-Jian Deng
Email addresses:[email protected](Cheng Jin),[email protected](Liang-Jian Deng), [email protected](Ting-Zhu Huang),
[email protected](Gemine Vivone)
and ERDAS, utilizes fusion techniques to deal with im- age enhancement problems [8]. Moreover, pansharp- ening has been considered a preliminary step for sev- eral image processing tasks, e.g., change detection [9], which demonstrates the important role of pansharpen- ing in practical applications.
Regarding pansharpening algorithms, we may cat- egorize them into four classes, i.e., component sub- stitution (CS) approaches, multi-resolution analysis (MRA) methods, variational optimization-based (VO) approaches, and deep learning-based (DL) techniques [7]. In this work, we mainly focus on developing a new deep learning network to address the pansharpen- ing problem.
The first two categories are CS and MRA methods which obtain the final pansharpened outcome from the perspective of details injection, see, e.g., [10, 12, 13, 14, 15]. The former approach is based on the substitution of a component of the spectral transformed multispectral (MS) image with the PAN image, see, e.g., the band- dependent spatial-detail with local parameter estima- tion (BDSD) [10], the robust band-dependent spatial-
Figure 1: First row: the schematic diagram of pansharpen- ing, where MS∈Rm×n×b, PAN∈RM×N×1and GT∈RM×N×b, M =4m,N=4n,bis the number of bands of the MS image.
Second row: the pansharpened results by a state-of-the-art tra- ditional method, BDSD [10], a deep learning approach, DMD- Net [11], and the proposed method. Third row: the residual maps between BDSD, DMDNet, the proposed approach, and the ground-truth (GT). It is clear that our method obtains a better residual map, thus showing a better visual quality.
detail (BDSD-PC) [16], and the PRACS approach [12].
The MRA class relies upon the injection of the spatial details of the PAN image into the upsampled LRMS image to obtain a high spatial resolution MS image, see, e.g., the smoothing filter-based intensity modula- tion (SFIM) [17], the generalized Laplacian pyramid (GLP) [18, 19, 20], the GLP with full-scale regres- sion (GLP-Reg) [21], the deconvolution based meth- ods [22, 23], and the bilinear filtering [24]. Hybrid CS/MRA approaches can also be found in the related literature [14, 25]. In general, the two types of meth- ods could get quite fast computation and promising out- comes, but, sometimes, may lead to slight spatial and spectral distortions.
The third category, the so-called VO class, is based on the formulation of models with proper regularizers to address the pansharpening problem, then reconstructing the high-resolution outcomes by designing algorithms for the given models. The whole process can be seen from a mathematical point of view as the reconstruction of incomplete complementary observations of multi- channel data. In general, it mainly contains Bayesian-
, Sparse Reconstruction (SR)-, and Model-based Opti- mization (MBO) techniques, whose generally utilizes the regularization-based technique to address this chal- lenge, see, e.g., [26, 27, 28, 29, 30]. In [31], Ballester et al. proposed the P+XS method, which obtained the spectral information for the fused image under the as- sumption that the PAN image can be approximated as a linear combination of the high resolution multispec- tral bands. In [32], Yokoya et al. employed the coupled nonnegative matrix factorization (CNMF) unmixing to deal with the pansharpening problem, which could pro- duce high quality fusion results, both spectrally and spa- tially. In [33], Deng et al. proposed a reproducible ker- nel Hilbert space and Heaviside based framework to ad- dress the task of pansharpening. Recently, Wu et al. in [34] proposed a new DL-based VO scheme that could benefit from both traditional VO methods and the out- come of DL-based approaches, thus getting both com- petitive results and generalization ability. Especially, some recent hyperspectral and multispectral image fu- sion approaches can be also applied to the task of pan- sharpening, see e.g., [35, 36]. More recently, this field of pansharpening has evolved into more specific appli- cations, i.e., in cloud-contaminated circumstances [37], where X. Meng et al. proposed a variational-based in- tegrated pansharpening model specifying in cloud con- tamination scenarios. Although VO-based techniques can obtain competitive results, they are sometimes lim- ited, e.g., the use of many hyperparameters, the insuf- ficient feature representation and extraction that could result in spatial and spectral distortions.
The fourth category is represented by the DL-based methods, which are mainly based on a feature extrac- tion/representation phase, the training of network pa- rameters, and the testing on real-world data. These methods are often far from a theoretical development and require enormous datasets to train the underlying parameters, but they have been widespread used (in- cluding but not limited to image super-resolution, pat- tern recognition, and image classification), often show- ing state-of-the-art performance. In particular, many satisfactory results have been obtained by deep learn- ing networks for pansharpening [38, 39, 40] [41, 42]. In [43] , Masi et al. employed first a convolutional neural network (CNN) with three layers to deal with the task of multispectral image pansharpening, obtaining excellent pansharpened results comparing with traditional state- of-the-art approaches. In [44], Fu et al. developed a deep CNN architecture with a high-pass filtering tech- nique to fuse the PAN and the LRMS images, which ob- tains state-of-the-art pansharpening outcomes. Besides, He et al. [45] followed physical concepts to design a
CNN based method for pansharpening relied upon the spatial detail injection framework. In [11], Fu et al.
elaborated their high-pass filtering network design with a grouped multiscale network structure. In [46], Deng et al. proposed two network architectures, named CS- Net and MRA-Net, respectively, according to the tra- ditional CS and MRA fusion equations. To limit the drawbacks of the two networks, they further proposed a simple but effective fusion network, called FusionNet, to yield state-of-the-art pansharpening outcomes. More recently, Generative Adversarial Networks (GAN) and unsupervised training have teemed with the pansharp- ening field thanks to their novelty and powerfulness.
Literature like [47, 48] demonstrates the promising fea- ture of these method. For improving the effectiveness of this type of methods, the solutions mainly focus on two broad aspects. One is to make more generalized datasets for network training, e.g. [49]. The other mainly fo- cuses on the amelioration of the network architecture.
Although these methods have achieved excellent per- formance, there is still room for improvement by con- sidering the following points. First of all, the multi- scale property plays an important role in resolution enhancement applications, see, e.g., the image super- resolution issue [50]. It is important to design more effective multi-scale network architectures based on widely used multi-scale structures, see, e.g., the state- of-the-art Laplacian pyramids. Second, for the specific pansharpening problem, the Modulation Transfer Func- tion (MTF) is a reasonable way to describe the physi- cal procedure of image capture, thus its consideration in the datasets simulation is expected to get better perfor- mance.
Motivated by the above-mentioned points, we pro- pose, in this work, a new network architecture that is able to deal with these aspects. Hence, a novel Lapla- cian pyramid pansharpening network (LPPN) architec- ture is considered. The given network accounts for both the image multi-scale information and the sen- sors’ MTFs. To exploit the multi-scale information, we use the Laplacian pyramid to decompose the original PAN image and the upsampled LRMS image in several scales, then designing the corresponding sub-networks for each image scale and incorporating them into a fu- sion convolutional neural network (FCNN). Due to the use of multi-scale sub-networks, the final loss function is represented by a combination of multipleℓ2loss func- tions. Moreover, we also introduce recursive blocks into our sub-networks, aiming to reduce the number of network parameters. Additionally, unlike the classical Laplacian pyramid that uses a fixed Gaussian kernel for all the image channels, our approach employs the spe-
cific sensors’ MTFs, which vary along the spectral di- mension (i.e., different Gaussian kernels for different spectral bands). Extensive experiments on reduced and full resolution datasets demonstrate that the proposed method gets the best quantitative and qualitative per- formance compared with state-of-the-art pansharpening approaches.
The contributions of this paper can be summarized as follows:
• To the best of our knowledge, this is the first work merging both the recursive sub-network and Lapla- cian pyramids to address the pansharpening task.
Unlike the classical Laplacian pyramid using a fixed Gaussian kernel for all the spectral channels, the Laplacian pyramid used in our work takes ad- vantage of the specific sensors’ MTFs for image scale decomposition, which can yield significant improvements.
• The exploitation of Laplacian pyramids allow us to develop multi-scale structured sub-networks im- proving the capability of managing different spatial details at the feature extraction stage. Besides, we also build multiple loss functions to describe the information loss for each scale, which can aid the image details recovery at different scales.
• The recursive block is used within each sub- network to effectively decrease the number of net- work parameters. Furthermore, its use can increase the depth of the sub-networks. Thus, the proposed approach can be seen as a lightweight network for pansharpening. For example, our method only in- volves about 50,000 network parameters to achieve the state-of-the-art performance on WorldView-3 datasets, which is significantly far away from the compared approaches.
The remaining of the paper is as follows. Firstly, the related works and motivations under the development of the proposed method are reported in Sect. 2. Sect. 3 is devoted to the presentation of the proposed method, in- cluding the network architecture, the loss function, the training details, and so forth. Sect. 4 shows the quan- titative and qualitative outcomes. Finally, conclusions are drawn in Sect. 5.
2. Related Works and Motivations
Laplacian pyramid decomposition (LPD) and its multi-scale structure, which are suitable for image res- olution enhancement tasks. Recently, some pansharp- ening algorithms are proposed from the perspective of
constructing multi-scale structures, see, e.g., [51], ob- taining promising outcomes. For instance, Yuan et al.
[52] formulated a novel pansharpening approach by em- ploying the multi-scale convolutional kernels, which could fully widen the network and effectively extract the image features on different scales. In [53], although the authors mentioned the words “Laplacian pyramid”
in their work, they did not really exploit the multi-scale LPD for the task of pansharpening, which limited the final pansharpening performance. Different from the in- troduced multi-scale methods, here we accordingly re- sort to the classical LPD with some special operations in pansharpening, e.g., LPD with MTF, to formulate an effective and lightweight deep CNN with a multi-scale structure, aiming to yield state-of-the-art pansharpen- ing results. In this section, we will introduce first the Gaussian pyramid decomposition (GPD) and the related LPD. Afterwards, we will point out the motivations un- der the development of the proposed method.
2.1. GPD and LPD
Proposed by H. Olkkonen et al. [54], the GPD is about the reconstruction of the image by the Gaussian expand wavelet transform for multi-resolution analysis of images. It aims to use Gaussian kernels to create a series of images at different scales. The general GPD equation is as follows:
G1(I)=I,
Gi(I)=(k⊗Gi−1(I))↓2,i=2,· · ·,S, (1) where Gi stands for the i-th layer of the Gaussian pyramid, Iis the original image,⊗denotes the convo- lution operation,↓2 indicates the downsampling with a scale factor of 2,krepresents the Gaussian kernel, andS is the total number of layers. It is worth to be remarked thatkis fixed for each channel of an MS image.
The LPD is a bandpass image decomposition derived from the GPD. It is originally proposed by Burt and Adelson [55] before multiresolution wavelet analysis was introduced. More in detail, the LPD technique is a multi-resolution image representation obtained through a recursive reduction of the set of data.
The general LPD processing equation is as follows:
LS(I)=GS(I),
Li(I)=Gi(I)−Gi+1(I)↑2,i=S −1,· · ·,1, (2) whereListands for thei-th layer of the LPD,Giis the outcome of thei-th layer of the GPD, and↑2indicates the upsampling with a scale factor of 2. Please, refer to Fig. 2 for a better understanding.
2.2. Motivations
Since conventional deep learning networks do not of- ten exploit multi-scale information, the proposed work is motivated by the introduction of this kind of infor- mation. In particular, the pyramid structure achieves a multi-scale representation in image processing to por- tray global and local information in a better way. This property inspired us to utilize the multi-scale spatial rep- resentation structure of the LPD to design an end-to-end deep neural network structure capable of progressively restoring more image information to get a better perfor- mance.
It is important to note that conventional GPD methods mainly utilize a fixed Gaussian kernel for each chan- nel of an MS image (e.g., an RGB image) to proceed with the convolution operation. However, this approach fails to meet the physics principles of pansharpening since the MS sensors’ point spread functions, which model the spatial responses of each spectral channel (also called band), are not the same along the spectral dimension. In other words, the Gaussian kernels related to the spectral bands of an MS image should be differ- ent from each other. They are about a special function, i.e., the MTF. Thus, we exploit the MTF to generate the specific kernels for every band of the MS image in order to implement the GPD. For more details, the interesting readers can refer to Sect. 3.1.1.
3. The Proposed Method
In this section, we describe the design of the proposed LPPN. The whole procedure is depicted in Fig. 3. The architecture of the network, the adopted loss function, and the training details are introduced in the remaining of this section. For simplicity, we denote the PAN image as Pand the MS image asM.
3.1. Network Architecture
This section is devoted to the detailed description of the proposed network architecture.
3.1.1. GPD with MTF
The Modulation Transfer Function (MTF) is used to model the magnitude response of the optical system at different spatial frequencies. In pansharpening, it has widespread used to design filters for image degrada- tion. The sensors’ MTFs are usually band-dependent, thus generating filters that have different Gaussian ker- nels for each spectral band. In this work, our goal is to address a resolution enhancement issue involving natu- ral images, thus motivating us to implement Laplacian
Figure 2: Schematic diagram of the Gaussian pyramid decomposition (GPD) and the Laplacian pyramid decomposition (LPD) for S=5.
pyramids using the MTFa prioriinformation. For more details, the interesting readers can refer to [6, 56].
3.1.2. LPD with MTF
Multi-scale information is essential for image reso- lution enhancement applications. Thus, we want to uti- lize the LPD, a well-established technique for producing multi-scale images, to recover more image details. In what follows, we will introduce our MTF LPD, which performs first the MTF GPD into several scales of the original PAN image, the upsampled LRMS image, and the ground-truth (GT) multispectral image, thus getting the corresponding LPD components using (2).
To match the spatial resolution, we employ the trans- posed convolution to upsampleM with a factor 4 (de- noted asM) to the PAN,e P, scale. Then, we separately decomposePandMe into theirS-layer pyramids, which is processed by the following procedure:
a) We generate convolution kernels according to the MTF information. P and Me have correspond- ing MTF functions (readers can find the kernels by having a look at the codes MTF-PAN and MTF, which are available in the pansharpening toolbox1). In particular, the MTF kernel for P is denoted as kP. Regarding to theb-th band of M, we separately indicate each band’s kernel ase k1,k2,· · ·,kb, respectively.
b) Compute the GPD with MTF for P andM. Lete Gi(P), Gi(M) denote the MTF-decomposed PANe
1http://openremotesensing.net/knowledgebase/
a-critical-comparison-among-pansharpening-algorithms/
and upsampled LRMS images of thei-th Gaussian pyramid layer (i=1,2,· · ·,S), and setG1(P)=P, G1(M)e =M. Afterwards, we respectively computee the output image at each GPD layer for P
G1(P)=P,
Gi(P)=(kP⊗Gi−1(P))↓2,i=2,· · ·,S, (3) and forMe
G1(M)e =M,e
Gi(Mej)=(kj⊗Gi−1(Mej))↓2,i=2,· · ·,S. (4) where ⊗ denotes the convolution operation, ↓2 stands for the downsampling operation with a scale factor of 2, and j=1,2,· · ·,bis the band index.
c) Compute LPD with MTF forPandMebased on the obtained GPD in (3) and (4), i.e.
LS(P)=GS(P),
Li(P)=Gi(P)−Gi+1(P)↑2,i=S −1,· · ·,1, (5) and
LS(M)e =GS(M)e ,
Li(Mej)=Gi(Mej)−Gi+1(Mej)↑2,i=S −1,· · ·,1, (6) where↑2stands for the upsampling operation with a scale factor of 2, and j =1,2,· · ·,bis the band index.
More details about Laplacian pyramids can be found in [55].
Figure 3: The network architecture of the proposed Laplacian pyramid pansharpening network (LPPN) for a resolution ratio between PAN and MS equal to 4.
In summary, the whole procedure of generating of our LPD with MTF forPandMecan be found in (5) and (6).
Please, refer also to Fig. 4 for the LPD and the GPD for PandM.e
3.1.3. Fusion Convolutional Neural Network (FCNN) After getting the multi-scale LPD components of P andM, we consider them into the designed fusion con-e volutional neural network (FCNN, denoted asNe t(i)(θ) in Fig. 3) for each scale to extract features. Thus, our final architecture is the combination of multiple sub- networks.
More in detail, we concatenate first the Li(P) and Li(M) at thee i-th layer of LPD, after taking them into account at the corresponding sub-networkNe t(i)(θ) and producing the output. Furthermore, the output is op- erated by the ReLU activation function and upsampled
exploiting a factor of 2. Finally, it is transmitted to the output of the next layer.
For each sub-networkNe t(i)(θ) (i.e., each FCNN), we mainly employ the residual learning [57] and recursive blocks [58] for enhancing the performance of the net- work, which separately play the role of improving ac- curacy and reducing the number of parameters. Please, see Fig. 5 for the structure of the FCNN. More details on theNe t(i)(θ) sub-network are pointed out as follows.
a) Feature Extraction: The goal of this step is to ac- quire the feature maps of the corresponding inputs.
For higher layers, we take fewer kernels. In this work, we apply 2S−i+1kernels with a 3×3 size for convolution operations to the concatenated input.
b) Recursive Blocks:To achieve the parameter reduc- tion and improve the accuracy, we employ several
Figure 4: Detailed illustration of the generation of the i-th Laplacian pyramid layer of Me and P (exploiting an exemplary WorldView-3 data withS = 5 and resolution ratio between the products to be fused equal to 4). Gi represents thei-th layer of the Gaussian pyramid. Note that we do not use the classical Gaussian pyramid, rather we use MTF-based filters to generate Laplacian pyramid layers.
recursive blocks. As the gradient vanishing and exploding in training of deep models, the shared- source skip-connection residual learning is incor- porated into our sub-network, whose accuracy is proven in [50]. The goal of our recursive block setting is to share the network parameters across each recursive block and between each convolu- tional layer. Therefore, a single set of parameters is capable of building multi-layer sub-networks.
c) Output Reconstruction:After recursive blocks, the FCNN output for each layer is integrated to recon- struct the multiple outputs, which lead to multiple across-scale loss functions for training. In the test- ing phase, we get multiple outputs and the sole out- put at the first layer, i.e.,Out(1)(θ), represents the outcome of the proposed approach.
Fig. 3 sums up the whole proposed architecture. It mainly includes several sub-networks to progressively fuse and enhance the spatial details. Please, see Fig.
6 to get more insights about how the proposed network architecture works and its effectiveness on an exemplary real-world test case.
3.2. Loss Function
Due to the usage of the LPD, we will have multi-scale output images. For each image, we build one FCNN to train the network parameters. Therefore, the final loss function is the combination of the different loss func- tions. The final loss function is defined as follows:
L(θ(i))= XS
i=1
kOut(i)(θ)−Gi(GT)k2F, (7) whereOut(i)(θ) is obtained byFθ(i)
Li(P),Li(M)e and its subsequent operations (please see Fig. 3 for more de- tails). In particular,iindicates thei-th layer of LPD and θstands for the network parameters. Besides,Gi(GT) is thei-th layer of the GT image obtained via the GPD and k · kF is the Frobenius norm. In this work, we empiri- cally setS equal to 5. It is worth to be pointed out that
Figure 5: The flowchart of the fusion convolution neural network (FCNN). In which, the output channels are set as 2S−i+1, where Srepresents the total number of LPD layers, and theiis the corresponding LPD layer number.
Figure 6: The outputs of all theS layers of our LPPN, i.e., Out(i)(θ),i=1,2,· · ·,S (from right to left) withS =5 on an exemplary real-world test case.
in order to have a robust layer-by-layer injection proce- dure, the use of multiple losses to constrain the distance between the output image of a layer and the correspond- ing target (GT) image is advisable and commonly used in the related literature. If the loss function is only about the layer 1 (i.e., the final layer), the control of each layer of the network could be not easy, making the training a hard task. Moreover, although all the loss functions in (7) have equal weights, the more the training time the higher the impact of the first layer on the final solution.
3.3. Generation of Training Data
For the generation of training samples, we simu- lated three datasets acquired from the WorldView-3, the QuickBird, and the Gaofen-2 sensors. The simu- lation way for the three datasets is the same. We con- sider here the WorldView-3 datasets as an exemplary case. The WorldView-3 satellite datasets can be freely downloaded2. The same way as in [46] is exploited
2https://www.maxar.com/product-samples/
to simulate the training/validation/testing datasets get- ting 12,580 PAN/LRMS/GT patch pairs of size 64×64, 16 × 16 × 8, and 64 × 64 × 8, respectively. Af- ter that, these datasets are divided into the 70/20/10%
for training (8,806 examples)/validation (2,516 exam- ples)/testing (1,258 examples). We simulate the LRMS, the PAN, and the GT images according to Wald’s pro- tocol [59] due to the unavailability of the GT image.
The upsampled LRMS image is obtained via a polyno- mial kernel with 23 coefficients [18], called EXP from hereon. Please, find more details on the implementation of Wald’s protocol in [46].
4. Experimental Results
In this section, we compare the proposed LPPN method with some recent state-of-the-art pansharpening approaches belonging to CS, MRA, VO, and DL cate- gories. The employed sensors, the benchmarking meth- ods, and the adopted quality indexes are described first.
Afterwards, the experimental analysis both at reduced and full resolutions is reported.
4.1. datasets
Several datasets have been acquired by the WorldView-3 sensor, which simultaneously cap- tures a high resolution PAN channel and eight MS bands. Four standard colors (red, green, blue, and near-infrared 1) and four new bands (coastal, yellow, red edge, and near-infrared 2) are acquired. The images are distributed with a pixel size of 0.3m and 1.2m for PAN and MS, respectively. The spatial resolution ratio is equal to 4. The radiometric resolution is 11 bits.
Different from WorldView-3, the images obtained by QuickBird and GaoFen-2 sensors consist of four MS
Table 1: Details of experimental datasets
Satellite Spatial Spectral Radiometric Themiatic Original data
Sensors Dimension Dimension Resolution Scenes Volume
WorldView-3 PAN 0.3m One band
11 bits 6
MS 1.2m Eight bands mixed
QuickBird PAN 0.6m One band
11 bits (urban, vegetation,
MS 2.4m Four bands water scenario) 1
GaoFen-2 PAN 1m One band
10 bits 1
MS 4m Four bands
bands and one PAN channel. The spatial resolution ra- tio is equal to 4, again. In particular, QuickBird and GaoFen-2 data have a spatial resolution of about 0.6m and 1m for the PAN channel, respectively. Moreover, they have a radiometric resolution of 11 bits and 10 bits, respectively. Please refer to Tab. 1 for the summary of the experimental datasets.
4.2. Benchmark
The benchmark consists of one representative CS based method (i.e., BDSD [10]), one representative MRA based methods (i.e., MTF GLP CBD [56], de- noted as GLP CBD from hereon for saving space in the tables), two regularization-based (VO) methods (i.e., CNMF [32] and CVPR19 [60]), and three state-of-the- art DL methods (i.e., PanNet[44], DiCNN [45], and DMDNet [11])3.
For fair comparison, all the compared DL-based methods (i.e., LPPN, DiCNN, PanNet, and DMDNet) are trained on the same training data using Python 3.7.4 with Tensorflow 1.14.0 on a desktop computer equipped with a Linux operating system and a GPU NVIDIA GeForce GTX 2080Ti with 11GB.
4.3. Quality Assessment Indexes
For quantitative evaluation, we adopt the spectral an- gle mapper (SAM) [61] to evaluate the spectral quality, the erreur relative globale adimensionnelle de synth`ese (ERGAS) index [62] as an extension of the root mean square error for multidimensional arrays, the spatial cor- relation coefficient (SCC) [63] in order to assess the
3The source codes of BDSD, GLP CBD and CNMF can be downloaded at the website http://openremotesensing.net/kb/codes/
pansharpening/. Additionally, the source codes of CVPR19 and Pan- Net can be downloaded at the website https://xueyangfu.github.io/.
Instead, the source codes of DiCNN and DMDNet are not available online, thus we re-implemented them by ourselves using the default parameters indicated in the related papers to ensure their best perfor- mance.
spatial quality, and two universal image quality indexes [64], Qavg (an average version of the Q index along the spectral bands) and Q2n (Q4 and Q8 for four and eight bands datasets, respectively) representing the multidi- mensional extension of the Q index [65, 66]. These indexes can be used when a reference (GT) image is available (i.e., at reduced resolution). Instead, when we need to assess the performance at full resolution, quality without reference indexes should be used [67, 68, 69].
In this paper, the quality with no reference (QNR) [67]
index is exploited. It is obtained by the combination of the spatial distortion index,DS, and the spectral distor- tion index, Dλ. The ideal values for SCC, QNR, Qave, and Q4/Q8 are 1. Instead, for ERGAS, SAM,Dλ, and DS are 0. Furthermore, we exhibit the average running time for each fusion method, denoted as A.T. in seconds in the results tables.
4.4. Parameters Tuning
In our LPPN network, we empirically employ the Adam [70] approach with a learning rate equal to 0.003 in order to minimize the loss function in (7). The num- ber of iterations for the training step is 1×105and the batch size is 32. Additionally, we set the kernel size of all filters as 3×3. In particular, for the setting of the other compared methods, we use the default asset pointing out in the related papers or source codes. In the DiCNN case, the batch size is set to 64 and the number of iterations is 3×105. Instead, for PanNet and DMD- Net, the batch size is set to 32 and the number of itera- tions is 255,000. Under the use of the above-mentioned settings, the three methods can achieve their best perfor- mance.
4.5. Reduced Resolution Assessment
In this section, we assess the qualitative and quanti- tative performance of the compared methods on the re- duced resolution datasets. The process for simulating the testing data is the same as that of the training data
Figure 7: The visual comparisons on a reduced resolution WorldView-3 case (depicted bands: 1, 3 and 5). First two rows: The fusion results by means of BDSD, CNMF, GLP CBD, CVPR19, DiCNN, PanNet, DMDNet, and Proposed LPPN. Third and fourth rows: The corresponding residual maps using the GT image as reference. To aid the visual inspection, we display the residual maps obtained by the analysis of the third spectral band.
Table 2: Quality assessment at reduced resolution on 1258 WorldView-3 test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.
Method SAM ERGAS SCC Q8 Qave A.T.
EXP 5.85±1.99 7.04±2.93 0.660±0.106 0.627±0.142 0.640±0.157 0.016 BDSD 6.90±2.73 5.15±2.27 0.878±0.075 0.817±0.118 0.823±0.119 0.019 CNMF 5.53±1.88 4.62±1.93 0.888±0.068 0.822±0.123 0.825±0.125 0.035 GLP CBD 5.29±1.96 4.16±1.77 0.890±0.070 0.854±0.114 0.849±0.123 0.033 CVPR19 5.21±1.87 5.14±2.12 0.867±0.604 0.793±0.123 0.788±0.130 1.731 DiCNN 4.25±1.35 3.05±1.06 0.945±0.047 0.893±0.118 0.908±0.115 0.838 PanNet 4.10±1.30 2.96±1.00 0.949±0.046 0.896±0.116 0.910±0.116 0.863 DMDNet 3.97±1.25 2.86±0.97 0.953±0.045 0.900±0.114 0.912±0.115 0.951 LPPN 3.90±1.29 2.64±0.96 0.955±0.045 0.913±0.111 0.913±0.114 0.977
(see Sect. 3.3 for details). More in detail, the spatial size of the testing PAN/LRMS/GT patch is 64×64, 16×16, and 64×64 for WorldView-3 datasets in Tab. 2. For Figs. 7-9, the testing images for WorldView-3, Quick- Bird and GaoFen-2 cases are with the spatial size of 256×256, 64×64, and 256×256, respectively. We tested the compared approaches on a large dataset consisting of 1258 test cases extracted from WorldView-3 data,
Table 3: Quality assessment at reduced resolution on 25 QuickBird test cases . The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.
Method SAM ERGAS SCC Q4 Qave A.T.
EXP 7.27±2.31 10.88±2.77 0.530±0.023 0.545±0.139 0.540±0.146 0.008 BDSD 6.71±2.08 7.05±1.12 0.840±0.084 0.769±0.178 0.761±0.187 0.205 CNMF 6.34±2.75 7.07±2.77 0.771±0.295 0.683±0.281 0.680±0.280 0.172 GLP CBD 6.52±1.96 6.91±1.06 0.840±0.085 0.779±0.169 0.764±0.185 0.036 CVPR19 6.84±2.14 8.62±1.86 0.815±0.060 0.686±0.171 0.676±0.179 4.478 DiCNN 4.79±1.09 5.06±0.57 0.908±0.067 0.835±0.187 0.830±0.194 0.868 PanNet 4.78±1.07 4.80±0.34 0.915±0.078 0.841±0.184 0.841±0.188 0.887 DMDNet 4.61±0.94 4.46±0.29 0.919±0.087 0.845±0.193 0.845±0.197 1.047 LPPN 4.39±0.85 4.41±0.50 0.937±0.065 0.851±0.185 0.848±0.190 1.122
and Tab. 2 exhibits the average performance and the corresponding standard deviations for all the compared methods. From Tab. 2, it can be readily got that our method obtains high performance for all the quality in- dexes. We depict a typical result in Fig. 7. It can be seen that BDSD and CNMF show greater spectral distortion.
GLP CBD and CVPR19 have various blurring defects, especially visible in the building area at the lower right
Figure 8: The visual comparisons on a reduced resolution QuickBird case (depicted bands: 1, 2 and 3). First two rows: The fusion results by means of BDSD, CNMF, GLP CBD, CVPR19, DiCNN, PanNet, DMDNet, and Proposed LPPN. Third and fourth rows:
The corresponding residual maps using the GT image as reference. To aid the visual inspection, we display the residual maps obtained by the analysis of the fourth spectral band.
Table 4: Quality assessment at reduced resolution on 25 GaoFen-2 test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.
Method SAM ERGAS SCC Q4 Qave A.T.
EXP 2.88±0.47 3.58±0.44 0.690±0.047 0.760±0.030 0.773±0.032 0.017 BDSD 2.90±0.43 2.53±0.46 0.859±0.054 0.873±0.047 0.884±0.042 0.019 CNMF 3.20±0.56 2.74±0.59 0.860±0.055 0.852±0.042 0.877±0.044 0.455 GLP CBD 2.83±0.50 2.49±0.44 0.852±0.054 0.873±0.042 0.877±0.039 0.079 CVPR19 2.57±0.44 2.76±0.37 0.854±0.041 0.861±0.024 0.861±0.024 9.037 DiCNN 1.77±0.31 1.57±0.20 0.943±0.013 0.949±0.015 0.953±0.012 0.879 PanNet 1.65±0.25 1.44±0.12 0.955±0.009 0.951±0.022 0.963±0.011 0.887 DMDNet 1.54±0.24 1.32±0.12 0.961±0.009 0.956±0.022 0.968±0.012 1.049 LPPN 1.49±0.20 1.18±0.12 0.968±0.007 0.968±0.014 0.970±0.015 1.129
corner. Furthermore, the other three DL-based meth- ods, i.e., PanNet, DMDNet and DiCNN, exhibit com- petitive visual performance. However, they still fail to outperform our method. To highlight the differences, we depicted several magnified sub-regions among the compared methods. It is clear that our method obtains the best visual performance, closer to the GT image in spatial aspects, including sharper edges and clearer ob-
jects. This can be observed in the lower right area of the sub-regions.(e.g., see the orange roofs of the close-ups in Fig. 7). Moreover, we compute the residual maps be- tween the comparison candidates and the ground-truth (GT) image. It is clear that the fusion image of the pro- posed approach is closer to GT image, getting a residual map close to zero almost everywhere.
In order to corroborate the results obtained on the WorldView-3 test cases, we assessed the performance of the compared approaches on data acquired by the QuickBird sensor (Indianapolis datasets) and GaoFen- 2 sensor (Beijing and Guangzhou datasets4). For the best performance, we adjusted its output channels in the head of each FCNN to 2S−i+2 for both datasets.
The deep learning-based methods are properly trained on training data acquired by these sensors. Similar to WorldView-3 testing cases, we also compare the ap- proaches on larger datasets of 256×256 spatial size con- sisting of 25 and 25 test cases extracted from Quick- Bird and GaoFen-2 datasets, respectively. Tabs. 3 and
4Datasets from: http://www.rscloudmart.com/dataProduct/sample
Figure 9: The visual comparisons on a GaoFen-2 case (depicted bands: 1, 2 and 3). First two rows: The fusion results by means of BDSD, CNMF, GLP CBD, CVPR19, DiCNN, PanNet, DMDNet, and Proposed LPPN. Third and fourth rows: The corresponding residual maps using the GT image as reference. To aid the visual inspection, we display the residual maps obtained by the analysis of the fourth spectral band.
4 exhibit the average performance and the correspond- ing standard deviations for all the compared methods, and Figs. 8 and 9 show the typical corresponding per- formance of the compared methods. In the scenario of the QuickBird sensor, BDSD and CNMF demonstrate acceptable visual outcomes. However, GLP CBD and CVPR19 still exhibit blurring defects on the white roof- like object, thus decreasing the spatial performance.
For DL-based methods, the outcomes are still compet- itive. With the aid of the residual maps, we can see our method outperforms the other DL-based compared methods for its close degree of proximity to the GT image. In the GaoFen-2 test case, BDSD and CNMF show acceptable visual results, again. GLP CBD and CVPR19 still suffer from spatial distortion. However, in this case, the CNMF method demonstrates an evident spectral distortion. For DL-based methods, we can ob- serve that our method yields better outcomes with the aid of the residual maps. In both the figures, the pro- posed LPPN clearly shows its spatial advantages thanks to lower image residuals (see the close-ups in the re- lated figures), thus very high performance of the pro- posed LPPN method can be easily observed.
Table 5: Quality assessment at full resolution on 200 WorldView-3 test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.
Method QNR Dλ DS A.T.
EXP 0.913±0.031 0.000±0.000 0.086±0.030 0.018 BDSD 0.893±0.032 0.033±0.013 0.077±0.027 0.035 CNMF 0.896±0.072 0.040±0.037 0.067±0.047 0.416 GLP CBD 0.920±0.050 0.028±0.024 0.055±0.032 0.079 CVPR19 0.932±0.023 0.012±0.006 0.057±0.019 8.964 DiCNN 0.953±0.036 0.018±0.020 0.030±0.020 0.861 PanNet 0.961±0.021 0.019±0.009 0.020±0.012 0.886 DMDNet 0.960±0.020 0.019±0.010 0.021±0.012 1.038 LPPN 0.963±0.023 0.018±0.012 1.089±0.013 1.082
4.6. Full Resolution Assessment
We also compare our LPPN approach with recent state-of-the-art pansharpening approaches on full res- olution WorldView-3, QuickBird and GaoFen-2 data, whose PAN/LRMS patch is of spatial size 256×256 and 64×64, respectively. Due to the lack of a reference (GT) image, the QNR index is used instead of the quality in- dexes in Sect. 4.5. Tabs. 5-7 report the outcomes re-
Figure 10: The visual comparisons on a full resolution WorldView-3 case (depicted bands: 1, 3, and 5).
Table 6: Quality assessment at full resolution on 20 QuickBird test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.
Method QNR Dλ DS A.T.
EXP 0.845±0.026 0.000±0.000 0.156±0.026 0.008 BDSD 0.877±0.029 0.034±0.016 0.092±0.036 0.025 CNMF 0.779±0.072 0.076±0.050 0.067±0.047 0.144 GLP CBD 0.832±0.028 0.055±0.013 0.012±0.021 0.043 CVPR19 0.936±0.013 0.008±0.004 0.056±0.012 4.539 DiCNN 0.910±0.027 0.026±0.009 0.065±0.024 0.872 PanNet 0.943±0.021 0.027±0.008 0.031±0.015 0.899 DMDNet 0.943±0.020 0.024±0.008 0.034±0.014 1.051 LPPN 0.947±0.004 0.025±0.009 0.029±0.007 1.132
spectively on 200 real WorldView-3, 20 real QuickBird and 20 GaoFen-2 examples, synthesizing them using the mean and the standard deviation operators. It is worth to be pointed out that the proposed method gets the best results related to the overall quality index at full reso- lution, QNR. The other two indexes, i.e., Dλ andDS, are also close to the approaches that get the best perfor- mance. Moreover, we also show the visual comparison on a full resolution WorldView-3 datasets in Fig. 10, in which the LPPN approach yields more image details and sharper image edges than the compared methods.
4.7. Discussions
In this section, we will deeply discuss about the pro- posed architecture (i.e., the number of Laplacian layers, the effect of recursive blocks, the use of kernels, and loss functions) after comparing the convergence of dif- ferent fusion methods. Note that for the discussion of this part, we take a WorldView-3 datasets (spatial size 256×256) as the test example for the sake of brevity.
Table 7: Quality assessment at full resolution on 20 GaoFen test cases. The mean and the standard deviation indexes are used to sum up the obtained performance. Best results are in boldface.
Method QNR Dλ DS A.T.
EXP 0.860±0.269 0.000±0.000 0.140±0.269 0.008 BDSD 0.882±0.035 0.021±0.014 0.099±0.029 0.201 CNMF 0.779±0.043 0.079±0.030 0.067±0.047 0.172 GLP CBD 0.892±0.037 0.048±0.021 0.064±0.021 0.034 CVPR19 0.832±0.035 0.006±0.005 0.163±0.036 4.479 DiCNN 0.870±0.027 0.033±0.013 0.100±0.022 0.873 PanNet 0.970±0.007 0.012±0.006 0.019±0.008 0.881 DMDNet 0.969±0.007 0.010±0.006 0.021±0.006 1.045 LPPN 0.970±0.004 0.010±0.005 0.018±0.005 1.134
Figure 11: Convergence of the proposed network.
4.7.1. Convergence Analysis
We utilize the mean squared error (MSE) loss on WorldView-3’s 8,806 training data to compare the dif-
Figure 12: Altered ResNet block structure of the FCNN.
Table 8: Performance assessment by varying the layer number of the proposed Laplacian pyramid-based approach.
Layers SAM ERGAS SCC Q8 Qave
1 4.81 3.29 0.9584 0.9467 0.9480
2 4.54 3.18 0.9624 0.9496 0.9515
3 4.41 3.12 0.9627 0.9507 0.9520
4 4.40 3.11 0.9640 0.9509 0.9522
5 4.30 3.10 0.9657 0.9518 0.9536
6 4.52 3.16 0.9621 0.9506 0.9519
Table 9: Performance assessment by varying the FCNN struc- ture.
Methods SAM ERGAS SCC Q8 Qave
Without recursive 4.83 3.54 0.9442 0.942 0.943 Fixed Gaussian kernel 4.66 3.21 0.9616 0.947 0.950
ℓ1loss 4.51 3.18 0.9617 0.948 0.949
Proposed 4.30 3.10 0.9657 0.952 0.954
ferent network convergence among all the compared DL-based method, i.e., DiCNN, PanNet, DMDNet and our LPPN architecture. As shown in Fig. 11, our net- work has demonstrated a lower training error with less iterations.
4.7.2. The Influence of the Laplacian Pyramid Layer Number
A key parameter of Laplacian pyramids is the layer number. Since the maximum spatial size of our training images is 64×64,S can only vary from 1 to 6. Using the datasets depicted in Fig. 7, the results of our method by varying the layer number are reported in Tab. 8. It is easy to show that the best performance can be obtained by the 5 layers configuration, i.e. the one adopted in this paper.
4.7.3. FCNN Structure Discussion
After determining the network levels, we look at the sub-network level, that is the FCNN. We studied several configurations of the FCNN. In particular, we have:
a) With recursive Vs. without recursive blocks: In order to investigate the effects of the recursive blocks, we substituted the recursive blocks with ResNet [57] blocks in each FCNN, as depicted in Fig 12. Having a look at the first and fourth rows in Tab. 9, it is straightforward that the proposed archi- tecture with recursive blocks generates better re- sults than that of the one without recursive blocks.
Moreover, the amount of parameters related to the compared architectures is 174086 (without recur- sive) against 50706 (with recursive), and the aver- age usage of GPU RAM is 1485 MB (without re- cursive) against 1453 MB (with recursive), consid- ering the test case depicted in Fig. 7, thus demon- strating the reduction of the number of parameters under the use of recursive blocks.
b) Fixed Gaussian Vs. MTF kernels: To verify our statement in Sect. 3.1.1, we change the shape of the filters by using classical fixed Gaussian kernel instead of MTF kernels to deal with the genera- tion of the Laplacian pyramid. Having a look at the second and fourth rows Tab. 9, it is clear that the proposed architecture with MTF-based kernels used for convolution obtains the best quantitative performance.
c) ℓ1Vs. ℓ2loss functions: We also consider the per- formance of our network architecture withℓ1 and ℓ2 loss functions. The results shown in the third and fourth rows demonstrate that the proposedℓ2 loss function is advisable.
4.7.4. Generalization Ability Assessment
We also evaluate the generation ability of our network architecture among the aforementioned fusion methods
Table 10: Comparison of the number of parameters (NoPs), total training time in seconds for all the CNN-based methods.
DiCNN PanNet DMDNet LPPN NoPs 1.5×105 2.5×105 3.1×105 0.5×105 Time 3.7×103 4.8×103 1.2×104 7.4×103
Table 11: Generalization ability assessment on the WorldView-2 dataset.
Method SAM ERGAS SCC Q4 Qave
EXP 8.24 8.88 0.4958 0.6468 0.6667
BDSD 8.42 6.30 0.7989 0.8400 0.8479
CNMF 7.41 6.29 0.8453 0.8298 0.8365
GLP CBD 7.77 6.31 0.8046 0.8370 0.8390
CVPR19 7.30 6.89 0.8144 0.7882 0.7976
DiCNN 6.80 5.54 0.8680 0.8493 0.8658
PanNet 6.52 5.40 0.8588 0.8643 0.8785
DMDNet 6.37 5.22 0.8648 0.8707 0.8824
LPPN 6.50 5.38 0.8742 0.8641 0.8725
by utilizing WorldView-2 datasets, whose number of spectral bands are equal to the WorldView-3 ones. The datasets can be freely downloaded from the same link as for the WorldView-3 datasets. More in detail, the test- ing WorldView-2 datasets have the PAN image of spa- tial size of 256×256 with a 4 PAN/MS spatial ratio. The outcomes are reported in Tab. 11. From the table, we can see that albeit the performance of our method does not outstrip the DMDNet, they are very close to the best.
4.7.5. Number of Parameters and Total Training Time Assessment
Lastly, we compare the number of parameters and to- tal training time of the four DL-based algorithms. As a consequence of the use of the recursive structure in each FCNN, our complexity significantly outperforms the one of the other methods. Tab. 10 shows the com- parison of the number of parameters (NoPs) for all the CNN methods on the WorldView-3 datasets. From this table, it is clear that the proposed LPPN method has only 0.5×105 parameters, which is significantly less than the NoPs of DiCNN (1.8×105), PanNet (2.5×105), and DMDNet (3.2×105). To the best of our knowledge, the proposed LPPN is one of the best lightweight net- work for pansharpening showing a small NoPs and also getting state-of-the-art performance. As for total train- ing time, DiCNN method yields the shortest. We be-
lieved the reason for this is the relatively simple network structure, which only contains 3 convolutional blocks and a skip-connection composition.
5. Conclusions
In this paper, we proposed an efficient deep pyra- mid network architecture for pansharpening. The net- work architecture consists of three parts. The first one is the decomposition of the input image set into Lapla- cian pyramids using MTF-based kernels. Afterwards, these pyramids are fused by the fusion convolutional neural network. Finally, we reconstruct the multi-layer outputs comparing them with the reference (GT) data exploiting an ℓ2 loss function in order to train the net- work. A broad experimental analysis demonstrates that the proposed approach outperforms the compared state- of-the-art pansharpening methods. Furthermore, some discussions about the network convergence, the number of Laplacian pyramid layers, the influence of the loss function, the use of recursive blocks, the generalization ability and so forth, are provided to the readers. Finally, an analysis on the number of parameters and total train- ing time of the network pointed out that our LPPN is a lightweight pansharpening network getting state-of-the- art performance.
6. Acknowledgments
The work is supported by National Natural Science Foundation of China grants 61702083 and 61772003, and Key Projects of Applied Basic Research in Sichuan Province (Grant No. 2020YJ0216), and National Key Research and Development Program of China (Grant No. 2020YFA0714001).
References
[1] V. Søren and M. Joav, Meta-analysis of Positive Effects, Side Effects and Adverse Events of Holistic Mind-body Medicine (Clinical Holistic Medicine): Experience from Denmark, Swe- den, United Kingdom and Germany, International Journal of Adolescent Medicine and Health 21 (4) (2009) 441–456.
[2] Z. Yong, Recent Developments in Technology and Language Learning: A Literature Review and Meta-analysis, CALICO Journal (2003) 7–27.
[3] W. Xu, W. Yuan, W. Dong, J. Xia, D. Liu and Y. Chen, A Meta-analysis of the Response of Soil Moisture to Experimental Warming, Environmental Research Letters 8 (4) (2013) 044027.
[4] D. Mauro, P. Saurabh, P. Fabio, G. Paulo, C. Jocelyn and B.
Atli, Challenges and Opportunities of Multimodality and Data Fusion in Remote Sensing, Proceedings of the IEEE 103 (9) (2015) 1585–1601.
[5] T. Claire, R. Thierry, W. Lucien and C. Jocelyn, Synthesis of Multispectral Images to High Spatial Resolution: A Critical Re- view of Fusion Methods Based on Remote Sensing Physics, IEEE Transactions on Geoscience and Remote Sensing 46 (5) (2008) 1301–1312.
[6] G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, A.
Garzelli, G. A. Licciardi, R. Restaino and L. Wald, A Critical Comparison Among Pansharpening Algorithms, IEEE Transac- tions on Geoscience and Remote Sensing 53 (5) (2014) 2565–
2586.
[7] G. Vivone, M. Mura, A. Garzelli, R. Restaino, G. Scarpa, M. Ul- farsson, L. Alparone and J. Chanussot, A New Benchmark Based on Recent Advances in Multispectral Pansharpening: Re- visiting Pansharpening with Classical and Emerging Pansharp- ening Methods, IEEE Geoscience and Remote Sensing Maga- zine (2020, DOI: 10.1109/MGRS.2020.3019315).
[8] D. Di and W. Di, Comparisons of ERDAS and ENVI in The- matic Mapping, IEEE International Conference on Communi- cation Software and Networks (ICCSN) (2011) 517–520.
[9] S. Carlos, F. Laurel, S. Moreira and R. Dar, Mapping Forest Degradation in the Eastern Amazon from SPOT 4 through Spec- tral Mixture Models, Remote Sensing of Environment 87 (4) (2003) 494–506.
[10] A. Garzelli, F. Nencini and L. Capobianco, Optimal MMSE Pan Sharpening of Very High Resolution Multispectral Images, IEEE Transactions on Geoscience and Remote Sensing 46 (1) (2008) 228–236.
[11] X. Fu, W. Wang, Y. Huang, X. Ding and J. Paisley, Deep Multi- Scale Detail Networks for Multi-Band Spectral Image Sharpen- ing, IEEE Transactions on Neural Networks and Learning Sys- tems (2020, DOI: 10.1109/TNNLS.2020.2996498).
[12] J. Choi, K. Yu and Y. Kim, A New Adaptive Component- substitution Based Satellite Image Fusion by Using Partial Re- placement, IEEE Transactions on Geoscience and Remote Sens- ing 49 (2011) 295–309.
[13] A. Laben and V. Brower, Process for Enhancing the Spatial Resolution of Multispectral Imagery Using Pan-sharpening, US Patent 6011875 (2000).
[14] X. Otazu, M. Gonz´alez-Audicana, O. Fors and J. N ´u˜nez, Intro- duction of Sensor Spectral Response into Image Fusion Meth- ods. Application to Wavelet-based Methods, IEEE Transactions on Geoscience and Remote Sensing 43 (2005) 2376–2385.
[15] Z. Zhang, T. -Z. Huang and L. -J. Deng, Pansharpening via RoG-based Filtering, IEEE International Geoscience and Re- mote Sensing Symposium (IGARSS) (2019).
[16] G. Vivone, Robust Band-Dependent Spatial-Detail Approaches for Panchromatic Sharpening, IEEE Transactions on Geoscience and Remote Sensing 57 (9) (2019) 6421–6433.
[17] J. Liu, Smoothing Filter-based Intensity Modulation: A Spectral Preserve Image Fusion Technique for Improving Spatial Details, International Journal of Remote Sensing 21 (18) (2000) 3461–
3472.
[18] B. Aiazzi, L. Alparone, S. Baronti and A. Garzelli, Context- driven Fusion of High Spatial and Spectral Resolution Images Based on Oversampled Multiresolution Analysis, IEEE Trans- actions on Geoscience and Remote Sensing 40 (2002) 2300–
2312.
[19] R. Restaino, G. Vivone, P. Addesso and J. Chanussot, A Pan- sharpening Approach Based on Multiple Linear Regression Es- timation of Injection Coefficients, IEEE Geoscience and Remote Sensing Letters 17 (1) (2020) 102–106.
[20] G. Vivone, S. Marano and J. Chanussot, Pansharpening:
Context-Based Generalized Laplacian Pyramids by Robust Re- gression, IEEE Transactions on Geoscience and Remote Sens- ing 58 (9) (2020) 6152–6167.
[21] G. Vivone, R. Restaino and J. Chanussot, Full Scale Regression- based Injection Coefficients for Panchromatic Aharpening, IEEE Transactions on Image Processing 27 (7) (2018) 3418–
3431.
[22] G. Vivone, M. Sim˜oes, M. Dalla Mura, R. Restaino, J.
M. Bioucas-Dias, G. A. Licciardi and J. Chanussot, Pansharp- ening Based on Semiblind Deconvolution, IEEE Transactions on Geoscience and Remote Sensing 53 (4) (2015) 1997–2010.
[23] G. Vivone, P. Addesso, R. Restaino, M. Dalla Mura and J. Chanussot, Pansharpening Based on Deconvolution for Multi- band Filter Estimation, IEEE Transactions on Geoscience and Remote Sensing 57 (1) (2019) 540–553.
[24] G. Vivone and J. Chanussot, Fusion of Short-wave Infrared and Visible Near-infrared WorldView-3 Data, Information Fusion 61 (2020) 71–83.
[25] G. Vivone, L. Alparone, A. Garzelli and S. Lolli, Fast Repro- ducible Pansharpening Based on Instrument and Acquisition Modeling: AWLP Revisited, Remote Sensing 11 (19) (2019) 2315:1–2315:23.
[26] J. Saeedi and K. Faez, A New Pan-sharpening Method Using Multiobjective Particle Swarm Optimization and the Shiftable Contourlet Transform, ISPRS Journal of Photogrammetry and Remote Sensing 66 (3) (2011) 365–381.
[27] J. Liu, L. -J. Deng, F. Fang and T. Zeng, A Rudin-Osher-Fatemi Model-Based Pansharpening Approach Using RKHS and AHF Representation, East Asian Journal on Applied Mathematics 9 (2019) 13–27.
[28] L. -J. Deng, M. Feng and X. -C. Tai, The Fusion of Panchro- matic and Multispectral Remote Sensing Images via Tensor- based Sparse Modeling and Hyper-Laplacian Prior, Information Fusion 52 (2019) 76–89.
[29] V. Rosaria, R. Rocco, V. Gemine, D. Mauro and C. Jocelyn, A Pansharpening Method Based on the Sparse Representation of Injected Details, IEEE Geoscience and Remote Sensing Letters 12 (1) (2014) 180–184.
[30] R. Gogineni and A. Chaturvedi, Sparsity Inspired Pan- sharpening Technique Using Multi-scale Learned Dictionary, ISPRS Journal of Photogrammetry and Remote Sensing 146 (2018) 360–372.
[31] B. Coloma, C. Vicent, I. Laura, V. Joan and R. Bernard, A Vari- ational Model for P+XS Image Fusion, International Journal of Computer Vision 69 (1) (2006) 43–58.
[32] Y. Naoto, Y. Takehisa and I. Akira, Coupled Nonnegative Ma- trix Factorization Unmixing for Hyperspectral and Multispec- tral Data Fusion, IEEE Transactions on Geoscience and Remote Sensing 50 (2) (2011) 528–537.
[33] L. -J. Deng, V. Gemine, W. Guo, M. Dalla Mura and J. Chanus- sot, A Variational Pansharpening Approach Based on Repro- ducible Kernel Hilbert Apace and Heaviside Function, IEEE Transactions on Image Processing 27 (2018) 4330–4344.
[34] Z. -C. Wu, T. -Z. Huang, L. -J. Deng, G. Vivone, J. Miao, J.
-F. Hu and X. -L. Zhao, A New Variational Approach Based on Proximal Deep Injection and Gradient Intensity Similarity for Spatio-Spectral Image Fusion, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13 (2020) 6277–6290.
[35] N. Yokoya, T. Yairi and A. Iwasaki, Coupled Nonnegative Ma- trix Factorization Unmixing for Hyperspectral and Multispec- tral Data Fusion, IEEE Transactions on Geoscience and Remote Sensing 50 (2) (2012) 528–537.
[36] R. Dian, S. Li, L. Fang, T. Lu and J. M. Bioucas-Dias, Non- local Sparse Tensor Factorization for Semiblind Hyperspectral and Multispectral Image Fusion, IEEE Transactions on Cyber- netics 50 (10) (2020) 4469–4480.
[37] X. Meng, H. Shen, Q. Yuan, H. Li, L. Zhang and W. Sun, Pan-