Predicting High or Low Transfer Efficiency of Photovoltaic Systems Using a Novel Hybrid Methodology Combining Rough Set Theory, Data Envelopment Analysis and Genetic Programming

(1)

Energies 2012, 5, 545-560; doi:10.3390/en5030545

energies

ISSN 1996-1073 www.mdpi.com/journal/energies Article

Predicting High or Low Transfer Efficiency of Photovoltaic

Systems Using a Novel Hybrid Methodology Combining

Rough Set Theory, Data Envelopment Analysis and

Genetic Programming

Yi-Shian Lee 1,* and Lee-Ing Tong 2

1_{Research Center for Psychological and Educational Testing, National Taiwan Normal University,}

HePing East Rd., Section 1, Taipei 106, Taiwan

2_{Department of Industrial Engineering Management, National Chiao Tung University,}

1001 Ta-Hsuch Rd., Hsunchu 300, Taiwan; E-Mail: [email protected]

* Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +886-2-23683967 (ext. 15).

Received: 24 January 2012; in revised form: 13 February 2012 / Accepted: 20 February 2012 / Published: 27 February 2012

Abstract: Solar energy has become an important energy source in recent years as it generates less pollution than other energies. A photovoltaic (PV) system, which typically has many components, converts solar energy into electrical energy. With the development of advanced engineering technologies, the transfer efficiency of a PV system has been increased from low to high. The combination of components in a PV system influences its transfer efficiency. Therefore, when predicting the transfer efficiency of a PV system, one must consider the relationship among system components. This work accurately predicts whether transfer efficiency of a PV system is high or low using a novel hybrid model that combines rough set theory (RST), data envelopment analysis (DEA), and genetic programming (GP). Finally, real data-set are utilized to demonstrate the accuracy of the proposed method.

Keywords: photovoltaic systems; rough set theory; data envelopment analysis; genetic programming; hybrid model

(2)

Energies 2012, 5 546 1. Introduction

Although traditional energy resources, such as oil and coal, account for the largest proportion of energy worldwide, they also produce more pollution than solar energy. As environmental awareness and the need to reduce pollution have increased, solar energy has become an important energy source in industrialized countries. Photovoltaic systems convert solar energy into electrical energy. However PV systems are not yet popular and their transfer efficiency must be improved. Hence, engineers have used various combinations of system components to increase the transfer efficiency of PV systems. Generally, the transfer efficiency of a PV system is only 6–20% [1]. According to the options of experts in PV energy of Taiwan, a transfer efficiency exceeding 9% is considered high and that ≤9% is considered low [2]. Generally, engineers or energy managers must judge if a PV system belongs to one category or the other, thus, a reliable prediction model is needed to determine whether the transfer efficiency of a PV system is high or low. Managers or decision-makers in the PV field will then be able to identify the critical components using the prediction model and improve to transfer efficiencies. Thus, this work develops a novel and efficient prediction model to determine whether transfer efficiency of a PV system is high or low.

In applications of discriminating models, most studies utilized different approaches to construct an effective prediction model [3–7]. These models were constructed using conventional statistical methods, such as discriminant analysis and logistic regression, or artificial intelligence (AI) methods, such as artificial neural networks (ANNs) and support vector machines (SVMs). Ong et al. [8] demonstrated that a discriminating model constructed using an ANN-based method is more accurate than a model constructed using traditional statistical methods, especially when data-sets are non-linear. However, ANN-based discriminating models have poor prediction accuracy when applied to small samples and input variables are irrelevant [9]. Additionally, hidden layers in an ANN are difficult to explain and the relationship between input variables and output variables in an ANN or SVM cannot be expressed by a mathematical equation. Genetic programming (GP) has recently been applied in many fields to construct classification or prediction models. Since GP does not require any assumptions about the relationships between dependent and independent variables to construct a prediction model [10], GP can be applied to both small and large samples [8]. In some applications, GP has better prediction accuracy than ANN-based methods. For examples, Ong et al. [8] utilized GP to construct a more satisfactory credit scoring model than ANN model; Muttil and Lee [11] utilized GP to predict coastal algal blooms and claimed GP can obtain more effective prediction model than ANN in their analytical case. In prediction or classification applications, GP can be used to construct a mathematical equation [10–12]. Moreover, a comparison of the performance of classification models indicated that GP outperforms conventional statistical methods and ANNs [13].

Measuring and monitoring energy efficiency have become important issues in many fields [14]. Some studies have utilized data envelopment analysis (DEA) to assess energy efficiency. For instance, Boyd and Pang [15] examined the relationship between productivity and energy intensity utilizing DEA to assess productivity. Hu and Kao [16] developed an energy efficiency index utilizing DEA. This index is used to determine the energy-saving target ratio (ESTR) for seventeen APEC countries.

(3)

Energies 2012, 5 547 Based on the importance of energy efficiency and the ability of DEA to determine the ratio between input and output variables, this work adopts DEA to evaluate the input/output efficiency of PV systems using multiple inputs, such as texture type, selection of a PV module, and PV module capacity, and one output (transfer efficiency of PV systems).

Moreover, identifying significant input variables is important when constructing an effective prediction model. Many conventional methods, such as correlation analysis, have been utilized to identify the significant input variables for predicting the output variable. However, such methods are restricted by some assumptions, such as a linear relationship among variables and normality, and large data-sets. Thus, a technique that provides a knowledge system contained in a data-set and clear attribute selection under different classes is desirable. Rough set theory (RST) can be utilized as a soft computing tool to deal with data-sets with poor information and remove irrelevant attributes from a data-set [17]. Notably, RST has been applied in many real-world classification problems [18–20].

To construct an efficient prediction model that determines whether the transfer efficiency of a PV system is high or low, this work uses input/output efficiency of a PV system as the predictive variable and enhances prediction accuracy using a novel hybrid model combining RST with GP; this model is called the RST-GP model. Because of its robust reliability in knowledge systems, RST is utilized during the first stage to identify significant input variables. During the second stage, significant independent variables obtained from RST are utilized as input variables for GP to construct a prediction model that can determine whether the transfer efficiency of a PV system is high or low. This remainder of this paper is organized as follows. Section 2 reviews the PV system literature. Section 3 briefly reviews the DEA model used to evaluate the input/output efficiency of a PV system. Section 4 describes RST and GP. Section 5 elucidates the proposed hybrid model. Section 6 analyzes and compares the outcomes of the proposed and existing hybrid models. Section 7 gives conclusions. 2. An Overview of PV System

This section introduces the structure of PV system and factors influencing PV system transfer efficiency.

(4)

Energies 2012, 5 548 2.1. PV System

A PV system primarily consists of a solar cell, an electrical conditioner, an inverter, and a system controller. A PV system uses an inverter to transform light energy into electrical energy. Figure 1 shows the PV system process.

2.2. Factors Influencing PV Systems

Gregg et al. [22] noted that numerous complex factors influence the efficiency of PV systems. These factors can be classified as internal or external factors. Internal factors include PV system texture, the azimuthal angle, transformation of the PV inverter, and selection of direct current (DC) voltage and an inverter. Among the internal factors, PV system texture, the most important factor, influences PV system transfer efficiency. Single crystal and polycrystals are common in PV systems. The azimuthal angle of a PV system is that at which most light is received given the absence of obstacles; thus, azimuthal angle varies with PV system location. The PV inverter transforms light energy into electrical energy. Selection of DC voltage and the inverter both influence PV system transfer efficiency. However, in the real world, the transformation of light energy into electrical energy is affected by dynamic changes in sunshine. Accordingly, the optimal transfer efficiency of a PV inverter cannot be attained in practice.

The two major external factors are described as follows: first, the amount of solar radiation strongly influences PV system transfer efficiency. Thus, the degree of solar radiation must also be considered when determining PV system transfer efficiency. Second, the temperature of a PV system affects the amount of electrical energy converted from light energy. Thus, determining the optimal temperature in a real environment is a major goal for energy experts.

2.3. Evaluating PV System Transfer Efficiency

Transfer efficiency of a PV system is the percentage of energy converted from light energy. The transfer efficiency formula is:

Transfer efficiency (%) = mou 100%

in

P

P × (1)

where P is maximum output electrical energy, and _mou P is input light energy. As transfer efficiency _in increases, the amount of energy a PV system generates increases.

3. Using DEA to Determine Efficiencies

Notably, DEA is a linear programming (LP)-based technique for evaluating decision-making units (DMUs) and deals with many decision-making problems by converting multiple output and input variables into a single comprehensive performance measure [23]. DEA is an extensively utilized non-parametric data analysis technique. For instance, Hu and Kao [16] utilized DEA to construct an energy efficiency index. This index is used to determine the energy-saving target ratio (ESTR) for

(5)

Energies 2012, 5 549 seventeen Asia-Pacific Economic Cooperation (APEC) countries. Tsai et al. [23] applied DEA with other measures to assess the magnitude of performance differences between leading telecom carriers. Guo and Tanaka [24] utilized a fuzzy DEA model to solve an efficiency evaluation problem with given fuzzy input and output data. Wu et al. [25] used the DEA-neural network approach to evaluate branch efficiency for a large Canadian bank. Additional detailed descriptions of DEA can be found elsewhere [26–28].

DEA, developed by Charnes, Cooper, and Rhodes (CCR) [28], was based on Farrell’s (1957) pioneering study of efficiency measures (relative efficiency or productivity of a specific DMU) [29]. Suppose data for each DMU, j=1, 2,...,n, comprise q positive outputs, y , 1, 2,...,_rj r= q, and p positive inputs, x , _ij i=1, 2,...,p. Let ho (o=1, 2,...,n) be the DMU whose relative efficiency is to be maximized. The DEA model is displayed as LP as follows:

Maximize 1 1 q ro ro r o p io io i u y h v x = = ∑ = ∑ (2) Subject to 1 1 1 q r rj r p i ij i u y v x = = ∑ ≤ ∑ , 0 r i u v ≥ ; 1, 2,...,i= p; 1, 2,...,r= q

where ,u_ro v are the variable weights of given to the rth output and ith input of the oth DMU, _io respectively. Furthermore, u and _ro v are decision variables of LP modeling used to determine the _io relative efficiency of DMUo. Obviously, the maximum value (efficiency score), h , cannot exceed 1. _o If 1h_o = , the DMUo is called the constant returns to scale (CRS) frontier [30]. There are two CCR models in practice. One minimizes input variables, and the other maximizes output variables. In this work, in order to obtain maximum energy efficiency, the maximized output variables of the CCR model are utilized to obtain the optimal value for the objective function, h . _o

4. Rough Set Theory and Genetic Programming

This section reviews the basic concepts of RST and GP. 4.1. Basic Concepts of Rough Set Theory

Pawlak [31] developed RST as a data-mining approach in 1982. RST has proved effective for data-sets with poor information or ambiguity and it can be applied in many fields [32–34]. Walczak and Massart [35] provided a detailed description of RST.

An information system can be represented as S=(U, R, V, f), where U is the universe (a finite set of objects, U = {x1,x2,…,xn}), R is a finite set of attributes (features and variables), _r

r R

V V

∈

= ∪ , where Vr is the domain of attribute r, and f U R: × → is an information function such that V f x r

( )

, ∈V_r for all

(6)

Energies 2012, 5 550 extracting decision rules. Let P⊆ and XR ⊆ , the lower approximation of X in S by P is denoted U as PX , and the upper approximation of X in S by P is denoted as PX and are derived as follows:

{ | ( ) } PX = ∈x U Ind R ⊂ X (3) { | / ( ) } PX= ∈x U U Ind R ∩ ≠X φ (4) where:

{

}

/ ( ) ( , )_i _j , ( , )_i ( , ),_j U Ind R = x x ∈ ⋅U U f x r = f x r ∀ ∈r R ₍₅₎

From Equations (3),(4), the boundary can be represented as follows:

( )

p

PN X =PX PX− ₍₆₎

Hence, reducts can be obtained utilizing approximation spaces. Given an information system

(

,

)

S = U R , and then the reduct RED(P), the minimal set of attributes is P⊆ , such that R ( ) ( ) P R r U =r U where: 1 ( ( )) 1| ( ) | ( ) ( ) | | n n i i i i P card P X P X r U card U U = = ∑ ∑ = = (7)

where r U_p( ) is the ratio of all P-correctly classified objects to all objects (U) in the system. Furthermore, core is common to all reducts. For instance, COR(P) is the core of P when

( )

COR P = ∩RED P . Reduction is a feature subset selection process. The selected feature subset retains its explanation ability and has minimal redundancy [36]. Core analysis results can be represented as a reference of important attributes in a knowledge system. Several RST-based reduction and feature-selection algorithms have been developed. For instance, Wen et al. [37] applied RST and a grey model to analyze the factors influencing gas breakdown. Li et al. [38] developed a grey-based rough set approach to solve a supplier-selection problem. Thangavel and Pethalakshmi [39] reviewed studies using RST-based feature selection.

4.2. Genetic Programming

Koza [40] developed GP as a novel algorithm for computer programs that exploits evolution in solving model structure identification problems and performs symbolic regression [41]. The basic concepts of GP resemble those of genetic algorithms (GAs), and include mutation, crossover, and reproduction [10]. Unlike GAs, GP uses a generic parse-tree representation to replace the logic number of the genetic state (0 and 1). Additionally, GP can construct an optimal forecasting equation through symbolic regression. The main advantage of symbolic regression is that it is not limited to any functional form or normality assumption for data-sets. For instance, GP is more flexible in symbolic setting than conventional regression method or data-mining approach (e.g., ANN). Notably, GP is also widely utilized in practical applications such as in forecasting [10–12,42] and classification [8,43].

Functions or statements in GP have operators ({+, −, ×, ÷, log, and exp}) and a trigonometric function ({sin, cos, and tan}). Hence, a GP parse tree (Figure 2) can be applied to a simple example:

(7)

Energies 2012, 5 551 cos[8x] − sec[5y]. When selecting input variables, GP automatically finds variables that contribute most to the model [11,42] and does not have any restriction for data size, as compared to an ANN or large data-set [8,43].

Figure 2. Example of GP parse tree representation.

5. The Proposed Hybrid Prediction Model

This work develops a four-step procedure for predicting whether the transfer efficiency of a PV system is high or low. The proposed prediction model is as follows:

Step 1: Collect transfer efficiencies of PV systems with various component combinations. These components are independent variables and transfer efficiency is a binary output variable (i.e., high or low) in the proposed prediction model.

Step 2: RST selects the significant independent variables of a PV system based on its robust reliability in knowledge system [36–39]. The importance of feature selection based on RST (i.e., core analysis) can be explained as follows [44]:

( )

_{{ }}

( )

{ }

( )

, 1 C C a C a C D C C D D D a D D γ γ γ σ γ γ − − − = = − (8)

where γ_C

( )

D denotes the degree of dependence between conditional features C (the variables of PV

systems) and decision feature D (i.e., the high or low PV transfer efficiency), γC−{ }a

( )

D denotes the

degree of dependence between removing a conditional feature (such as a condition feature) from C and decision feature D. σ₍_{C D}_, ₎

( )

a denotes the variation of degree of dependence between removing a from C with all condition features C. When σ₍_{C D}_, ₎

( )

a is large, feature a importantly affects the decision attribute D.

Step 3: The DEA evaluates energy efficiency (i.e., the input/output ratio) of a PV system. The input variables in DEA are obtained in Step 2 and the output variable in DEA is transfer efficiency of a PV system. The DMU values obtained from DEA represent energy efficiencies of PV systems.

Step 4: GP constructs a classification model for predicting whether transfer efficiency of a PV system is high or low. For the GP model, this work utilizes the significant independent variables obtained in Step 2 and the input/output ratio obtained in Step 3 as input variables of GP and binary

(8)

Energies 2012, 5 552 transfer efficiency (i.e., high or low) of a PV system is the output variable of GP. Table 1 presents parameter settings of the GP model. The parameters of GP are obtained by trial-and-error approach.

Table 1. The settings of GP model.

Items Content

Population size 400

Maximum number of generation 1000

Function set +, −, ×, ÷, sin, cos, exp, log constant

Crossover rate 0.8

Mutation rate 0.02

In Step 2, RST is utilized to select the significant independent variables of PV systems because adopting significant independent variables can yield good accuracy for constructing a prediction model [36]. Moreover, RST can not only deal with small data-sets but also requires no statistical assumptions (such as a linear relationship between input variables with output variable). In Step 3, DEA is utilized to evaluate the energy efficiency of PV systems because the index (energy efficiency of PV systems) efficiently provides sufficient information for evaluating the economic-value of PV systems. In Step 4, GP is utilized to construct a prediction model because of its high performance in forecasting and classification. Furthermore, GP yields good forecasts using only small data-sets [42]. Hence, RST, DEA, and GP are integrated herein to predict the high or low transfer efficiency of PV systems, and the model thus developed is called the RST-DEA-GP model.

6. Empirical Analysis

A real data-set of transfer efficiency of PV systems collected from a Taiwanese research organization is utilized to demonstrate the effectiveness of the proposed model. The data used in Step 1 concern 38 PV systems. Each PV system contains 18 variables (e.g., texture type, capacity for PV-transfer, and number of inverters) and binary transfer efficiency (e.g., low or high). The low and high transfer efficiencies of the PV systems are coded as 0 and 1, respectively. The data-set comprises 38 PV systems–15 with low and 23 with high transfer efficiencies.

Table 2. Selected significant variables from RST and DMU variable from DEA. Variables Description Importance (obtained from RST)

X1 Texture type 0.6424

X2 The output power of inverter 0.5715

X3 The selection of PV module 0.4817

X4 The number of inverter 0.3914

X5 The weights of PV module 0.3367

X6 The selection of inverter 0.2893

X7 PV module capacity 0.2567

X8 The selection of DC voltage 0.2638

X9 The location of PV setting 0.2476

(9)

Energies 2012, 5 553 In Step 2 of the proposed hybrid model, RST is utilized to identify significant independent variables of PV systems. The RST algorithm can be constructed using MATLAB software. The RST results indicate that nine independent variables (X1–X9) are significant (Table 2) because that the importance

value of nine independent variables are greater than 0.2. It has not a clear criterion to determine the threshold value (importance value). Moreover, the nine independent variables (X1–X9) have high

correlation to output variable (the low or high transfer efficiencies of PV systems). The correlation coefficient are greater than 0.6. Also, based on the opinion of experts in PV energy in Taiwan, these nine variables importantly influence for the transfer efficiency of PV systems.

In Step 3, DEA is utilized to evaluate the DMU value of each PV system. Table 2 shows the DMU value (X10). In applying DEA, input variables of DEA are the nine significant variables obtained in

Step 2 and the output variable of DEA is PV system transfer efficiency. The DEA algorithm can be executed by LINGO software. Table 3 lists the DMU values of the PV systems. In Step 4, the significant independent variables obtained in Step 2 and DMU obtained in Step 3 are utilized as input variables for GP to predict the high or low level of PV system transfer efficiency. To demonstrate the effectiveness of the proposed hybrid model, some basic classification models such as K Nearest Neighbor (KNN), Naive Bayes (NB), SVM, ANN, and GP are utilized as benchmark models. The basic classification models belong to data-mining techniques and can obtain better prediction performance than traditional linear statistical method (e.g., linear regression) [8,10].

Table 3. The results of DMU value of each PV system by utilizing DEA.

No DMU No DMU PV001 1.0000 PV023 0.7735 PV002 0.9482 PV024 0.8059 PV003 0.9879 PV025 1.0000 PV004 0.8392 PV026 1.0000 PV005 1.0000 PV027 1.0000 PV006 1.0000 PV028 1.0000 PV007 1.0000 PV029 1.0000 PV008 1.0000 PV030 0.6981 PV009 1.0000 PV031 0.6417 PV010 0.6902 PV032 0.6608 PV011 0.9215 PV033 0.4919 PV012 0.5153 PV034 1.0000 PV013 0.4955 PV035 0.8274 PV014 0.9667 PV036 0.4947 PV015 0.7484 PV037 0.8405 PV016 1.0000 PV038 0.9944 PV017 0.6144 PV018 0.8630 PV019 1.0000 PV020 0.8630 PV021 1.0000 PV022 0.8832

(10)

Energies 2012, 5 554 Although some studies [36] have also adopted hybrid classification models that combine RST, DEA, and SVM to predict business failures, the RST of their proposed methodology did not identify how to obtain the important variables based on a clear equation. This study [36] only adopted the RSES software tool [45] to select important variables. Furthermore, the SVM model performs well only with large data-sets, and collecting large data-sets for PV systems is difficult. Hence, the use of a suitable classification model for small data-sets is important for constructing a high-precision prediction model. In order to compare the accuracy of hybrid prediction model when adding DEA or nor, this work does some design of experiments for prediction models. The proposed model, named RST-DEA-GP model, which adopts the significant variables obtained by RST and the DMU variable obtained in DEA as input variables for GP (model I). The RST-GP model adopts only the significant variables, X1–X9, as

the input variables for GP (model II). In both models I and II, this work adopts leave-one-out cross validation to test the accuracy of the prediction model.

Tables 4 and 5 show the analytical results for hybrid models I and II, respectively. Model I has an average correct classification rate of 92.10%, and that of model II is 84.21%. Hence, adding DEA provides more information than adopting significant input variables only and enhances prediction model accuracy.

Table 4. RST-DEA-GP model (model I) results with both significant variables and DMU.

Actual class Classified class

1 (High-level) 2 (Low-level)

1 (High-Level) 22 (95.65%) 1 (4.35%)

2 (Low-Level) 2 (13.33%) 13 (86.67%)

Average correct classification rate: 92.10%.

Table 5. RST-GP model (model II) results with only significant variables.

1 (High-level) 2 (Low-level) 1 (High-Level) 21 (91.30%) 2 (8.70%)

2 (Low-Level) 4 (26.67%) 11 (73.33%)

The RST-SVM-based models are also utilized to predict whether PV systems have high or low transfer efficiency. The RST-DEA-SVM model uses both significant variables obtained from RST and DMU as input variables of SVM (model III). The RST-SVM model, which utilizes only significant attributes, is model IV. In constructing the SVM model, this work utilizes STATISTICA software to generate a classification model. Some studies [46,47] utilized the Gaussian kernel function to enhance prediction performance. For the SVM model, parameters settings are the Gaussian kernel function, C = 3, and r =0.129, which can generate an appropriate prediction model. Tables 6 and 7 summarize prediction results for the confusion matrix utilizing models III and IV, respectively. Based on RST-SVM-based model results, adding DEA improves the correct classification rate from 78.94% to 81.57%.

(11)

Energies 2012, 5 555 Table 6. RST-DEA-SVM model (model III) results with significant variables and DMU.

2 (Low-Level) 4 (26.67%) 11 (73.33%)

Table 7. RST-SVM model (model IV) results with only significant variables.