Neuro-Fuzzy Cost Estimation Model Enhanced by Fast Messy Genetic Algorithms for Semiconductor Hookup Construction

(1)

Neuro-Fuzzy Cost Estimation Model Enhanced by

Fast Messy Genetic Algorithms for Semiconductor

Hookup Construction

Fan-Yi Hsiao

Department of Civil Engineering, National Chiao Tung University, Hsin-Chu, Taiwan

Shih-Hsu Wang

Department of Civil Engineering, R.O.C Military Academy, Kaohsiung, Taiwan

Wei-Chih Wang

∗

Department of Civil Engineering, National Chiao Tung University, Hsin-Chu, Taiwan

Chao-Pao Wen

Program of Engineering Technology and Management, College of Engineering, National Chiao Tung University, Hsin-Chu, Taiwan

&

Wen-Der Yu

Department of Construction Management, Chung Hua University, Hsin-Chu, Taiwan

Abstract: Semiconductor hookup construction (i.e., constructing process tool piping systems) is critical to semiconductor fabrication plant completion. During the conceptual project phase, it is difficult to conduct an ac-curate cost estimate due to the great amount of uncer-tain cost items. This study proposes a new model for estimating semiconductor hookup construction project costs. The developed model, called FALCON-COST, in-tegrates the component ratios method, fuzzy adaptive learning control network (FALCON), fast messy ge-netic algorithm (fmGA), and three-point cost estimation method to systematically deal with a cost-estimating en-vironment involving limited and uncertain data. In ad-dition, the proposed model improves the current FAL-∗_{To whom correspondence should be addressed. E-mail: weichih@} mail.nctu.edu.tw.

CON by devising a new algorithm to conduct building block selection and random gene deletion so that fmGA operations can be implemented in FALCON. The results of 54 case studies demonstrate that the proposed model has estimation accuracy of 83.82%, meaning it is approx-imately 22.74%, 23.08%, and 21.95% more accurate than the conventional average cost method, component ratios method, and modified FALCON-COST method, respec-tively. Providing project managers with reliable cost esti-mates is essential for effectively controlling project costs.

1 INTRODUCTION

High technology semiconductor fabrication has been an essential part of Taiwan’s national economic growth in past decades. Numerous semiconductor fabrication

C

2012 Computer-Aided Civil and Infrastructure Engineering. DOI: 10.1111/j.1467-8667.2012.00786.x

(2)

plants (or fabs) have supported this development (Liu et al., 2010). Semiconductor hookup construction plays a critical role in fab completion (Hong, 2001; Beer et al., 2008). During the conceptual phase of a semi-conductor hookup construction project (i.e., construct-ing process tool pipconstruct-ing systems for a process module; see Section 2 for further illustration), it is preferable to conduct an accurate cost estimate for effective cost control. In practice, due to the requirement for rapid semiconductor plant construction, the semiconductor plant management team may not be able to ascertain the semiconductor process and/or the detailed require-ments (such as the needed diameters and quantities of pipes) for a process module at that time. Even if the pro-cess is confirmed, the originally planned propro-cess may be altered after the plant is completed because of the rapid advancements and changes in semiconductor process-ing technologies. Estimatprocess-ing the costs of a semiconduc-tor hookup construction project occurs in an uncertain estimation data environment.

The conventional methods used to estimate semicon-ductor hookup construction project costs either take the average costs from historical data or rely heavily on ex-perienced estimators’ intuition. Both existing methods perform poorly, that is, resulting in a huge gap between the estimated cost and the final project cost. Under-estimated costs result in poorly allocated budget execu-tion, whereas over-estimated costs push aside other jobs required for completing a fab (Wu, 2007; Wen, 2010). Therefore, it is necessary to develop an improved cost estimation method for semiconductor hookup construc-tion projects to effectively control the total fab cost.

Artificial intelligence technologies, such as neural networks (NNs) (Ahmadlou and Adeli, 2010; Sedano et al., 2010), fuzzy logic (FL) (Jiang and Adeli, 2003; Lee and Pinheiro dos Santos, 2011; Ma et al., 2011; Reuter, 2011; Ross et al., 2011), and genetic algo-rithms (GAs) (Kim and Adeli, 2001; Carro-Calvo et al., 2010; Mart´ınez-Ballesteros et al., 2010; Baraldi et al., 2011) have been widely used in construction engineer-ing and project cost estimations (Creese and Li, 1995; Kim et al., 2005; Yu and Lin, 2006; Cheng et al., 2009). For instance, Creese and Li (1995) developed a back-propagation NN (Hung and Adeli, 1993; Koprinkova-Hriatova, 2010) application for the timber bridge para-metric cost estimation. Boussabaine and Elhag (1997) designed a neuro-fuzzy system (NFS) related to FL for predicting the cost and duration of a construction project. Karim and Adeli (1999a,b) developed CON-SCOM model for construction scheduling, cost opti-mization, and change-order management using neuro-computing and object technologies (Hung and Adeli, 1994; Adeli and Yu, 1995; Jiang and Adeli, 2004; Zhang

et al., 2011). Senouci and Adeli (2001) presented a resource scheduling model using the neural dynamics model of Adeli and Park.

Kim et al. (2005) established a cost approximation model for residential projects using GAs to optimize the parameters and weights of the back-propagation NN. Yu and Lin (2006) combined NN and FL to develop a Variable Attribute Fuzzy Adaptive Learning Control Network (VaFALCON) that was able to handle cost es-timation missing attribute problems. Cheng et al. (2009) integrated GAs, FL, and NN technologies to establish a cost estimation model with an extremely high pre-dictive power for construction costs. Rokni and Fayek (2010) proposed a multicriteria optimization approach for industrial shop scheduling using fuzzy set theory. In general, NNs provide the ability to learn from past data and generalize solutions for future applications. The FL allows tolerance for real world imprecision and uncer-tainty. GAs can be applied toward the global optimiza-tion of parameters (Cheng et al., 2009).

In a seminal book, Adeli and Hung (1995) advocated and presented the synergistic integration of the three ar-eas of computation intelligence: NN, FL, and GA and showed how such a multiparadigm approach can help solve complicated pattern recognition problems such as face recognition and engineering design. Since then many authors have followed their multiparadigm ap-proach, but the great majority have focused on integra-tion of just two, such as FL and NNs (Gonzalez-Olvera et al., 2010; Li et al., 2010; Scherer, 2010; Theodoridis et al., 2010; Wang et al., 2010; Freitag et al., 2011) or FL and evolutionary computing (Iglesias et al., 2010; Patrinos et al., 2010). This work proposes an innovative model, called FALCON-COST, for estimating semicon-ductor hookup construction project costs (Wang et al., 2012) using NN, FL, and GA. To systematically deal with a cost-estimating environment involving limited and uncertain data, this proposed model integrates the component ratios method, fuzzy adaptive learning con-trol network (FALCON), fast messy GA (fmGA), and three-point cost estimation method.

The remainder of this article is organized as follows. Section 2 introduces semiconductor hookup construc-tion and three cost-estimating characteristics. Secconstruc-tion 3 reviews some of the existing cost estimation models. Section 4 elucidates the techniques adopted by the pro-posed model. Section 5 presents a general description of the proposed model. Section 6 presents the details of the proposed model. Section 7 discusses the application results from 54 case studies. The results using the pro-posed model and three other methods are compared. Section 8 presents our conclusions and offers recom-mendations for future research.

(3)

material mask IC design TF DIFF CMP IMP LITHO INT ETCH WET ICs test assembly inspection wafer Cleanroom Process (semiconductor hookup construction)

Fig. 1. Workflow of main process modules for a

semiconductor fab.

2 SEMICONDUCTOR HOOKUP CONSTRUCTION

In an intensely competitive market environment, com-pleting a fab construction project, from ground break-ing to first wafer production, takes about 12 months (Chasey and Merchant, 2000). Under normal market conditions, project completion takes about 18 months, and hookup construction usually begins in the 11th month and lasts for 3 months (Wu, 2007; Wen, 2010). Figure 1 displays the main production process modules used to manufacture microchips in a semiconductor fab (Hong, 2001; Wu, 2007; Wen, 2010). There are eight process modules related to semiconductor hookup con-struction, including chemical mechanical planarization (CMP), diffusion (DIFF), ETCH (dry etching), inte-gration (INT), LITHO, implant (IMP), thin film (TF), and wet etching (WET) (Hong, 2001; Wu, 2007; Wen, 2010). The construction of piping systems for a process tool related to a particular process module (or simply called a process module tool) is called a semiconductor hookup construction project (or simply called a project hereafter). Constructing the piping systems required for hookup is the last job in finishing a semiconductor fab, and is the first job to require the installation of process module tools for production (Hong, 2001). A semicon-ductor fab construction project requires conducting sev-eral hookup construction projects to install numerous process module tools.

A hookup construction project requires constructing some or all of the 11 types of piping systems (corre-sponding to 11 cost items), including (Wen, 2010): (1) bulk gas, (2) special gas, (3) chemical, (4) pumping line, (5) process cooling water (PCW), (6) ultra pure water (UPW), (7) drain, (8) power, (9) exhaust, (10) process vacuum (PV), and (11) foundation. For example, the CMP project (i.e., a project that is related to the CMP process module tool) is mainly involved in bulk gas, UPW, exhaust, and foundation piping systems. The construction for a DIFF project is related mainly to bulk gas, special gas, pumping line, and exhaust piping systems.

During the early phase of a hookup construction project, cost estimators are required to provide a best estimate to facilitate effective budget allocation for the project (Wu, 2007; Wen, 2010). In this stage, the type of process module tool is often known. However, the required quantities and specifications (such as the pipe diameters) for the piping systems are frequently unknown, primarily because tool demands/needs (such as installation locations and manufacturing brands) are still being planned (Wen, 2010). Thus, the project cost cannot be estimated using the well-known unit cost method (that is, the total cost is the summation of the products of the quantities multiplied by the corre-sponding unit costs) (Hendrickson and Au, 2003). The conventional approach uses an average cost method according to historical project cost data. That is, the estimated total cost for a new project is the summa-tion of the average costs of the 11 cost items in his-torical projects. In addition, cost estimates are some-times simply based on the experience of estimators to directly generate total project cost. The accuracy of es-timates made with both the average cost method and the experience-based method are often unacceptable, lead-ing to poor budget plannlead-ing (Wen, 2010).

Accurately estimating hookup construction project cost during the conceptual phase is difficult because cost estimators must base their calculations on limited and uncertain cost data. To be effective, cost estima-tion models must deal with three cost-estimating char-acteristics. The first characteristic is that only limited data is available for estimating new projects. As indi-cated earlier, typically only some piping systems dom-inate the work invested in each project. The second characteristic is that the relationships (reasoning rules) between the costs of the dominant piping systems and total project cost are complicated. The third character-istic is that the details (e.g., quantities and specifica-tions) of the piping systems are uncertain in evaluating a new project cost. Therefore, estimator experience is still required.

3 REVIEW OF COST ESTIMATION MODELS Accurately estimating costs is an essential task in effec-tively managing construction projects. Several cost esti-mation models have been proposed to account for the effects of uncertainties. These recent cost models in-volve FL (Sarma and Adeli, 2000, 2002), NNs (Adeli and Wu, 1998; Adeli and Karim, 1997, 2001), simu-lation (Wang, 2002; Wang et al., 2007; Wang et al., 2008; Sadeghi et al., 2010; Chou, 2011), and other sys-tematic approaches (Diekmann and Featherman, 1998; Oberlender and Trost, 2001).

(4)

Numerous conceptual cost estimation methods, such as unit cost, cost indices, cost-capacity factors, compo-nent ratios, and parametric estimation, have been de-signed to quickly compute a total project cost in the construction industry (Sarma and Adeli, 1998; Barrie and Paulson, 1992; Hendrickson and Au, 2003; Hong et al., 2011). Cost indices focus on cost changes over time, whereas cost-capacity factors apply to changes in size, scope, or the capacity of similar projects (Barrie and Paulson, 1992). They reflect the increase in cost with size, as a result of economics of scale.

The parametric estimation method has been widely applied. The parametric estimation method takes a sin-gle parameter (such as floor area, cubic volume, electric-ity generating capacelectric-ity, steel production capacelectric-ity, etc.) to describe a cost function in the screening estimate of a new facility (Hendrickson and Au, 2003; PEH, 2008). A parametric cost estimation method includes one or sev-eral cost-estimating relationships between the cost (the dependent variables) and the cost-governing parame-ters (the independent variables) (Hegazy and Ayed, 1998; PEH, 2008). Cost indices can also be incorporated into the parametric estimation method for reflecting the cost changes over time (Oberlender, 2000; Barrie and Paulson, 1992).

Parametric cost estimation methods are often used by both contractors and government bodies in the project planning and budgeting stages (Hegazy and Ayed, 1998). Several parametric estimation methods based on regression analysis and NNs have been sug-gested to improve the accuracy of conceptual cost esti-mates (Sonmez, 2008; Gunduz et al., 2011). The regres-sion technique allows a relatively simple analysis to sort out the impact of the parameters on the cost of a project (Lowe et al., 2006). NNs based on artificial intelligence offer an alternative approach to estimate the costs of building projects (Kim et al., 2005) and highway projects (Hegazy and Ayed, 1998).

Generally, the current conceptual cost-estimating methods focus on the level of total project cost (i.e., they usually do not examine any cost divisions or cost item details) and generate estimates that can vary widely in terms of accuracy. In addition, current methods have been developed for various projects, such as building projects, highway projects, and oil refinery projects. However, no methods have been developed for captur-ing the aforementioned three cost-estimatcaptur-ing character-istics encountered in semiconductor hookup cost esti-mations (Wen, 2010).

4 REVIEW OF RELATED TECHNIQUES This section reviews the techniques related to the pro-posed model, including the component ratios method, FALCON, and fmGA.

4.1 Component ratios method

In the component ratios method (also called equipment installation cost ratios, plant cost ratios, or ratio estimat-ing method), it is assumed that a ratio (or factor) ex-ists between the total project cost and the cost of a ma-jor cost item (Barrie and Paulson, 1992). Hence, when the cost of the major cost item and the ratio (=total project cost divided by major item cost based on histor-ical data) are known, the total project cost can be cal-culated by multiplying the major item cost by the ratio (greater than 1.0). A variation on this component ratios method takes the cost of each major item separately, multiplies each by its own ratio, then takes the sum of the factored items (Barrie and Paulson, 1992).

Following the component ratios method concept, Yu (2006) further developed a principal-item ratio estima-tion method (PIREM) by considering only the selected 20% cost items (called principal items) and their as-sociated principal item ratios to calculate the overall cost. This “20%” number is determined according to the Pareto Optimum Criterion (named “80/20 princi-ple”) which implies that 80% of the overall project cost is determined by 20% of the cost items (Koch, 1997). Yu (2006) discussed public civil construction projects and building projects (unlike the present investigation). However, his study encouraged the belief that focusing on certain principal costs could not only produce accept-ably accurate estimates, but it could also save estima-tion effort and time. Hence, the 80/20 principle is used to identify major cost items (to support the component ratios method adopted in this study) with the difference that the sum of the principal item costs does not exactly equal 80%. See the model for further illustrations.

4.2 FALCON

FALCON, one of the NFSs, is a fuzzy system that uses a learning algorithm derived from NN theory to deter-mine its parameters by processing data samples (Lin and Lee, 1991). Lin and Lee (1991) developed FAL-CON to solve system control problems in electronics and manufacturing engineering. However, FALCON has been utilized to acquire construction knowledge due to its numerous features, such as the ability to handle uncertainties and trace-back functions for problem solv-ing (Yu and Skibniewski, 1999). Furthermore, the FAL-CON network structure graphically shows how it cap-tures the complex IF-THEN reasoning rules (Lin and Lee, 1991). Most importantly, FALCON has been mod-ified to support conceptual cost estimation in building projects (Yu and Lin, 2006; Yu, 2007).

FALCON’s learning ability is based on the Kohonen learning rule and supervised learning algorithm. In the traditional FALCON methodology there is no

(5)

mechanism for rule refinement after Kohonen learn-ing. As indicated by Yu and Skibniewski (1999), it has been found that after many computational experiments the two learning algorithms (the competitive learning for initial rule connection and the back-propagation for fine-tuning of membership functions) in the FALCON methodology encounter severe local optimal problems. A local optimum of a combinatorial optimization prob-lem is a solution that is optimal (either maximal or mini-mal) within a neighboring set of solutions. This contrasts with a global optimum, which is the optimal solution among all possible solutions.

First, as the FALCON FL rules are determined by the fuzzy partitions of the linguistic terms defined for input attributes by the decision maker, it results in enormous FL rules that contain redundant precondition links and unnecessary consequence links. Such a problem is es-sentially due to using two algorithms in the traditional FALCON method. Using the Kohonen learning rule first to roughly determine the fuzzy membership func-tions of fuzzy linguistic terms may have imposed an er-roneous precondition structure for the FL rules. As a result, the consequence links obtained by competitive learning rule (based on the precondition structure pre-viously determined) in the unsupervised learning phase in the traditional FALCON may be erroneous. Second, back-propagation is adopted for supervised parameter learning in fuzzy membership functions (for both in-put and outin-put layers) on the primitive fuzzy rules de-termined in the structure learning. Because the primi-tive fuzzy rules are erroneous, the supervised parameter learning results may be easily captured in local optimum (Lin and Lee, 1996; Yu and Skibniewski, 1999).

Because there is no mechanism to revise the FL rules using back-propagation once they are determined, the local optimum problem cannot be improved in the tradi-tional FALCON method (Lin and Lee, 1996). Yu (2007) and Cheng et al., (2009) suggested adopting the messy GA (mGA) and fmGA, respectively, for structure revi-sion and parameter learning in a NFS, in which FAL-CON is one kind of NFS. This study, thus, applies the fmGA mutation and cut-splice operators to revise the fuzzy membership functions and FL rules of FALCON to improve the cost estimation accuracy.

4.3 fmGA

GAs, originally proposed by Holland (1975), are search algorithms and they search through a decision space for optimal solutions based on the mechanics of natural se-lection and genetics. Using GAs for civil engineering problem solutions may go as far back as 1993 (Adeli and Cheng, 1993, 1994a,b; Adeli and Kumar, 1995a,b). GAs have also been applied in other disciplines such as

con-struction engineering (Cheng and Yan, 2009; Al-Bazi and Dawood, 2010), transportation engineering (Lee and Wei, 2010; Putha et al., 2012), highway engineering (Kang et al., 2009), and structural engineering (Marano et al., 2011).

To explore an individual gene’s contribution to the fitness value during the evolution process, mGA was developed (Goldberg et al., 1989). Unlike the simple GAs which use fixed length strings to represent possi-ble solutions, Goldberg et al. (1993) further developed the fast mGA (fmGA) to apply messy chromosomes to form strings of various lengths.

The fmGA chromosome is divided into two parts: al-lele locus and alal-lele value. The alal-lele locus represents the allele serial number. The allele value is the value of the allele serial number. The major difference be-tween the fmGA and traditional GA lies in the fact that the fmGA allows for variations in the chromosome lengths and the allele locus and allele value evolution may happen simultaneously. This variable chromosome length characteristic provides a desirable capability for FALCON structural revision because the optimum pre-condition and consequence links structure for the fuzzy rule base may be obtained via the fmGA evolution process.

Four distinct features differentiate the fmGA from the traditional GA (Feng and Wu, 2006): (1) variable length chromosomes can be adopted in fmGA; (2) sim-ple cut and splice is used to replace the GA opera-tor mechanism; (3) the optimization process contains a primordial phase and a juxtapositional phase; and (4) competitive templates (CTs) are adopted to retain the most outstanding gene building blocks (BBs) in each generation.

After applying the cut-splice operator to the chro-mosome, the problems of chromosomes being over-or under-specified may result (Feng and Wu, 2006). If the chromosomes are over-specified the fmGA will screen out repeated genes from left to right on a first-come-first-served basis. If the chromosomes are under-specified the fmGA will make the chromosome with the optimum fitness in the previous generation be the CT and make up for the missing genes.

5 GENERAL DESCRIPTION OF THE PROPOSED MODEL

This section provides an overview of the proposed cost-estimating model, called FALCON-COST. The mod-eling steps are displayed in the left part of Figure 2. Restated, the component ratios method, FALCON, and fmGA are integrated to generate the original FALCON-COST that will be trained (or learned) from

(6)

Step 1: Identifying principal cost items

7. Transmitting the training cost data and performing

defuzzification operation

Step 2: Applying FALCON

Step 4: Using three-point cost estimating method to estimate a new project Step 3: Applying fmGA

1. Initialization phase Generate competitive template (CT) No Yes Yes epoch=1 epoch= epoch+1 era=1 String with optimal value to replace old CT 2. Primordial phase era

=era+1 _{Inner loop:} era≥ era_max

Building block filtering

6. Performing fuzzification and fuzzy OR operations 5. Building consequences of fuzzy rules

4. Performing fuzzy AND operation at rule nodes 3. Building preconditions of fuzzy rules

2. Performing fuzzification from input data

1. Transmitting cost data into FALCON End Start No Outer loop: epoch≥ epoch_max 3. Juxtaposition phase Cut and splice

Mutation

Second cost-estimating characteristic:

Relationships between the costs of the dominant cost items and total project cost are complex

First cost-estimating characteristic:

Only limited data is available; some piping systems dominate the project

Third cost-estimating characteristic:

Details of the dominant cost items are uncertain; estimator experience is still required

Fig. 2. Proposed model to meet three cost-estimating characteristics.

historical projects. See Steps 1–3. The three-point esti-mation method is then applied to support the trained FALCON-COST to predict the total cost of a new project. See Step 4.

This proposed model aims to systematically guide cost estimators to conduct their estimations for dealing with the above three cost-estimating characteristics. See the right part of Figure 2. Namely, the component ra-tios method reflects the first characteristic and focuses exclusively on the cost items of those dominant piping systems. The FALCON and fmGA methods are used to solve the complex relationships between those dom-inant cost items and total project costs (second charac-teristic). The three-point estimation method deals with the third characteristic in assessing the uncertainties of the dominant cost items in a new project. The major modeling steps of the FALCON-COST are described as follows.

Step 1—Unlike many existing conceptual cost estima-tion models that center on the level of total cost, the proposed model predicts the total costs by analyzing the item-level costs. To reflect the environment lacking sufficient and certain data for cost estimating, the com-ponent ratios method is adopted to identify the major (or principal) cost items to forecast the total cost of a project. In this study, 241 historical projects are used to indicate the principal cost items of the projects for the eight types of process modules.

Step 2—The FALCON is used to learn the rela-tionships between the principal item costs (inputs) and the corresponding total cost (output) of each historical project for each type of process module. The main FAL-CON operations include (Lin and Lee, 1991): calcu-lating membership functions from network input, per-forming the “fuzzy AND” operation to determine the fired rules, and performing the “fuzzy OR” operation

(7)

Table 1

Training and test project amounts in each process module

Training

project First set of test Second set of test Module amounts project amounts project amounts

CMP 15 2 2 DIFF 41 4 4 ETCH 36 4 4 INT 29 3 3 LITHO 16 3 3 IMP 30 3 3 TF 27 3 3 WET 47 5 5 Total 241 27 27

to aggregate the linguistic fuzzy cost estimation term memberships and finally carrying out defuzzifica-tion to derive a total project cost estimate (output value).

Step 3—To overcome the local optimum problem caused by FALCON, FALCON’s fuzzy membership functions and FL rules are optimized through the fmGA mutation and cut-splice operators to enhance the cost estimation accuracy. After completing this step the training process for the original FALCON-COST is finished.

Step 4—To compute the cost of a new project, the three-point cost estimation method is applied to de-termine the expected cost of each principal cost item. These expected principal item costs are then treated as the inputs for the trained FALCON-COST to generate the total cost of the project.

6 DETAILS OF THE PROPOSED MODEL This section illustrates the detailed FALCON-COST development. Semiconductor hookup construction con-sists of eight process modules. A cost estimation model must be established for each individual mod-ule because the piping systems related to each modmod-ule vary.

6.1 Historical projects

Two hundred forty-one historical projects related to the same semiconductor plant were used to develop and train the FALCON-COST. The left and right parts of Table 1 present the training and test project amounts, respectively, for the eight types of process modules.

6.2 Step 1: Identifying principal cost items

Based on the component ratios method, three to four principal cost items of each project for each process module are indicated. Table 2 lists the average cost and the percentage of each principal cost item identified in each module. Notably, the costs are in New Taiwan dollars ($1 U.S. dollar ∼= $30 New Taiwan dollars). For instance, four cost items from 15 historical CMP projects are identified to have the highest cost percent-ages. Restated, the cost percentages are 8.7%, 15.7%, 33.7%, and 16.5% for the bulk gas, UPW, exhaust, and foundation piping systems, respectively. Because the sum of these percentages of cost account for a high por-tion (about 74.9%) of the total cost, these four items are called the principal cost items. The principal cost items for the projects for seven other process modules were also determined using a similar process. The prin-cipal item costs (inputs) and the corresponding total cost (output) for each historical project are used as the training data in the proposed model.

6.3 Step 2: Applying FALCON

A FALCON network structure consists of five layers of nodes and two links, including: the input linguistic nodes (Layer 1), input term nodes (Layer 2), IF-part condition links (Link 1), rule nodes (Layer 3), THEN-part conse-quence links (Link 2), output term nodes (Layer 4) and output linguistic nodes (Layer 5) (Lin and Lee, 1991). Layer 2 and Layer 3 are connected by Link 1, whereas Layer 3 and Layer 4 are connected by Link 2. The com-putation results for each node will be passed on to the next layer of nodes through the neuron synaptic weights and become the input value for the next layer. A de-scription of these FALCON layers and links is further illustrated below (Lin and Lee, 1991).

1. Layer 1 (input linguistic nodes): The nodes in this layer just transmit the input values (i.e., cost data) to the next layer directly. For example, the actual costs of the four principal cost items (i.e., bulk gas, UPW, exhaust, and foundation) for a CMP project are transmitted directly into the network.

2. Layer 2 (input term nodes): The nodes in this layer are responsible for calculating the member-ship functions. That is, this layer conducts fuzzi-fication on the input values (i.e., cost data) from Layer 1. Fuzzy partitions are determined based on the clustering relationships of the principal item costs and the total costs. For instance, Figure 3 de-picts the clusters identified in each principal item after conducting the fuzzy partitions for the CMP module. Based on the cost data graphic distribu-tion, two clusters (high and low) are identified for

(8)

each of the bulk gas, UPW, and exhaust principal items, whereas three clusters (high, medium, and low) are for the foundation principal item. As a re-sult, the input parameters are partitioned accord-ing to the number of identified clusters.

3. Link 1 (IF-part condition links): The connections between Layer 2 and Layer 3 represent the fuzzy IF-THEN rule preconditions. Take the CMP mod-ule for example. Based on Layer 2 operations there are 2, 2, 2, and 3 clusters identified for the bulk gas, UPW, exhaust, and foundation items, re-spectively. Therefore, 24 (=2×2×2×3) IF-part FL rules are generated.

4. Layer 3 (rule nodes): The rule nodes in this layer perform the “fuzzy AND” operation to derive the fired strength of various FL rules. For example, in the CMP module, four input cost parameters are involved. These input parameters are fuzzified in Layer 2 (input term nodes) with corresponding fuzzy partitions (i.e., 2, 2, 2, and 3). A membership value (ranging from 0 to 1) is then given to each input term node. Every rule node is connected to five output term nodes in Layer 4 for the CMP module.

5. Link 2 (THEN-part consequence links): The links between Layer 3 and Layer 4 present the

conse-quences of FL rules. There should be no more than one consequence for each rule node in a single out-put network. The links are represented as numeric value 0 (disconnected) or 1 (connected).

6. Layer 4 (output term nodes): The nodes at this layer perform two functions, right–left (only per-formed in training stage) and left–right (for both training and usage stages) transmissions. In right– left transmission the training data (i.e., actual total project cost) from the output layer (i.e., Layer 5) are transmitted into Layer 4. Thus, the fuzzy op-eration of this layer is exactly the same as Layer 2. That is, the output cost data is mapped through the membership functions of the output fuzzy lin-guistic terms. In left–right transmission the output term nodes carry out the “fuzzy OR” operation to sum up the membership functions of the fired rules obtained from Link 2.

7. Layer 5 (output linguistic nodes): The nodes in Layer 5 also perform right–left (only for training stage) and left–right (for both training and usage stages) transmissions. In right–left transmission the nodes at Layer 5 act precisely the same as Layer 1, that is, feeding the training data (i.e., actual total project cost) into the network. In left–right transmission the nodes at Layer 5 perform the

Table 2

Average costs and percentages of the principal cost items in each process module

Module

Cost item CMP DIFF ETCH INT LITHO IMP TF WET

Bulk gas $143,459 $282,234 $768,333 $67,950 $215,554 $151,807 $857,853 $282,712 8.7% 16.4% 19.0% 14.1% 12.8% 15.1% 24.0% 13.8% Specialty gas $245,510 $819,391 $787,540 14.3% 20.2% 22.0% Pumping line $412,006 $507,120 24.0% 14.2% PCW $186,562 11.1% UPW $258,308 $430,736 15.7% 21.0% Drain $267,435 13.1% Power $62,002 $99,932 12.8% 9.9% Exhaust $554,114 $293,638 $1,342,478 $379,487 $278,059 $551,778 $465,667 33.7% 17.1% 33.1% 22.6% 27.6% 15.4% 22.7% Foundation $271,050 $242,613 $367,176 $190,682 16.5% 50.2% 21.9% 18.9% Chemical PV Sum of percentages 74.9% 71.3% 68.9% 80.9% 69.0% 76.3% 70.4% 68.7% Amount of projects 15 41 36 29 16 30 27 47

(9)

(a) (b) (c) (d) 0 50 100 150 200 250 300 0 500 1,000 1,500 2,000 2,500 B ulk G a s (NT$1,000) Total cost (NT$1,000) 0 100 200 300 400 500 0 500 1,000 1,500 2,000 2,500 UPW (NT$1 ,00 0) Total cost (NT$1,000) 0 200 400 600 800 1,000 1,200 0 500 1,000 1,500 2,000 2,500 Exha ust (NT$ 1 ,0 0 0 ) Total cost (NT$1,000) 0 100 200 300 400 500 0 500 1,000 1,500 2,000 2,500 Fou n d ati on (N T $ 1,000) Total cost (NT$1,000) Cluster 1 Cluster 2 Cluster 1 Cluster 2 Cluster 1 Cluster 2 Cluster 1 Cluster 2 Cluster 3

Fig. 3. Clusters identified in each principal item for the CMP module. (a) Bulk gas, (b) UPW, (c) exhaust, and (d) foundation.

defuzzification of fuzzy set to provide a definite output value (i.e., estimated total project cost).

6.4 Step 3: Applying fmGA

After completing the FALCON operations the fmGA’s mutation and cut-splice operators are then used to opti-mize the FALCON’s parameters including fuzzy mem-bership functions and FL rules for improving the cost estimation accuracy. To do so, the fmGA variable-length chromosome is utilized to revise the fuzzy par-titions (i.e., the number of input term nodes in Layer 2) and fuzzy decision rules (i.e., the consequence links of Link 2) of FALCON. The fmGA global search capabil-ity is employed to optimize the parameters (means and spreads) of the membership functions in FALCON in-put and outin-put term nodes.

To perform the above-mentioned functions, the fmGA chromosome models a set of FALCON solu-tions, where the parameters (e.g., input membership functions, FL rules, and output membership functions) of FALCON are represented as the chromosome gene values. For example, Figure 4 presents the composition of a sample chromosome for the CMP module. Every membership function contains a mean and a spread. The numbers of fuzzy partitions for the input and out-put linguistic nodes are [2, 2, 2, 3] and [5], respectively. Thus, there are 52 alleles in an fmGA chromosome.

Restated, 52 alleles= 18 (=2×2 + 2×2 + 2×2 + 3×2 input membership parameters)+ 24 (=2×2×2×3 rules) + 10 (=5×2 output membership parameters).

As also presented in the left part of Figure 2, fmGA includes two operation loops, an outer loop and inner loop. Finishing an outer loop is called an epoch, whereas conducting an inner loop is called an era. As suggested by Feng and Wu (2006), this study defines the maxi-mum number of eras (era max) as 4. In addition, the maximum number of epochs (epoch max) is defined as a preset criterion for terminating the fmGA evolution process. In this study, epoch max is 10. Furthermore, an inner loop consists of three phases (Goldberg et al., 1993): (1) the initialization phase—a population with sufficient strings is created to contain all possible BBs of the order k, where BBs refer to partial solutions of a problem; (2) primordial phase—bad genes are filtered out to maintain only the chromosomes with good fitness (i.e., containing only “good” alleles fitting to BBs); and (3) juxtapositional phase—those good alleles (BBs) are rebuilt using cut-splice and mutation operations to form a high quality generation that tends to generate an opti-mal solution.

As depicted in Figure 2 the fmGA starts with the outer loop and generates a CT. For example, the CT for the CMP module is an fmGA chromosome repre-senting a set of FALCON solutions (refer to Figure 4). After completing one era, the CT will be replaced by a new CT (with new alleles) with the best fitness (i.e.,

(10)

Fig. 4. Chromosome composition in fmGA for the CMP module.

the highest estimation accuracy) found in that era. The operational details for the three phases are further de-scribed as follows.

6.4.1 Initialization phase. As suggested by Feng and Wu (2006), the chromosome population size (n) in this study is determined by Equation (1) to ensure a suffi-cient quantity of chromosomes:

n= l λ l− k λ − k 2c (α) β2_(M_{− 1) 2}k ₍₁₎ where

l is the chromosome length. For example, the value of l will be 52 (=18 + 24 + 10) for the CMP mod-ule. See Figure 4.

k is the number of fuzzy rules. For example, the value of k should be set as 24 (=2×2×2×3) for the CMP module. However, computations will be overwhelming if k= 24 is applied to Equation (1) for deriving n. Hence, k= 4 is used here as sug-gested by Goldberg et al. (1993).

λ is a random value generally set to be l–k, k<λ≤l. For example, the value ofλ will be 48 (=l–k = 52– 4) for the CMP module.

c(α) where α is probability square of a normal distribu-tion, which is set to be 1.

β is the ratio of a chromosome with the optimum fit-ness to those with the second best fitfit-ness in the same era; it is set as 1.

M is the BBs’ coefficient, it is set as 2.

The fitness of a chromosome is evaluated based on the estimation accuracy, defined in Equation (2). This estimation accuracy, in terms of percentage indicates the difference between the estimated total cost and the actual total project cost.

Accuracy(%)=

1−ABS(Estimated cost− Actual cost) Actual cost

× 100% (2)

6.4.2 Primordial phase. This phase performs two oper-ations, namely building-block filtering and threshold se-lection (Goldberg et al., 1993). The building-block filter-ing includes BBs selection and random allele deletion.

A BB is a set of alleles, which are a subset of strings that are short, low-order, and high performance. The key to building-block filtering is to pump enough copies of the good BBs so that even after random allele dele-tion eliminates a number of copies, one or more copies remain for subsequent processing (Goldberg, 2002). In addition, in threshold selection, a genetic threshold mechanism (also called tournament selection) is applied to restrict competition between BBs that have little in common (Goldberg et al., 1991).

The BBs in this study are built to represent the FAL-CON parameters, including the means (mij) and spreads

(σij) of input and output term nodes and the fuzzy rule

links (Rij) of Link 2. The building-block filtering process

details are illustrated as follows:

1. In BBs selection, a chromosome with the best fit-ness in the previous era is picked to be the CT. The alleles of FL rule nodes for this CT are selected as the BBs that are used to replace the alleles of FL rule nodes for the reproduced chromosomes in the next era.

2. In random allele deletion these BBs will replace the genes of FL rules for the other 80% of next-era chromosomes with worse fitnesses. The alleles of the means and spreads for those chromosomes will be randomly deleted by 5% (in the CMP module, 5%3 alleles) in each era. The deleted al-leles are replaced by the alal-leles stored in the CT. The minimum number of alleles in the chromo-somes is kept the same as the number of BBs (i.e., 24 for the CMP module) after the deletion, as sug-gested by Goldberg (2002).

In addition to the manipulation of BBs that enrich chromosome diversity, this study adopts a new algo-rithm that combines Kohonen, competitive learning rules, and the fmGA operations to enhance the tradi-tional FALCON learning rules for escaping from the aforementioned local optimum problem.

6.4.3 Juxtapositional phase. In each era, two outcomes can result, the fitness value of a specific chromosome is higher or lower than (or equal to) that of the CT in the previous era:

1. If the fitness value is higher than that in the pvious era, it means a better fitness estimation re-sult is generated through the cut-splice operator.

(11)

The cut and splice operation is then performed. The one-cut point of chromosomes is randomly se-lected. Both the splice rate and the cut rate in this study are set to 1. All chromosomes except the one selected for the CT are evolved.

2. If the fitness value is lower than (or equal to) that in the previous era, it implies that no-better-estimation result can be generated using the cut-splice operation alone. The mutation operator is therefore employed. In performing the mutation operation the mutation probability (Pm) is set to 5% (as suggested by Goldberg et al., 1993). All pa-rameters (such as the input membership functions, fuzzy rules, and output membership functions) of the chromosomes are mutated and the allele locus to be mutated is selected randomly.

After evolution the chromosome with highest fitness is fed back to FALCON for calculating the cost estima-tions of the new input data. Best-fit chromosomes (with optimum fitness) will also be maintained by fmGA to provide the population and the CT of the next epoch. Then Steps 2–3 (FALCON and fmGA operations) are repeated iteratively until the fitness value converges, or it has reached a preset maximum era number (set to be 10 in this study). Finally, the fmGA operations stop and the final optimal FALCON-COST structure is derived. Notably, three types of data are required to train the original FALCON-COST: (1) principal item costs and the total cost for each historical project, (2) FALCON’s fuzzy partitions, and (3) fmGA’s maximum numbers of evolution era and epoch.

6.5 Step 4: Using three-point cost estimation method to estimate a new project

Although the required quantities and specifications of the piping systems for a new project are uncertain, the costs of the principal items (for the piping systems) must be provided to run the proposed model for meet-ing the third cost-estimatmeet-ing characteristic mentioned above. For instance, in estimating the total cost of a CMP project, the costs of the four principal items (i.e., bulk gas, UPW, exhaust, and foundation) should be de-rived. The estimation guess for each principal cost item is performed by asking the question: how much cost will be higher and/or lower (in terms of percentage) than the average historical cost, based on his knowledge on this principal item of the project? As indicated by several cost-estimating mangers specialized in fab construction, an experienced manager should be able to make a rea-sonable guess of the cost of each principal item for a project based on the available cost information.

To increase the objectivity of input evaluations, the widely-used three-point estimation approach is adopted to systematically guide a cost-estimating manager to as-sess the uncertainties surrounding the costs of principal items (Moder et al., 1983; Oberlender, 2000; Peurifoy and Oberlender, 2002). By introducing an expected per-centage variable, the expected cost (denoted as Ci(j)) of

each principle item j (j= 1, 2, 3, and/or 4) for a project related to a process module i (i= 1, 2, . . ., 8) is derived as,

Ci ( j )= EPi ( j )× Ci ( j )(ave)

= ai ( j )+ 4mi ( j )+ bi ( j )

6 × Ci ( j )(ave)

(3)

in which EPi(j) is an expected percentage variable of

Ci(j). Furthermore, ai(j), mi(j), and bi(j)are the optimistic,

most likely, and pessimistic values (expressed as per-centages) of EPi(j), respectively. Ci(j)(ave) is the

aver-age historical cost of Ci(j). Notably, in estimating a new

project, the calculated expected costs (Ci(j)) of the three

(or four) key items serve as inputs to the FALCON-COST model for predicting total project cost.

6.6 Computer implementation

The operations of Steps 2 and 3 (FALCON and fmGA) are built with MatlabR _{version 7.5. The cost data are} read in .dat format and the operations are run un-der the Genuine Intel 1.6GHz CPU, 896MB SRAM, and Windows XP computer operating systems. Train-ing the FALCON-COST for the 15 historical projects of the CMP module will take approximately 20 minutes. The Steps 1 and 4 operations for FALCON-COST are performed in Microsoft Excel.

6.7 Training of the FALCON-COST

The actual costs of 241 historical projects are used to train the proposed models (i.e., the algorithms related to the FALCON and fmGA steps) with respect to the eight types of process modules. For example, 15 his-torical projects are used to train the CMP model. The training helps develop the model to reduce its error rate (=1−estimation accuracy) by running for several evo-lutions (=40 eras). Figure 5 depicts the model training error rate in each epoch of the CMP model after the evolutions reached 10 epochs. At that time the estima-tion accuracy is about 97.85%, and thus the error rate has been reduced to only around 2.15% (=1–0.9785). The same processes are used to train the models for the other seven process modules.

(12)

Fig. 5. Model training error rate in each epoch for the CMP

module.

7 CASE STUDIES

A two-step cross-fold validation process is conducted to test the proposed model, and is described as follows. First, a set (first set) of 27 additional hookup construc-tion projects (project numbers 1–27) for the same semi-conductor plant is used to test the proposed model that is trained using the aforementioned 241 projects. After completing the first set of 27 case studies, the valida-tion process proceeds to the next step. Namely, a second set of 27 test projects (project numbers 28–54) are ran-domly selected from the original 241 training projects. Notably, 241 projects are still involved in the training because the first set of 27 test projects are entered into the training pool. A total of 54 projects are tested. The middle and right columns of Table 1 list the first and sec-ond sets of test project amounts for each process mod-ule, respectively.

Section 7.1 illustrates how the FALCON-COST is ap-plied to estimate the cost of a new project. Section 7.2 compares the results of the 54 case studies with those using the conventional average cost method, the com-ponent ratios method, and a modified FALCON-COST method. Section 7.3 discusses the FALCON-COST im-provements over the conventional FALCON.

7.1 Case project application

As indicated earlier, a cost-estimating manger must pro-vide three-point estimations for the cost of each princi-pal item for a project. The expected cost, Ci(j), for each

principle item in the project can then be derived. These derived expected costs for the principle items for the

project are used as inputs to calculate the total project cost using FALCON-COST Steps 2 and 3. Notably, all the training projects and test projects are related to a single company; additionally, a cost-estimating manager involved in the 54 test projects was asked to provide the inputs realistically and consistently.

Take the No. 1 test project (a CMP project) shown in Table 3 for example. To estimate the expected cost (Ci(j)) of the bulk gas (a principal item) for the new

project the manager inputs the ai(j), mi(j), and bi(j)

val-ues. That is, based on his understanding of the require-ments for this new project, he provides the optimistic % (or lowest %), most likely %, and pessimistic % (or highest %) of the average costs from historical projects. In this example, the ai(j), mi(j), and bi(j)

val-ues are 92%, 102%, and 105% of the average cost, re-spectively. Thus, the expected percentage of the aver-age cost for the bulk gas for this new project, EPi(j),

equals 100.8333% (=(92%+4×102%+105%)/6). The average historical cost of Ci(j)for CMP projects, Ci(j)(ave),

is $143,459. Hence, the expected cost (Ci(j)) of the bulk

gas is $144,645 (=100.8333%×$143,459) according to Equation (3).

Similarly, in the No. 1 test project, the expected costs (Ci(j)) of the other three principal piping items

(i.e., UPW, exhaust, and foundation) are calculated as $260,461, $577,202, and $266,532, respectively. Table 3 summarizes the calculated expected costs of the four principal items for this test project. These four Ci(j)

val-ues are used as inputs for generating the estimated to-tal cost of this new project. The estimated cost of this project using the proposed model is $2,223,200.

The actual cost of this project was $2,022,555. Thus, the estimation accuracy of the proposed model is 96.10% (=1–ABS(1–2,223,200/2,022,555)) according to Equation (2). Similarly, the evaluation steps are also ap-plied to the other 53 test projects. Table 4 lists the eval-uation results for the first set of 27 test projects.

7.2 Comparisons with three methods

The conventional average cost method, the component ratios method, and a modified FALCON-COST are ap-plied to the 54 test projects. The No. 1 test project is also used to illustrate how these three methods work. When the conventional average cost method is utilized, the es-timated cost of the test project equals $1,642,981 (=total costs of 15 CMP historical projects divided by 15). Be-cause the actual cost of this project was $2,022,555, the estimation accuracy of the average cost method is 81.23% using Equation (2).

In the component ratios method (Barrie and Paulson, 1992), the four cost items: bulk gas, UPW, exhaust, and foundation for the CMP project are identified as the principal items. The averaged ratio between the whole

(13)

Table 3

Calculated expected costs of the principal items for No. 1 test project

Principal item Average cost Ci(j)(ave) Optimistic %, ai(j) Most likely %, mi(j) Pessimistic %, bi(j) Expected cost, Ci(j)

Bulk gas $143,459 92% 102% 105% $144,654 UPW $258,308 75% 95% 150% $260,461 Exhaust $554,114 60% 105% 145% $577,202 Foundation $271,050 30% 105% 140% $266,532

Table 4

Estimated costs and estimation accuracies using the proposed model for the first set of 27 test projects

Test project Estimated Actual Accuracy Average of Standard

Module number cost ($) cost ($) (%) accuracy (%) deviation (%)

CMP 1 2,101,500 2,022,555 96.10 97.02 1.30 2 2,141,600 2,186,601 97.94 DIFF 3 2,194,600 2,466,930 88.96 84.28 11.61 4 2,128,100 2,176,044 97.80 5 2,009,800 2,534,026 79.31 6 2,067,600 2,910,376 71.04 ETCH 7 2,849,700 2,745,135 96.19 79.93 12.07 8 3,707,200 4,922,227 75.32 9 3,929,700 5,809,517 67.64 10 3,492,600 2,924,495 80.57 INT 11 570,090 784,678 72.65 89.19 14.37 12 570,090 578,209 98.60 13 570,090 549,890 96.33 LITHO 14 2,518,400 2,689,975 93.62 91.44 9.66 15 2,519,000 2,523,427 99.82 16 2,477,900 2,079,944 80.87 IMP 17 1,359,500 1,225,856 89.10 83.88 7.42 18 1,787,700 1,584,328 87.16 19 811,970 1,076,994 75.39 TF 20 2,261,300 3,983,913 56.76 78.06 20.87 21 2,144,600 2,716,373 78.95 22 2,130,600 2,163,491 98.48 WET 23 2,519,100 2,860,966 88.05 90.05 9.05 24 2,815,400 2,830,734 99.46 25 3,183,100 3,433,748 92.70 26 2,512,600 2,660,107 94.45 27 2,507,400 3,316,608 75.60

project cost and the sum of the costs of these principal items is 1.335 (=100%/74.9%; see Table 2) for the 15 historical CMP projects. Because the sum of the aver-aged costs of these principal items is $1,226,931 for the same historical projects, the total cost of a new project is $1,637,732 (=1,226,931×1.335). Table 5 summarizes the calculations using the component ratios method for the No. 1 test project. Because the actual cost of this project was $2,022,555, the estimation accuracy using the com-ponent ratios method is 80.97%.

The modified FALCON-COST revises the details of Step 4 in Figure 2. Restated, the historical aver-age cost (Ci(j)(ave)) rather than the expected costs (Ci(j))

(obtained using three-point estimations) of the principal items is used to estimate the project cost. For example, in the No. 1 test project, the average costs of the four cost items listed on the left of Table 3 are directly used as inputs for the FALCON-COST. When the modified FALCON-COST is applied, the estimated cost of the test project equals $2,080,100. Because the actual cost of this project was $2,022,555, the estimation accuracy using Equation (2) is 97.15%.

Similar evaluation steps are also applied to the other 53 test projects. Table 6 summarizes the es-timation accuracies of the evaluation results using four methods. In the 54 test projects, the proposed

(14)

Table 5

Estimated cost using the component ratios method for No. 1 test project

Principal Averaged Estimated

item cost ($) Ratio cost ($)

Bulk gas 143,459 1.335 1,637,732 UPW 258,308

Exhaust 554,114 Foundation 271,050 Subtotal cost 1,226,931

model achieved average estimation accuracy of 83.82%. This represented an improvement of about 22.74% (=83.82%−61.08%) compared with the average cost method, around 23.08% (=83.82%−60.74%) compared with the component ratios method, and approximately 21.95% (=83.82%−61.87%) compared with the modi-fied FALCON-COST method. Moreover, the proposed model has smaller standard deviation of estimation ac-curacy than alternative models (just 13.46%), mean-ing it can provide more consistent estimations than the other three methods.

These 54 case studies yield two additional observa-tions. First, these case studies confirm the poor perfor-mance of the conventional average cost method. Specif-ically, the average accuracies of the projects related to IMP and WET process modules using the average cost method are only 4.52% (118.44% for standard de-viation) and 37.47% (78.07% for standard dede-viation), respectively. Analyzing the historical project data re-veals that these high inaccuracies likely resulted from

the large cost deviation among historical projects even within the same type of process model. Consequently, clustering data (for example, high and low cost clusters) for analysis is crucial for capturing this high cost devi-ation. In the proposed model, this clustering capability can be found in Layer 2 of FALCON and in the example shown in Figure 3.

Second, compared with the other three methods, only the proposed model reflected certain features of a new project using the three-point cost estimation method, potentially significantly contributing to the im-proved estimation accuracy. The modified FALCON-COST using historical average costs rather than ex-pert opinions (obtained using three-point estimations) as modeling inputs achieved average accuracy of just 61.87% (with standard deviation of 35.45%). Restated, the model outputs are sensitive to inputs, and thus the proposed model is particularly recommended for ex-perienced cost estimators familiar with semiconductor hookup construction.

7.3 Discussions on the FALCON-COST improvements

As indicated earlier, FALCON-COST applies the fmGA mutation and cut-splice operators to optimize the conventional FALCON fuzzy membership func-tions and FL rules. To verify this improvement, this study uses the CMP module (including 15 historical projects) as an example.

Table 7 compares the FALCON parameters (i.e., mij

andσijof input membership functions; Rijof FL rules;

Table 6

Comparisons of results using various estimation methods for 54 test projects

Estimation accuracy (%)

Average cost method

Component ratios method

Modified FALCON-COST

model Proposed model

Amount of

Module test projects Average Std Dev Average Std Dev Average Std Dev Average Std Dev

CMP 4 86.13 9.56 86.11 9.82 83.94 14.16 93.87 5.57 DIFF 8 65.57 7.39 65.92 7.39 72.32 7.11 79.85 10.58 ETCH 8 66.57 15.54 64.96 20.63 43.86 50.55 78.33 14.43 INT 6 74.51 15.96 71.38 15.26 72.26 22.90 92.69 10.16 LITHO 6 79.70 14.10 78.85 13.82 78.00 18.03 83.88 13.98 IMP 6 4.52 118.44 7.41 107.25 42.08 71.60 76.98 11.32 TF 6 74.12 21.60 74.67 29.68 64.40 21.02 76.81 16.88 WET 10 37.47 78.07 36.62 82.58 38.12 7.22 88.14 13.82 Average accuracy (%) 61.08 55.54 60.74 54.67 61.87 35.45 83.82 13.46

(15)

Table 7

Comparisons of FALCON parameters for the CMP module

(a) Input membership functions

m11 σ11 m12 σ12 m21 σ21 m22 σ22 m31 σ31 m32 σ32 m41 σ41 m42 σ42 m43 σ43

FALCON-COST 0.1 0.2 1.0 0.1 0.2 0.1 1.0 0.2 0.4 0.3 1.0 0.2 0.2 1.2 0.9 0.1 1.0 0.1 Traditional FALCON 0.1 1.0 0.2 1.0 0.4 1.0 0.2 0.9 1.0 0.0 0.1 0.1 0.2 0.3 0.2 0.0 0.1 0.0

(b) Fuzzy logic rules

R01 R02 R03 R04 R05 R06 R07 R08 R09 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 FALCON-COST 2 4 5 2 5 2 1 4 4 1 4 4 2 5 1 2 4 4 2 5 5 2 5 5 Traditional FALCON 2 4 5 2 4 4 2 4 4 2 4 4 2 5 5 2 4 4 2 5 5 2 5 5

(c) Output membership functions

m11 σ11 m12 σ12 m13 σ13 m14 σ14 m15 σ15

FALCON-COST 0.0 0.0 0.2 0.0 0.7 0.1 0.9 0.0 1.0 0.1 Traditional FALCON 0.0 0.2 0.6 0.9 1.0 0.0 0.0 0.1 0.0 0.1

mijandσijof output membership functions) trained by

the traditional FALCON and FALCON-COST for the CMP module. From this table some of the parameter values produced by both models vary greatly. For instance, in the input membership functions (see Table 7a), great differences in the parameter values are found forσ11, m12,σ12,σ21, m22,σ22, m31,σ31, m32,σ41, m42, and m43. In other words, the proposed FALCON-COST model is able to revise the parameters for gener-ating a new FALCON structure to achieve an improved solution. The traditional FALCON model does not pro-vide this capability.

As presented in Section 6.7, the estimation accu-racy of the trained FALCON-COST is around 97.85% (error rate= 2.15%) after training using 15 CMP his-torical projects. When the same hishis-torical projects were applied to the conventional FALCON, the error rate in-creased to about 5.25% (=1–0.9475). This comparison verifies the FALCON-COST improvement.

8 CONCLUSION

This investigation has contributed to several aspects. First, the proposed model has enhanced the estimation accuracy of the cost of hookup construction projects. The 54 test projects achieved increases in estimation ac-curacy of approximately 22.74%, 23.08%, and 21.95% over the conventional average cost method, the com-ponent ratios method, and a modified FALCON-COST

method, respectively. In addition, the proposed model is currently being implemented by the case-study com-pany to facilitate cost estimations for new projects specifically related to INT and WET process modules tools.

Second, integrating the four techniques (that is, the component ratios method, FALCON, fmGA, and the three-point estimation method) into the proposed model is innovative, and can systematically deal with real world cost estimation problems.

Third, the proposed model improves the current FALCON by applying the fmGA mutation and cut-splice operators. That is, in the primordial phase of fmGA, a new algorithm is developed to conduct BB se-lection and random gene deletion, so that fmGA opera-tions can be implemented in FALCON.

Future research may include the following direc-tions. First, computerizing the proposed model will help expedite the evaluation. Second, collecting additional historical projects should support the model training process for enhancing the estimation accuracy. Third, applying the proposed model to new projects for con-ducting a before-the-fact analysis can further verify the practicality of the model. Fourth, other attribute rank-ing algorithms, such as Analytical Hierarchy Process (Saaty, 1978), or analysis of the observation frequen-cies of cost items may help identify major cost items. Fifth, FALCON may be substituted by another type of NFS, such as adaptive network-based fuzzy infer-ence system (ANFIS) (Jang, 1993). Other optimization

(16)

methods, such as Tabu search (Fan and Machemehl, 2008), simulated annealing (Paya et al., 2008; Zeferino et al., 2009; Oliveira and Petraglia, 2011), and ant colony (Vitins and Axhausen, 2009; Putha et al., 2012) may also be substituted for fmGA in the proposed model to find enhanced solutions.

Sixth, although the proposed model is devised specif-ically for semiconductor hookup construction projects, it can be modified to apply to other similar decision-making problems with similar cost-estimating charac-teristics, including conceptual cost estimation problems in building projects (Yu, 2006; Cheng et al., 2009), bid-price determination under limited bid preparation time (Wang et al., 2007), and project success prediction prob-lems (Cheng et al., 2010). For instance, during the con-ceptual phase of a building project, project management often needs to calculate the project cost given a con-ceptual design situation involving unavailable and un-certain cost data. At such times, by treating the major cost categories (or cost divisions) in the building project as the main cost items discussed in this study, and using the relevant historical building project data for training, the FALCON-COST model can easily be refined and applied.

ACKNOWLEDGMENTS

The authors would like to thank the editor and the re-viewers for their careful evaluation and thoughtful sug-gestions. The authors also thank the National Science Council of Taiwan (Contract No. NSC97–2221-E-009– 134) and the Ministry of Education of Taiwan via the Aim for the Top University (MOU-ATU) program for financially supporting this research.

REFERENCES

Adeli, H. & Cheng, N.-T. (1993), Integrated genetic algorithm for optimization of space structures, Journal of Aerospace

Engineering, 6(4), 315–28.

Adeli, H. & Cheng, N.-T. (1994a), Augmented Lagrangian genetic algorithm for structural optimization, Journal of

Aerospace Engineering, 7(1), 104–18.

Adeli, H. & Cheng, N.-T. (1994b), Concurrent genetic al-gorithms for optimization of large structures, Journal of

Aerospace Engineering, 7(3), 276–96.

Adeli, H. & Hung, S. L. (1995), Machine Learning—Neural

Networks, Genetic Algorithms, and Fuzzy Systems, John

Wiley and Sons, New York.

Adeli, H. & Karim, A. (1997), Scheduling/cost optimization and neural dynamics model for construction, Journal of

Construction Management and Engineering, 123(4), 450–8.

Adeli, H. & Karim, A. (2001), Construction Scheduling,

Cost Optimization, and Management—A New Model Based on Neurocomputing and Object Technologies, Spon Press,

London.

Adeli, H. & Kumar, S. (1995a), Distributed genetic algorithms for structural optimization, Journal of Aerospace

Engineer-ing, 8(3), 156–63.

Adeli, H. & Kumar, S. (1995b), Concurrent structural opti-mization on a massively parallel supercomputer, Journal of

Structural Engineering, 121(11), 1588–97.

Adeli, H. & Wu, M. (1998), Regularization neural network for construction cost estimation, Journal of Construction

Engi-neering and Management, 124(1), 18–24.

Adeli, H. & Yu, G. (1995), An integrated computing en-vironment for solution of complex engineering problems using the object-oriented programming paradigm and a blackboard architecture, Computers and Structures, 54(2), 255–65.

Ahmadlou, M. & Adeli, H. (2010), Enhanced probabilis-tic neural network with local decision circles: a robust classifier, Integrated Computer-Aided Engineering, 17(3), 197–210.

Al-Bazi, A. & Dawood, N. (2010), Developing crew al-location system for precast industry using genetic algo-rithms, Computer-Aided Civil and Infrastructure

Engineer-ing, 25(8), 581–95.

Baraldi, P., Canesi, R., Zio, E., Seraoui, R. & Chevalier, R. (2011), Genetic algorithm-based wrapper approach for grouping condition monitoring signals of nuclear power plant components, Integrated Computer-Aided

Engineer-ing, 18(3), 221–34.

Barrie, D. S. & Paulson, B. C. (1992), Professional

Construc-tion Management, 3rd edn, McGraw-Hill, New York.

Beers, A. J., Williams, D. & Tarr, A. (2008), Tool hookup: a paradigm shift to modularization, Solid State Technology, September, 36–7.

Boussabaine, A. H. & Elhag, T. M. S. (1997), A

neuro-fuzzy model for predicting cost and duration of construc-tion projects, RICS Research (9p), The Royal Instituconstruc-tion of

Chartered Surveyors, London.

Carro-Calvo, L., Salcedo-Sanz, S., Ortiz-Garc, G. & Portilla-Figueras, A. (2010), An incremental-encoding evolution-ary algorithm for color reduction in images, Integrated

Computer-Aided Engineering, 17(3), 261–9.

Chasey, A. D. & Merchant, S. (2000), Issues for construction of 300-mm fab, Journal of Construction Engineering and

Management, 126(6), 451–7.

Cheng, M. Y., Tsai, H. C. & Hsieh, W. S. (2009), Web-based conceptual cost estimates for construction projects using evolutionary fuzzy neural inference model, Automation in

Construction, 18(2), 164–72.

Cheng, M. Y., Wu, Y. W. & Wu, C. F. (2010), Project suc-cess prediction using an evolutionary support vector ma-chine inference model, Automation in Construction, 19(3), 619–29.

Cheng, T. M. & Yan, R. Z. (2009), Integrating messy ge-netic algorithms and simulation to optimize resource uti-lization, Computer-Aided Civil and Infrastructure

Engineer-ing, 24(6), 401–15.

Chou, J. S. (2011), Cost simulation in an item-based project involving construction engineering and management,

Inter-national Journal of Project Management, 29, 706–17.

Creese, R. C. & Li, L. (1995), Cost estimation of timber bridges using neural networks, Cost Engineering, 37(5), 17–22.

Diekmann, J. E. & Featherman, W. D. (1998), Assessing cost uncertainty: lessons from environmental restoration projects, Journal of Construction Engineering and