Surface fitting - 以基因演算法優化最小二乘支持向量機在地籍坐標轉換之研究

This research is first to adopt different surface models to conduct our analysis. The surface equation can be classified into many types including plane, quadratic, cubic,

國土測繪與空間資訊第三卷第二期

quartic and quintic surface (Lancaster and Salkauskas, 1986; Pottmann and Leopoldseder , 2003). The plane surface equation is shown as equation 2.

N = 𝑎₀+ 𝑎₁𝑥 + 𝑎₂𝑦 + 𝑎₃𝑥𝑦 (2) Where𝑎₀~𝑎₃are unknown parameters; N is geoidal undulation; x and y are coordinates.

The plane surface contains four unknown parameters. The meaningful solution can only be found only if there are 4 points on the fitting geoidal undulation surface. The quadric surface equation (as equation 3) has 6 parameters and needs 6 points to be solved.

The cubic surface equation (as equation 4) has 10 parameters and needs 10 points to be solved. The quartic surface equation (as equation 5) has 15 parameters and needs 15 points to be solved. The quintic surface equation (as equation 6) has 21 parameters and needs 21 points to be solved. Thus, the number of points within the fitting range has to be considered in selecting to solve surface equations (Awange et al., 2010) .

N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥² + 𝑎4𝑦²+ 𝑎5xy (3) N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥𝑦 + 𝑎4𝑥²+ 𝑎5𝑦²+ 𝑎6𝑥³+ 𝑎7𝑦³+ 𝑎8𝑥²𝑦 + 𝑎9𝑥𝑦² (4) N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥𝑦 + 𝑎4𝑥²+ 𝑎5𝑦²+ 𝑎6𝑥³+ 𝑎7𝑦³+ 𝑎8𝑥²𝑦 + 𝑎9𝑥𝑦²+

𝑎₁₀𝑥⁴ + 𝑎₁₁𝑦⁴+ 𝑎₁₂𝑥³𝑦 + 𝑎₁₃𝑥²𝑦²+ 𝑎₁₄𝑥𝑦³ (5) N = 𝑎₀+ 𝑎₁x + 𝑎₂𝑦 + 𝑎₃𝑥𝑦 + 𝑎₄𝑥²+ 𝑎₅𝑦²+ 𝑎₆𝑥³+ 𝑎₇𝑦³+ 𝑎₈𝑥²𝑦 + 𝑎₉𝑥𝑦²+

𝑎₁₀𝑥⁴ + 𝑎₁₁𝑦⁴+ 𝑎₁₂𝑥³𝑦 + 𝑎₁₃𝑥²𝑦²+ 𝑎₁₄𝑥𝑦³+ 𝑎₁₅𝑥⁵+ 𝑎₁₆𝑦⁵+ 𝑎₁₇𝑥⁴𝑦 +

𝑎18𝑥³𝑦² + 𝑎19𝑥²𝑦³+ 𝑎20𝑥𝑦⁴ (6) Also, a_n indicates unknown parameters, N indicates geoidal undulation, and x and y indicates the components on abscissa and ordinate, respectively.

To derive perfect fitting data, the precision after fitting should require to be approximated to 0. We discuss the fitting result with simulated data, which are presented in plane equation, cubic surface and quintic surface equations, respectively. In the research, the simulation data are 9 by 9 grid points, and the z component is a random number between 0 to 1 units. The content is detailed in Table 1. As shown in Figures 2, 3 and 4, the data are fit to a plane surface with precision of 0.2723 unit. The data are fit to cubic surface with precision of 0.2703 unit. The data are fit to quintic surface with precision of 0.2637 unit. From above data and graphs, higher order surface equations may result in more fit data. However, higher order surface equations imply the risk of overfitting, that is, the prediction error is relatively high.

Ning Fang-Shii, Lee Wen-Chieh：The Best Surface Fitting of Regional Geoidal Undulation- A Case Study of Taichung Area Table 1 Data Points of Surface Fitting Simulation (No Unit)

1 2 3 4 5 6 7 8 9

1 0.706 0.0344 0.7094 0.3404 0.5472 0.35 0.9172 0.7792 0.3112 2 0.0318 0.4387 0.7547 0.5853 0.1386 0.1966 0.2858 0.934 0.5285 3 0.2769 0.3816 0.276 0.2238 0.1493 0.2511 0.7572 0.1299 0.1656 4 0.0462 0.7655 0.6797 0.7513 0.2575 0.616 0.7537 0.5688 0.602 5 0.0971 0.7952 0.6551 0.2551 0.8407 0.4733 0.3804 0.4694 0.263 6 0.8235 0.1869 0.1626 0.506 0.2543 0.3517 0.5678 0.0119 0.6541 7 0.6948 0.4898 0.119 0.6991 0.8143 0.8308 0.0759 0.3371 0.6892 8 0.3171 0.4456 0.4984 0.8909 0.2435 0.5853 0.054 0.1622 0.7482 9 0.9502 0.6463 0.9597 0.9593 0.9293 0.5497 0.5308 0.7943 0.4505

Figure 2 Result of Simulated Points from Plance Surfae Fitting with Precision of 0.2723 Unit

Figure 3 Result of Simulated Points from Cubic Surface Fitting with Precision of 0.2703 Unit

國土測繪與空間資訊第三卷第二期

Figure 4 Result of Simulated Points of Quintic Surface with Precision of 0.2637 Unit The comparison chart (Figure 5) shows training sample errors and test sample errors for different model complexities utilizing 100 groups of training data (there are 50 respective samples in each group of the training sets) (Hastie et al., 2009). The abscissa indicates the complexity of the model, the ordinates the prediction error, the pale blue curves indicate training errors, the reddish curves indicate test errors, and the solid lines indicate the expectation values of training errors and test errors. From the graph, higher complexity model result in lower training errors and test errors. However, as the model complexity is higher, the difference between test errors and training errors increase instead. As the complexity is increasing until the training error reaches zero, it indicates the case of overfitting for training samples.

Figure 5 Training Sample Errors and Test Sample Errors under Different Model Complexities (Hastie et al., 2009)

Ning Fang-Shii, Lee Wen-Chieh：The Best Surface Fitting of Regional Geoidal Undulation- A Case Study of Taichung Area 2.2 Cross Validation

Cross validation is probably the most widely used and the easiest tool for evaluation prediction errors of model (Hastie et al., 2009). The cross validation is used to determine prediction errors of model. It classified the original data into test data and training data, followed by validating data quality with cyclic analysis and calculation. As shown in Figure 6, total 20 data are assumed for model creation. At first, all data are classified into 5 subsets, each of which has 4 data. In cross validation, one subset is used as the validation data after model creation and does not join training model for every calculation. After 5 iterations, all subsets are used as validation data to evaluate prediction errors of model. The data are classified into 5 subsets in Figure 6. Such method of classification into multiple subsets is referred to as K-fold cross validation in cross validation methods, wherein K indicates the number of subsets, which is 5 in the example.

Figure 6 Example of Cross Validattion Flow

The cross validation is a relative conservative estimation method for evaluating prediction errors of model, and would take considerable computation time. Since advanced computing capability nowadays, the cross validation would not consume too much cost with the reasonable amount of data number and the less complex model. Thus,

data 1 data 2 data 5 data 6 data 9 data 10 data 13 data 14 data 17 data 18

Validation Train Train Train Train

Part 1 Part 2 Part 3 Part 4 Part 5

Train Validation Train Train Train

Part 1 Part 2 Part 3 Part 4 Part 5

Train Train Validation Train Train

Part 1 Part 2 Part 3 Part 4 Part 5

Train Train Train Validation Train

Part 1 Part 2 Part 3 Part 4 Part 5

Train Train Train Train Validation

國土測繪與空間資訊第三卷第二期

the cross validation may be used to determine prediction errors of model in a relatively simple manner.

In the research, LOOCV (Leave One Out Cross-Validation), one of cross validation methods, is used to determine prediction errors of surface fitting model. LOOCV is the extreme form in K-fold cross validation methods, wherein K is the total number of data.

One datum is extracted to be validation datum every time, while other data are trained and iterated until all data have been used as validation data for one time (Kearns, M. and Ron, D., 1999).

The evaluation equation for prediction errors of cross validation is as shown in equation 7 (Hastie et al., 2009):

CV(𝑓̂) = �_𝑁¹∑ �𝑦^𝑁_𝑖=1 𝑖 − 𝑓̂^−𝑖(𝑥𝑖)�² (7)

The data are classified into N subsets; f̂ indicates fitting surface equation, while f̂⁻ⁱ is the model obtained by using the ith group of the subsets as validation data while other data are trained; y is a dependent variable, which is the value of geoidal undulation as surface fits geoidal undulation; while x is an independent variable, which is a plane coordinate as surface fits geoidal undulation. Equation 7 is the prediction error formula defined on the basis of K-fold cross validation. LOOCV is used in the research. The prediction error evaluation of LOOCV may be calculated simply by setting N as the total number of data.

在文檔中以基因演算法優化最小二乘支持向量機在地籍坐標轉換之研究 (頁 25-30)