This research is first to adopt different surface models to conduct our analysis. The surface equation can be classified into many types including plane, quadratic, cubic,
國土測繪與空間資訊 第三卷第二期
quartic and quintic surface (Lancaster and Salkauskas, 1986; Pottmann and Leopoldseder , 2003). The plane surface equation is shown as equation 2.
N = 𝑎0+ 𝑎1𝑥 + 𝑎2𝑦 + 𝑎3𝑥𝑦 (2) Where𝑎0~𝑎3are unknown parameters; N is geoidal undulation; x and y are coordinates.
The plane surface contains four unknown parameters. The meaningful solution can only be found only if there are 4 points on the fitting geoidal undulation surface. The quadric surface equation (as equation 3) has 6 parameters and needs 6 points to be solved.
The cubic surface equation (as equation 4) has 10 parameters and needs 10 points to be solved. The quartic surface equation (as equation 5) has 15 parameters and needs 15 points to be solved. The quintic surface equation (as equation 6) has 21 parameters and needs 21 points to be solved. Thus, the number of points within the fitting range has to be considered in selecting to solve surface equations (Awange et al., 2010) .
N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥2 + 𝑎4𝑦2+ 𝑎5xy (3) N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥𝑦 + 𝑎4𝑥2+ 𝑎5𝑦2+ 𝑎6𝑥3+ 𝑎7𝑦3+ 𝑎8𝑥2𝑦 + 𝑎9𝑥𝑦2 (4) N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥𝑦 + 𝑎4𝑥2+ 𝑎5𝑦2+ 𝑎6𝑥3+ 𝑎7𝑦3+ 𝑎8𝑥2𝑦 + 𝑎9𝑥𝑦2+
𝑎10𝑥4 + 𝑎11𝑦4+ 𝑎12𝑥3𝑦 + 𝑎13𝑥2𝑦2+ 𝑎14𝑥𝑦3 (5) N = 𝑎0+ 𝑎1x + 𝑎2𝑦 + 𝑎3𝑥𝑦 + 𝑎4𝑥2+ 𝑎5𝑦2+ 𝑎6𝑥3+ 𝑎7𝑦3+ 𝑎8𝑥2𝑦 + 𝑎9𝑥𝑦2+
𝑎10𝑥4 + 𝑎11𝑦4+ 𝑎12𝑥3𝑦 + 𝑎13𝑥2𝑦2+ 𝑎14𝑥𝑦3+ 𝑎15𝑥5+ 𝑎16𝑦5+ 𝑎17𝑥4𝑦 +
𝑎18𝑥3𝑦2 + 𝑎19𝑥2𝑦3+ 𝑎20𝑥𝑦4 (6) Also, an indicates unknown parameters, N indicates geoidal undulation, and x and y indicates the components on abscissa and ordinate, respectively.
To derive perfect fitting data, the precision after fitting should require to be approximated to 0. We discuss the fitting result with simulated data, which are presented in plane equation, cubic surface and quintic surface equations, respectively. In the research, the simulation data are 9 by 9 grid points, and the z component is a random number between 0 to 1 units. The content is detailed in Table 1. As shown in Figures 2, 3 and 4, the data are fit to a plane surface with precision of 0.2723 unit. The data are fit to cubic surface with precision of 0.2703 unit. The data are fit to quintic surface with precision of 0.2637 unit. From above data and graphs, higher order surface equations may result in more fit data. However, higher order surface equations imply the risk of overfitting, that is, the prediction error is relatively high.
Ning Fang-Shii, Lee Wen-Chieh:The Best Surface Fitting of Regional Geoidal Undulation- A Case Study of Taichung Area Table 1 Data Points of Surface Fitting Simulation (No Unit)
Y
1 2 3 4 5 6 7 8 9
x
1 0.706 0.0344 0.7094 0.3404 0.5472 0.35 0.9172 0.7792 0.3112 2 0.0318 0.4387 0.7547 0.5853 0.1386 0.1966 0.2858 0.934 0.5285 3 0.2769 0.3816 0.276 0.2238 0.1493 0.2511 0.7572 0.1299 0.1656 4 0.0462 0.7655 0.6797 0.7513 0.2575 0.616 0.7537 0.5688 0.602 5 0.0971 0.7952 0.6551 0.2551 0.8407 0.4733 0.3804 0.4694 0.263 6 0.8235 0.1869 0.1626 0.506 0.2543 0.3517 0.5678 0.0119 0.6541 7 0.6948 0.4898 0.119 0.6991 0.8143 0.8308 0.0759 0.3371 0.6892 8 0.3171 0.4456 0.4984 0.8909 0.2435 0.5853 0.054 0.1622 0.7482 9 0.9502 0.6463 0.9597 0.9593 0.9293 0.5497 0.5308 0.7943 0.4505
Figure 2 Result of Simulated Points from Plance Surfae Fitting with Precision of 0.2723 Unit
Figure 3 Result of Simulated Points from Cubic Surface Fitting with Precision of 0.2703 Unit
國土測繪與空間資訊 第三卷第二期
Figure 4 Result of Simulated Points of Quintic Surface with Precision of 0.2637 Unit The comparison chart (Figure 5) shows training sample errors and test sample errors for different model complexities utilizing 100 groups of training data (there are 50 respective samples in each group of the training sets) (Hastie et al., 2009). The abscissa indicates the complexity of the model, the ordinates the prediction error, the pale blue curves indicate training errors, the reddish curves indicate test errors, and the solid lines indicate the expectation values of training errors and test errors. From the graph, higher complexity model result in lower training errors and test errors. However, as the model complexity is higher, the difference between test errors and training errors increase instead. As the complexity is increasing until the training error reaches zero, it indicates the case of overfitting for training samples.
Figure 5 Training Sample Errors and Test Sample Errors under Different Model Complexities (Hastie et al., 2009)
Ning Fang-Shii, Lee Wen-Chieh:The Best Surface Fitting of Regional Geoidal Undulation- A Case Study of Taichung Area 2.2 Cross Validation
Cross validation is probably the most widely used and the easiest tool for evaluation prediction errors of model (Hastie et al., 2009). The cross validation is used to determine prediction errors of model. It classified the original data into test data and training data, followed by validating data quality with cyclic analysis and calculation. As shown in Figure 6, total 20 data are assumed for model creation. At first, all data are classified into 5 subsets, each of which has 4 data. In cross validation, one subset is used as the validation data after model creation and does not join training model for every calculation. After 5 iterations, all subsets are used as validation data to evaluate prediction errors of model. The data are classified into 5 subsets in Figure 6. Such method of classification into multiple subsets is referred to as K-fold cross validation in cross validation methods, wherein K indicates the number of subsets, which is 5 in the example.
Figure 6 Example of Cross Validattion Flow
The cross validation is a relative conservative estimation method for evaluating prediction errors of model, and would take considerable computation time. Since advanced computing capability nowadays, the cross validation would not consume too much cost with the reasonable amount of data number and the less complex model. Thus,
data 1 data 2 data 5 data 6 data 9 data 10 data 13 data 14 data 17 data 18
Validation Train Train Train Train
Part 1 Part 2 Part 3 Part 4 Part 5
Train Validation Train Train Train
Part 1 Part 2 Part 3 Part 4 Part 5
Train Train Validation Train Train
Part 1 Part 2 Part 3 Part 4 Part 5
Train Train Train Validation Train
Part 1 Part 2 Part 3 Part 4 Part 5
Train Train Train Train Validation
國土測繪與空間資訊 第三卷第二期
the cross validation may be used to determine prediction errors of model in a relatively simple manner.
In the research, LOOCV (Leave One Out Cross-Validation), one of cross validation methods, is used to determine prediction errors of surface fitting model. LOOCV is the extreme form in K-fold cross validation methods, wherein K is the total number of data.
One datum is extracted to be validation datum every time, while other data are trained and iterated until all data have been used as validation data for one time (Kearns, M. and Ron, D., 1999).
The evaluation equation for prediction errors of cross validation is as shown in equation 7 (Hastie et al., 2009):
CV(𝑓̂) = �𝑁1∑ �𝑦𝑁𝑖=1 𝑖 − 𝑓̂−𝑖(𝑥𝑖)�2 (7)
The data are classified into N subsets; f̂ indicates fitting surface equation, while f̂−i is the model obtained by using the ith group of the subsets as validation data while other data are trained; y is a dependent variable, which is the value of geoidal undulation as surface fits geoidal undulation; while x is an independent variable, which is a plane coordinate as surface fits geoidal undulation. Equation 7 is the prediction error formula defined on the basis of K-fold cross validation. LOOCV is used in the research. The prediction error evaluation of LOOCV may be calculated simply by setting N as the total number of data.