Sea
21°51’ N 25°21’ N
◇Experiment Data
− Taiwan
Figure 5.1: The experiment data in experiment–B
Table 5.2: Experimental results of tE in B–I and B–II setting 256 − u − E1 − O2 AvgDev(m) 0.5087
(−9.48899sin(φ)λ+1.92778e5)e3−(1.36694φ2−4.6575λ)e4
φ−2.54796eλ−1.49749e2 −sin(φ)(φ−2.54796eλ−1.49749e4.86773λ3−8.13652e4 2)+ 2 setting 256 − s − E2 − O2 AvgDev(m) 0.4013
TN[−2.65186sin(φ)e2−8.68171e3−1.68387e−17TN3]−e8(2.52098φ2−3.56625e4)
φ−4.91899e5−TNe−2[4.00273−4.26238e−2sin(φ)] − 11.25 setting 512 − s − E1 − O2 AvgDev(m) 0.4028
λe3(4.87771λ−4.95387sin(φ))+e4(5.28864φ2−7.46274e4) φ+0.0191201λ2−8.69869 − 1.6 setting 512 − s − E2 − O2 AvgDev(m) 0.6345
λ[−7.26026e3sin(φ)+5.22277e3λ]
φ+2.02448e−2λ2 + φ[5.68214eφ+2.02448e4φ+5.16929e−2λ22sin(φ)]− φ+2.02448e8.01763e−28 λ2 + 0.2 setting 1024 − u − E2 − O2 AvgDev(m) 1.1053
TNe2(−5.24391e−3sin(φ)φ+2.68252)−[8.40946φ+1.51947sin(φ)]e8+5.93986e4φ3
φ2+7.86776e−4TN+6.01565e−3 + 1.2
{+, −, ×, ÷, sin, cos, tan}. For convenience, we use the notation, [256|512|1024]-[u|s]-[E1|E2]-[O1|O2], to express the number of data (256, 512, 1024) extracted from each sampling block, with (s) or without (u) being sorted by λ or φ in the experiments E1 or E2 using operator sets O1 or O2. The best transformation equations obtained in this experiment are presented in Table 5.2 and Table 5.3. From the results, we induce that operators in O2 provide better regressive functions than others. This may be because of the non-leaner relationships between Ellipsoidal coordinates and Cartesian coordinates.
The existence of non-leaner operators can better describe the transformation functions than just basic, usually linear, operators. We also find that the trained equation for TE performs better than the one for TN. This may be caused by the biased distribution of training data selected from IE and IN.
5.2.3 Experiment–A–II
We know that TN and TE are cross-referred, i.e, TN=fN(~g, TE) and TE=fE(~g, TN). In this experiment, we include T and T together with (φ, λ, h) as input data. Operators
Table 5.3: Experimental results of tN in B–I and B–II setting 256 − u − E1 − O2 AvgDev(m) 2.5056
1.10767e5λ−2.3349e3
1+3.17788e−3sin(φ) + 7.25
setting 256 − u − E2 − O1 AvgDev(m) 2.4911
(1.10919λ2+1.02362)e5+TEe−7(7.72509TE−8.55344e4λ−1.87762e6)
λ+8.14987e−2 − 1.5
setting 512 − s − E1 − O2 AvgDev(m) 2.5177
λ2(1.10868e2λ−5.95677)+5.08182e2
λ2(1.0e−3+3.17088e−7sin(φ))+[6.30707e−2λ−8.2436]e−5 + 8 setting 512 − s − E2 − O1 AvgDev(m) 2.5061
TE[+2.27391e−14TE3+(1.10708e2λ−3.52056)e3]−e7(7.3863λ−110.574)
TE−e1(1.42λ+6.81256) +T2 1.71916e13
E−TEe1(1.42λ+6.81256) − 2.5 setting 1024 − u − E1 − O1 AvgDev(m) 2.5883
1.75968e4λ−1.03562e3
1−4.63439e−3 + φ2(1−4.63439e4.54473e8λ−3) − 0.35 setting 1024 − u − E2 − O1 AvgDev(m) 2.5132
TEλ2(1.10428TE2−1.73612e7)e5−TEλ(1.68998TE+4.06722e2)e12+4.52439e20 TE2[λ(TE−2.11095e2)−1.52949e7]−1.22274e5 − 3.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0
256-u-E1-O2 512-u-E1-O2 1024-u-E2-O2 256-s-E2-O2 512-s-E1-O2 1024-s-E1-O2 Avg. Dev.(m)
Avg. Dev. of all blocks Max. Dev. of all blocks
Figure 5.2: The average and maximum deviations between tE and TE
are the ones in O1 and O2. In order to avoid biased selection of training data, we perform 10 times of this experiment with the same parameter settings. In this repeated training process, two modes of generating initial population of trees are used: one is purely by random; the other is based on the best tree obtained previously. The best transformation equations obtained in this experiment are also listed in Table 5.2 and Table 5.3. The average and maximum deviation of these two experiments are presented in Figure 5.2 and 5.3. The introduction of cross-reference of TN and TE improves the performance of training tE in accuracy. The average performance of tN obtained in this experiment is better than that in experiment E1, but the average performance of tE is opposite.
This phenomenon may be explained by that tE doesn’t need to refer to TN but tE does.
Interestingly, initial populations with previously trained trees can have better results than randomly generating initial population in both convergence and accuracy. The order of feeding data into the regression engine may be a considerable factor, according to the results in Table 5.2 and Table 5.3. However, sorted data is significantly helpful to the accuracy of TE, but is not for TN. We further study the distribution of data in the sampling blocks by examining their density, i.e., the number of sampled data v.s. the maximum number of samplable points. Figure 5.4 and Figure 5.5 present the accuracy of regression functions when data are selected with different sampling density. A better distribution on selection of training data may be important in this method.
0.0 0.5 1.0 1.5 2.0 2.5 3.0
256-u-E2-O2 512-u-E2-O2 1024-u-E1-O2 256-s-E2-O1 512-s-E2-O1 1024-s-E2-O2 Avg. Dev.(m) Avg. Dev. of all blocks Max. Dev. of all blocks
Figure 5.3: The average and maximum deviations between tN and TN
0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8
1.61 2.91 4.50 6.17 7.63 9.02 11.34 12.29 14.08
Sampling Density(%) Avg. Dev.
(m)
256-s-E2-O2 512-s-E1-O2 1024-s-E1-O2 256-s-E2-O1 512-s-E2-O1 1024-s-E2-O2
TE
TN
Figure 5.4: The deviations of the regression functions with sampled data sorted
0.0
1.61 2.91 4.50 6.17 7.63 9.02 11.34 12.29 14.08
Sampling Density(%)
Figure 5.5: The deviations of the regression functions with sampled data unsorted
5.3 Experiment–B
5.3.1 Data collection and experiment design
In experiment–B, we add some standard reference point in the sampling area. The ref-erence points are separated into two parts. First one, sampling area 1, is HSINCHU, CHANGHUA, CHIAYI, and TAINAN. They are shown in Figure 5.6 and Figure 5.7.
Second one, sampling area 1 is TAINAN, KAOHSIUNG, and PINGTUNG. We also use the generation data in the sampling area 3 which in in Easting(162000-260000) and Northing(2428000-2740000) and the collecting data in sampling area 4 which in Easting(140000-226000) and Northing(2489000-2576000). They are shown in Figure 5.8 and Figure 5.9. For convenience, we use the notation, (DT, VT)::(DS, VS), to represent that in the training phase of an experiment, DT is the training dataset (user collected (UserData) or standard reference points (StdRef)) whose expected values are calculated by means of VT (from standard formulas (StdFom) or reference points (StdRef)). Sim-ilarly, DS is the dataset used in the testing phase whose values are calculated by the formulas obtained in the training phase and to be compared with the values calculated by means of VS. Three experiments are conducted. The experimental results are pre-sented in Table 5.4–Table 5.6 and Figure 5.10.
Several interesting phenomenas are observed and explained as follows.
Pacific Ocean
122°03’ E 120°03’ E
Taiwan Strait
South China
Sea
21°51’ N 25°21’ N
◇Experiment Data
− Taiwan
Figure 5.6: Reference points in sampling area 1
Pacific Ocean
122°03’ E 120°03’ E
Taiwan Strait
South China
Sea
21°51’ N 25°21’ N
◇Experiment Data
− Taiwan
Figure 5.7: Reference points in sampling area 2
Pacific Ocean
122°03’ E 120°03’ E
Taiwan Strait
South China
Sea
21°51’ N 25°21’ N
◇Experiment Data
− Taiwan
Figure 5.8: Generating data in sampling area 3
Pacific Ocean
122°03’ E 120°03’ E
Taiwan Strait
South China
Sea
21°51’ N 25°21’ N
◇Experiment Data
− Taiwan
Figure 5.9: Collecting data in sampling area 4
Table 5.4: Experimental results of B–I
Regions Target Coord. Sys. TLs #Points StdErr Dev(m)
HSINCHU, TM2E (TWD97) 3 91 1.28e−4 0.803
CHANGHUA, TM2N (TWD97) 3 7.11e−5 1.148
CHIAYI,TAINAN TM2E (TWD67) 4 1.04e−5 0.954
(30746.27 km2) TM2N (TWD67) 4 1.37e−5 1.294
TAINAN, TM2E (TWD97) 3 489 1.44e−5 0.576
KAOHSIUNG, TM2N (TWD97) 3 6.08e−6 0.437
PINGTUNG TM2E (TWD67) 4 8.74e−6 0.397
(7464.96 km2) TM2N (TWD67) 4 8.69e−6 0.707
Note: WGS84(φ84, λ84, h84)using level-wise methods, TLs: the number of coordinate transformations from
Experiment–II: (100%StdRef, StdRef)::(100%StdRef, StdFom)
Table 5.5: Experimental results of B–II
Regions Target Coord. Sys. TLs #Points StdErr Dev(m) E(162000-260000) TM2E (TWD97) 3 10000 9.34e−5 3.932
N(2428000-2740000) TM2N (TWD97) 3 9.12e−5 3.577
Generating data TM2E (TWD67) 4 3.62e−5 0.239
(30109.78 km2) TM2N (TWD67) 4 5.51e−6 0.158
E(140000-226000) TM2E (TWD97) 3 28527 1.61e−5 0.406
N(2489000-2576000) TM2N (TWD97) 3 9.48e−6 0.235
Collecting data TM2E (TWD67) 4 4.01e−6 0.499
(7386.41 km2) TM2N (TWD67) 4 5.94e−6 0.843
Note: WGS84(φ84, λ84, h84)using level-wise methods, TLs: the number of coordinate transformations from
Experiment–I: (70%UserData, StdFom)::(30%UserData, StdFom)
Table 5.6: Experimental results of B–III
Regions Target Coord. Sys. TLs #Points StdErr Dev(m) Reference points TM2E (TWD97) 3 10091 3.02e−5 8.831
+ TM2N (TWD97) 3 7.89e−6 3.723
Generation data TM2E (TWD67) 4 1.56e−5 4.037
TM2N (TWD67) 4 9.62e−6 2.375
Reference points TM2E (TWD97) 3 29016 6.83e−5 0.794
+ TM2N (TWD97) 3 7.72e−6 0.296
Collecting data TM2E (TWD67) 4 1.41e−6 0.631
TM2N (TWD67) 4 5.59e−6 0.312
Note: WGS84(φ84, λ84, h84)using level-wise methods, TLs: the number of coordinate transformations from Experiment–III:
(70%UserData+70%StdRef, StdFom+StdRef)::
(30%UserData+30%StdRef, StdFom+StdRef)
0
Percentage of points in that Dev. (%)
Figure 5.10: Advanced analysis on the result of Experiment–C using different numbers of sampling points
distributed may be important factors in deriving transformation formulas. Too few sampling points derive less accurate results. Most testings produce satisfactory re-sults, except Experiment–C in the first region. Figure 5.10 plots the percentage of the points among 91 standard reference points in the first region against the associ-ated deviations in testing the regression formulas. Most deviations concentrate at 3m when the number of sampling points is large enough.
• Deviations are different at various areas. This may be due to the insufficient number of sampling data on critical landforms which play a critical role in geodesic survey.
Another possibility of inaccuracy comes from the insufficiency of the reference points in the testing regions. For example, there are 91 reference points in the first region (30746.27 km2) but 489 points in the second region (7464.96 km2.) in case of experiment-B
• It is intuitive to assume that stronger relationships exist between the source and tar-get coordinates if the coordinates are calculated using existing formulas. Regression on such coordinates may be more easy and have better performance.
• Regression on datasets consisting of only standard reference points may not produce a good result. This may be because that such points are surveyed at different time
and are recorded in different precisions. Additionally, the use of standard reference points in the regression process sometimes improves the overall performance in accuracy. This phenomena may be explained by that the standard reference points which give stronger constraints to the regression engine.
5.4 Experiment–C
5.4.1 Data collection and experiment design
We use the standard reference points to be the experiment data. There are 2,748 such standard reference points distributed on Taiwan island of 35,915 km2. They are cate-gorized as satellite control points, gravity reference points, base level points, etc., and are classified into three levels, class-1 – class-3, with each level further partitioned into 3 grades, grade-1 – grade-3 for different purposes.
5.4.2 Experiment–C–I
The first experiment is to test if the proposed regression engine can work and give satis-factory results. All 2,748 standard reference points are used in the training phase, where 70%, 80%, 90% of them are randomly selected as training data and the rest of data are reserved for testing. The experimental results are presented in Table 5.7. From the ex-perimental results, the regression engine works satisfactorily. It converges and meets the requirements of minimizing errors and products regressive functions. The value of Dev in TM2N is about 0.5m; while the one in TM2E is about 1.5m. The accuracy of regression on TM2N is much better than that on TM2E. This may due to the geographic shape of Taiwan where the length in the north-south direction is about 2.5 times of that in the east-west direction. Even the error biases, wE and wN, are adjusted, the same results still remain. This phenomena may be explained by the fact that the standard reference points do not uniformly distributed on the sampling area.
5.4.3 Experiment–C–II
In order to make sampling data more uniformly distributed and to eliminate Dev in both TM2E and TM2N, the sampling data are partitioned. There are two partitioning strategies. First, the standard reference points are partitioned into 5 regions according to their administrative areas and redo the above experiment. In each region, 80% data
Table 5.7: Experimental results–using all standard reference points for regression
Data TM2E TM2N
Sampling∗ StdErr Dev(m) StdErr Dev(m)
70%::30% 9.34e−5 2.152 1.28e−4 0.517 80%::20% 9.34e−5 0.971 1.28e−4 0.628 90%::10% 9.34e−5 1.509 1.28e−4 0.456 Average 9.34e−5 1.544 1.28e−4 0.534
*: kData for trainingk :: kData for testingk
are randomly selected for training and 20% data for testing. The visualized picture is shown in Figure 5.11. The experimental results and functions are presented in Table 5.8 and Table 5.9 respectively. The second partitioning strategy is according to their height above the reference surface denoted in the WGS84’s hφ84, λ84, h84i data packages. Also, in each level of height of data, 80% data are randomly selected for training and 20% data for testing. The visualized picture is shown in Figure 5.12. The experimental results are presented in Table 5.10.
From the experimental results, regression using partitioned datasets gains better ac-curacy than that in un-partitioned datasets. This is a reasonable result since each dataset has condensed sampling data so that the regression engine can converge faster. Interest-ingly, both results are better than that in Table 5.7 and the results in Table 5.8 outperform slightly those in Table 5.10. This may due to the spread of data partitioned by heights is wider than that partitioned by locations.
5.4.4 Experiment–C–III
Finally, the effectiveness of the proposed method is compared with level-wise transfor-mation. For this purpose, 8 sets of synthetic data in WGS84 (hφ84, λ84i) are extracted with parameters δe= 0.410, δn= 0.230, and k = 0 ∼ 511 as follows.
• E1={h120◦38.430, 21◦51.000 + k · δei}
• E2={h120◦45.430, 21◦51.000 + k · δei}
• E3={h120◦53.430, 21◦51.000 + k · δei}