Chapter 6 The Asymptotic Minimax Optimization for Mean Estimation of
6.2 QMLE optimization on MMSE of the Two Endpoints of Range, (QSQ)
In the proposed QMLE mean estimator, the quantiles are determined by the maximum percentage of its original population, i.e. coverage. Since the coverage-constrained quantiles obey the properties of symmetric quantiles, the QMLE mean estimator may be efficient and robust with variance asymptotically approaching the Cramer-Rao lower bound. It is worthy noting that since the QSQ usually covers a significant portion of the population, it is therefore popular to apply the double censoring scheme for the observations of small sample size, especially in the sport contest. We know that adopting such a strategy can avoid the large variation occurring in the mean estimation. Based on above discussions, we apply the above QMLE+MMSE optimization search only on QSQ, and call it the Q2MMSE-CLT scheme.
We now examine the performance of Q2MMSE-CLT by simulations. Suppose that the combined quantity is composed of four independent random input quantities with two normal distributions, x z= + + +1 z2 z3 z4 , z1~ (0.1,1 )N 2 and
2 2 ~ (2.15,1.5 )
z N , and two rectangular distributions,
3~ [ 2 3 1.05, 2 3 1.05]
z rect − − − and z4 ~rect[ 10 3 1.45,10 3 1.45]− + + . We perform 10,000 trials to test Q2MMSE-CLT for each of the four conditions listed in Table 9. The testing sample size ranges from 11 to 40 for each trial. Fig. 21 and Fig.
22 display the experimental results. It can be found from these two figures that Q2MMSE-CLT significantly outperforms the sample mean for Conditions A and B, and is slightly better for Conditions C and D. In other words, Q2MMSE-CLT has much lower MSEs when the standard uncertainty is known.
10 15 20 25 30 35 40
6.2.1 Test the Robustness of Q2MMSE-CLT for Different Uncertainty Ratio
Here we test Q2MMSE-CLT for two different values of UR. As demonstrated in Fig.
4, the R*N distribution is more flat in its central part as UR increases. It is a general issue to study whether Q2MMSE-CLT performs better for larger UR. We perform 10,000 trials for two cases of combined quantities composing of four different distributions. One has z1 ~ (0.1,1 )N 2 , z2 ~ (0.2,1.5 )N 2 ,
3~ [ 2 3 0.15, 2 3 0.15]
z rec− + + , and z4 ~rec[ 10 3 0.1,10 3 0.1]− − − . Its UR is equal to 3.7 evaluated according to Eq.(2-25). Another is the same as the first case except that z4 ~rec[ 10 3 0.1,10 3 0.1]− − − is changed to
4 ~ [ 28 3 0.1, 28 3 0.1]
z rec − − − . The UR is accordingly changed to 10.4. Fig. 23 displays the histograms of 50,000 outputs of combined quantities for the two cases. It shows the property of quasi-normal distribution for the output of combined quantities.
To compare the two cases of Q2MMSE-CLT, a robustness function of gain relative to sample mean is defined as
2( )
( 2 )
1 , ( : )
( )
u xc
Average MSEs of Q MMSE
G unit
Average MSEs of sample mean n
= − (6-6)
Fig. 24 displays the experimental results. It can be found from the figure that Q2MMSE-CLT outperforms sample mean for both cases of UR=3.7 and UR=10.4.
Moreover, the performance is better for larger UR.
Fig. 23: Histogram of 50,000 combined quantities for different URs. x-axis is the output of combined quantities and y-axis is the frequency count
Fig. 24: Gain performance for the different URs. The unit is u x n c2( ) /
6.2.2 An Advanced Refinement of the QMLE
Although Q2MMSE-CLT follows the paradigm of asymptotic minimax principle, there are only about 2%~3% gains, for Conditions C and D, over the sample mean in the mean estimation for the output of combined quantities. By considering the practical applications, we only further discuss Condition D. As was noted previously, the testing data of combined quantities are formed in the same manner and we execute
1,000 trials with 15 observations in each trial. We select 60 candidates of population mean and arrange them to be symmetric to the sample mean within the interval of [ 2− σs/ n x+ , 2σs/ n x+ ]. Then we evaluate the QMLE via the Q2MMSE-CLT scheme. In our maneuver, we first plot the convex curves according to the three different clusters of Z score (i.e., quantile of the signal transformed to standard normal pdf) of sample mean:Z < − , 2 −0.5≤ ≤Z 0.5, and Z > . We then define the 2 cluster −0.5≤ ≤Z 0.5 as good sample mean and the other two clusters, Z < − and 2
2
Z > , as the bad sample means. Fig. 25 is the convex sets conditioned on the good sample mean. Here, the dot line is the convex set for the original signal of combined quantities and the green solid line represents the convex set due to enlarging standard uncertainty (ESU) to 4 times of the original signal with the same reference candidates of population mean. We find from the figure that for the good sample mean case QMLE converges near the symmetric location, (i.e., the 30-th candidate) for both the original and ESU signals. So, in the good sample mean case the convergence of QMLE to population mean on heavy observations will be guaranteed.
0 20 40 60 80 sample size is 15, 4 combined quantities
Fig. 26 and Fig. 27 show respectively the results for the two bad cases of biased Z score to be less than -2 and greater than 2 when applying the Q2MMSE-CLT and enlarging standard uncertainty Q2MMSE-CLT (ESQ2MMSE-CLT). We plot the details shown as the double y-axes representation in which the dash line represents the original signal evaluated by Q2MMSE-CLT and the solid line represents the
signal evaluated by ESQ2MMSE-CLT with 4 times of combined standard uncertainty.
An important fact is found from these two figures that the original signal will be affected by the sample mean if it only takes the Q2MMSE-CLT operations. The resulting MSE curves converge to the near symmetric location which is the sample mean, but we know it is a bad sample mean. We also found from these two figures that, as we apply the ESQ2MMSE-CLT algorithm with 4 times of combined standard uncertainty, the MSE curves converge to locations deviated away from the bad sample mean and toward the true population mean. Why does it act like this as the action? The reason is that the ESQ2MMSE-CLT enlarges the combined standard uncertainty to 4 times of the original signal. Thus the Z score of the general maximum bias sample mean will be reduced to 25% of that of the original signal. It means that the Z score of bias is constrained to −0.5≤ ≤Z 0.5. This in turn will guarantee the convergence to the good sample mean (also the population mean) as shown in Fig. 25,
0 10 20 30 40 50 60 70
Fig. 26: Originally left biased of bad sample mean tested with the convex sets, double y-axes, normalized by u x n , sample size is 15, 4 combined quantities c2( ) /
0 10 20 30 40 50 60 70
Fig. 27: Originally right biased of bad sample mean tested with the convex sets, double y-axes, normalized by u x n , sample size is 15, 4 combined quantities c2( ) /
Fig. 28 displays the refined results of ESQ2MMSE-CLT for sample size from 11~40.
We find from the figure that ESQ2MMSE-CLT significantly outperforms the sample mean by 40% MSE reduction. So it is a promising mean estimator.
10 15 20 25 30 35 40
Fig. 28: Refined Q2MMSE-CLT with the enlarging standard uncertainty, y-axis is normalized by u x n , 4 combined quantities c2( ) /