More discussion about combine step - 分群與合併的多元尺度分析法之最佳分群決策與遺失值問題的討論

• dim(X1) = dim(X₂ = dim(X₁∩ X2)) = r

This is the basic situation; the above derivation is under this assumption.

• dim(X1) = r₁ > dim(X₂) = dim(X₁∩ X2) = r

Assume the dimensionality of two groups is diﬀerent even by random grouping, such that the same points (overlapping part) are expressed as diﬀerent dimensions. In this case, Q₁ and Q₂ will become r× r and r × r matrix after QR decomposition. It means that only an r basis can span these points (overlapping part) in r-dimension space, and the corresponding R₁ and R₂ will drop the dimension to r being r × N^I matrices. Then, we can ﬁnd the aﬃne transformation based on the same dimension such that equation (1) will hold. After U = (Q₁)_r×r(Q^T₂)_r×r operate on (X₂)_r×NI, X˜₂= U_r_×r(X₂)_r×n2+ b_r_×n₂ to move up to r-dimension space. Inversely, if dim(X₁) = r < dim(X₂) = r, in equation (1) U = (Q₁)_r×r(Q^T₂)_r×r operating on (X₂)_r×N_I will

project the higher dimension part into lower dimension. As a result, dimensionality of X₂will descend and lose partial information of X₂. Consequently, if the dimensionality of two groups is diﬀerent, it is more reasonable to transform the lower dimension part into higher dimension space by aﬃne mapping.

• dim(X1) = dim(X₂) = r > dim(X₁∩ X2) = r

If two overlapping parts are colinear, dim(X₁ ∩ X2) will less than r. In this case, there are two possible conditions. Let’s repeat the example above to explain. Assume there are two groups in two-dimension space. There are three intersection points for these two groups. These three points are collinear so that they are located on a line in one-dimension space, as Figure 3 shows. Fix the ﬁrst group and align the second group through aﬃne mapping. This may occur in two situations. One is that these two sets will be aligned through a rotation and shift, and another is that two sets will be aligned through a rotation or shift other than a reﬂection in respect to the line. These two conditions will create two diﬀerent results, and one of them would accord with the original conﬁguration. As a result, there is not enough information to distinguish which one will be correct. The same confusion would happen under dim(X₁)≥ dim(X2) > dim(X₁∩ X2).

Fig. 3: Three solid points represent overlapping points of two groups. These three points has collinearity and allocate in a line.

In the following, we will show another simple example to explain the second condition: spanned by three basis with dim(X₁) = dim(X₂) = 4. Figure 4 shows the condition.

In matrix X, x₁ and x₅ are orthogonal to the set X₁∩ X2, and x₁ is also orthogonal to x₅, so the space spanned by X₂ projects on the space spanned by X₁ should be a three-dimensional plane. However, x₁ and x₅ will be considered as the same component and aligned through aﬃne mapping in the process of combination step in SCMDS. Doing the wrong space alignment will cause matrix X to be reduced to a four-dimensional matrix and lose information from one dimension. So, choosing a suitable N_I (the number of overlapping points) and N_g (the size of each group) is very important. We hope that N_I and N_g are large enough so that we can lower the chance of the appearance of colinearity for the intersection part as well as the chance of dimensionality of the intersection part so that it is less than the dimensionality of total data when we do the random grouping.

In conclusion, the number of overlapping points would be at least the minimal of dimen-sionality of two groups. This can be denoted as

dim(X₁∩ X2)≥ min{dim(X1), dim(X₂)}.

Remarks:

1. In the process of SCMDS, combining two overlapping groups should ﬁx one group with larger dimensionality and operate aﬃne mapping on another group with smaller dimen-sionality, and then align it with the former. If two groups have the same dimendimen-sionality,

Fig. 4: Split a dataset into two overlapping groups. Each groups have four dimensionality.

The overlapping part has only three dimensionality.

choose any one as the central group and align the two groups through applying aﬃne mapping on the other one.

2. In SCMDS, split data set X into two overlapping groups, X₁ and X₂. The num-ber of intersection points, N_I, should be large enough such that dim(X₁ ∩ X2) = min{dim(X1), dim(X₂)}.

Theorem: Let there be a matrix X_p×n= [x₁, x₂,· · · , xn], x_i ∈ R^p, i = 1, 2,· · · , n, X1 and X₂ are two subsets of X, X = X₁∪ X2, X₁∩ X2 = ∅. On X1 and X₂ apply MDS separately, resulting in new matrices denoted as X₁ and X₂. There exist minimal orthogonal sets such that X₁ ⊂ span{v1, v₂,· · · , vr₁} and X₂ ⊂ span{w1, w₂,· · · , wr₂}, {vi}^ri=1¹ ,{wj}^rj²=1 ∈ R^r, r < p, where r is the number of main component we keep, dim(X₁) = r₁ ≥ r2 = dim(X₂).

Y₁ and Y₂ represent the overlapping part after applying MDS in two sets, respectively. If rank(X₁∩ X2) = r₂, the aﬃne mapping of recombination process in SC-MDS will transform X₂ to a subset of span{v1, v₂,· · · , vr₁}.

Proof:

Because Y₁ and Y₂ are not centralized in the same center, we shift these two sets to the same center, say original point. Then, we rotate them to the same conﬁguration expension.

As the introduction of the previous recombination method, we apply QR decomposition to

both Y₁− ¯Y₁1^T and Y₂− ¯Y₂1^T. We have Y₁− ¯Y₁1^T = Q₁R₁ and Y₂− ¯Y₂1^T = Q₂R₂

∵ Y1 is the representation of X₁∩ X2 in R^r¹

∴ Q1 only has r₂ orthogonal column vectors

∴ Q1 ∈ M^r1r₂(R)

Now, extend the simpliﬁed case into K groups. Apply the combining method above, we combine the ﬁrst two groups and obtain a new spatial conﬁguration ˜X₍₁₎. Then, combine the new spatial conﬁguration with the next groups X₃ with the same rule. If dim(X₃) ≥ dim( ˜X₍₁₎), align ˜X₍₁₎ with X₃ based on X₃ and so forth. Low dimensional groups will be absorbed into high dimensional groups. Repeat until all groups combine in an identity space.

In the end, the whole spatial conﬁguration will be spanned by basis of one of the groups. In other words, as long as, the space spanned by a group whose basis is identical with the space spanned by the basis of data X, the result of SCMDS should be consistent with CMDS.

Remark:

If the dimensionality of at least one of the groups is equal to the dimensionality of the total data set, the result of SCMDS is the same as the result of CMDS apart from a rotation eﬀect.

在文檔中分群與合併的多元尺度分析法之最佳分群決策與遺失值問題的討論 (頁 17-22)