• dim(X1) = dim(X2 = dim(X1∩ X2)) = r
This is the basic situation; the above derivation is under this assumption.
• dim(X1) = r1 > dim(X2) = dim(X1∩ X2) = r
Assume the dimensionality of two groups is different even by random grouping, such that the same points (overlapping part) are expressed as different dimensions. In this case, Q1 and Q2 will become r× r and r × r matrix after QR decomposition. It means that only an r basis can span these points (overlapping part) in r-dimension space, and the corresponding R1 and R2 will drop the dimension to r being r × NI matrices. Then, we can find the affine transformation based on the same dimension such that equation (1) will hold. After U = (Q1)r×r(QT2)r×r operate on (X2)r×NI, X˜2= Ur×r(X2)r×n2+ br×n2 to move up to r-dimension space. Inversely, if dim(X1) = r < dim(X2) = r, in equation (1) U = (Q1)r×r(QT2)r×r operating on (X2)r×NI will
project the higher dimension part into lower dimension. As a result, dimensionality of X2will descend and lose partial information of X2. Consequently, if the dimensionality of two groups is different, it is more reasonable to transform the lower dimension part into higher dimension space by affine mapping.
• dim(X1) = dim(X2) = r > dim(X1∩ X2) = r
If two overlapping parts are colinear, dim(X1 ∩ X2) will less than r. In this case, there are two possible conditions. Let’s repeat the example above to explain. Assume there are two groups in two-dimension space. There are three intersection points for these two groups. These three points are collinear so that they are located on a line in one-dimension space, as Figure 3 shows. Fix the first group and align the second group through affine mapping. This may occur in two situations. One is that these two sets will be aligned through a rotation and shift, and another is that two sets will be aligned through a rotation or shift other than a reflection in respect to the line. These two conditions will create two different results, and one of them would accord with the original configuration. As a result, there is not enough information to distinguish which one will be correct. The same confusion would happen under dim(X1)≥ dim(X2) > dim(X1∩ X2).
Fig. 3: Three solid points represent overlapping points of two groups. These three points has collinearity and allocate in a line.
In the following, we will show another simple example to explain the second condition: spanned by three basis with dim(X1) = dim(X2) = 4. Figure 4 shows the condition.
In matrix X, x1 and x5 are orthogonal to the set X1∩ X2, and x1 is also orthogonal to x5, so the space spanned by X2 projects on the space spanned by X1 should be a three-dimensional plane. However, x1 and x5 will be considered as the same component and aligned through affine mapping in the process of combination step in SCMDS. Doing the wrong space alignment will cause matrix X to be reduced to a four-dimensional matrix and lose information from one dimension. So, choosing a suitable NI (the number of overlapping points) and Ng (the size of each group) is very important. We hope that NI and Ng are large enough so that we can lower the chance of the appearance of colinearity for the intersection part as well as the chance of dimensionality of the intersection part so that it is less than the dimensionality of total data when we do the random grouping.
In conclusion, the number of overlapping points would be at least the minimal of dimen-sionality of two groups. This can be denoted as
dim(X1∩ X2)≥ min{dim(X1), dim(X2)}.
Remarks:
1. In the process of SCMDS, combining two overlapping groups should fix one group with larger dimensionality and operate affine mapping on another group with smaller dimen-sionality, and then align it with the former. If two groups have the same dimendimen-sionality,
Fig. 4: Split a dataset into two overlapping groups. Each groups have four dimensionality.
The overlapping part has only three dimensionality.
choose any one as the central group and align the two groups through applying affine mapping on the other one.
2. In SCMDS, split data set X into two overlapping groups, X1 and X2. The num-ber of intersection points, NI, should be large enough such that dim(X1 ∩ X2) = min{dim(X1), dim(X2)}.
Theorem: Let there be a matrix Xp×n= [x1, x2,· · · , xn], xi ∈ Rp, i = 1, 2,· · · , n, X1 and X2 are two subsets of X, X = X1∪ X2, X1∩ X2 = ∅. On X1 and X2 apply MDS separately, resulting in new matrices denoted as X1 and X2. There exist minimal orthogonal sets such that X1 ⊂ span{v1, v2,· · · , vr1} and X2 ⊂ span{w1, w2,· · · , wr2}, {vi}ri=11 ,{wj}rj2=1 ∈ Rr, r < p, where r is the number of main component we keep, dim(X1) = r1 ≥ r2 = dim(X2).
Y1 and Y2 represent the overlapping part after applying MDS in two sets, respectively. If rank(X1∩ X2) = r2, the affine mapping of recombination process in SC-MDS will transform X2 to a subset of span{v1, v2,· · · , vr1}.
Proof:
Because Y1 and Y2 are not centralized in the same center, we shift these two sets to the same center, say original point. Then, we rotate them to the same configuration expension.
As the introduction of the previous recombination method, we apply QR decomposition to
both Y1− ¯Y11T and Y2− ¯Y21T. We have Y1− ¯Y11T = Q1R1 and Y2− ¯Y21T = Q2R2
∵ Y1 is the representation of X1∩ X2 in Rr1
∴ Q1 only has r2 orthogonal column vectors
∴ Q1 ∈ Mr1r2(R)
Now, extend the simplified case into K groups. Apply the combining method above, we combine the first two groups and obtain a new spatial configuration ˜X(1). Then, combine the new spatial configuration with the next groups X3 with the same rule. If dim(X3) ≥ dim( ˜X(1)), align ˜X(1) with X3 based on X3 and so forth. Low dimensional groups will be absorbed into high dimensional groups. Repeat until all groups combine in an identity space.
In the end, the whole spatial configuration will be spanned by basis of one of the groups. In other words, as long as, the space spanned by a group whose basis is identical with the space spanned by the basis of data X, the result of SCMDS should be consistent with CMDS.
Remark:
If the dimensionality of at least one of the groups is equal to the dimensionality of the total data set, the result of SCMDS is the same as the result of CMDS apart from a rotation effect.