Introduction to nonmetric MDS - 分群與合併的多元尺度分析法之最佳分群決策與遺失值問題的討論

Nonmetric MDS was proposed by Shepard (1962) and Kruskal (1964). Proximities in non-metric MDS don’t oﬀer distance values; instead, they oﬀer only ordinal information. For

example, when we measure the perceptual space of human subjects, it is hard to tell how much diﬀerence there is between two objects, but it is easy to say that comparing with object B and C, object A is more close to object B. Hence, dissimilar rank can be deﬁned.

1.3.1 How to sort the dissimilarity for the proximities?

There are many ways to rank the similarities. One method is to prepare cards for each pair of objects and let subjects arrange every card according to their similarity. Another method is to divide the cards into two groups, one with higher similarity and another with lower similarity, and repeat the procedure on each pile until the similarity of object pairs in the same pile are approximately equal. Furthermore, still another method is to write one object on a card and put the similar objects in the same pile, and then count the number of times that two objects occur in the same pile as the proximities. This kind of deﬁnition of proximities is very intuitive.

1.3.2 How does nonmetric MDS operate?

In nonmetric MDS, the distance between two objects is meaningless to us. To be more precise, the value of proximities only matters in their relative sizes; the distance between two values does not have any meaning. Hence, transforming proximities into spatial points needs to preserve the rank order of pairs of points. So, we want to ﬁnd a monotonic function such that proximities transformed by this function could still preserve their dissimilar order. Distances of points which transform from proximities through the optimal monotonic function, denoted as ˆd = f (p), are called disparities. The problem of nonmetric MDS transfers to how to arrange the points conﬁguration and how to ﬁnd an optimum transformation so that the order of disparities and proximities will be consistent as well as possible. Here, Kruskal (1964) proposed an iterative technique to ﬁnd the transformation and suggested minimizing STRESS to access how well the conﬁguration ﬁts.

ST RESS =

2. Find the distance d_ij of the conﬁguration

3. Find the optimum transformation and calculate ˆd 4. Use steepest descent to ﬁnd a new conﬁguration

5. Compare the stress to one iteration forward. If it is smaller than some criterion, end this algorithm or go back to step 2

More details are in Multidimensional Scaling, Cox and Cox, chapter3.

Although Multidimensional scaling has many types of models for each kind of data, here we focus on classical MDS. Let’s go back to the classical MDS and confront some challenges.

1. How many components should we keep?

In other words, how many dimensions does this data set need in order to at least keep it dissimilarity? There are lots of suggestions, such as picking up those components with eigenvalue > 1, using the minimal number of dimensions such that stress is less than 0.05, or deleting the components with a small eigenvalue relative to others. Some of these suggestions are from researchers’ experience according to huge numbers of tri-als. No one can ensure that estimating the number of dimensions is the best estimator according to these suggestions. Another well-known method is the scree test or elbow test: plot the scatter plot of dimension vs. stress, observe variety of stress as dimension changing, and choose the point which doesn’t have a signiﬁcant decrease as the dimen-sion increases. This method has a disadvantage. If the curve shows a mild decrease when the dimension gets large, it is hard to choose which point is the elbow point, and the method is then inactive. You can see more principals by refering to ”Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches” , Donald A. Jackson (1993).

2. Classical MDS algorithm is slow.

The computation complexity is O(N³). It would cost a lot of time to calculate when the sample size is large. Many kinds of MDS methods are proposed for large data sizes, such

as Chalmer’s linear iteration algorithm, anchor point method, relative MDS, landmark MDS, or Diagonal majorization algorithm (DMA).

3. Missing data of distance matrices are not allowed. CMDS is a PCA based method. It does not allow any missing value in the matrix when we reconstruct the data coordinate.

There are many methods to reﬁll the missing data, such as shortest path. However, the computation cost is huge.

2 Literature Review

2.1 Chalmers’ Linear Iteration Algorithm

Many kinds of MDS models are developed. The spring model is one kind of MDS model. It is a force-directed model. Proxim-ities are considered as a physical system. Imagine that objects are connected with each other using springs. The proximity of two objects is considered the length of a relaxed spring. At ﬁrst, we will initialize positions of objects arbitrarily so these springs will be stretched or compressed. If we have these springs

oscil-late liberally, the system will eventually get into equilibrium with a minimal energy. Stress is a suitable measurement here to measure the energy of the system. In other words, the spring model for MDS aims to minimize the stress. This process will be achieved by an iterative algorithm. However, the computation complexity is O(N³).

The spring model is based on a method proposed by Chalmers (2006). Its time cost is better than the spring model’s time cost. The computation complexity is reduced to O(N²).

The Chalmers linear iteration model reduces the number of calculations for forces. Two sets are deﬁned. For an object i, the ﬁrst set, V, collects the neighbor objects of object i.

We randomly select some objects from those objects out of a neighbor set and check the proximity of object i and itself. If the proximity is less than any one of its current neighbors, then swap it into set V, else collect it in a second set S. The set S will be reconstructed in each iteration. Thus, in the iteration of each object, not all the force information is used.

This will reduce the order of computation complexity.

In this case, the spring model is good for adding new points into system. Still, it is not stable for the general solution; therefore, good initial values are needed.

在文檔中分群與合併的多元尺度分析法之最佳分群決策與遺失值問題的討論 (頁 9-13)