Owing to the congenital problem of this voxel-wise comparison approach, in this chap-ter we will introduce the proposed method called the multivariate volumetric morphometry (MVM) which can assess anatomical brain differences. The MVM includes a preprocess-ing and an analysis step as VBM does. The image preprocesspreprocess-ing of MVM is the same with one of VBM (shown in Figure 2.4), in which the modulation is required to charac-terize volumetric group differences of brain tissues, but the data smoothing is omitted. In the statistical analysis step, MVM adopts a reformatory LDA-based method as the basis of multivariate analysis, and conjugates the wavelet transform, which is used to rearrange the spatial and frequency information for the multiresolution analysis, to measuring group differences. Because each and every voxel represents one variate in analysis, thus MVM is a multivariate approach. This method overcomes the drawback of voxel-based analysis, and is appropriate for estimating the structural brain discrepancy between different groups.
3.1 Ideas of the Proposed Method
The multivariate volumetric morphometry (MVM) is the proposed method which char-acterizes volumetric anatomical discrepancy between different groups through a multivari-ate analysis of MR images of particular brain tissues. It is an unbiased, objective and whole-brain measurement. The multivariate volumetric morphometry contains several pro-cesses like VBM does, and the chief breakthrough of this thesis is the multivariate analysis.
Thus, in the following, we only focus on the statistical analysis step of MVM.
In the multivariate analysis stage, it employs a high-dimensional classification tech-nique, which considers all voxels of MR images at one time, to identify the most discrim-inative hyper-plane that well separates the populations of groups in the high-dimensional space. This hyper-plane goes along with a unique normal vector, the most discriminant pro-jection vector w, which is the direction shifted from one population to another population.
The appearance of a shift might be resulted from some factors of interest cause the group
3.1 Ideas of the Proposed Method 27
Voxel 1
Voxel 2 w
Figure 3.1: Sketchily showing how a high-dimensional classification technique can be used to measure group differences. Assume the yellow points are the patients’ morphological profiles, and the corresponding yellow elliptic area is the patients’ distribution; either are the blue ones for normal subjects. A classification technique determines the most discrim-inative hyper-plane, which is presented by the dotted line, and the corresponding most discriminant projection vector w. In such the projection vector w, each element denotes the discrimination weight of group discrepancy. Therefore, the vector w can be considered as a spatial map containing the regions representative of group differences.
discrepancy under study. The most discriminant projection vector is also an image which has the same size of all sample images. The way of using a classification technique to find such the most discriminant projection vector does not need to coincide along any voxels (di-mensions), that voxel-based analyses are unable to achieve. Further, each of the parameters from the most discriminant projection vector w denotes the weighting, or the discrimina-tion of characterizing group discrepancy, so the most discriminant projecdiscrimina-tion vector can regarded as the analytic image containing the resulting analysis parameters. In this way, we can quantify differences throughout the whole brain between different groups by such a high-dimensional classification technique. Figure 3.1 illustrates the idea schematically.
3.2 Framework of Multivariate Volumetric Morphometry
Before the multivariate analysis, we used the first six steps of the optimized VBM pro-tocol mentioned in the chapter 2 to obtain individual normalized and modulated gray/white matter images. Notice that the smoothing was disused. The reason why we do not smooth the images in the multivariate volumetric morphometry will be explained in chapter 5, discussion. Then, in the central multivariate method, we used a reformatory LDA-based method, the discriminative common vector method [32], to find the most discriminant pro-jection vector which minimizes the scatter within each individual group and simultaneously maximizes the scatter between groups without the small sample size problem. The result-ing projection vector forms a spatial map, whose image size is the same with all gray/white matter images used for the analysis, containing the regions which are most representative of group differences. The details of the method and its efficient implementation we proposed for implementation will be interpreted in the next section.
Besides the discriminative common vector method, we also used the wavelet transform rearranging the spatial and frequency information of MR images to improve the effect of MVM upon catching significant group differences, in several varied scales. The reason we applied the discriminative common vector method in the wavelet space is that, although the method considers all voxels of images simultaneously when estimating group differ-ences, there are the same forces of relationships between all pairs of two voxels in the method; no matter the two voxels are adjacent to or far away from each other. Thus, to increase spatial correlations between neighboring voxels, the 3-D wavelet transformation is used to restructure voxel data into space-scale features in a hierarchical representation way. After that, we then apply the discriminative common vector method on these wavelet features. Moreover, the wavelet transform also makes the analysis become a multivariate multiresolution analysis.
After getting the most discriminant projection vector of two groups’ features, the weight
3.2 Framework of Multivariate Volumetric Morphometry 29
of each feature from the projection vector represents the degree of importance for charac-terizing the group differences. The number of features, equal to the number of voxels in a MR image, is usually a huge amount. Only the features with larger weights in the most discriminant projection vector are used when performing the inverse 3-D wavelet transfor-mation, to obtain the final discriminating map in the original voxel-based space. Discarding the features with small weights helps to remove trifling differences and to improve accuracy of the multivariate analysis.
Finally, for the purpose of determining and displaying which regions representative of the significant group differences, a smoothing and thresholding of discrimination weights of the parameters in the discriminating map are needed. As mention before, each parameter of the discriminating map denotes the discrimination of characterizing the group discrep-ancy, so in an intuitively thinking, the changes of the weights of neighboring parameters should be slight. However, in practice, it does not often look smooth as we think. It may result from the noise or the variation within groups, or the error produced during the pre-processing like a wrong image registration or tissue segmentation. It happens especially when we abandon the smoothing step in MVM preprocessing. Therefore, to constrain the smoothness of discrimination weights in the discriminating map regionally, we use a smoothing for the discriminating map. In addition, a thresholding is done before displaying the discriminating map to show the detected regions most representative of group discrep-ancy. Only voxels with a parameter value preceding the threshold in the discriminating map are considered to reach the significant level and to shall be showed. The minimum cluster size of the parameters also can be set to reject the too small regions. Although the smoothing and thresholding are not the parts of the multivariate analysis step, they are need for visualizing the discrepancy pattern between the groups. Of course, both of the procedures can be regulated by users. In the end of the MVM analysis, we can also obtain a whole-brain confidence in explaining whether the detected group discrepancy is correct, by caculating the p-value associated with the T-statistic on the two groups of projected
images onto the discriminating map.
Summarily, the multivariate volumetric morphometry (MVM) contains the following steps:
1. Spatially normalization of all images to the same stereotactic space 2. Segmentation of normalized images into GM, WM, and CSF 3. Correction for volume changes of segmented images
4. Multivariate analysis
(a) Forward 3-D wavelet transformation to the multiresolution space
(b) Discriminative common vector method to obtain the most discriminant projec-tion vector
(c) Discarding unimportant wavelet features with small discrimination weights in the most discriminant projection vector
(d) Inverse 3-D wavelet transformation to obtain the discriminating map in the stereotactic normalization space
5. Visualization of the discrepancy pattern (a) Smoothing
(b) Thresholding
Figure 3.2 is the flowchart of the multivariate analysis part of MVM. In implementation, we used the first six steps of the optimized VBM protocol to accomplish the step 1 and 2 of MVM. In the following sections we will introduce the techniques used in the mul-tivariate analysis step, including the discriminative common vector method, the efficient implementation for the discriminative common vector, and the 3-D wavelet transform.
3.2 Framework of Multivariate Volumetric Morphometry 31
Figure 3.2: Flowchart of multivariate analysis and visualization in MVM.
3.3 Multivariate Analysis using a Reformatory LDA-Based Method
3.3.1 Conventional Linear Discriminant Analysis and Its Potential Prob-lem
The linear discriminant analysis (LDA) is one of the most popular linear projection techniques. It was invented by Ronald A. Fisher in 1936 [33], and has been successfully applied in many classification problems such as image recognition, multimedia informa-tion retrieval and so on. Its goal is to find the most discriminant projecinforma-tion vector w, in which direction groups can be separated with the maximum between-class scatter and the minimum within-class scatter.
LetK be the number of classes (groups), where the kth class contains Mksamples, and let xkmbe aN -dimensional column vector which denotes the mth sample of the kth class.
There is a total ofM = PK
k=1Mk samples. The within-class scatter matrix Sw and the between-class scatter matrix Sb are defined as
Sw = projection matrix Plda that maximizes the Fisher’s linear discriminant criterion, that is
Plda = arg maxP F (P) = arg max
P
|PTSbP|
|PTSwP|. (3.3)
According to linear algebra, the ratio is maximized when the column vectors of Plda are the eigenvectors of S−1w Sb. In implementation, each individual morphological profile is
3.3 Multivariate Analysis using a Reformatory LDA-Based Method 33
first reshaped into a sample vector by arranging the 3-D volume in some consistent order before applying the linear discriminant analysis. Moreover, there are only two groups in our case, i.e. K = 2, so we can immediately obtain the most discriminant projection vector w, which is the only one eigenvector composing Plda, by the formula w = S−1w (µ1− µ2).
However, LDA encounters difficulties when the number of samples is much smaller than the dimensionality of the sample space. This situation causes the within-class scatter matrix singular and not invertible, so the LDA cannot be applied directly. It is known as the small sample size (SSS) problem [34]. Therefore, we employ the discriminative common vector method [32], which was proposed by Cevikalp and Wilkes for face recognition, to solve this problem.
3.3.2 Discriminative Common Vector Method
The discriminative common vector method for the small sample size problem is based on a variation of the LDA by maximizing the modified Fisher’s linear discriminant criterion [35]. The general idea of the common vector is to find a vector which can represent a class by extracting common properties of the class, or saying that, by eliminating differences between the samples in the class. After getting each common vector of every class, we can use the principal components analysis (PCA) [36] to find the principal components which actually equate the most discriminant projection vectors of LDA.
Let us use all previous definitions and let the total scatter matrix be defined as
St =
The modified Fisher’s linear discriminant criterion F (P) =ˆ |PTSbP|
|PTStP| = |PTSbP|
|PTSwP + PTSbP| (3.5)
has been proved that it is exactly equivalent to the original Fisher’s criterion by Liu et
The modified criterion will attain a maximum in the special case, proved in [37], where pTSwp = 0 and pTSbp 6= 0, for all projection vectors p ∈ RN \ {0}. Under these conditions for p, a better criterion [38] will be
arg max
|PTSwP|=0|PTSbP| = arg max
|PTSwP|=0|PTStP|.
That is to say, if we transform all samples onto the null space of Sw to restrict the projected within-class scatter matrix to be a zero matrix, and then calculate the principal components that maximize |PTStP| by performing PCA, we will obtain the most discriminant pro-jection vectors without the small sample size problem. It is called the null space method proposed by Chen et al. [37].
The transformation matrix from the original sample space to the null space of Sw is Q ¯¯QT, where the column vectors of ¯Q are the vectors spanning the null space of Sw.
Cevikalp and Wilkes [32] have proved that, projecting every samples xkm (which denotes the mth sample of the kth class) in the kth class onto the null space of Sw will produce exactly one vector xkcom = Q ¯¯QTxkm , which is referred to the common vector; moreover, because of ¯Q ¯QTxkm = xkm− QQTxkm, the common vector xkcom of the kth class can be calculated without ¯Q by using
xkcom = xkm− QQTxkm, (3.7) where Q is the matrix whose column vectors are the orthonormal vectors spanning the range space of Sw. Since the number of columns in Q is about M and the number of columns in ¯Q is aboutN − M, the size of Q is much smaller than the size of ¯Q. It states that the method can greatly reduce the computational burden than the null space method.
After obtaining the common vector for each and every group, the principal components of those common vectors will be the most discriminant projection vectors. It is because
3.3 Multivariate Analysis using a Reformatory LDA-Based Method 35
there is exactly one class over the common vectors now. These principal components of the common vectors are called the discriminative common vectors. Again, in our practice, samples are divided into only two groups, so there is only one discriminative common vec-tor in the event. We obtain the most discriminant projection vecvec-tor w by directly subtracting of the two common vectors, namely, w= x1com − x2com.
So the steps of the discriminative common vector method are as follows:
1. Compute the eigenvectorsα1, α2, . . . , αrcorresponded to the nonzero eigenvalues of Sw, where r is the rank of Sw, and set Q = [α1 α2 · · · αr].
2. Obtain the common vectors for each class by choosing any sample from each class and projecting it onto the null space of Sw, those are
x1com = x1m− QQTx1m, m ∈ {1, . . . , M1}, (3.8) and
x2com = x2m− QQTx2m, m ∈ {1, . . . , M2}. (3.9) 3. Compute the only one discriminative common vector, i.e. the most discriminant
projection vector w by
w= x1com − x2com. (3.10)
3.3.3 Efficient Implementation for Computing Discriminative Com-mon Vectors
Although the discriminative common vector method solves the small sample size prob-lem, there are still come difficulties in implementation. It is because the dimensionality of the sample space is a very huge amount. For example, a 3-D image with the size 157 × 189 × 156 has more than 4.6 × 106 voxels. Therefore, we proposed an efficient implementation for computing the discriminative common vector.
Since the within-class scatter matrix is defined as Sw =PK than directly calculating the large N -by-N matrix Sw, we used a computationally fea-sible method [39, 40] to compute the eigenvectors of AAT by multiplying the matrix A by the matrix ˜Q whose columns are the eigenvectors of ATA. So the matrix representation of the subsequent operations is written as
Q = A ˜Q
x1com = x11− Q(QTx11) x2com = x21− Q(QTx21)
w = x1com − x2com, (3.12)
where we choose the first sample of each class to obtain the common vector.
However, there is still a heavy computational cost if we translate these equations into programming codes without simplifying them. It is known that, a matrix multiplication BC needsn1× n2× n3multiplications when the matrix B isn1-by-n2and the matrix C is n2-by-n3. So, how many multiplications it will take if we do not change the computation way? For this purpose, we developed an efficient implementation to achieve the above objective (3.12). The following is the pseudo-code:
1 forj := 1 to r do
3.3 Multivariate Analysis using a Reformatory LDA-Based Method 37
7 end
8 qj := N ormalizeV ector(qj) 9 dot1 := Dot2V ectors(qj, x11) 10 dot2 := Dot2V ectors(qj, x21) 11 fori := 1 to N do
12 QQtX1(i) := QQtX1(i) + qj(i) ∗ dot1 13 QQtX2(i) := QQtX2(i) + qj(i) ∗ dot2
14 end
15 end
16 fori := 1 to N do
17 w(i) := (x11(i) − QQtX1(i)) + (x21(i) − QQtX2(i)) 18 end
In this code fragment, the scalarsr, N , M are the number of column vectors composed of ˜Q, the dimensionality of the sample space, and the number of all samples, respectively.
The vector qj represents the jth column vector of Q. The N -by-M matrix A and the M -by-r matrix ˜Q represent as the definitions before. And, the vector QQtXk, used to calculate the common vector xkcom, represents the projected sample of xk1 of the range space of Sw, for k = 1, 2. Furthermore, the function N ormalizeV ector() makes the input vector turning out a vector with the norm equal to 1 in the same direction. The function Dot2V ectors() returns the scalar product of the input vectors. Finally, the resulting vector w is the most discriminant projection vector.
3.4 Multiresolution Analysis using Wavelet Transform
In image processing, there are many methods and theories to be developed. Owing to convenience of analysis, it usually transforms domain of signals. In engineering applica-tion, the most popular method is Fourier transform. Fourier transform can transform the signals from spatial domain to frequency domain, but the information of spatial domain is lost after applying Fourier Transform on an image. In many applications, however, it needs to analyze both frequency and spatial information at the same time. To avoid the lack of spatial information, Haar, Goupillaud, Grossman, and Morlet proposed and improved wavelet transform [41].
Wavelet transform is one of multi-resolution analysis. Wavelet transform not only can transform an image from spatial domain to frequency domain, but also has information of both two domains. Similar to the Fourier transform, wavelet transform consists of ous wavelet transform (CWT), and discrete wavelet transform (DWT). However, continu-ous wavelet transform is limited by the redundancy and impracticality in image processing.
Therefore, the discrete wavelet transform is applied in this thesis.