2.2 Methods
2.2.7 Performance evaluation
This section introduces the methods used to evaluate the performance of the proposed brain extraction algorithm, including the data sets, performance criteria, and the approaches used for comparisons. The obtained accuracy evaluation results are further analyzed by two-sample t-test for performance comparison among the brain extraction methods. More-over, previous evaluation works can be found in [19, 70, 71].
Brain extraction algorithms for performance comparisons
The proposed method was compared with the Brain Surface Extractor (BSE) in Brain-Suite2 [35,55,72], Brain Extraction Tool (BET) version 2.1 [24,73], Hybrid Watershed Al-gorithm (HWA) version stable 3 [25], and Model-based Level Set (MLS) version 0.5 [26].
The programs of the compared methods used in our experiments were downloaded from their webpages. BET, BSE, HWA, and our method were implemented in C++ whereas MLS was programmed in Java. All extraction experiments were performed on an AMD
ʻ˴ʼ
ʻ˵ʼ
Figure 2.7: Brain extraction results of a T1-weighted image shown in (a) coronal and (b) sagittal views.
Opteron 240 processor running Linux, except BSE. Software of BSE is available only for Windows system, thus we evaluated its performance on another machine with an AMD XP 2400+ processor. Furthermore, we adopted the nearest neighbor sampling in our imple-mentation not only because of its efficiency but also its accuracy compared to the trilinear interpolation. This observation agrees with the findings in [24]. The reason could be that sampling methods other than the nearest neighbor somewhat blur images and the resulted weak boundaries may deteriorate the accuracy of brain extraction.
Image data sets with manual segmentation results
Two sets of T1-weighted head MR images as well as manual segmentation results were obtained from the Internet Brain Segmentation Repository (IBSR)1. In the experiments, we applied extraction algorithms to determine the brain volumes of these subjects and employed the manually segmented brain areas, including the ventricles, to evaluate the extraction accuracy.
The first IBSR data set comprises twenty MR volumes, each with around 60 coronal slices, matrix size 256 × 256, FOV 256 × 256 mm2, and slice thickness 3.1 mm. Obvious intensity inhomogeneity and other significant artifacts present in most of the MR images in this data set. Another challenge of this data set is that the neck and even shoulder areas are included. This may influence the extraction accuracy of BET and HWA methods, as the examples shown in Figs. 2.8a and 2.8c (HWA even failed to process eighteen of the twenty MR images), because the excess non-brain tissues severely bias the estimation of the required parameters. To fairly evaluate extraction performance, several inferior slices of the image volumes containing neck or shoulder area were manually removed beforehand.
In this way, BET and HWA achieved better segmentation results, as shown in Figs. 2.8b and 2.8d.
1IBSR was developed by the Center for Morphometric Analysis at Massachusetts General Hospital and is available at http://www.cma.mgh.harvard.edu/ibsr .
The second IBSR data set contains eighteen MR images, each with around 128 coronal slices, matrix size 256 × 256, FOV 240 × 240 mm2, and slice thickness 1.5 mm. All images were transformed to radiological convention beforehand based on the orientation information obtained from IBSR. These images have superior quality in contrast to those in the first data set. According to the document of IBSR, each image in this data set has been roughly registered to the Talairach space and the intensity inhomogeneity has been corrected using the software developed by the Center for Morphometric Analysis at the Massachusetts General Hospital.
Criteria for extraction accuracy assessment
Several criteria are utilized to measure the extraction accuracy, including the Jaccard similarity coefficient (JSC), the sensitivity and specificity coefficients, and the risk evalua-tion of the segmentaevalua-tion results. The JSC, also known as the Jaccard index, is an extensively adopted measurement which evaluates the similarity between the extracted brain region 𝐵 and the corresponding ground truth 𝐴:
𝐽𝑆𝐶(𝐴 , 𝐵) = ∣𝐴 ∩ 𝐵∣
∣𝐴 ∪ 𝐵∣ , (2.14)
where ∣ ⋅ ∣ denotes the cardinality value. The value of JSC is within [0, 1] and a larger JSC value means a better overlap with the ground truth.
Brain extraction is usually a compromise between the high recognizing percentage for brain voxels (that is, high sensitivity) and the high rejecting percentage for non-brain voxels (that is, high specificity). Therefore, the coefficients of sensitivity 𝑆eand specificity 𝑆pcan be used to characterize brain extraction algorithms:
𝑆e= TP
TP + FN , (2.15)
𝑆p = TN
TN + FP . (2.16)
The true positive rate, TP, and false positive rate, FP, are the number of voxels correctly and incorrectly classified as brain tissues, respectively. The true negative rate, TN, and false negative rate, FN, are the number of voxels correctly and incorrectly classified as non-brain tissues, respectively.
In some applications, it is more important to avoid missing brain tissues than to reject all non-brain regions. From this point of view, S´egonne et al. proposed an error function 𝐸 to measure the extraction risk [25]:
𝐸(𝑐) = 𝑝f + 𝑐𝑝m
1 + 𝑐 , (2.17)
where 𝑐 is the risk ratio between the probabilities of missed detection for brain tissues, 𝑝m, and false alarm, 𝑝f. These two probabilities are calculated as
𝑝m = ∣𝐴 − 𝐵∣
∣𝐴 ∪ 𝐵∣ , (2.18)
𝑝f = ∣𝐵 − 𝐴∣
∣𝐴 ∪ 𝐵∣ , (2.19)
where 𝐵 is the extracted brain region, 𝐴 is the corresponding ground truth, and ∣ ⋅ ∣ denotes the cardinality value.
Parameters of brain extraction algorithms
The parameters of the compared methods were determined to achieve the best average JSC value for each data set. In other words, there were two sets of parameter values for each method and each set is for one data set. For the first (second) IBSR image set, the smoothness weighting of MLS was chosen as 0.05 (0.1); the fractional intensity threshold of BET was set to be 0.6 (0.7); the parameters of HWA were set to the default values (default values with surface-shrink option turned on); the parameter 𝑘 of the proposed method was set to be 0.15 (0.15); and the edge constant, diffusion iteration, and diffusion constant of BSE were set to be 3 (3), 1 (1), and 0.70 (0.66), respectively.