Experimental Results Analysis - 應用在文件分類的領域空間權重機制

Chapter 6: Experiments

6.2 Experimental Results Analysis

We experiment our classifier in terms of micro- and macro-averaging F1

evaluation functions with four aspects:

(1) the classification accuracy of our classifier construction algorithm comparing to the algorithms shown in [7];

(2) the influence of the training document threshold φ and the discrimination threshold δ on the classification accuracy;

(3) the influence of the number of tuning documents on the classification accuracy;

(4) the time performance of our classifier construction algorithm comparing to a batch-based mining approach.

In [7], Debole and Sebastiani utilized six supervised term weighting functions, chi-square, information gain and gain ratio respectively globally or locally (i.e. χ²(g), IG(g), GR(g), χ²(l), IG(l), GR(l)), across three classifier construction algorithms, Rocchio, k-NN, and SVM, to compare the average classification accuracy respectively for Reuters-21578(10), Reuters-21578(90) and Reuters-21578(115). As shown in Table 6, we can find a classifier with GR(g) has the best classification accuracy. As for our classifier construction algorithm, if the discrimination threshold δ is set at 0.5 for Reuters-21578(10) and 0.04 for Reuters-21578(90) and Reuters-21578(115), and if the number of tuning documents is set at 0, Table 7 shows the classification accuracy of our classifier along with different training document threshold φ. The training document threshold φ is to determine whether a category in the training document is available. If the number of training documents in a category is less than the specified φ, the category is omitted in the training algorithm. For example, only 39 categories of Reuters-21578(90) satisfying φ = 25 are used in the training algorithm. From Tables 6 and 7, the classification accuracy of our classifier construction algorithm for Reuters-21578(10) is always better than that in [7], whereas the results of Reuters-21578(90) and Reuters-21578(115) are worse if φ is less than 15. We may conclude the classification accuracy of the classifier constructed by the domain-space

weighting scheme rather depends on the number of training documents and it will be getting better with enough training documents.

Table 6: Micro- and Macro-averaging F1 Shown in [7]

χ2 (g) IG(g) GR(g) χ2 (l) IG(l) GR(l)

Reuters-21578(10) 0.852 0.843 0.857 0.810 0.816 0.816 Reuters-21578(90) 0.795 0.750 0.803 0.758 0.767 0.767 Micro

Reuters-21578(115) 0.793 0.747 0.800 0.756 0.765 0.765 Reuters-21578(10) 0.725 0.707 0.739 0.674 0.684 0.684 Reuters-21578(90) 0.542 0.377 0.589 0.527 0.559 0.559 Macro

F1 Reuters-21578(115) 0.596 0.458 0.629 0.581 0.608 0.608

Table 7: Micro- and Macro-averaging F1 Values Respectively at φ =1, φ =15 and φ

=25 for Reuters-21578(10), Reuters-21578(90) and Reuters-21578(115)

φ=1 φ =15 φ=25

Reuters-21578(10) 0.903 0.903 0.903 Reuters-21578(90) 0.751 0.784 0.815 Micro

Reuters-21578(115) 0.737 0.784 0.815 Reuters-21578(10) 0.824 0.824 0.824 Reuters-21578(90) 0.490 0.569 0.660 Macro

F1 Reuters-21578(115) 0.616 0.569 0.660

The detail of the influence of the training document threshold φ and the discrimination threshold δ on the classification accuracy for Reuters-21578(10), Reuters-21578(90) and Reuters-21578(115) are respectively shown in Tables 8 to 12.

As mentioned in the discrimination algorithm, the scale of δ is decided according to the number of categories. Thus, the scale range of δ in Table 8 is [1/10, 1], and the scale ranges of δ in Tables 9, 10 and in Tables 11, 12 are [1/90, 1] and [1/115, 1], respectively. Since each category in Reuters-21578(10) contains more than 50 training documents, the influence of the training document threshold is ignored in Table 8.

From Tables 9 to 12, we can find the influence of δ is not obvious even for Reuters-21578(10) of categories with the largest number of training documents. The

possible reason may be that the one-normalization of the discrimination algorithm has achieved the purpose of discrimination such that setting δ has less influence on the classification accuracy. Comparatively, setting φ has decisive influence on the classification accuracy. The larger number of training document is, the better classification accuracy will be.

Table 13 shows numbers of the remaining categories at different φ for Reuters-21578(10), Reuters-21578(90) and Reuters-21578(115). As φ is more than 15, Reuters-21578(90) and Reuters-21578(115) consider the same categories in the training algorithm.

Table 8: Micro-and Macro-averaging F1 Values at Different δ for Reuters-21578(10)

δ Micro-averaging F1 Macro-averaging F1

0.9 0.902511370 0.814721475 0.8 0.901324896 0.813716994 0.7 0.903302353 0.820149529 0.6 0.903302353 0.819969831 0.5 0.902906862 0.823657403 0.4 0.898951948 0.815825122 0.3 0.901324896 0.817534791 0.2 0.895788017 0.804622957 0.1 0.898160965 0.806951786

Table 9: Micro-averaging F1 Values at Different δ and φ for Reuters-21578(90)

δ 1 5 15 25 35 45

0.1 0.74739 0.75360 0.78372 0.81300 0.82566 0.84547 0.08 0.74827 0.75389 0.78403 0.81269 0.82631 0.84447 0.06 0.75033 0.75478 0.78464 0.81458 0.82695 0.84681 0.04 0.75063 0.75300 0.78433 0.81521 0.82824 0.84681 0.02 0.74974 0.75271 0.78555 0.81553 0.8289 0.84681 0.01 0.74974 0.75330 0.78555 0.81584 0.8289 0.84681

Table 10: Macro-averaging F1 Values at Different δ and φ for Reuters-21578(90)

δ 1 5 15 25 35 45

0.1 0.46830 0.52281 0.56963 0.66335 0.67258 0.71811 0.08 0.48881 0.54619 0.57344 0.65812 0.67529 0.71390 0.06 0.48748 0.53001 0.57152 0.66360 0.67395 0.71542 0.04 0.48997 0.52214 0.56868 0.65998 0.67747 0.71738 0.02 0.48467 0.51960 0.57205 0.66281 0.67783 0.71738 0.01 0.48922 0.52176 0.57205 0.66312 0.67783 0.71738

Table 11: Micro-averaging F1 Values at Different δ and φ for Reuters-21578(115)

δ 1 5 15 25 35 45

0.1 0.73593 0.74885 0.78372 0.81300 0.82566 0.71811 0.08 0.73505 0.74915 0.78403 0.81269 0.82631 0.71390 0.06 0.73711 0.7506 0.78464 0.81458 0.82695 0.71542 0.04 0.73681 0.74944 0.78433 0.81521 0.82824 0.71738 0.02 0.73652 0.74855 0.78555 0.81553 0.8289 0.71738 0.01 0.73711 0.74915 0.78555 0.81553 0.8289 0.71738

Table 12: Macro-averaging F1 values at different δ and φ for Reuters-21578(115)

δ 1 5 15 25 35 45

0.1 0.62378 0.53231 0.56963 0.66335 0.67258 0.71811 0.08 0.60384 0.55127 0.57344 0.65812 0.67529 0.71390 0.06 0.60474 0.53598 0.57152 0.66360 0.67395 0.71542 0.04 0.61597 0.53130 0.56868 0.65998 0.67747 0.71738 0.02 0.61526 0.53057 0.55990 0.66312 0.67783 0.71738 0.01 0.61526 0.52903 0.57205 0.66281 0.67783 0.71738

Table 13: Numbers of the Remaining Categories at Different φ

φ =1 φ =5 φ =15 φ =25 φ =35 φ =45

Reuters-21578(10) 10 10 10 10 10 10 Reuters-21578(90) 90 69 51 39 34 27 Reuters-21578(115) 115 70 51 39 34 27

After that, the influence of the number of tuning documents on the classification accuracy for Reuters-21578(10), Reuters-21578(90) and Reuters-21578(115) are respectively shown in Figures 8 to 10, where the tuning documents in our experiments are selected from the testing documents. The original testing documents are therefore divided into two parts for tuning and testing the constructed classifier, respectively.

The tuning parameter ζ observed from the experimental results is set at 0.000005 to have a stably increasing trend. A too small ζ may cause that the tuning adjustment is so tiny that the tuning effects is insignificant, whereas a too large ζ may cause that the tuning adjustment is unstable and oscillatory such that the tuning effects become unpredictable. From Figures 8 to 10, it is easily seen that the classification accuracy of the constructed classifier is getting better along with increasing numbers of tuning documents and will be convergent as the tuning documents are more than 700.

Tuning on Reuters-21578(10)

0.85 0.9 0.95

0 100 200 300 400 500 600 700 800 900 1000 Tuning documents

Figure 8: The Curve of Micro-averaging F1 Values along with Different Tuning Documents for Reuters-21578(10)

Reuters-21578(90)

0.7 0.75 0.8 0.85

0 100 200 300 400 500 600 700 800 900 1000

Tuning docs

micro F1

φ=1 φ=15 φ=25

Figure 9: The Curve of Micro-averaging F1 Values along with Different Tuning Documents and at φ =1, φ =15 and φ =25 for Reuters-21578(90)

Reuters-21578(115)

0.7 0.75 0.8 0.85

0 100 200 300 400 500 600 700 800 900 1000

Tuning docs

micro F1

φ=1 φ=15 φ=25

Figure 10: The Curve of Micro-averaging F1 Values along with Different Tuning Documents at φ =1, φ =15 and φ =25 for Reuters-21578(115)

Tables 14 and 15 list the classification accuracy with different training document threshold on Reuters-21578(90) and Reuters-21578(115), respectively. We can further discover that by the tuning algorithm, the classifier with φ=15 have achieved the similar effect to the result of the classifier which is constructed with φ=25 but without the tuning algorithm.

Table 14: Micro-averaging F1 Results with Different Tuning Documents on Reuters-21578(90) with Different φ

Number of Tuning Documents Reuters-21578

(90) 0 200 400 600 800 1000 φ=0 0.749449 0.753614 0.757308 0.773390 0.788915 0.781840 φ=15 0.784027 0.789430 0.793457 0.805179 0.818692 0.815258 φ=25 0.816159 0.820750 0.826453 0.836256 0.845883 0.843235

Table 15: Micro-averaging F1 Results with Different Tuning Documents on Reuters-21578(115) with Different φ

Number of Tuning Documents Reuters-21578

(90) 0 200 400 600 800 1000 φ=0 0.736517 0.740588 0.747276 0.765221 0.781122 0.773716 φ=15 0.784027 0.789430 0.793457 0.805179 0.818692 0.815258 φ=25 0.816159 0.820750 0.826453 0.836256 0.845883 0.843235

Finally, we want to evaluate the efficiency of our classifier construction algorithm compared with a batch-based classifier construction approach. Regardless of the tuning algorithm, the computation time of our classifier construction algorithm includes three major portions in the i-th run: (1) the time of extracting and weighting features from a given category, denoted as ti1; (2) the time of integrating the training results into the feature-domain association weighting table, denoted as ti2; and (3) the time of diminishing the association weights for the features in the feature-domain association weighting table which have lower discriminating powers, denoted as ti3. Since ti1 > ti2 >> ti3, the total computation time can be simplified as O(ti1+ti2) in i-th run. However, if our classifier construction algorithm mimics a batch-based approach which needs to re-process all previous categories so far to reconstruct the classifier in each run, the total computation time will be O(

∑

ⁱ_j=₁(t_j₁+t_j₂)) in the i-th run. Figure

11 shows the computation times spent by our classifier construction algorithm respectively in batch and in incremental for Reuters-21578(10) along with increasing number of involved categories. It is easily seen that, the computation time of the batch-based classifier is increasing along with the increasing number of involved categories whereas the computation time of the incremental-based classifier is almost the same along with increasing number of involved categories. Since the previous discovering information is all retained in the feature-domain association weighting table, the classification accuracy of the incremental-based classifier is the same as the batch-based classifier.

Efficiency on Reuters-21578(10)

0 200000 400000 600000 800000

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10