• 沒有找到結果。

Clause or Sentence Boundary

在文檔中 漢語問句偵測之量化研究 (頁 87-102)

7.1 Discussion

7.1.3 Clause or Sentence Boundary

The syntax of Mandarin is very flexible compared to Indo-European languages.

Such flexibility, however, increases the search space for parsing. As stated in Section 1.3, there is no syntactically decisive and reliable marker and word order in Mandarin question sentences. In real setting, therefore, given a series of clauses, a question-detection program must be able to identify the beginning and the end of every sentence. Otherwise it may be confused about complex sentences or serial clauses, and has trouble dealing with alternative questions spanning over several clauses.

7.2 Summary

In this study we have pointed out the problem of detecting Mandarin question sentences and reviewed relevant linguistic literature. Then we have outlined our strategy to approach this problem: to increase recall first and then to increase precision. We have presented our statistical approaches and procedure, and dis-cussed our findings. The lack of appropriate machine-readable dictionaries and electronic resources limits our pursuit of several subtle issues.

This is a new topic in NLP community as far as we know. Our contributions are twofold. In the linguistic field, we re-examine relevant topics from a new

quantitative point of view, and discover more comprehensive and precise features.

In the NLP field, we demonstrate several techniques that is useful for this problem, and achieve good recall and precision in the preliminary study.

APPENDIX A

LIST OF QUESTION-RELATED WORDS

This appendix details the top 300 results generated by the procedure discussed in Section 5.1. In the beginning, the four counters a, b, c, and d are accumulated in the following way: for every wi ∈ {all words in the corpus},

Is wi in the clause?

Clauses Yes No

Ends with ‘?’ a b

Ends without ‘?’ c d

Next, compute a series of intermediate values:

n = a + b + c + d ma = (a + b)(a + c)

mb = (a + b)(b + d) mc = (a + c)(c + d) md = (b + d)(c + d)

Finally, calculate LLR statistic, χ2 statistic, precision, and recall of wi in the following way:

LLR statistic = 2 ×

d

X

j=a

j lnn × j mj

χ2 statistic = n(ad − bc)2 mamd Frequency = a + c

Precision = a/(a + c) Recall = a/(a + b)

Table 13: List of top 300 question-related words (QRWs)

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

1 1 ý(T) 17,956.52 88,941.79 2,507 17,721 48 729,611 12.39 98.12 2 2 á(T) 17,798.44 72,731.19 3,398 16,830 2,169 727,490 16.80 61.04 3 3 Bó(Nep) 11,223.98 37,515.11 2,708 17,520 3,855 725,804 13.39 41.26 4 4 ÑBó(D) 6,163.42 26,622.68 1,149 19,079 593 729,066 5.68 65.96 5 6 5(Nh) 5,464.47 11,060.08 2,615 17,613 13,946 715,713 12.93 15.79

6 5 5ó(D) 4,776.90 18,747.55 998 19,230 835 728,824 4.93 54.45

7 13 .(D) 3,320.25 4,982.97 3,191 17,037 34,695 694,964 15.78 8.42

8 7 Õ(Nh) 2,685.79 9,161.25 661 19,567 931 728,728 3.27 41.52

9 10 àS(D) 2,548.63 7,369.35 761 19,467 1,735 727,924 3.76 30.49

10 8 ƒ(D) 1,998.58 8,221.60 400 19,828 276 729,383 1.98 59.17

11 14 u´(D) 1,653.17 4,540.90 530 19,698 1,393 728,266 2.62 27.56

12 9 5óŸ(VH) 1,549.34 7,372.23 257 19,971 62 729,597 1.27 80.56

13 12 5óš(VH) 1,454.17 5,464.69 323 19,905 328 729,331 1.60 49.62 14 33 u(SHI) 1,342.47 1,605.50 3,940 16,288 77,339 652,320 19.48 4.85

15 11 Ø−(D) 1,265.77 5,861.16 219 20,009 71 729,588 1.08 75.52

16 15 ¨(Nep) 1,226.55 4,376.40 289 19,939 354 729,305 1.43 44.95 17 16 S(Nes) 1,154.87 4,307.55 259 19,969 271 729,388 1.28 48.87 18 17 ¨³(Ncd) 1,033.09 4,086.19 217 20,011 180 729,479 1.07 54.66

19 18 ³(D) 973.16 3,959.01 198 20,030 145 729,514 0.98 57.73

20 19 ˝§(D) 944.68 3,554.16 210 20,018 213 729,446 1.04 49.65

21 23 5b(Nh) 915.39 2,173.73 362 19,866 1,381 728,278 1.79 20.77

22 28 ß(T) 851.01 1,919.64 365 19,863 1,579 728,080 1.80 18.78

23 59 í(DE) 813.92 772.95 4,999 15,229 248,739 480,920 24.71 1.97

24 20 ÑS(D) 807.60 3,003.16 182 20,046 193 729,466 0.90 48.53

25 40 ø−(VK) 772.87 1,404.77 501 19,727 3,635 726,024 2.48 12.11

26 21 àS(VH) 772.15 2,857.26 175 20,053 189 729,470 0.87 48.08

27 49 }(D) 741.07 1,057.46 990 19,238 12,904 716,755 4.89 7.13

28 22 }.}(D) 711.35 2,774.36 152 20,076 134 729,525 0.75 53.15

29 29 Öý(Neqa) 709.86 1,844.61 247 19,981 754 728,905 1.22 24.68

30 48 ¢(D) 702.45 1,097.02 684 19,544 7,154 722,505 3.38 8.73

31 50 ´(D) 691.14 1,036.92 769 19,459 8,872 720,787 3.80 7.98

32 37 g(Nh) 658.02 1,558.61 262 19,966 1,008 728,651 1.30 20.63

33 24 ´u(Caa) 652.50 2,169.88 167 20,061 258 729,401 0.83 39.29

34 31 š(Nf) 613.09 1,808.57 181 20,047 396 729,263 0.89 31.37

35 38 v(D) 609.59 1,471.49 236 19,992 867 728,792 1.17 21.40

36 47 ú(VH) 581.03 1,107.65 341 19,887 2,214 727,445 1.69 13.35

37 43 ](Nh) 580.09 1,223.44 280 19,948 1,428 728,231 1.38 16.39

38 61 b(D) 564.10 761.19 979 19,249 14,751 714,908 4.84 6.22

39 25 5ó(VH) 517.82 2,169.03 102 20,126 65 729,594 0.50 61.08

40 76 Ê(P) 482.15 402.42 769 19,459 55,146 674,513 3.80 1.38

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

41 57 ô(T) 468.43 804.87 351 19,877 2,927 726,732 1.74 10.71

42 27 S.(D) 460.89 2,076.27 83 20,145 34 729,625 0.41 70.94

43 65 µ(Nep) 447.72 620.46 695 19,533 9,830 719,829 3.44 6.60

44 26 ó(T) 435.34 2,128.01 69 20,159 11 729,648 0.34 86.25

45 68 (V 2) 435.24 509.48 1,949 18,279 42,564 687,095 9.64 4.38

46 34 ~½(VE) 426.64 1,590.91 96 20,132 101 729,558 0.47 48.73

47 58 Ú(T) 407.77 779.29 239 19,989 1,546 728,113 1.18 13.39

48 35 5š(VH) 404.89 1,589.15 86 20,142 74 729,585 0.43 53.75

49 32 5?(D) 380.65 1,744.45 67 20,161 24 729,635 0.33 73.63

50 69 ?(D) 369.10 495.63 672 19,556 10,323 719,336 3.32 6.11

51 30 ß.ß(VH) 363.41 1,809.61 55 20,173 5 729,654 0.27 91.67

52 52 v(T) 359.12 940.58 124 20,104 372 729,287 0.61 25.00

53 42 5š(D) 355.61 1,257.22 85 20,143 108 729,551 0.42 44.04

54 41 Ýó(Nep) 354.78 1,292.21 82 20,146 94 729,565 0.41 46.59

55 36 ³(T) 351.10 1,567.45 64 20,164 28 729,631 0.32 69.57

56 56 µ³(Ncd) 341.46 848.60 127 20,101 435 729,224 0.63 22.60

57 67 µ(Dk) 325.11 584.29 219 20,009 1,643 728,016 1.08 11.76

58 63 öí(D) 320.43 638.18 173 20,055 1,015 728,644 0.86 14.56

59 39 SÊ(VH) 316.73 1,409.06 58 20,170 26 729,633 0.29 69.05

60 73 ß(VH) 316.41 435.28 515 19,713 7,452 722,207 2.55 6.46

61 45 ¨<(Neqa) 312.59 1,186.21 69 20,159 68 729,591 0.34 50.36

62 112 (Cab) 311.49 187.41 18 20,210 7,949 721,710 0.09 0.23

63 105 J(P) 299.55 210.48 85 20,143 12,914 716,745 0.42 0.65

64 106 £(Caa) 297.46 210.39 90 20,138 13,200 716,459 0.44 0.68

65 44 áBó(VA) 286.22 1,188.63 57 20,171 38 729,621 0.28 60.00

66 113 v(Ng) 282.54 185.67 42 20,186 9,429 720,230 0.21 0.44

67 108 5(DE) 274.44 202.31 125 20,103 14,862 714,797 0.62 0.83

68 55 ?.?(D) 268.65 860.27 72 20,156 125 729,534 0.36 36.55

69 46 ¨o(Ncd) 263.49 1,120.95 51 20,177 30 729,629 0.25 62.96

70 51 b.b(D) 261.41 944.40 61 20,167 72 729,587 0.30 45.86

71 84 d(VC) 245.85 336.88 411 19,817 6,022 723,637 2.03 6.39

72 131 †(D) 238.51 145.97 17 20,211 6,402 723,257 0.08 0.26

73 129 ((Ng) 221.13 146.49 36 20,192 7,631 722,028 0.18 0.47

74 53 ª.ªJ(D) 201.66 914.99 36 20,192 14 729,645 0.18 72.00

75 119 6(D) 200.22 169.42 440 19,788 29,024 700,635 2.18 1.49

76 141 1(Cbb) 198.04 126.80 23 20,205 6,100 723,559 0.11 0.38

77 137 k(P) 196.81 132.61 38 20,190 7,244 722,415 0.19 0.52

78 54 S‚(VG) 191.01 868.56 34 20,194 13 729,646 0.17 72.34

79 87 Ê(VCL) 190.76 312.87 164 20,064 1,538 728,121 0.81 9.64

80 79 ô(I) 189.30 369.00 107 20,121 662 728,997 0.53 13.91

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

81 142 ˛(D) 185.03 126.10 40 20,188 7,124 722,535 0.20 0.56

82 85 µó(D) 180.25 322.75 123 20,105 933 728,726 0.61 11.65

83 140 ®(Nes) 179.32 127.59 59 20,169 8,294 721,365 0.29 0.71

84 86 ¥ó(D) 178.58 317.47 124 20,104 957 728,702 0.61 11.47

85 124 ø(Neu) 176.64 159.12 1,033 19,195 54,432 675,227 5.11 1.86 86 100 ¥š(VH) 176.59 246.22 274 19,954 3,849 725,810 1.35 6.65

87 60 í.(D) 175.33 763.41 33 20,195 17 729,642 0.16 66.00

88 152 [ý(VE) 173.08 111.24 21 20,207 5,408 724,251 0.10 0.39 89 157 Ĥ(Cbb) 170.85 107.10 16 20,212 4,937 724,722 0.08 0.32

90 77 ö(VH) 168.49 386.36 71 20,157 297 729,362 0.35 19.29

91 92 µó(Dk) 163.81 289.96 115 20,113 897 728,762 0.57 11.36

92 62 5}(D) 159.10 729.59 28 20,200 10 729,649 0.14 73.68

93 117 ¥(Nep) 159.04 178.76 1,279 18,949 31,847 697,812 6.32 3.86 94 158 FJ(Cbb) 153.70 106.97 41 20,187 6,469 723,190 0.20 0.63

95 154 â(P) 153.69 109.61 52 20,176 7,207 722,452 0.26 0.72

96 97 µ<(Neqa) 153.08 262.12 117 20,111 989 728,670 0.58 10.58 97 153 ÄÑ(Cbb) 152.88 110.11 57 20,171 7,511 722,148 0.28 0.75 98 147 (Nf) 151.01 114.55 95 20,133 9,769 719,890 0.47 0.96 99 104 @v(D) 147.23 211.50 201 20,027 2,620 727,039 0.99 7.13 100 110 9(Na) 146.80 200.69 251 19,977 3,712 725,947 1.24 6.33

101 168 1(D) 143.90 90.49 14 20,214 4,206 725,453 0.07 0.33

102 102 ø(VK) 140.29 222.40 134 20,094 1,367 728,292 0.66 8.93

103 64 í.u(D) 139.34 629.96 25 20,203 10 729,649 0.12 71.43

104 70 ST(Nc) 139.16 459.57 36 20,192 57 729,602 0.18 38.71

105 66 S(D) 138.96 608.06 26 20,202 13 729,646 0.13 66.67

106 116  (D) 138.07 179.01 316 19,912 5,365 724,294 1.56 5.56 107 155 2(Ng) 136.60 108.49 142 20,086 11,944 717,715 0.70 1.17 108 163 ©(Nes) 132.31 96.65 57 20,171 6,989 722,670 0.28 0.81 109 160 D(Caa) 130.66 105.40 163 20,065 12,853 716,806 0.81 1.25

110 96 (Nf) 128.82 264.12 66 20,162 362 729,297 0.33 15.42

111 133 B(Nh) 128.06 141.17 1,400 18,828 36,897 692,762 6.92 3.66

112 71 S.(D) 127.86 446.84 31 20,197 41 729,618 0.15 43.06

113 169 ø(P) 127.86 88.64 33 20,195 5,308 724,351 0.16 0.62

114 151 ÿ(D) 127.59 111.82 494 19,734 28,406 701,253 2.44 1.71 115 114  (VCL) 124.83 185.24 150 20,078 1,803 727,856 0.74 7.68

116 75 SJ(D) 122.34 423.30 30 20,198 41 729,618 0.15 42.25

117 103 b(VC) 122.20 216.21 86 20,142 672 728,987 0.43 11.35

118 186 …(Nes) 120.86 74.83 10 20,218 3,378 726,281 0.05 0.30 119 101 Ö(Dfa) 119.19 226.77 71 20,157 466 729,193 0.35 13.22 120 134 í(T) 118.32 140.91 514 19,714 10,962 718,697 2.54 4.48

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

121 189 v(Nes) 116.85 72.57 10 20,218 3,296 726,363 0.05 0.30

122 109 Ê(VG) 113.23 201.07 79 20,149 612 729,047 0.39 11.43

123 130 ¬(Di) 110.67 146.29 227 20,001 3,667 725,992 1.12 5.83 124 170 O(Cbb) 109.06 87.16 123 20,105 10,055 719,604 0.61 1.21

125 180 Ë(DE) 108.01 80.17 55 20,173 6,229 723,430 0.27 0.88

126 174 ø(D) 107.25 82.68 81 20,147 7,720 721,939 0.40 1.04

127 74 á(VA) 106.49 428.71 22 20,206 17 729,642 0.11 56.41

128 126 ³(D) 104.06 150.14 140 20,088 1,807 727,852 0.69 7.19 129 194 rÖ(Neqa) 103.59 68.11 16 20,212 3,516 726,143 0.08 0.45 130 127 Â(Na) 102.34 149.77 130 20,098 1,619 728,040 0.64 7.43

131 182 s(Neu) 99.45 78.59 99 20,129 8,472 721,187 0.49 1.16

132 72 SK(D) 96.78 444.44 17 20,211 6 729,653 0.08 73.91

133 80 Sþ(D) 96.04 367.79 21 20,207 20 729,639 0.10 51.22

134 183 O(Di) 94.75 77.58 144 20,084 10,651 719,008 0.71 1.33

135 139 Ê(D) 92.60 127.98 151 20,077 2,175 727,484 0.75 6.49

136 190 ú(Neu) 92.56 71.42 71 20,157 6,727 722,932 0.35 1.04

137 192 à‹(Cbb) 92.43 68.42 46 20,182 5,268 724,391 0.23 0.87 138 149 ;(VE) 91.89 114.06 285 19,943 5,444 724,215 1.41 4.97

139 89 Ö˝(Nd) 91.84 300.78 24 20,204 39 729,620 0.12 38.10

140 187 Ñ(VG) 91.65 72.80 97 20,131 8,118 721,541 0.48 1.18

141 200 r(Neqa) 90.40 65.07 34 20,194 4,467 725,192 0.17 0.76

142 181 ·(D) 90.26 78.88 345 19,883 19,938 709,721 1.71 1.70

143 120 (VG) 89.45 166.40 56 20,172 388 729,271 0.28 12.61

144 214 à(P) 87.85 58.33 15 20,213 3,093 726,566 0.07 0.48

145 83 ÑÝó(D) 86.98 349.66 18 20,210 14 729,645 0.09 56.25

146 91 µ³(D) 86.76 291.60 22 20,206 33 729,626 0.11 40.00

147 202 2(Ncd) 86.68 63.68 40 20,188 4,749 724,910 0.20 0.84

148 148 V(VA) 86.37 114.49 175 20,053 2,809 726,850 0.87 5.86

149 215 û˝(Na) 85.87 58.22 18 20,210 3,272 726,387 0.09 0.55

150 78 ø.ø−(VK) 85.51 374.18 16 20,212 8 729,651 0.08 66.67

151 198 w(Nep) 84.82 65.96 71 20,157 6,497 723,162 0.35 1.08

152 191 ,(Ncd) 84.82 70.07 145 20,083 10,343 719,316 0.72 1.38

153 196 Ÿ(Nf) 84.51 66.23 77 20,151 6,817 722,842 0.38 1.12

154 225 °v(Nd) 83.67 53.61 10 20,218 2,606 727,053 0.05 0.38

155 165 Bb(Nh) 82.59 93.03 688 19,540 17,169 712,490 3.40 3.85

156 227 „(D) 82.35 52.85 10 20,218 2,578 727,081 0.05 0.39

157 115 V(P) 81.99 180.53 37 20,191 171 729,488 0.18 17.79

158 82 á(D) 81.26 359.44 15 20,213 7 729,652 0.07 68.18

159 222 E(D) 79.58 54.05 17 20,211 3,056 726,603 0.08 0.55

160 221 á(Nf) 78.75 54.44 20 20,208 3,250 726,409 0.10 0.61

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

161 93 í(D) 78.60 288.92 18 20,210 20 729,639 0.09 47.37

162 231 .¬(Cbb) 78.23 50.93 11 20,217 2,571 727,088 0.05 0.43

163 94 ª´(D) 77.39 280.62 18 20,210 21 729,638 0.09 46.15

164 223 q(Ncd) 77.25 53.78 21 20,207 3,284 726,375 0.10 0.64 165 162 (P) 77.07 98.92 190 20,038 3,322 726,337 0.94 5.41 166 205 ¤(Nep) 77.01 60.70 75 20,153 6,476 723,183 0.37 1.14 167 228 ‡(Ng) 75.53 51.26 16 20,212 2,891 726,768 0.08 0.55 168 88 ¨(Ncd) 75.29 312.77 15 20,213 10 729,649 0.07 60.00 169 208 6(Na) 74.97 59.81 84 20,144 6,891 722,768 0.42 1.20

170 95 5(D) 74.09 271.92 17 20,211 19 729,640 0.08 47.22

171 136 à¤(Dfa) 73.71 136.03 47 20,181 332 729,327 0.23 12.40 172 206 ,(Ng) 73.34 60.11 114 20,114 8,379 721,280 0.56 1.34

173 81 â(VH) 72.66 361.90 11 20,217 1 729,658 0.05 91.67

174 243 Ý(Dfa) 72.22 48.56 14 20,214 2,669 726,990 0.07 0.52 175 232 Í$(Na) 71.25 50.90 25 20,203 3,406 726,253 0.12 0.73 176 164 ½(VE) 70.86 95.99 128 20,100 1,943 727,716 0.63 6.18 177 235 º(Na) 70.50 50.21 24 20,204 3,322 726,337 0.12 0.72 178 146 <2(Na) 70.32 117.23 58 20,170 524 729,135 0.29 9.97 179 125 )(VK) 70.03 150.39 33 20,195 162 729,497 0.16 16.92 180 135 ¥ó(Dfa) 69.47 136.05 39 20,189 239 729,420 0.19 14.03

181 99 |(Nep) 68.40 246.76 16 20,212 19 729,640 0.08 45.71 182 210 ¸(Caa) 68.24 58.86 216 20,012 13,052 716,607 1.07 1.63 183 209 7(Cbb) 66.89 59.21 325 19,903 17,883 711,776 1.61 1.78 184 226 É(Da) 66.81 53.58 80 20,148 6,413 723,246 0.40 1.23

185 90 í?(D) 66.39 298.54 12 20,216 5 729,654 0.06 70.59

186 229 t−(Nc) 65.63 51.18 57 20,171 5,144 724,515 0.28 1.10 187 253 7/(Cbb) 62.32 43.12 16 20,212 2,585 727,074 0.08 0.62 188 128 Sv(Nd) 62.23 147.20 25 20,203 97 729,562 0.12 20.49 189 144 à(Na) 61.92 124.44 33 20,195 190 729,469 0.16 14.80 190 257 Aº(Na) 61.87 41.94 13 20,215 2,361 727,298 0.06 0.55

191 143 ?´(D) 60.76 124.93 31 20,197 169 729,490 0.15 15.50 192 195 7(T) 60.49 67.46 595 19,633 15,308 714,351 2.94 3.74 193 251 -(Ng) 60.19 44.07 27 20,201 3,252 726,407 0.13 0.82 194 244 º(D) 59.78 48.48 82 20,146 6,279 723,380 0.41 1.29 195 238 F(D) 58.86 49.48 129 20,099 8,571 721,088 0.64 1.48 196 156 ^(VC) 58.84 108.00 38 20,190 272 729,387 0.19 12.26 197 123 ð(VA) 58.82 162.21 19 20,209 50 729,609 0.09 27.54 198 260 Z(D) 58.42 41.40 19 20,209 2,695 726,964 0.09 0.70

199 98 íÝ(D) 57.74 249.05 11 20,217 6 729,653 0.05 64.71

200 255 ø(D) 56.03 43.00 41 20,187 3,965 725,694 0.20 1.02

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

201 262 Ò(Nc) 55.57 40.35 23 20,205 2,882 726,777 0.11 0.79 202 274 ¨Ž(VK) 55.25 37.24 11 20,217 2,064 727,595 0.05 0.53 203 178 "(VK) 54.95 81.02 68 20,160 832 728,827 0.34 7.56 204 121 {C(Caa) 54.79 163.58 16 20,212 34 729,625 0.08 32.00 205 252 ñ‡(Nd) 54.76 43.31 56 20,172 4,752 724,907 0.28 1.16

206 259 U(VL) 52.60 41.61 54 20,174 4,576 725,083 0.27 1.17

207 171 šä(Na) 52.32 84.71 47 20,181 456 729,203 0.23 9.34

208 176 õ¶(Na) 51.94 81.26 52 20,176 549 729,110 0.26 8.65

209 276 5((Ng) 51.80 36.96 18 20,210 2,466 727,193 0.09 0.72

210 138 Ì(VJ) 51.03 131.88 18 20,210 56 729,603 0.09 24.32

211 270 ×ç(Nc) 50.94 38.29 30 20,198 3,185 726,474 0.15 0.93 212 249 '(Dfa) 50.65 44.54 223 20,005 12,532 717,127 1.10 1.75 213 269 ù(Neu) 50.51 38.34 33 20,195 3,348 726,311 0.16 0.98

214 107 áó(D) 49.61 204.04 10 20,218 7 729,652 0.05 58.82

215 111 S S*(VA) 49.59 187.55 11 20,217 11 729,648 0.05 50.00

216 277 .â(D) 48.70 36.88 31 20,197 3,181 726,478 0.15 0.97

217 172 ¨(T) 48.62 84.39 36 20,192 295 729,364 0.18 10.88

218 184 (D) 47.55 75.72 45 20,183 455 729,204 0.22 9.00

219 217 ÛÊ(Nd) 47.39 57.01 193 20,035 4,025 725,634 0.95 4.58 220 197 <2(Na) 47.38 66.16 74 20,154 1,040 728,619 0.37 6.64

221 292 D(Nd) 47.15 32.58 12 20,216 1,949 727,710 0.06 0.61 222 281 2-(Nc) 47.08 35.30 27 20,201 2,901 726,758 0.13 0.92

223 285 ±(Nf) 46.97 33.22 15 20,213 2,149 727,510 0.07 0.69

224 272 ü(VH) 46.81 37.89 63 20,165 4,857 724,802 0.31 1.28

225 204 v(VC) 46.69 61.57 98 20,130 1,597 728,062 0.48 5.78

226 132 Sj(Ncd) 46.48 143.78 13 20,215 25 729,634 0.06 34.21

227 122 ¨(D) 45.96 162.49 11 20,217 14 729,645 0.05 44.00

228 278 Uà(VC) 45.88 36.60 52 20,176 4,253 725,406 0.26 1.21

229 167 êk(VA) 45.84 91.07 25 20,203 148 729,511 0.12 14.45

230 185 ](VK) 45.64 75.17 39 20,189 363 729,296 0.19 9.70

231 161 ±(Na) 45.23 104.02 19 20,209 79 729,580 0.09 19.39

232 286 Åq(Nc) 45.09 33.12 21 20,207 2,485 727,174 0.10 0.84 233 118 (Neu) 45.08 170.50 10 20,218 10 729,649 0.05 50.00

234 289 Ì(VJ) 45.05 32.94 20 20,208 2,422 727,237 0.10 0.82

235 304 tÙ(Nd) 44.95 31.21 12 20,216 1,897 727,762 0.06 0.63 236 302 xX(Na) 44.36 31.56 15 20,213 2,085 727,574 0.07 0.71 237 216 ´u(D) 44.11 57.43 100 20,128 1,686 727,973 0.49 5.60 238 296 Ú7(Na) 43.32 32.09 22 20,206 2,499 727,160 0.11 0.87 239 287 ªW(VC) 43.31 32.95 29 20,199 2,912 726,747 0.14 0.99 240 282 T(Na) 43.21 34.96 58 20,170 4,476 725,183 0.29 1.28

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

241 299 ̶(D) 42.97 32.00 23 20,205 2,550 727,109 0.11 0.89

242 173 ˚Z(Na) 42.76 83.76 24 20,204 147 729,512 0.12 14.04

243 301 œ(Dfa) 42.72 31.69 22 20,206 2,483 727,176 0.11 0.88

244 177 µóÖ(Neqa) 42.41 81.16 25 20,203 162 729,497 0.12 13.37

245 305 ’e(Na) 42.35 31.15 20 20,208 2,351 727,308 0.10 0.84

246 218 ³(T) 42.31 57.00 79 20,149 1,218 728,441 0.39 6.09

247 211 ø-(Nd) 42.28 58.67 68 20,160 971 728,688 0.34 6.54

248 145 \(I) 42.22 120.84 13 20,215 31 729,628 0.06 29.55

249 166 õ&(VC) 42.03 92.47 19 20,209 88 729,571 0.09 17.76

250 220 (VC) 41.41 55.43 80 20,148 1,254 728,405 0.40 6.00

251 312 u(Nf) 41.35 30.17 18 20,210 2,201 727,458 0.09 0.81

252 316 wõ(D) 41.00 29.77 17 20,211 2,129 727,530 0.08 0.79

253 240 Ċ(Na) 40.28 48.99 149 20,079 3,015 726,644 0.74 4.71

254 310 TX(VD) 39.94 30.62 29 20,199 2,815 726,844 0.14 1.02

255 245 g)(VK) 39.89 47.14 193 20,035 4,228 725,431 0.95 4.37

256 175 “(VE) 39.77 82.42 20 20,208 107 729,552 0.10 15.75

257 329 €Ž(VC)[+nom] 39.60 27.86 12 20,216 1,769 727,890 0.06 0.67

258 322 ÅÒ(Nc) 39.41 28.72 17 20,211 2,088 727,571 0.08 0.81

259 150 ˛(T) 39.38 113.61 12 20,216 28 729,631 0.06 30.00

260 303 1Å(Nc) 39.07 31.26 46 20,182 3,713 725,946 0.23 1.22

261 320 ’m(Na) 38.96 29.03 21 20,207 2,321 727,338 0.10 0.90

262 337 V(Ng) 38.34 26.55 10 20,218 1,602 728,057 0.05 0.62

263 236 Bk(P) 37.96 50.15 79 20,149 1,282 728,377 0.39 5.80

264 321 êÛ(VE) 37.88 28.88 26 20,202 2,584 727,075 0.13 1.00

265 242 ‰a(Na) 37.84 48.64 93 20,135 1,622 728,037 0.46 5.42

266 295 D(P) 37.82 32.21 101 20,127 6,374 723,285 0.50 1.56

267 331 Ë–(Nc) 37.77 27.80 18 20,210 2,107 727,552 0.09 0.85

268 291 ª(D) 37.55 32.70 142 20,086 8,251 721,408 0.70 1.69

269 334 ð(Nf) 37.55 26.77 13 20,215 1,785 727,874 0.06 0.72

270 201 ªJ(VH) 36.68 63.82 27 20,201 220 729,439 0.13 10.93

271 159 5óz(VH) 36.54 106.45 11 20,217 25 729,634 0.05 30.56

272 347 ý(Neu) 36.20 25.70 12 20,216 1,686 727,973 0.06 0.71

273 207 ßT(Na) 36.05 59.93 30 20,198 273 729,386 0.15 9.90

274 343 Ö(Neqa) 35.54 26.01 16 20,212 1,925 727,734 0.08 0.82

275 203 }(VL) 35.36 62.51 25 20,203 196 729,463 0.12 11.31

276 317 çÞ(Na) 35.31 29.73 80 20,148 5,273 724,386 0.40 1.49

277 323 ²(P) 35.21 28.66 51 20,177 3,840 725,819 0.25 1.31

278 213 <(Dfb) 34.71 58.34 28 20,200 248 729,411 0.14 10.14

279 266 AÐ(Nh) 34.63 38.74 333 19,895 8,516 721,143 1.65 3.76

280 314 y(D) 34.55 29.87 117 20,111 6,971 722,688 0.58 1.65

Ranking QRW Statistic Count Recall Precision

LLR χ2 wi LLR χ2 a b c d (%) (%)

281 342 €Ž(VC) 33.80 26.04 26 20,202 2,465 727,194 0.13 1.04 282 338 %È(Na) 33.79 26.46 31 20,197 2,742 726,917 0.15 1.12 283 349 ù(Neu) 33.56 25.48 22 20,206 2,230 727,429 0.11 0.98 284 319 \(P) 33.53 29.08 119 20,109 7,015 722,644 0.59 1.67 285 250 AÞ(Na) 33.36 44.48 66 20,162 1,046 728,613 0.33 5.94 286 179 i{(VA) 33.36 80.53 13 20,215 48 729,611 0.06 21.31 287 328 Ù(Nf) 33.34 27.97 72 20,156 4,809 724,850 0.36 1.48 288 193 }}(VE) 33.22 68.21 17 20,211 93 729,566 0.08 15.45 289 368 -(Ncd) 32.94 23.61 12 20,216 1,605 728,054 0.06 0.74 290 361 c(Neqa) 32.92 24.28 16 20,212 1,856 727,803 0.08 0.85

291 354 Çá(VL) 32.85 25.19 24 20,204 2,324 727,335 0.12 1.02 292 219 à¤(Na) 32.35 56.11 24 20,204 197 729,462 0.12 10.86 293 230 H(VF) 32.28 51.16 31 20,197 317 729,342 0.15 8.91 294 372 J(Cbb) 32.11 23.43 14 20,214 1,711 727,948 0.07 0.81 295 358 J£(Caa) 32.10 24.93 27 20,201 2,470 727,189 0.13 1.08 296 359 úk(P) 32.06 24.91 27 20,201 2,469 727,190 0.13 1.08 297 352 æ˜(Na) 31.93 25.32 34 20,194 2,845 726,814 0.17 1.18 298 241 (VE) 31.74 48.70 34 20,194 377 729,282 0.17 8.27 299 224 µo(Ncd) 31.70 53.74 25 20,203 217 729,442 0.12 10.33 300 247 º(VH) 31.68 45.43 44 20,184 578 729,081 0.22 7.07

REFERENCES

[1] Agresti, A., Categorical Data Analysis. Hoboken, New Jersey: John Wiley

& Sons, second ed., 2002.

[2] ARPA Knowledge Sharing Initiative, Specification of the KQML Agent-Communication Language. ARPA Knowledge Sharing Initiative, Ex-ternal Interfaces Working Group, July 1993.

[3] Chang, L.-l., “Modal words in modern Mandarin,” Tech. Rep. 93-06, In-stitute of Information Science, Academia Sinica, Taipei, June 1993.

[4] Chen, F.-Y., Tsai, P.-F., Chen, K.-J., and Huang, C.-R., “Construc-tion of the Sinica treebank,” Computa“Construc-tional Linguistics and Chinese Lan-guage Processing, vol. 4, no. 2, pp. 87–104, 1999.

[5] Chen, K.-J., Luo, C.-C., Chang, M.-C., Chen, F.-Y., Chen, C.-J., Huang, C.-R., and Gao, Z.-M., “Sinica treebank: Design criteria, rep-resentational issues and implementation,” in Treebanks: Building and Using Parsed Corpora (Abeill´e, A., ed.), ch. 13, pp. 231–248, Kluwer Academic Publishers, 2003.

[6] Chen, S. F. and Goodman, J. T., “An empirical study of smoothing techniques for language modeling,” Computer Speech and Language, vol. 13, pp. 359–394, Oct. 1999.

[7] Cheng, R. L., “Chinese question forms and their meanings,” Journal of Chinese Linguistics, vol. 12, pp. 86–147, 1984. Also in Temporal and Spatial Relations, Questions and Negatives in Taiwanese and Mandarin, Yuan-Liou Publishing Co., Taipei, pp. 273–312, 1997.

[8] Cheng, R. L., “Mandarin and Taiwanese question sentences,” in Temporal and Spatial Relations, Questions and Negatives in Taiwanese and Mandarin, pp. 357–402, Taipei: Yuan-Liou Publishing Co., 1997.

[9] Chinese Knowledge Information Processing Group, “Analysis of Chinese word classes,” Tech. Rep. 93-05, Institute of Information Science, Academia Sinica, Taipei, 1993.

[10] Chinese Knowledge Information Processing Group, “A guide to the Academia Sinica balanced corpus of modern Chinese,” Tech. Rep. 95-02/98-04, Institute of Information Science, Academia Sinica, Taipei, 1998.

[11] Chu, C. C., A Concise Grammar of Mandarin Chinese. Taipei: Wu-Nan Book Inc, 1999.

[12] Chu, C. C. and Chi, T.-J., A Cognitive-Functional Grammar of Mandarin Chinese. Taipei: Crane Publishing Company, Inc, 1999.

[13] Chu, C.-N., Mandarin Lexicology. Taipei: Wu-Nan Book Inc, Oct. 1999.

[14] Churcher, G. E., Atwell, E. S., and Souter, C., “Dialogue manage-ment systems: A survey and overview,” Tech. Rep. 97.06, University of Leeds, UK, Feb. 1997.

[15] Defrancis, J., ed., ABC Chinese-English Dictionary. Honolulu: University of Hawaii Press, 1996.

[16] Dietz, J. L., “The constituents of business interaction–generic layered pat-terns,” Data & Knowledge Engineering, vol. 47, pp. 301–325, 2003.

[17] Dunning, T., “Accurate methods for the statistics of surprise and coinci-dence,” Computational Linguistics, vol. 19, no. 1, pp. 61–74, 1993.

[18] EZ Information, “The affiliated Mandarin phrase center of EZ Input.”

http://phrasecenter.freehosting.net/.

[19] Fan, X. and others, eds., Studies on Mandarin Sentence Types. Taiyuan, Shanxi, China: Shuhai Publishing House, 1998.

[20] Fellbaum, C., ed., WordNet: An Electronic Lexical Database. Cambridge, Mass: The MIT Press, 1998. The service can be accessed on-line at http:

//www.cogsci.princeton.edu/wn/.

[21] Friedl, J. E. F., Mastering Regular Expressions. CA: O’Reilly & Asso-ciates, second ed., 2002.

[22] Gan, K. W. and Tham, W. M., “General knowledge annotation based on How-net,” Computational Linguistics and Chinese Language Processing, vol. 4, no. 2, pp. 39–86, 1999.

[23] Gan, K. W. and Wong, P. W., “Annotation guidelines of Chinese message structures based on HowNet,” tech. rep., Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, Nov. 2000.

[24] Genesereth, M. R. and Fikes, R. E., “Knowledge Interchange Format version 3.0 reference manual,” Tech. Rep. Logic-92-1, Interlingua Working Group, DARPA Knowledge Sharing Effort, June 1992.

[25] Goodman, J. T., “A bit of progress in language modeling,” Computer Speech and Language, vol. 15, pp. 403–434, Oct. 2001.

[26] Hsiao, P.-H., Tsai, C.-H., Hsieh, T.-H., Yeh, P.-J., and Tan, K.-S.,

“The TaBE project.” http://xcin.linux.org.tw/libtabe/.

[27] Huang, C.-R., Chen, K.-J., Chen, F.-Y., and Chang, L.-L., “Seg-mentation standard for Chinese natural language processing,” Computational Linguistics and Chinese Language Processing, vol. 2, pp. 47–62, Aug. 1997.

[28] Jurafsky, D. and Martin, J. H., Speech and Language Processing. New Jersey: Prentice Hall, 2000.

[29] Labrou, Y. and Finin, T., “A proposal for a new KQML specification,”

Tech. Rep. CS-97-03, Computer Science and Electrical Engineering Depart-ment, University of Maryland Baltimore County, Baltimore, Maryland, Feb.

1997.

[30] Li, C. N. and Thompson, S. A., Mandarin Chinese: A Functional Refer-ence Grammar. New York: University of California Press, 1981.

[31] Lin, F.-W., “Some reflections on the thematic system of information-based case grammar (ICG),” Tech. Rep. 92-01, Institute of Information Science, Academia Sinica, Taipei, Aug. 1992.

[32] Lind, M. and Goldkuhl, G., “The atoms, molecules and fibers of organi-zations,” Data & Knowledge Engineering, vol. 47, pp. 327–348, 2003.

[33] Litkowski, K. C., “Use of metadata for question answering and novelty tasks,” in Proceedings of the 12th Text REtrieval Conference (TREC 2003), (Gaithersburg, Maryland), pp. 161–170, Nov. 2003.

[34] Liu, Y., Pan, W., and Gu, W., Modern Chinese Grammar. Beijing: For-eign Language Teaching and Research Press, 1983. Reprinted in traditional Chinese by Shih Ta Book Ltd, Taipei, 1996.

[35] Loos, E. E., Anderson, S., Dwight H., Jr., D., Jordan, P. C., and Wingate, J. D., “Glossary of linguistic terms,” in LinguaLinks Library, Version 5.0, SIL International, 2003. Its content can also be accessed on-line at http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/.

[36] Luo, Z. and others, eds., Unabridged Mandarin Dictionary. Shanghai: The Publishing House of the Unabridged Chinese Dictionary, 1993. Reprinted in traditional Chinese by Taiwan Doughua Bookstore, Taipei, 1997.

[37] Lyu, S., An Outline of Chinese Grammar. Beijing: The Commercial Press, 1982.

[38] Lyu, S. and others, eds., Eight Hundred Words of Modern Mandarin. Bei-jing: The Commercial Press, revised ed., 1999.

[39] Mandarin Promotion Council, A Manual of Punctuation Marks Re-vised. Taipei: Ministry of Education, 1987.

[40] Mandarin Promotion Council, Mandarin Dictionary Revised. Taipei:

Ministry of Education, fourth ed., Apr. 1998. The on-line version can be accessed at http://www.edu.tw/mandr/clc/dict/.

[41] Manning, C. D. and Sch¨utze, H., Foundations of Statistical Natural Lan-guage Processing. Cambridge, MA: MIT Press, Cambridge, MA, 1999.

[42] Mao, J.-j., Chen, Q.-l., and Lu, R.-z., “Formal representation and seman-tics of modern Chinese interrogative sentences,” Lecture Notes in Computer Science, vol. 2588, pp. 65–74, 2003.

[43] Meng, I.-H., Design and Study of Semantic Discovery Methods for Extract-ing Knowledge from Free Text Information. Ph.D. dissertation, National Chiao Tung University, Hsinchu, Taiwan, July 2003.

[44] Moldovan, D., Harabagiu, S., Girju, R., Morarescu, P., Lacatusu, F., Novischi, A., Badulescu, A., and Bolohan, O., “LCC tools for question answering,” in Proceedings of the 11th Text REtrieval Conference (TREC 2002), (Gaithersburg, Maryland), pp. 388–397, Nov. 2002.

[45] National Standard of the People’s Republic of China, GB/T 15834-1995: Use of Punctuation Marks, 1995.

[46] Pedersen, T., “Fishing for exactness,” in Proceedings of the South Central SAS User’s Group Conference (SCSUG-96), (Austin, TX), pp. 188–200, Oct.

1996.

[47] Pickett, J. P. and others, eds., The American Heritage Dictionary of the English Language. Boston, MA: Houghton Mifflin Company, fourth ed., 2000.

[48] Shen, T.-Z., A Study of Why Questions in Web-Based Question Answering System. Master thesis, National Taiwan University, Taipei, Taiwan, 2003.

[49] Tsai, C.-H., “A review of Chinese word lists accessible on the Internet.” on-line document, Oct. 2003. http://www.geocities.com/hao510/wordlist/.

[50] Voorhees, E. M., “Overview of the TREC-8 question answering track,” in Proceedings of the 8th Text REtrieval Conference (TREC-8), (Gaithersburg, Maryland), pp. 77–82, Nov. 1999.

[51] Voorhees, E. M., “Overview of the TREC-9 question answering track,” in Proceedings of the 9th Text REtrieval Conference (TREC-9), (Gaithersburg, Maryland), pp. 71–79, Nov. 2000.

[52] Voorhees, E. M., “Overview of the TREC 2001 question answering track,” in Proceedings of the 10th Text REtrieval Conference (TREC 2001), (Gaithersburg, Maryland), pp. 42–51, Nov. 2001.

[53] Voorhees, E. M., “Overview of the TREC 2002 question answering track,” in Proceedings of the 11th Text REtrieval Conference (TREC 2002), (Gaithersburg, Maryland), pp. 57–67, Nov. 2002.

[54] Voorhees, E. M., “Overview of the TREC 2003 question answering track,” in Proceedings of the 12th Text REtrieval Conference (TREC 2003), (Gaithersburg, Maryland), pp. 54–68, Nov. 2003.

[55] Weeber, M., Baayen, R. H., and Vos, R., “Extracting the lowest-frequency words: Pitfalls and possibilities,” Computational Linguistics, vol. 26, no. 3, pp. 301–317, 2000.

[56] Winograd, T., “A language/action perspective on the design of cooperative work,” Human-Computer Interaction, vol. 3, no. 1, pp. 3–30, 1987–1988.

[57] Wu, B., Prescriptive Usage of Punctuation Marks. Hong Kong: Joint Pub-lishing (H.K.) Co., 1998.

[58] XTAG Research Group, “A lexicalized tree adjoining grammar for En-glish,” Tech. Rep. IRCS-01-03, IRCS, University of Pennsylvania, 2001.

[59] Xu, Z. and others, eds., Unabridged Dictionary of Chinese Characters.

China: Sichuan Dictionary Publishing House and Hubei Dictionary Publish-ing House, 1995.

[60] Yen, Y.-S., ed., Mass Modern Chinese-English Dictionary. Taipei: Mass Publishing Company, 1988.

[61] Zhang, B., “Functional interpretation of Mandarin question sentences,” in Various Characteristics of Mandarin Grammar (Xing, F., ed.), pp. 291–303, Beijing: Beijing Language and Culture University Publishing House, 1999.

[62] Zhang, Z., Common Sense of Mandarin Grammar. Shanghai: Shanghai Education Publishing House, 1959. Reprinted in traditional Chinese by Joint Publishing (H.K.) Co., Hong Kong, 1999.

在文檔中 漢語問句偵測之量化研究 (頁 87-102)

相關文件