*
*
+
*
+
[email protected] [email protected] [email protected]
Flesch Reading Ease
49 KNN Flesch Reading Ease Flesch-Kincaid KNN Flesch-Kincaid
Abstract
The development of IT industry is related to a country’s global competition, and enhancing IT talents’ technological English reading ability can thus promote their creativity. Currently, however, there is no reliable formula to judge the readability of technological English to help learners select proper articles to read. Moreover, studies related to readability of technological English articles are extremely rare. Therefore, our study analyzes the advantages and the disadvantages of Flesch-Kincaid readability formula and proposed an extended formula, using GEPT’s vocabulary difficulty levels and technology terminology as the formula’s feature values. The resource of our study comes from 49 GEPT’s beginning, intermediate, and high-intermediate level’s reading comprehension articles. We further classify these articles by KNN algorithm and compare the effectiveness between Flesch-Kincaid formula and our proposed formula. The result indicates that our proposed formula is more effective than the latter in predicting the readability of a technological English article.
Keywords: KNN, GEPT, readability, technological
English, Flesch-Kincaid readability formula
1.
[6]
(information
overload) Google
- CNN BBC Time News Week Popular Science National Geography
Flesch Reading Ease
64%~70% Flesch Reading Ease
5
Flesch Reading Ease
49 KNN Flesch Reading Ease (1) Flesch Reading Ease (2) (3)
Flesch Reading Ease
2.
KNN
2.1
(General English Proficiency Test GEPT)[3]
(Language Training and Testing Center LTTC) 5 6 [9] (GEPT) 6 6 6 7 : [5] : There are three major facts about the Earth’s orbit around the sun that causes these changes to take place. First, the Earth’s spin is wobbly, like a spinning top that starts to wobble when it begins to slow down. This wobble makes the Earth tilt in a circular motion around its center that alters the place where the sun hits the Earth’s spin, the position of the Earth on its path changes relative to the time of year. This phenomenon is called the “precession of equinoxes.” The cycle of equinox precession takes over 22,000 years to complete. In the growth of continental ice sheets, summer temperatures are probably more important than winter ones.
112 ( ) level 1
86 level 2 7 level 3 6
level 4 5 level 5 2 level 6 0
level 7 7 76.79% 6.25% 5.36% 4.46% 1.79% 0.00% 6.25% 2.2 Flesch-Kincaid (Flesch-Kincaid Readability analysis) [11] Flesch Kincaid Flesch-Kincaid
Flesch Reading Ease Flesch-Kincaid Grade Level
Flesch Reading Ease
1 100
90 100
( )
60 70
0 30 Flesch Reading Ease (1) : − − totalWords bles totalSylla nces totalSente totalWords 84.6 015 . 1 835 . 206
…
(1)Flesch-Kincaid Grade Level Flesch Reading Ease Flesch-Kincaid Grade Level
7.2 7
Flesch-Kincaid Grade Level 2 [3,15] 59 . 15 8 . 11 39 . 0 + − toalWords bles totalSylla nces totalSente totalWords
…
(2)3 (simple sentence) (compound sentence) (complex sentence) [10]
Flesch Reading
: Lying exposed without its blanket of snow, the ice on the river melts quickly under the warm March sun.
Flesch Reading Ease
boom clamp tilt stain via wring
Flesch Reading Ease 64%~70%
Flesch Reading Ease
2.3 KNN [4] (machine learning) [7,8] (features) Fabrizio Sebastiani [12] 3 dj j=1~m ci i=1~n D C D×C -T F
(
dj,ci)
∈D×C,D×C→{ }
T,F ……… (3) TF-IDF K- (K-Nearest Neighbor KNN)[7] (training data) (validation data) (test data) [17] (overfit) (underfit) [2]K (K Nearest Neighbor algorithm KNN)[4] (Nearest Neighbor NN) (Euclidean distance) 4 [15]
(
)
( )
= − − = = n i n n i Q i P Q P Eu q q q Q p p p P 1 2 2 1 2 1 ) ( , ) , , , ( , , , , ………… (4) K K K K K A A k [13] (1) K K (2) (3) K (4) K K 5 [16]( )
∈ − = kNN d j j j i i j b c d y d x sim c x y , ( , ) ( , ) ……… (5){ }
0,1 ) , (dj cj ∈ y y=1 dj ci sim(x, dj) x dj bi ci KNN K KKNN ( K-D tree) [13]
3.
3.1 2005 21 15 13 49 50% JavaFlesch Reading Ease Flesch-Kincaid Grade
Level 1 1 1 3 1 / / 192.61 11.90 139.48 12.26 1.39 232.71 11.44 152.50 13.59 1.45 261.18 8.85 159.23 18.43 1.64 3.2
Flesch Reading Ease 49
4 (
) XLMiner
4
1 Flesch Reading Ease
KNN 2 7 1 6 3 1 2 9 3 1 2 3 2 1 2 ( = 1) ( = 2) KNN 1.528 1.427 (residual) -0.518 0.573 2 K 1 18 13 1.418 1.2604 1 1.528 -0.528 2 18 7.235 1.501 1.06816 2 1.427 0.573 … … … … KNN [14] KNN 3 3 ( %) 1 2 3 4 5 6 40 50 60 70 80 90 60 50 40 30 20 10
4.
Flesch Reading Ease 64%~70%
49 : 21
15 13
Flesch Reading Ease 1
1
Flesch Reading Ease
10 0-Fl es ch R ea di ng E as e
1 Flesch Reading Ease
2 6 7 4 49 4 1 4 1 2 3 4 5 6 0.91215 0.05171 0.01601 0.01736 0.00029 0.00070 0.00178 0.83819 0.07059 0.02892 0.04265 0.00549 0.00474 0.00942 0.70827 0.10779 0.05689 0.06585 0.01706 0.01284 0.03129 1 4 5 RMS 1 2 3 RMS > 0.5 RMS 0 1 RMS 0.5 0.516 : 50:50 2 4 RMS 0.416 0.342 0.349 : 60:40 90:10 70:30 5 RMS 1 4 4 3 2 1 1 Flesch Reading Ease 2 7
1 Flesch Reading Ease 3 Flesch Reading Ease 7
1 2
(Flesch Reading Ease
) 4 7 ( ) 3 5 1 4 KNN k RMS k RMS k RMS k RMS 40 60 20 0.547 16 0.4191 12 0.399 19 0.369 50 50 20 0.516 20 0.436 20 0.416 20 0.389 60 40 2 0.569 9 0.416 7 0.393 13 0.392 70 30 20 0.609 6 0.420 6 0.396 15 0.349 80 20 20 0.744 16 0.485 14 0.477 18 0.441 90 10 4 0.656 13 0.473 6 0.342 18 0.388 0.607 0.442 0.404 0.388 4 6 6 Level 1 2 3 4 5 6 7 / 0.85 1.49 1.76 2.05 16.22 7.61 5.78 / 0.78 2.08 3.55 3.79 58.05 18.35 17.59 / 0.91 1.40 2.01 1.85 3.58 2.41 3.04 8 L1 L2 L3 L4 L5 L6 L7 7 59 . 17 6 35 . 18 5 05 . 58 4 79 . 3 3 55 . 3 2 08 . 2 1 91 . 0 ) ( L L L L L L L x f + + + + + + − = …… (6) 2 1 4 RMS ( 5) 2 ( 2 3 4) 1 RMS 5.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 RMS 2 ( 6) KNN : Flesch Reading Ease : Flesch Reading Ease (1) (2) KNN (3) [1] 2005 p1~30 [2] 2003 [3] http://www.gept.org.tw/ [4] 2006 -[5] 2004 [6] 2004 [7] 2004 [8] 2003
-[9] Chiang, H. K. & Kuo, F. L. (2005). Promoting
Active Learning: Finding Right Articles for Right Learners, Paper presented at the Fifth International Conference on AsiaCALL, Korea. [10] Chidambaram, D. (2005). Processing complex
sentences for information extraction. Arizona: Arizona State Univ.
[11] Davies, S. (2003). Content Based Instruction in
EFL Contexts. The Internet TESL Journal, vol.
2, no. 9
(http://iteslj.org/Articles/Davies-CBI.html). [12] Farbrizio S. (2002). Machine learning in
automated text categorization, ACM Computing Surveys, 34 (1), pp.1-47.
[13] Teknomo,
http://people.revoledu.com/kardi/tutorial/KNN/H owTo_KNN.html.
[14] Kleinbaum, D.G., Kupper, L.L., Muller, K.E.,
and Nizam, A. (1998). Applied Regression Analysis and Other Multivariable Methods. 3rd ed., Duxbury Press, Belmont, California, U.S.A. [15] Marton, K.(2004). Measure concentration for
Euclidean distance in the case of dependent random variables, Ann. Probability. 32, no. 3b, pp.2526–2544.
[16] Tam, V., Santoso, A. and Setiono, R. (2002). A
comparative study of centroid-based, neighborhood-based and statistical approaches
for effective document
categorization,Proceedings of the 16th international conference on pattern recognition, pp.235-238.