• 沒有找到結果。

科技英文文章難易度分類之研究

N/A
N/A
Protected

Academic year: 2021

Share "科技英文文章難易度分類之研究"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

*

*

+

*

+

[email protected] [email protected] [email protected]

Flesch Reading Ease

49 KNN Flesch Reading Ease Flesch-Kincaid KNN Flesch-Kincaid

Abstract

The development of IT industry is related to a country’s global competition, and enhancing IT talents’ technological English reading ability can thus promote their creativity. Currently, however, there is no reliable formula to judge the readability of technological English to help learners select proper articles to read. Moreover, studies related to readability of technological English articles are extremely rare. Therefore, our study analyzes the advantages and the disadvantages of Flesch-Kincaid readability formula and proposed an extended formula, using GEPT’s vocabulary difficulty levels and technology terminology as the formula’s feature values. The resource of our study comes from 49 GEPT’s beginning, intermediate, and high-intermediate level’s reading comprehension articles. We further classify these articles by KNN algorithm and compare the effectiveness between Flesch-Kincaid formula and our proposed formula. The result indicates that our proposed formula is more effective than the latter in predicting the readability of a technological English article.

Keywords: KNN, GEPT, readability, technological

English, Flesch-Kincaid readability formula

1.

[6]

(information

overload) Google

- CNN BBC Time News Week Popular Science National Geography

Flesch Reading Ease

64%~70% Flesch Reading Ease

(2)

5

Flesch Reading Ease

49 KNN Flesch Reading Ease (1) Flesch Reading Ease (2) (3)

Flesch Reading Ease

2.

KNN

2.1

(General English Proficiency Test GEPT)[3]

(Language Training and Testing Center LTTC) 5 6 [9] (GEPT) 6 6 6 7 : [5] : There are three major facts about the Earth’s orbit around the sun that causes these changes to take place. First, the Earth’s spin is wobbly, like a spinning top that starts to wobble when it begins to slow down. This wobble makes the Earth tilt in a circular motion around its center that alters the place where the sun hits the Earth’s spin, the position of the Earth on its path changes relative to the time of year. This phenomenon is called the “precession of equinoxes.” The cycle of equinox precession takes over 22,000 years to complete. In the growth of continental ice sheets, summer temperatures are probably more important than winter ones.

112 ( ) level 1

86 level 2 7 level 3 6

level 4 5 level 5 2 level 6 0

level 7 7 76.79% 6.25% 5.36% 4.46% 1.79% 0.00% 6.25% 2.2 Flesch-Kincaid (Flesch-Kincaid Readability analysis) [11] Flesch Kincaid Flesch-Kincaid

Flesch Reading Ease Flesch-Kincaid Grade Level

Flesch Reading Ease

1 100

90 100

( )

60 70

0 30 Flesch Reading Ease (1) : − − totalWords bles totalSylla nces totalSente totalWords 84.6 015 . 1 835 . 206

(1)

Flesch-Kincaid Grade Level Flesch Reading Ease Flesch-Kincaid Grade Level

7.2 7

Flesch-Kincaid Grade Level 2 [3,15] 59 . 15 8 . 11 39 . 0 + − toalWords bles totalSylla nces totalSente totalWords

(2)

(3)

3 (simple sentence) (compound sentence) (complex sentence) [10]

Flesch Reading

: Lying exposed without its blanket of snow, the ice on the river melts quickly under the warm March sun.

Flesch Reading Ease

boom clamp tilt stain via wring

Flesch Reading Ease 64%~70%

Flesch Reading Ease

2.3 KNN [4] (machine learning) [7,8] (features) Fabrizio Sebastiani [12] 3 dj j=1~m ci i=1~n D C D×C -T F

(

dj,ci

)

D×C,D×C

{ }

T,F ……… (3) TF-IDF K- (K-Nearest Neighbor KNN)[7] (training data) (validation data) (test data) [17] (overfit) (underfit) [2]

K (K Nearest Neighbor algorithm KNN)[4] (Nearest Neighbor NN) (Euclidean distance) 4 [15]

(

)

( )

= − − = = n i n n i Q i P Q P Eu q q q Q p p p P 1 2 2 1 2 1 ) ( , ) , , , ( , , , , ………… (4) K K K K K A A k [13] (1) K K (2) (3) K (4) K K 5 [16]

( )

∈ − = kNN d j j j i i j b c d y d x sim c x y , ( , ) ( , ) ……… (5)

{ }

0,1 ) , (dj cjy y=1 dj ci sim(x, dj) x dj bi ci KNN K K

(4)

KNN ( K-D tree) [13]

3.

3.1 2005 21 15 13 49 50% Java

Flesch Reading Ease Flesch-Kincaid Grade

Level 1 1 1 3 1 / / 192.61 11.90 139.48 12.26 1.39 232.71 11.44 152.50 13.59 1.45 261.18 8.85 159.23 18.43 1.64 3.2

Flesch Reading Ease 49

4 (

) XLMiner

4

1 Flesch Reading Ease

KNN 2 7 1 6 3 1 2 9 3 1 2 3 2 1 2 ( = 1) ( = 2) KNN 1.528 1.427 (residual) -0.518 0.573 2 K 1 18 13 1.418 1.2604 1 1.528 -0.528 2 18 7.235 1.501 1.06816 2 1.427 0.573 … … … … KNN [14] KNN 3 3 ( %) 1 2 3 4 5 6 40 50 60 70 80 90 60 50 40 30 20 10

4.

Flesch Reading Ease 64%~70%

49 : 21

15 13

Flesch Reading Ease 1

1

Flesch Reading Ease

(5)

10 0-Fl es ch R ea di ng E as e

1 Flesch Reading Ease

2 6 7 4 49 4 1 4 1 2 3 4 5 6 0.91215 0.05171 0.01601 0.01736 0.00029 0.00070 0.00178 0.83819 0.07059 0.02892 0.04265 0.00549 0.00474 0.00942 0.70827 0.10779 0.05689 0.06585 0.01706 0.01284 0.03129 1 4 5 RMS 1 2 3 RMS > 0.5 RMS 0 1 RMS 0.5 0.516 : 50:50 2 4 RMS 0.416 0.342 0.349 : 60:40 90:10 70:30 5 RMS 1 4 4 3 2 1 1 Flesch Reading Ease 2 7

1 Flesch Reading Ease 3 Flesch Reading Ease 7

1 2

(Flesch Reading Ease

) 4 7 ( ) 3 5 1 4 KNN k RMS k RMS k RMS k RMS 40 60 20 0.547 16 0.4191 12 0.399 19 0.369 50 50 20 0.516 20 0.436 20 0.416 20 0.389 60 40 2 0.569 9 0.416 7 0.393 13 0.392 70 30 20 0.609 6 0.420 6 0.396 15 0.349 80 20 20 0.744 16 0.485 14 0.477 18 0.441 90 10 4 0.656 13 0.473 6 0.342 18 0.388 0.607 0.442 0.404 0.388 4 6 6 Level 1 2 3 4 5 6 7 / 0.85 1.49 1.76 2.05 16.22 7.61 5.78 / 0.78 2.08 3.55 3.79 58.05 18.35 17.59 / 0.91 1.40 2.01 1.85 3.58 2.41 3.04 8 L1 L2 L3 L4 L5 L6 L7 7 59 . 17 6 35 . 18 5 05 . 58 4 79 . 3 3 55 . 3 2 08 . 2 1 91 . 0 ) ( L L L L L L L x f + + + + + + − = …… (6) 2 1 4 RMS ( 5) 2 ( 2 3 4) 1 RMS 5.

(6)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 RMS 2 ( 6) KNN : Flesch Reading Ease : Flesch Reading Ease (1) (2) KNN (3) [1] 2005 p1~30 [2] 2003 [3] http://www.gept.org.tw/ [4] 2006 -[5] 2004 [6] 2004 [7] 2004 [8] 2003

-[9] Chiang, H. K. & Kuo, F. L. (2005). Promoting

Active Learning: Finding Right Articles for Right Learners, Paper presented at the Fifth International Conference on AsiaCALL, Korea. [10] Chidambaram, D. (2005). Processing complex

sentences for information extraction. Arizona: Arizona State Univ.

[11] Davies, S. (2003). Content Based Instruction in

EFL Contexts. The Internet TESL Journal, vol.

2, no. 9

(http://iteslj.org/Articles/Davies-CBI.html). [12] Farbrizio S. (2002). Machine learning in

automated text categorization, ACM Computing Surveys, 34 (1), pp.1-47.

[13] Teknomo,

http://people.revoledu.com/kardi/tutorial/KNN/H owTo_KNN.html.

[14] Kleinbaum, D.G., Kupper, L.L., Muller, K.E.,

and Nizam, A. (1998). Applied Regression Analysis and Other Multivariable Methods. 3rd ed., Duxbury Press, Belmont, California, U.S.A. [15] Marton, K.(2004). Measure concentration for

Euclidean distance in the case of dependent random variables, Ann. Probability. 32, no. 3b, pp.2526–2544.

[16] Tam, V., Santoso, A. and Setiono, R. (2002). A

comparative study of centroid-based, neighborhood-based and statistical approaches

for effective document

categorization,Proceedings of the 16th international conference on pattern recognition, pp.235-238.

參考文獻

相關文件

 To write to the screen (or read the screen), use the next 8K words of the memory To read which key is currently pressed, use the next word of the

2.1.1 The pre-primary educator must have specialised knowledge about the characteristics of child development before they can be responsive to the needs of children, set

The articles in this issue of the NET Scheme News will tell you how our English teachers continue to explore different innovative ways to enrich students’ English learning

Reading Task 6: Genre Structure and Language Features. • Now let’s look at how language features (e.g. sentence patterns) are connected to the structure

 Genre – animal stories but even the stories have animals as main characters the contents are actually realistic..  Curious

However, dictation is a mind-boggling task to a lot of learners in primary schools, especially to those who have not developed any strategies (e.g. applying phonological

• Information on learners’ performance in the learning task is collected throughout the learning and teaching process so as to help teachers design post-task activities

 Work in a collaborative manner with subject teachers to provide learners with additional opportunities to learn and use English in the school.  Enhance teachers’ own