DISCUSSIONS AND CONCLUSIONS - 以語料庫為依據之學術英文字彙之研究

The present study explores vocabulary use in RAs, the Introduction section in particular, in relation to its communicative purposes or moves, using a data-driven, corpus-based approach. In this chapter, we first discuss and summarize the major findings of the study. Then, pedagogical implications as well as possible applications of the results are discussed. We finally provide a few directions for future research.

Summary of the Study

The study takes a genre-based, corpus-informed approach to analyze the use of vocabulary in RAs in the field of computer science. The corpus consists of 60 RAs selected from 3 major journals in computer science. All the text samples were analyzed both quantitatively and qualitatively. What distinguishes our study from most genre analysis studies or vocabulary studies is that we attempt to connect the two research fields; in other words, we aim to explore the generic nature of vocabulary. Specifically, we investigate move-signaling words in RAs since they can play an essential role in the pedagogy of academic writing, research paper writing in particular. Moreover, we approach the research questions mainly from a data-driven, probabilistic perspective. The quantitative analysis is solidly based on statistical measures or facilitated by NLP tools.

Data analysis focuses both on the whole RA corpus and the RA Introduction sub-corpus. To explore the nature of vocabulary used in the genre of RAs in computer science, the corpus is analyzed from different perspectives. Analysis of the word frequency list of the corpus shows the coverage of the GSL (28.20%), AWL (12.75%), and technical words (as generally represented by off-list words) (59.05%) used in

RAs. in the list. This suggests that general-purpose words constitute only a little more than one-fourth of all vocabulary in this genre, while academic and technical vocabulary account for almost three-fourths. Particularly, words of technical nature play an essential role in writing RAs in computer science. The percentages thus reflect the vocabulary register of both the genre and the field.

A second quantitative analysis is an examination of the top 100, 200, and 300 high-frequency words. It is found that the percentages of academic and technical words increase consistently in the order of 100, 200, and 300 word lists. For example, a lot more content words with field-specific meanings occur in the top 300 high frequency word list, such as channel, output, hardware etc. However, if we look at the proportions of these different categories of vocabulary from a different perspective, namely the coverage of the total running words (tokens), the results are totally different. This is demonstrated by the two word frequency profiles also compiled in the study. They reveal that actually a very small number of word-forms (the GSL words, and mostly function words) have very high occurrence rate, constituting nearly 1/4 of the whole corpus in terms of running words. On the other hand, low frequency words (those occurring less than 10 times) account for more than half of the vocabulary (or types) of the corpus. This phenomenon poses an interesting question about vocabulary learning: should learners of academic writing learn high-frequency words or low-frequency words? Although low-frequency words do not recur frequently, they form the wide range of vocabulary repertoire RA writers need to use, even merely once or twice. The pedagogical implication of this finding is thus significant.

To learn how vocabulary use may reflect the field of research, a simple comparison of the 50 most frequent content word forms among the CS corpus, a TESOL corpus, and the BNC Written is made. The result reveals that words

frequently used in the CS corpus are rather infrequent in the TESOL Corpus or the BNC Written. We, thus, may draw the conclusion that the genre as well as subject content of a corpus may influence the results of corpus-based vocabulary analysis. In addition, vocabulary register characterized by field and genre should be taken into account in selecting target words for vocabulary learning. The field-specific words deserve more attention in EAP classrooms since they play an important role in the comprehension and production of academic texts.

As indicated earlier in this section, this study intends to investigate move-signaling words in RAs. We, therefore, narrow the focus down to one single section of RAs -- the Introduction. Again, statistical analysis reveals that the AWL words constitute an even higher percentage of the total vocabulary in the Introduction sub-corpus than that in the whole CS corpus. However, the proportion of the technical vocabulary (the off-list words) drops might result from the nature of Introduction in which general words are more used frequently.

To connect individual words with the rhetorical functions of RA Introduction, or to find move-signaling words, move analysis is conducted. A self-developed coding scheme is used to identify all the moves in the text samples. The major and optional moves as well as 3-move and 4-move patterns representing the information structures of RA Introduction are further identified based on frequency and range. Results indicate that among the six major moves, the combination of IL with IM, or vise versa, seems to be very common in both the 3-move and 4-move patterns, accounting for 4 instances among the 7 selected common move patterns. The other three common move patterns are IL-IG-IL, IL-IP-IM, and IB-IL-IG. Although the frequencies of these move patterns are not significantly high because of the small size of the corpus, they are pedagogically helpful since they exemplify how major/optional moves are used in combination in the Introduction, providing learners with useful information in

writing this section.

Lexical bundles refer to fixed expressions that can be found in a register or genre.

As they lexical bundles consistently in a specific text type, they can reveal its important discourse functions. We thus examined lexical bundles in the Introduction subcorpus and move-signaling words used to realized the rhetorical functions in the subcorpus of each move. In the Introduction subcorpus, we examined the five-word, four-word, and three-word lexical bundles, categorizing them into bundles that reflect the rhetorical functions of RAs and general academic bundles. It is found that the majority of the former bundles characterize the rhetorical functions of IP and IO, such as in this paper, we present or paper is organized as follows, while bundles reflecting referential stance such as on the basis of or can be viewed as, among the latter bundles, are the most frequently employed category. This implies that IP and IO are moves that are highly conventionalized in terms of language use, the realization of which is fixed, providing significant pedagogical implications for both EAP teaching and learning.

We also investigated move-signaling words and lexical bundles of some of the major moves to shed light on how they are used to realize the rhetorical functions of them.

Results firstly presented the move signaling words observed from the high-frequency wordlists. Then, lexical bundles characterizing the rhetorical functions with high frequencies were selected. It was found that the examination of high-frequency wordlists revealed words associated with the rhetorical functions such as the reporting verbs in IP or concessive sentence-connectors used in IG. Also, high frequently recurred lexical bundles of some moves are designated in the representation of the rhetorical functions of move. We may conclude that the examination of language use from subcorpus of each move helps reveal subtle linguistic features hard to be noticed by investigating only the whole corpus.

Implications of the Study

The quantitative analysis of the study was mainly based on the construction of a corpus. The word frequency lists, move/common move patterns, and lexical bundles of the study were all derived from the analysis of the corpus with NLP tools, setting a good example in terms of the use of corpora in vocabulary studies. Corpus-based results enable researchers, teachers and students to have an access to language use in real world instead of relying on intuition or made-up examples. Frequency serves as the most important information that relies a great deal on the use of corpus studies. An understanding of how frequently words occur and how words are covered by wordlists developed for different purposes is of help to know the characteristics of words to be studied. In addition, the comparison of word frequency lists of different genres or in different fields might result in information regarding the composition of word frequency list. This information not only reflects the characteristics of a genre but helps teachers set an appropriate learning goal that fits learners’ needs. On the other hand, many studies in the past have been emphasizing the importance of the GSL, indicating its high coverage in texts is useful in comprehending texts of various types. Since the majority of English teachers are lack of the specialist knowledge of learner’s technical area, specialized vocabulary such as technical vocabulary is often neglected. Although language teachers may not have knowledge of learners’

specialized areas, what they can do is to provide learning materials specifically designed for students’ field such as the construction of a wordlist for specific purposes.

Since most students may have certain control over the GSL, the supplement of the AWL or specialized vocabulary may enhance their comprehension of specialized texts.

Finally, the learning of vocabulary should not be constrained to individual word meaning. Rather, knowing how a word relates to its discourse function is important because words are meaningful when used in context. As a result, knowledge about

words that co-occur with the words in concern or how words are used in context such as collocation or lexical bundles of a word is important since it is the essence of language knowledge and distinguishes native speakers from non-native speakers.

Limitations and Future Research

The results of this research show that the use of corpus-based approach is insightful in exploring the nature of vocabulary, linking the use of vocabulary with its corresponding rhetorical function in RAs. Because of time limitation, some aspects worthy of being investigated are not completed in this study. We, thus, provide a number of directions for future research. First, some of the results of our study are constrained or insignificant because of the small size of the corpus. To generalize the research results, it is suggested that a larger corpus is used for future investigation.

Also, since our study only focuses on the Introduction section of RAs, it is believed that analyses of other sections of RAs will be insightful for an understanding of the genre of RAs as a whole. Finally, to identify the distinguished characteristics of a discipline, future research might be aimed at comparing findings obtained from different research fields. Further, the comparison of native speakers’ corpus with learner corpus is likely to bring valuable information concerning the needs and difficulties learners have, providing a solid foundation for curriculum design and materials development.

APPENDIXES

Appendix A Sources

IEEE Transactions on Computers

Text 1a. Rexford, Jennifer, Hall, John, & Shin, Kang, G.., (1998). A router architecture for real-time communication in multicomputer networks.

IEEE Transactions on computers, 47, 10, 1088-1101.

Text 2a. Fiore, Paul, D., (1999). Parallel multiplication using fast sorting networks. IEEE Transactions on computers, 48, 6, 640-645.

Text 3a. Kumar, Vijay, Prabhu, Nitin, Dunham, Magaret H., & Seydim, Ayse Yasemin, (2002). TCOT –a timeout-based mobile transaction commitment protocol. IEEE Transactions on computers, 51, 10, 1212-1218.

Text 4a. Park, Joonseok, Diniz, Pedro C., & Shayee, K. R. Shesha, (2004).

Performance and area modeling of complete FPGA designs in the presence of loop transformations. IEEE Transactions on computers, 53, 11, 1420-1435.

Text 5a. Ofek, Yoram, Yener, Bulent, & Yung, Moti, (1997). Concurrent asynchronous broadcast on the metanet. IEEE Transactions on computers, 46, 7, 737-748.

Text 6a. Marcuello, Pedro, Gonzalez, Antonio, & Tubella, Jordi, (2004). Threaad partitioning and value prediction for exploiting speculative thread-level parallelism. IEEE Transactions on computers, 53, 2, 114-125.

Text 7a. Pineiro, Jose-Alejandro, Bruguera, Javier Diaz, (2002). High-speed double-precision computation of reciprocal, division, square root, and

inverse square root. IEEE Transactions on computers, 51, 12, 1377-1388.

Text 8a. Chisholm, G. H., & Wojcik, A. S., (1999). An application of formal analysis to software in a fault-tolerant environment. IEEE Transactions on computers, 48, 10, 1053-1064.

Text 9a. Schwiebert, Loren, (2001). Deaadlock –free oblivious wormhole routing with cyclic dependencies. IEEE Transactions on computers, 50, 9, 865-876.

Text 10a. Danysh, Albert, & Tan, Dimitri, (2005). Architecture and implementation of a vector/SIMD multiply-accumulate unit. IEEE Transactions on computers, 54, 3, 284-293.

Text 11a. Phipatanasuphorn, V., & Ramanathan, P., (2004). Vulnerability of sensor networks to unauthorized traversal and monitoring. IEEE Transactions on Computers, 53, 3, 364-369.

Text 12a. Pedregal-Martin, C., & Ramamritham, K., (2002). Support for recovery in mobile systems. IEEE Transactions on Computers, 51, 10, 1219-1224.

Text 13a. Radhakrishnan, R., Vijaykrishnan, N., John, L. K., Sivasubramaniam, A., Rubio, J., & Sabarinathan, J., (2001). Java runtime systems:

characterization and architectural implications. IEEE Transactions on Computers, 50, 2, 131-146.

Text 14a. Abdelzaher, T. F., & Shin, K. G.., (2000). Period-based load partitioning and assignment for large real-time applications. IEEE Transactions on Computers, 49, 1, 81-87.

Text 15a. Mishra, P., & Srivastava, M., (1998). Effect of connection rerouting on application performance in mobile networks. IEEE Transactions on

Computers, 47, 4, 371-390.

Text 16a. Zuberi, K. M., & Shin, K. G.., (2000). Design and implementation of efficient message scheduling for controller area network. IEEE Transactions on Computers, 49, 2, 182-188.

Text 17a. Chanchio, K., & Sun, Xian-He, (2004). Communication state transfer for the mobility concurrent heterogeneous computing. IEEE Transactions on Computers, 53, 10, 1260-1273.

Text 18a. Vijaykrishnan, N., Kandemir, M., Irwin, M. J., Kim, H. S., Ye, W., &

Duarte, D., (2003). Evaluating integrated hardware-software optimizations using a unified energy estimation framework. IEEE Transactions on Computers, 52, 1, 59-76.

Text 19a. Park, J., Diniz, P. C., & Shayee, K.R. S., (2004). Performance and area modeling of complete FPGA designs in the presence of loop transformations. IEEE Transactions on Computers, 53, 11, 1420-1435.

Text 20a. Sabbineni, H., & Chakrabarty, K., (2005). Location-aided flooding: an energy-efficient data dissemination protocol for wireless senior networks. IEEE Transactions on Computers, 54, 1, 36-46.

IEEE Transactions on Pattern Analysis and Machine Intelligence

Teax 1b. Chuang, J. H., Tsai, C. H., & Ko, M. C., (2000). Skeletonization of three-dimensional object using generalized potential field. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 11, 1241-1251.

Text 2b. Senior, A., (2001). A combination fingerprint classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 10, 1165-1174.

Text 3b. Beiden, S. V., Maloof, M. A., & Wagner, R. F., (2003). A general model for finite-sample effects in training and testing of competing classifiers.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 12, 1561-1569.

Text 4b. Cordella, L. P., Foggia P., Sansone, C., & Vento, M., (2004). A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 10, 1367-1372.

Text 5b. Lam, L., & Suen, C. Y., (1995). An evaluation of parallel thinning algorithms for character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 9, 914-919.

Text 6b. McCrowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M.,

& Zhang, D., (2005). Automatic analysis of multimodal group actions in meetings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 3, 305-317.

Text 7b. Liu, C. L., Koga, M., & Fujisawa, H., (2002). Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 11, 1425-1437.

Text 8b. Borgefors, G.., Ramella, G.., & di Baja, G.. S., (2001). Hierarchical decomposition of multiscale skeletons. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 11, 1296-1312.

Text 9b. El-Yacoubi, A., Gilloux, M., & Suen, C.Y., (1999). An HMM-based approach for off-line uncontrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 8, 752-760.

Text 10b. Rocha, J., & Pavlidis, T., (1995). Character recognition without segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 9, 903-909.

Text 11b. Ahmed, M., & Ward R., (2002). A rotation invariant rule-based thinning algorithm for character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 12, 1672-1678.

Text 12b. Singh, S., (2003). Multiresolution estimates of classification complexity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 12, 1534-1539.

Text 13b. Ho, Tin Kam, & Baird, Henry S., (1997). Large-scale simulation studies in image pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 10, 1067-1079.

Text 14b. Madhvanath, S., Kleinberg, E., & Govindaraju, V., (1999). Holistic verification of handwritten phrases. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 12, 1344-1356.

Text 15b. Watanabe, M., & Nayar, S. K., (1997). Telecentric optics for focus analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19,12, 1360-1365.

Text 16b. Havaldar, P., & Medioni, G., (1998). Full volumetric descriptions from three intensity images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 5, 540-545.

Text 17b. Starner, Thad, Weaver, J., & Pentland, A., (1998). Real-time American sign language recognition using desk and wearable computer based video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 12, 1371.

Text 18b. Jiang, Xiaoyi, (2000). An adaptive contour closure algorithm and its

experimental evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 11, 1252-1265.

Text 19b. Fredembach, C., Schroder, M., Susstrunk, S., (2004). Eigenregions for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 12, 1645-1649.

Text 20b. Marinai, S., Gori, M., & Soda, G.., (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1, 23-35.

Computational Linguistics

Text 1c. Venkataraman, A., (2001). A statistical model for word discovery in transcribed speech. Computational Linguistics, 27, 3, 351-372.

Text 2c. Ploux, S., & Ji, H., (2003) A model for matching semantic maps between languages (French/ English, English/ French). Computational Linguistics, 29, 2, 155-178.

Text 3c. Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin M., (2004). Learning subjective language. Computational Linguistics, 30, 3, 277-308.

Text 4c. Kibble, R., & Power, R., (2004). Optimizing referential coherence in text generation. Computational Linguistics, 30, 4, 401-416.

Text 5c. Teahan, W. J., Wen, Yingying, McNab, R., & Witten, Ian H., (2000). A compression-based algorithm for Chinese word segmentation.

Computational Linguistics, 26, 3, 375-393.

Text 6c. Navigli, R., & Velardi, P., (2004). Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics, 30, 2, 151-179.

Text 7c. Silber, H. Gregory, & McCoyy, Kathleen F., (2002). Efficiently

computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics, 28, 4, 487-496.

Text 8c. Santamara, C., & Gonzalo, J., & Verdejo, F., (2003). Automatic association of web directories with word senses. Computational Linguistics, 29, 3, 485-502.

Text 9c. Keller, F., & Lapatay, M., (2003). Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29, 3, 459-484.

Text 10c. Fais, Laurel, (2004). Inferable centers, centering transitions, and the notion of coherence. Computational Linguistics, 30, 2, 119-150.

Text 11c. Stamatatos, E., Fakotakis, N., & Kokkinakis, G.., (2001). Automatic text categorization in terms of genre and author. Computational Linguistics, 26, 4, 471-495.

Text 12c. Pevzner, L., & Hearsty, Marti A., (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28, 1, 19-36.

Text 13c. Li, Hang, & Li Cong, (2004). Word translation disambiguation using bilingual bootstrapping. Computational Linguistics, 30, 1, 1-22.

Text 14c. Mason, Zachary J., (2004). CorMet: a computational, corpus-based conventional metaphor extraction system. Computational Linguistics, 30, 1, 23-44.

Text 15c. Branco, AntAonio, (2002). Binding Machines. Computational Linguistics, 28, 1, 1-18.

Text 16c. Oflazer, Kemal, (2003). Dependency parsing with an extended finite-state approach. Computational Linguistics, 29, 4, 515-544.

Text 17c. Marchand, Y., &Damper, R., (2000). A multistrtegy approach to improving pronunciation by analogy. Computational Linguistics, 26, 2,

195-219.

Text 18c. Ke, J., Ogura, M., & Wang, William S.-Y., (2003). Optimization models of sound systems using genetic algorithms. Computational Linguistics, 29, 1, 1-18.

Text 19c. Kehler, A., Bear, J., & Appelt, D., (2001). The need for accurate alignment in natural language system evaluation. Computational Linguistics, 27, 2, 231-248.

Text 20c. Teufel, S., & Moens, M., (2002). Summarizing scientific articles:

experiments with relevance and rhetorical status. Computational Linguistics, 28, 4, 409-445.

Appendix B

achieve (achieves, achieved, achievement, achievements) adapt (adapts, adapted, adaptive, adaptation)

adjacent

algorithm (algorithms)

align (aligns, aligned, alignment, alignments) allow (allows, allowed)

analyze (analyzes, analyzed, analytical, analytically, analysis, analyses) annotate (annotates, annotated, annotation, annotations, annotator, annotators) approach (approaches)

assign (assigns, assigned, assignment, assignments) associate (associates, associated, association, associations) assume (assumes, assumed, assumption, assumptions) author (authors)

bit (bits)

calculate (calculates, calculated, calculation, calculations) candidate (candidates)

capture (captures, captured) category (categories) cell (cells)

channel (channels)

characteristic (characteristics) chip (chips)

classify (classifies, classified, classification, classifications, classifier, classifiers ) cluster (clusters, clustering)

code (codes)

cohere (coheres, cohesion, cohesive, coherence) column (columns)

commit (commits, committed, commitment)

communicate (communicates, communicated, communication) compile (compiles, compiled, compiler, compilers)

complex (complexity) component (components)

compute (computes, computed, computing, computer, computation, computational) concept (concepts, conceptual, conceptually)

configure (configures, configured, configuration, configurations) consist (consists, consisting)

constant

constrain (constrains, constrained, constraint, constraints) consume (consumes, consumed, consumption)

context

contour (contours)

contrast (contrasts, contrasted, contrastive, contrastively)

contribute (contributes, contributed, contribution, contributions) convention (conventions, conventional)

core (cores)

corpus (corpora)

correlate (correlates, correlated, correlative, correlation)

correspond (corresponds, corresponded, corresponding, correspondence, correspondences)

define (defines, defined, definition, definitions) density

depend (depends, depended, dependency, dependencies, dependent, depending) derive (derives, derived, derivation, derivations)

design (designs, designed)

distribute (distributes, distributed, distribution, distributions) document (documents)

evaluate (evaluates, evaluated, evaluation) generate (generates, generated, generation) genre (genres)

identify (identifies, identified, identifying, identical, identification) image (images)

impact (impacts)

implement (implements, implemented, implementation)

instruct (instructs, instructed, instruction) interface (interfaces)

intermediate internal (internally)

interpret (interprets, interpreted, interpretive, interpretative, interpretation) inverse (inversely, inversion)

issue (issues)

Iterate (iterates, iterated, iteration, iterations) Java

logic (logical, logically) loop (loops)

migrate (migrates, migrated, migrating, migration)

minimum mobile mode (modes)

modify (modifies, modified, modification, modifications) module (modules)

normalize (normalizes, normalized, normal, normalization) observed

obtain (obtains, obtained)

occur (occurs, occurred, occurring, occurrence, occurrences) OCR

potential

precision (precise, precisely)

predict (predicts, predicted, prediction, predictions, predictor, predictors) preprocess (preprocesses, preprocessed, preprocessing)

process (processes, processed, processing, processor, processors) property (properties)

relate (relates, related, relation, relations, relationship, relationships) relative (relatively)

relevant (relevance)

rely (relies, relied, reliance)

remove (removes, removed, removal)

represent (represents, represented, representation)

require (requires, required, requirement, requirements, requiring) rerouting

research

schedule (schedules, scheduled, scheduler, scheduling) scheme (schemes)

simulate (simulates, simulated, simulating, simulation) skeleton (skeletons, skeletonization)

slot (slots)

speculate (speculates, speculated, speculative, speculatively, speculation) statistical (statistically)

syntactic (syntactically)

transform (transforms, transformed, transformation, transformations) transition (transitions)

via

virtual (virtually) web (webs) whereas wireless

word-net (word-nets) Note:

1. The CS wordlist here contains only 335 word families which cover 80% out of

在文檔中以語料庫為依據之學術英文字彙之研究 (頁 80-112)