Studies With Neutral or Positive Results - Chatbot Research in Language Learning and Teaching

2.2 Chatbot Research in Language Learning and Teaching

2.2.2 Studies With Neutral or Positive Results

2.2.2 Studies with Neutral or Positive Results

Jia (2004b) developed a chatbot system called Computer Simulator in Educational Communication (GSIEC) with the belief that students unable to find English-speaking partners to practice English will find the system helpful for its almost ubiquitous nature; also, some students might be more confident talking to a chatbot than to a real person. Adaptations had been continuously made to refine the GSIEC system (Jia, Hou & Chen, 2006) such as adding avatars and offering both open-ended and limited chat for beginning English learners.

After the 2006 adaptations, evaluations of GSIEC have been conducted in Jia and Chen (2008). The evaluation involved 45 high school students and the researchers adopted questionnaires, observations, teacher surveys, and student focus groups to formatively evaluate the use of the GSIEC in an English classroom. The results indicated that students felt most positive in terms of reviewing unit content and maintaining interest in English as their reaction and 60.50% of the users liked the system and would continue using it after the research. However, as the authors pointed out in the conclusion, their evaluation of the conversation logs relied solely on the number of rounds of user input and chatbot output and suggested further analyses on the content to examine the quality of chat. The current study will focus on the syntactic complexity and frequency of various structure usages to explore further the details of conversation logs.

In a survey study report, Fryer and Carpenter (2006) made several positive assumptions of chatbot applications in language learning. Most chatbots now are either free or reasonably priced via subscription and can serve as a platform for interactive language practice (Fryer &

Carpenter, 2006). Award-winning bots like A.L.I.C.E., Joan, George, Cleverbot, and Suzette are all freely accessible online. Language instructors can also create their own chatbots that support the course material by making their own chatbots via the scripting modules with a yearly fee on Jabberwacky (http://www.jabberwacky.com) and Pandorabots (http://www.pandorabots.com) or for free on Chatbot4U ( http://wwwchatbot4u.com). In

addition, the abundance of available chatbot programs and resources online could enable learners to find the ones most suitable for them and to keep the novelty. With the development of AI technologies, chatbots nowadays might hold the potential to serve as affordable and ubiquitous learning tools for language learners.

In addition to cost and convenience, Fryer and Carpenter (2006) pointed out six useful ways of using chatbots in language learning. To begin with, some students in their fledging stage of English learning might feel more secure when talking to computers. Fryer and Carpenter found 85% of the 211 learners felt more comfortable conversing with A.L.I.C.E.

than a real person; in the same study, 74% of students commented Jabber wacky as funny and entertaining to talk to when they were instructed to write about their reflections after a twenty-minute chat. Second, chatbots can be structured to repeat the same materials for learners and do not get bored with the repetitions. Third, some chatbots offer text and synthesis audio simultaneously, allowing students to practice multiple skills of the target language. Since some bots are now equipped with voice synthesis and voice recognition mechanisms, they can also work as avenues to practicing writing, reading, and even listening and speaking skills. Fourth, chatting with chatbots is potentially motivating. It conveys the message to students in EFL setting that English is a tool for communication, not merely another subject to be tested. Fifth, conversing with chatbots gives students opportunities to review and recycle what they have learned in the classroom. Finally, some chatbots can also be programmed to respond to specific grammar or vocabulary usage to a certain degree.

Despite the aforementioned prospects of chatbots in language learning, previous studies on chatbots and language learning have produced mixed results.

Coniam (2008a) examined five chatbots for their linguistic potential as language learning tools for learners who are learning English as a second language (ESL). Coniam recruited undergraduate ESL trainee teachers who were rated at least seven at IELTS to simulate competent language users and less competent language users by using misspelt words or

ungrammatical structures that beginning learners might adopt. On average, each chatbot was talked to for over six hours. Textual analyses were conducted after the transcripts were all collected. George, the winner in the 2005 Loebner Prize competition, performed the best in the word level, sentence level, and text level of linguistic resources. In addition, Lucy’s ability to occasionally point out some grammatical errors also raised its educational value. Therefore, Lucy and George were considered more suitable for ESL learners. The results of Coniam’s (2008a) study are illustrated in Table 1. Coniam (2008b) further investigated six selected chatbots for their exterior features. The finding is presented in Table 2. Coniam concluded that Lucy was the best in user interface.

Table 1: Evaluation Summary of Linguistic Resources (Reproduced from Coniam, 2008a)

Though both papers described the existing chatbots to be less than perfect at the stage, Coniam acknowledged the progress made in the technology and was positive that eventually some issues will be resolved as more improvements are made.

Table 2: Evaluation Summary of Interface (Reproduced from Coniam, 2008b)

Features Cybelle Dave George Jenny Lucy Ultra Hal

already been removed and are no longer accessible or have been upgraded in some aspects.

For example, George, who was graded poor due to its lack of an avatar, is currently operating with a three-dimensional avatar in 2013.

Huang, Lin, Yang, and Wu (2008a) conducted a pilot study on the participant-perceived effectiveness of a customized chatbot called English Dialogue Companion (EDC). Designed for elementary school students, EDC included three activities: the companion selection phase, conversation phase, and the teaching phase. In the first phase, students were directed by the system to select an image to represent the system and also for themselves. In the conversation phase, both EDC and the user can select vocabulary flashcards on the screen and initiate conversations about the vocabulary. In the teaching phase, learners can add new vocabulary and flashcards to the system and teach the system to correctly pronounce the new words. In the pilot study of the system, 34 fifth graders went through the three phases of EDC and filled out a survey and eight students were randomly selected for an interview. The results were positive. On a six-point Likart scale, the mean values of all sectio ns were above 5.27. Most students looked at EDC as their classmate (52%) or teacher (24%). It was also found that students selected avatars of the same gender, found the system encouraging and preferred Chinese as the interface display language.

Huang et al. (2008b) further conducted an experimental study of EDC on the oral test and motivation of 66 elementary school students in Taiwan. The EDC was equipped with a speech recognition engine, Sphinx-4. The study adopted a between-subject design with 33 students in each condition. The control group received the instruction as usual, while the experimental group chatted with the EDC system for twenty minutes in a five-week span.

After the post oral test, fifteen students were randomly invited to have an interview to explore their experience with the machine. Analysis of covariance (ANCOVA) was performed using the pretest score as the covariate. The statistical analysis indicated no significant difference in English oral achievement but the researchers indicated over half of the lower level students

improved in the posttest. However, the participants in the study was not further divided into operationalized English levels and so no further analysis using initial proficiency as a factor was performed. The questionnaire findings indicated that 88% of the participants felt the system kept them attentive to the learning process, 91% felt English communication is important, and 94% expressed satisfaction with the system and would like to use the system to practice conversation. The researchers concluded that the EDC system might be more beneficial to lower level students.

Sha (2009) conducted a two-stage study consisting of an evaluation of three chatbots, Jabberwacky, Ultra Hal, and Verbot and a survey regarding the participants’ reactions to Verbot, a version of AIML-based chatbots that is equipped with a two-dimensional avatar, text to speech (TTS), and also a speech recognition engine (SRE). In the evaluation phase, Sha criticized Jabberwacky for its lack of avatar, confirmed by Coniam (2008b) and also its slow response rate. The slow response rate of George, one of the Jabberwacky bots, was also confirmed in Chen (2012), whose participants significantly perceived delays in the this specific bot’s response. On the other hand, Ultra Hal uses keyword spotting and during the evaluation where Sha used identical benchmark questions to acquire comparable output data, Ultra Hal’s responses were mostly irrelevant to the questions posed while Verbot responded perfectly and even humorously at times. In the survey phase, fifteen volunteers filled out a reaction survey. When asked about interest, 13% felt “amused and started laughing” and 80%

of them think it is “interesting” to talk to the bot. None of the participants felt “bored or nervous” to chat with the machine and 73% also agreed to use the chatbot in their own computer if possible. As for perceived effectiveness, 33% admitted that Verbot responded correctly to “most of my questions”, 49% said “only a few of my questions” and 27% said sometimes they were not sure what the chatbot me ant. A more open-ended questionnaire question found that typing error (20%), spelling error (13%), unnatural voice synthesis voice (27%) and Verbot’s vocabulary (20%) are all issues making the communication difficult. Sha

concluded by addressing prerequisite English level of the users in order to make the communication possible with the machine. Without a moderate commend of English, users tend to have problems in text chat and may produce unrecognizable speech that is problematic for voice recognition technology.

Chen (2012) analyzed the conversation transcripts and user experience of seven high school students in Nantou, Taiwan, using Bruce Wilcox’s Rosette, the 2011 winner of the Loebner Prize competition, and two Jabberwacky bots, George and Cleverbot. Participants chatted with all three chatbots for at least thirty minutes each. Then two junior high and three senior high school participants were further sampled to demonstrate interaction with all three chatbots for ten minutes. Following each ten- minute chat with a chatbot, participants finished a questionnaire for every specific chatbot and also verbally reported their experience back to the researcher on site. All the logs were collected for analyses on vocabulary range and coherence. It was found that generally, Rosette, a ChatScript bot, used more vocabulary belonging to higher bands in British National Corpus (BNC) than Cleverbot, a Jabberwacky bot. It is possibly due to the fact that ChatScript bots like Suzette and Rosette are rule-based bots whose answers are deliberately drafted by the botmaster for each theme of conversation while Jabberwacky bots simply recycle what humans ever said to them online as their own language source. For coherence, however, Rosette outperformed Cleverbot, possibly also due to the difference of their design. In the qualitative data, it appears that learners prefer short and clear responses from chatbots and the use of online dictionary is common for both junior high and senior high school students.

Based on the findings of the cited studies above, it could be found that studies that yielded neutral or positive results are dominantly conducted via surveys. Learners generally expressed motivation to converse with chatbots as language learning companions. Those studies, however, appear to focus on the learners’ perceptions toward chatbots’ responses rather than the learners’ responses themselves with Hung et al. ’s (2008b) study being the only

study that evaluated learners’ production in the machine- human chat. The current study will thus focus more on the development of learners’ responses in chatbot-human interactions.

For survey studies, the duration of exposure for the participants ranges from twenty minutes (Fryer & Carpenter, 2006), thirty minutes (Chen, 2012), to roughly six hours (Coniam, 2008a). For experimental studies that measure incremental effects of bots, the only study cited lasted for one hundred minutes in a five-week span (Huang et al., 2008b), roughly twenty minutes per week. The current study lengthens Huang et al.’s (2008b) duration for one more week and ten more minutes per week. As language development is an incremental process, it is arguable that a longer duration of treatment might more accurately demonstrate the potential benefits of applying chatbots to language learning.

在文檔中英語聊天機器人對台灣高中生英語學習之效益探討 (頁 34-42)