English as a foreign language (EFL) education in Taiwan has been receiving attention from the government and now compulsory English education has been made available all over the country from elementary to high school levels. With the hope of not only promoting beneficial washback to the teaching but also the evaluation of students’ overall English proficiency, an English listening assessment tool is also recently added to the Department Required Test for senior high school students from 2013 and an easier version will also become a regular test for junior high school students from 2014. The emphasis of a more well-rounded English training displays Taiwanese government’s ambition to equip Taiwanese students with well- rounded English competence instead of treating the target language as simply a core subject of high-stake examinations.
Nonetheless, several almost unavoidable problems are likely to render the English training in Taiwan less effective. To begin with, as an EFL setting, English is not a widely employed language in Taiwan. Most students, except for the ones with exceptional financial backgrounds, are not granted the chances to fully experience using English as a tool of communication in authentic settings. The Chinese-dominant environment in Taiwan also eliminates the opportunity for English learners to communicatively use English. Secondly, the test-driven pressure embedded in the educational system might distract English learners from acquiring the ability to communicate. Most exposure of English for many students might purely come from the formal classroom setting where the instruction usually focuses on the
3
grammatical concepts, vocabulary, and sentence patterns to be evaluated. Students might thus have less time to interactively practice their communication skills in the target language.
However, according to interactionism, interaction is one important element of language acquisition.
Interactionism or the interactionist position of language development involves a range of language acquisition theories that emphasize the importance of linguistic environment (Lightbown & Spada, 2004) and verbal interaction (Ellis, 2008) that learners are engaged in.
This view presumes some linguistic knowledge from the learners’ side, and considers meaningful interactions necessary for learners to acquire language structures (ESL Glossary, n.d.). It is through production and reception of the target language that learners keep constructing their knowledge about the language and become gradually aware of the underlying system. The interactionist position is different from the other theoretical frameworks that deem language learning understandable through language analysis of learners’ utterances regardless of the accompanying social interactions or cognitive development (Richards & Schmidt, 2002).
The interactionist position can also be further divided into a spectrum of interactionist models based on their varied views of whether language acquisition occurs based on the linguistic raw material in the form of modified input or in the interactions of learner and interlocutor (Ellis, 2008; Lightbown & Spada, 2004). Brown (2000) also enumerated interaction hypothesis, intake through social interaction, output hypothesis, authenticity, task-based instruction, and High Input Generators (HIGs) in a review of social constructivist models of second language acquisition (SLA).
One of the ways to counter Taiwan’s lack of English interactions is to capitalize on online English learning resources to maximize learning opportunities. Nowadays, the Internet serves as a bountiful source of potential English learning materials. News websites such as Voice of America (VOA) and Cable News Network (CNN) offer simplified news articles, audios or
4
videos for English learners. In addition, the advancement of technology has brought us numerous cutting-edge tools including speaking trainer websites like MyET and EnglishCentral for English learners who hope to calibrate pronunciation and improve speaking skills in English. All the resources online now make English language learning more accessible than ever. Though some of them are commercial programs that need purchasing, free resources are also common on the Internet. English instructors in Taiwan can capitalize on the potential of technology by appropriately applying them into their regular instructions or as supplementary learning tools.
Chatbots are one computer application that intends to maintain human- like conversations with a human entity and is mostly available directly via the Internet. They are a resource for rural area students because they offer additional linguistic input and might promote productive use of English.
1.3 Purposes of the Study and Research Questions
The purposes of this study are twofold. First, this study aims to explore the potential effects of conversing with virtual conversational agents. This study tries to examine whether interacting with three selected chatbots will have impacts on students’ written English production. One advantage of chatbots is the variety and abundance of them and students can turn to talk to a different bot before they are bored with the previous one (Fryer & Carpenter, 2006). Therefore, the current study adopts that proposed advantage to investigate the overall effects after chatting with a group of three chatbots.
The second purpose of this study is to delineate learners’ perceptions towards different kinds of chatbots. The available chatbots online now are constructed by several different underlying mechanisms, each claiming to be more advantageous than others in simulating human-like conversations by their designers. Chatbots with different underlying mec hanisms also come equipped with different supporting features. Students’ perception of those features
5
will also be compared. This study will address the following four research questions:
1. Does the participants’ syntactic complexity change after chatting with chatbots for six weeks?
2. Does the participants’ grammar accuracy and message appropriateness change after chatting with chatbots for six weeks?
3. Does the participants' fluency change after chatting with chatbots for six weeks?
4. Do the participants’ syntactic complexity and fluency change when interacting with different chatbots?
5. Do students perceive the chatbots constructed using varied mechanisms differently?
1.4 Significance of the Study
Some previous studies on the application of chatbots in language learning have been limited to the development and evaluation of new or modified chatbots, including Freudbot (Heller, Procter, Mah, Jewell & Cheung, 2005), Verbot (Sha, 2009), Let’s Chat (Stewart &
File, 2007), Aghate (Williams &van Compernolle, 2009), ALICE bot clones (Jia, 2004a;
Schumaker & Chen, 2010), Skynet (Horrigan, 2009), Computer Simulator in Educational Communication (CSIEC) (Jia, 2004b, 2008), and Coniam’s two studies (Coniam, 2008a, 2008b) respectively on chatbots’ linguistic resources and interface design.
Others explored learners’ perception of chatbots (Fryer & Carpenter, 2006; Heller et al., 2005; Jia, 2004a; Jia & Chen, 2008). O nly one study (Huang, Lin, Yang & Wu, 2008) was found to examine the effects of chatbots on the oral proficiency of 66 elementary school students in Taiwan. Though the study by Huang et al. (2008) did not find any significant effects, it is arguable that the more recent chatbots might be better equipped technologically to better maintain a human-like conversation, and might benefit language learning more.
Furthermore, the study by Huang et al. (2008) adopted primary school students as its sample and a different sample from a public senior high school might yield a different picture. This
6
study also looks at the influence of different chatbots on learners ’ use of language. Chatbots are built using one of the available mechanisms and some of them might be more suitable to sustain conversations with EFL learners, elicit more various structures or promote writing complexity.
The results of the current study might contribute to the literature on the application of chatbots in language learning. Moreover, the findings could help shed light on the effectiveness of more recent chatbots and the impacts of different underlying mechanisms on text chat quality.
7
CHAPTER TWO LITERATURE REVIEW
The following literature review first offers a theoretical foundation for incorporating chatbots in language acquisition. What follows is a review of chatbots, the synonymous terms to be used in this study and the chatbot contests currently available worldwide that evaluate and acknowledge up-to-date chatbots. The underlying mechanisms of chatbots are then explored to pinpoint the rationales of comparing different chatbots.
2.1 Introduction to Chatbots 2.1.1 Definition
Chatbots refer to computer programs that in voice, textually, or audiovisually simulate and sustain a conversation on general or specific topics with one or more human entities using natural language (Abu Shawar & Atwell, 2007b). Several other labels have also been used to refer to chatbots, including conversational agents (Goh & Fung, 2008; Horrigan, 2009; Kerly, Ellis & Bull, 2008; Perry, Hall & Ellis, 2008), dialogue systems (Abu Shawar& Atwell, 2007;
Huang et al., 2008a, 2008b; Schumaker, Ginsburg, Chen & Liu, 2006; Stewart & File, 2007;
Webb, Benyon, Hansen & Mival, 2010), natural language dialogue syste ms (Schumaker &
Chen, 2010) and chatterbots, chatbots or bots (Abu Shawar & Atwell, 2005, 2007a, 2007b;
Kirakowski, O’Connell & Yiu, 2009; Sha, 2009; Vrajitoru, 2006). On the website of chatbots.org, 158 synonyms are listed to represent the technology, including chat bot, chatbot, chatterbot, conversational agent, virtual agent, virtual assistant, virtual human and the other 152 alternatives. As indicated by the website ’s introduction article of chatbots, those terms emphasize different aspects of the technology and are employed by different groups of professionals.
As indicated on chatbots.org, chatbots are available in at least twenty-seven languages, though English chatbots are the most dominant category, with near ly five hundred chatbot
8
profiles registered in 2012. Chatbot application research in languages other than English such as Mandarin Chinese (Goldie, 2011; Hsieh, 2011), French (Williams & van Compernolle, 2009), and Afrikaans (Abu Shawar& Atwell, 2003 ), a South Africa official language, have also been conducted.
Chatbots are constructed using various components, though mostly are at least equipped with the text message bar for text chat. Coniam’s (2008b)’s study on chatbot interface introduced chatbots that are represented by a three-dimensional avatar, a two-dimensional animation agent, a static avatar image, or a faceless text bar. Some chatbots also come equipped with a text-to-speech (TTS) component and a speech recognition engine (SRE), though the quality of SRE is usually limited (Lai, 2005) due to current technological limitations and the high costs of quality SRE applications. There are also stand-alone chatbots (e.g. Verbot) that can be downloaded for personal use and web-based chatbots (e.g. ALICE bots) for everyone with the link to the site (www.chatbots.org). Some chatbots such as Cleverbot offer a complete transcript of conversation history, while others like Jabberwacky are able to converse in various different languages.
In the current study, the label “chatbot” or “bot” are used consistently to refer to any computer programs that emulate human- like conversations with another human entity regardless of any other functions that they possess the underlying architecture, or the original purpose to construct them.
2.1.2 Development of Chatbots
Introduced in 1966 by Joseph Weizenbaum, an artificial intelligence (AI) pioneer interested in determining the appropriate modality of human-computer interactions, ELIZA is the chatbot ancestor (www.chatbots.org) and employs very simple algorisms to simulate a Rogerian therapist, frequently reformulating the user input into reflective questions for users to ponder over their own issues (Last, 1989). ELIZA is a rule-based chatbot that locates
9
keywords in chat texts, assembling sentences from fragments, and reacting to text keyed in by the users just as an actress following a predetermined script to cleverly perform accordingly (Weizenbaum, 1985; cited from Last, 1989). The illusion of intellige nce deteriorates once a human entity notices the fact that ELIZA can only roughly mimic the way a Rogerian psychotherapist interacts with a patient and does not have the discourse framework or NLP component of its own (Dodigovic, 2005).
According to Richard Wallace, in the past decade, there were three phases of chatbot development (Chatbots, n.d.). The first stage is marked by keyword matching imitations of human-human interactions. The second stage is characterized by the introduction of the Internet, about thirty years after ELIZA was created. The third stage starts when chatbots are combined with other advanced technologies such as voice recognition, synthesis, or syntactic parsers.
Ever since the introduction of ELIZA in the 1960s, variations of ELIZA have also been combined with other databases of information (Beatty, 2003) and chatbot technology has been refined for half a century, to a point to replace or at least supplement some tasks.
Abu Shawar and Atwell (2007a) illustrated four categories of useful chatbot applications, including entertainment, language learning and practice, information retrieval, and e-commerce. Entertainment is the initial aim of building chatbot systems. ELIZA is the first attempt of chatbots that is built to amuse its interlocutors. Other more recent bots are also created in part for this purpose. For example, the chatbots designed to pass the Turing Test are for some part constructed to entertain users by convincing them that they are conversing with real humans. Secondly, chatbots can also serve as a tool for language learning and practice.
Despite some existing technical limitations, Abu Shawar and Atwell (2007a) found the users of their ALICE bots considered chatbots to be interesting tools to practice the target language even when the language is not fully understood by the researchers or the computer. They concluded that chatbots could be used as a tool to learn a foreign language (Abu Shawar &
10
Atwell, 2007b) even with its current keyword-based matching technique and argued that chatbots retrained using a corpus (Abu Shawar& Atwell, 2005) could offer a useful autonomous alternative to traditional conversational practice. Thirdly, chatbots are also used as information retrieval tool, usually in the form of frequent asked questions (FAQ) or supplementary tools in e-learning environments. Although it is popular to use search engines to find information, Shawar and Atwell (2007a) found that users of FAQ bots preferred the bots’ ability to give direct answers instead of just URL links and chatbots’ answers are usually shorter, even when some links are involved. The researchers concluded that FAQ bots are at least as viable as regular search engines.
Another variation of information retrieval is the conversational agents in e- learning contexts. Kerly et al. (2008) introduced CALMsystem, a chatbot system designed to describe learners’ understanding of content areas during question drills and also TeachBot, a chatbot created to guide activities step-by-step using pre-scripted language. Heish (2011) designed Confucius, a chatbot system assisting students in practicing the class content in a question and answer (Q&A) format. Schumaker et al. (2006) developed three versions of ALICE bots that were different in their conversational and content area knowledge of telecommunications for university students. Heller et al. (2005) used AIML to construct Freudbot to chat with psychology class students in the first person point of view about Sigmund Frued’s theories and contributions. E- learning agents serve as the bridge between the teacher and the student, but the chatbots need to be specifically tailored to the online or blended courses that adopt the technology. The fourth application is e-commerce, business, and other domains. It is now common for many kinds of business to have their own online chatbot agents for information service. O n chatbots.org, the English-speaking chatbots are categorized in terms of consumer themes. In 2012, twenty-six categories of business has been established for the chatbots serving as business representatives online (www.chatbots.org). Figure 1 and Figure 2 offer two examples of such an application.
11
Figure 1: Online Interactive Q&A Service of Copa Airlines
Figure: Online Interactive Q&A Service of IKEA
Although not without limitations, chatbot technology seems to be an integral part of our life in various aspects. They have also made many human tasks much more efficient and effective than before.
2.1.3 The Turing Test
Alan Mathison Turing is one of the most prominent pioneers of artificial intelligence. In his paper titled “Computer Machinery and Intelligence” (Turing, 1950), he argued that given the same circumstances, if a human being tends to mistaken the conversations with a machine as those with a male or female human in “the Imitation Game” with no less than a 30%
chance of success, the machine must possess intelligence within the qualified definition.
Despite the possibility that some strategies might be used to imitate human conversations, it might be obviated by the fact that to pass the test, a machine must be able to respond to
12
questions appropriately. The embodiment of such a test to differentiate the conversations with a machine from the ones with a person has since been called the Turing Test, and it serves as an attempt proposed by Turing himself as a standard for a machine to be called “intelligent.”
Turing was optimistic about the development of computers to pass the proposed Imitation Game as he stated below:
I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning. The original question,
"Can machines think?" I believe to be too meaningless to deserve discussion (Turing, 1950)
Although not without its criticisms (Oppy & Dowe, 2011), the Turing Test remains the most adopted test to evaluate artificial intelligence technology.
2.1.4 Chatbot Contests
Following Alan M. Turing’s proposal of the Imitation Game in the 1950s, contests aiming to determine the human-likeness of candidate programs have been proliferating. O n the website of chatbots.org listed six contests related to AI and chatbot technology in 2012, two of which have been running for over ten years.
In 1991, Hugh Loebner initiated the first administration of the Turing Testing history (Loebner, 2009), now called the Loebner Prize competition for artificial intelligence. The contest offers a $100,000 monetary prize and also an 18-carat gold medal to any program that passes the Turing Test. Considering the limitations of the technology at the time, two restraints were initially imposed on the Turing’s proposed test to render the pass more
13
attainable: limiting the range of topics and use of trickery from the judges’ side. Shieber (1993) critiqued the two additional rules and the necessity of the Loebner Prize competition. Changes have later been made to allow open-ended interactions between the judges and the entrants.
Until now, however, no chatbot system has successfully passed the test; instead, a smaller amount is given annually to the program with relatively better performance compared with the other competitors without any absolute criteria (Loebner Prize, n.d.).
Ten years after the launch of the Loebner Prize competition, the Chatterbox Challenge (CBC) was launched as another chatbot event from 2001 and shortly later more such events have been taking place annually including Gathering of Animated Lifelike Agents (GALA) from 2005, the BCS Machine Intelligence Prize from 2006, and the 2-K Bot Contest from 2008. Although the contests all target the human- likeness of chatbot systems, they have some different rules and procedures.
The Loebner Prize competition follows the Loebner Protocol (Loebner, 2010) in its two-stage screening process for selecting better entrants. The first stage is based on the entrants’ responses to a set of identical questions posed by an automated system. Similar questions that will be used in the first stage are announced prior to the competition but not the exact questions that will be used. A group of the hosting university’s faculties and students then votes the responses for human-likeness. The four top-ranked chatbot systems will then be qualified to the second round of the competition, in which four judges, usually professionals from the field of AI, will chat with four pairs of humans and one of the four-top-ranked chatbots from the first stage for ten to twenty-five minutes and evaluate the human-likeness of each entity on a scale.
For the current study, award-winning chatbots that are accessible online are considered for they have been evaluated on their human- likeness and are more likely to be of a higher quality than the other existing chatbots.