臺灣科技大學機構典藏 NTUSTR:Item 987654321/82450

(1)

CLUSTERING ANALYSIS SYSTEM BASED ON STUDENTS’ MOTIVATION AND

LEARNING BEHAVIOR IN MOOC

Jian-Wei Tzeng ¹ Nen-Fu Huang ² Chia-An Lee ² Yi-Hsien Chen ² An-Chi Chuang ²

1

Center for Teaching and Learning Development, National Tsing Hua University

2

Department of Computer Science, National Tsing Hua University

ABSTRACT

With an increase in the number of massive open online course (MOOC), the amount of learning educational big data has also increased. Artificial intelligence technology is now applicable to big data analysis. This study implements a clustering system based on learning educational big data. This study uses the MOOC course offered at a university in the north of Taiwan as experimental data. Meanwhile, observing the ratio of the number of students who watched videos to that of students who finished practice exercises, we clustered students into different groups by using a K-means clustering module. Then, we use the deep learning prediction module to determine whether or not the students will change their clustering result next week. This system aims to provide teachers and students clustering results for the next week to recommend suitable learning strategies, which can facilitate the most appropriate guidance and more adaptive counseling.

Keywords: cluster analysis, deep learning, K-means, learning educational Big Data, MOOC

Introduction

Massive open online courses (MOOCs) refer to a new type of online class that provides open

access and offers interactive participation through video lectures, tests, and discussion forums

(Fauvel et al., 2018; Lee, 2019). With their increasing popularity, several enrollments in MOOCs

have generated considerable educational big data in terms of online activities and logs, which

(2)

might be valuable for academe and practitioners (Kastrati, Imran, & Kurti, 2019). The innovative use of video-based learning has attracted many researchers and practitioners in the education field. MOOCs also feature discussion forums in which learners can interact with teachers and fellow learners by sharing their thoughts or raising questions, and online practice systems enable students to evaluate their progress (Su, Huang, & Ding, 2016). Thus, MOOCs provide a high- quality autonomous learning environment, which is considered a novel learning approach in the 21st century (Tang, 2017).

The autonomous learning environment of MOOCs offers considerable flexibility and allows students to choose from a range of course topics. However, because MOOCs attract several geographically dispersed students (Veletsianos & Shepherdson, 2016), the students have diverse needs, interests, motives, and learning styles. For example, not all students aim to obtain the course certificate or successfully complete the course: some students might be interested only in learning certain concepts in the course, whereas others might have already understood a course topic and aim to only apply and consolidate their knowledge through discussions with teachers and other learners.

It is thus essential to identify students’ learning motives (de Braba, Kennedy, & Ainley, 2016).

Because student diversity implies diverse learning motives, the diversity of students enrolled in MOOCs should also be investigated. Thus, we propose a clustering analysis system that categorizes the students’ learning styles according to their learning motivation and behavior.

We design a new method that relies on previous courses, which indicate the students’

learning behaviors, to define groups based on this. Some students prefer to use videos to learn, whereas others prefer doing practice exercises; thus, we define students who prefer videos as the “Video” group and those who prefer practice exercises as the “Exercise” group.

Subsequently, during the starting period of new courses, we administered a questionnaire to incoming students of the course to determine their learning motivation and analyze their learning style. We combined the questionnaire results with the previously defined groups to determine which group the students belong to. Although many studies have suggested grouping students in MOOCs (Gasevic, Kovanovic, Joksimovic, & Siemens, 2014; Rodrigues, Ramos, Silva, &

Gomes, 2016), these studies have mostly performed post-course analyses, meaning that they used learners’ behavior data after they have finished their courses to conduct machine learning.

By contrast, our proposed method classifies students at an early stage of a course, meaning that

as the course progresses, we can obtain more precise grouping based on the accumulation of

information regarding students’ learning behaviors. Research prerequisites are students must

have the behavior of watching videos and answering exercises to analyze. In summary, this paper

proposes a method involving the use of a questionnaire analysis, complemented by a clustering

(3)

system and provides examples of how the method can be used in MOOC platforms. The innovative clustering system is tested and integrated into a live MOOC platform in Taiwan.

We use the 2016 Introduction to Computer Networks course as our experimental data. In addition, we use the clustering module to cluster the students of the course into eight groups and conduct user projection analysis so that we can explore the relation between the students’ learning behaviors and motivation. The defined groups include “Completing,” “Video,” “Exercise,”

“Forum,” “GiveUp,” “Testing,” “Disengaging,” and “DropOut.” (Huang et al., 2018). Then, we use several courses, including the 2015-2017 Introduction to Computer Networks and eight other courses, including “Topics of Investment,” “Introduction to Calculus,” “Introduction to Computer Programming,” “Principles of Economics,” “Introduction to Life Science,” “Introduction to Computer Science,” “General Chemistry” and “General Physics” (Appendix 1), to perform group predictions in the next week. We obtained more than 70% precision and more than 80% F1 score. This proves that our model is good for predicting the students’ next week’s grouping in other courses. Because it is difficult for teachers to observe and analyze students’ learning behaviors when using MOOCs, this paper has designed the system such that the test results are highly positive and can provide the student’s clustering results for the next week, due to recommending suitable learning strategies that can decide the most appropriate guidance and more adaptive counseling.

Literature Review

1. Learning Educational Big Data

The 2017 NMC Horizon Report indicated that artificial intelligence (AI) will strengthen the online teaching model, which will help the adaptive learning and academic research process, making the interaction between teachers and students more intuitive and frequent (Adams Becker et al., 2017). MOOCs promote a new generation of digital learning programs for the government in Taiwan, emphasize the multicurricular operation mode, and promote the transformation of learning and teaching to foster a quality education environment with appropriateness and education for all. MOOCs derive learning characteristic data and diversified test questions, including learners’

personal data (such as gender, age, educational level and place of residence) and the diverse data of

test questions (such as the number of candidates, number of passers, number of questions, responses

to test questions, and comments). Several scholars have suggested that learning analysis is a viable

approach for improving the quality and fitness of MOOCs (Y. Li & H. Li, 2017).

(4)

With the maturity of data exploration and related algorithms, an increasing number of researchers are using the Log Data of the learning platform to study learning outcomes (Schwendimann et al., 2017; Alexandron, Ruiperez-VaLiente, Chen, Munoz-Merina, & Pritchard, 2017). These studies have developed emerging research methods and areas, such as Educational Data Mining and Learning Analytics, all of which use learners’ login data on the platform to analyze the learning process (Romero & Ventura, 2016). Driven by AI, educational teaching scenes, forms (such as multimedia and virtual reality), and other aspects have produced new educational patterns. Big data has been widely used in educational fields, providing not only teachers but also academic institutions to students. The activity is evaluated, and based on this, the method of intervention guidance is determined to improve the learning effect.

2. Deep Learning Model

In this thesis, we use deep learning technology (LeCun, Bengio, & Hinton, 2015; Goodfellow, Bengio, & Courville, 2016; Schmidhuber, 2015) to build our prediction model. This section introduces the working of the deep learning model. A deep neural network is a mathematical model that mimics the biological nervous system. This network usually has several layers, which have tens to hundreds of neurons. The neuron multiplies the output of the previous layer by weights and then adds them together. After that, the output is converted by the activation function (Leshno, Lin, Pinkus, & Schocken, 1993) and the converted value is taken as the output of the neuron.

To simulate the neural network of a living being, the activation function is usually a nonlinear transformation. The traditional activation function is the Sigmoid function (Cybenko, 1989), which is a binary classification function. However, in deep neural networks, the learning effect of the Sigmoid function is relatively poor and the ReLu function (Nair & Hinton, 2010) or Softmax function (Rafferty, Shellito, Hyman, & Buie, 2006) is often used.

The architecture of a neural network indicates the number of layers, the number of neurons in each layer, and the way neurons are connected by the type of activation function. These parameters must be set before training. Therefore, the quality of the parameter setting also affects the performance of the neural network. The learning and training process of a neural network involves trying to find the best setting.

3. Cluster Analysis

Cluster analysis (Anderberg, 1973) is a technique for statistical data analysis that is used

in various processes and fields, such as machine learning (Witten, Frank, Hall, & Pal, 2016),

information retrieval, pattern recognition, image analysis, and bioinformatics. In cluster analysis,

(5)

data are condensed and samples are grouped into subclusters according to similarity in attributes.

Similarity is generally determined based on the distances between objects. A smaller relative distance indicates a higher similarity. After clustering, small differences exist within clusters or groups and large differences exist between clusters or groups. Accordingly, objects within the same subcluster exhibit some similar attributes, which is commonly manifested as closer proximity with each other in the coordinate system. Normally, data clustering is classified as an

“unsupervised learning method” (Hastie, Friedman, & Tibshirani, 2009), which means learning without a fixed answer. Thus, in the clustering method, there is no certain result.

Previous studies on MOOC clustering have used statistical methods to analyze students.

The statistical method usually involves directly assigning students to a group after setting the threshold of each group. Some methods use the concept of education to directly define the student group and then group the students according to their learning behavior patterns. Kizilcec, Pérez-Sanagustín, & Maldonado (2017) indicated different ideas proposed by Hill (2013) for classifying students. Based on the new data obtained from Stanford University professors, the authors described four student models that appear in the Coursera-style MOOC, created a new “No- Shows” category and renamed as “Lurkers” or “Observers.” They also provided a more rigorous description of each model to help define potential data collection that identifies these groups.

Subsequently, based primarily on Cryer’s scheme of Elton (Khalil & Ebner, 2017), which is an educational philosophy and incorporates a grouping approach to student learning behavior, Khalil and Ebner proposed a clustering method based on learning motivation (Elton, 1996). This method aims to use cluster analysis to explore the importance of learning motivation.

We re-defined the new group of type by combining the grouping of related studies and features of our MOOC platform. Compared with related studies, our proposed system can provide timely feedback to students. Compared to most related studies, which performed a cluster analysis after the end of the course, our system performs a cluster analysis as the course progresses and predicts the next week’s clustering to provide teachers or students the information to improve the completion rate.

System Architecture

In this study, we use students’ learning behaviors to design a clustering analysis system and

apply it to ShareCourse. The cluster analysis system classifies students into groups. Additionally,

we design a questionnaire analysis module that we use to analyze students’ background and learning

(6)

motives. The questionnaire enables us to investigate whether students’ background and motives affect their learning progress, and results help to define the student grouping during the starting period of the course to overcome the cold-start problem (Schein, Popescul, Ungar, & Pennock, 2002). Finally, we designed a prediction module by using deep learning methods to implement this system, which we can use to predict which group a student may change to in the next week, thereby learning about changes in student learning styles and predicting student learning behavior.

Our proposed system comprises four parts: the clustering module, the questionnaire module, the deep learning prediction module and the application programming interface (API) server. The clustering module extracts the behavioral features of students from the database to use as features in a subsequent cluster analysis and the API server saves the results in the database. The deep learning prediction module uses the results of the clustering module and the students’ learning features to predict the results of the students’ groups for the next week. The questionnaire module comprises a user interface through which questionnaire results are collected, and the API server saves the results in the database. The API server provides two main functions: (1) a database for use in the two aforementioned systems and (2) a function to provide teachers with information regarding students.

1. ShareCourse Database

We organize the database into two parts: the learning behavior database and the students’

background database. The learning behavior database, MongoDB, saves users’ learning behavior data, including data from the account, visited page log, forum boards, exercises, videos, and exams (Table 1). The clustering analysis system collects students’ learning behavior data saved in MongoDB and uses the data to classify students into different clusters.

We use students’ background data to create a student learning log and re-establish a database table. We save students’ background information and motives in the Share Course database server (MySQL).

Table 1 User activity logs module

Type Account Page Video Forum Exercise Exam

login leave play Post exerciseType getScore

logout pause Reply studentAns studentAns

leave like getScore

change rate

seek

video time

(7)

2. Clustering Module

The proposed method in the Introduction applies machine learning to group students into clusters. The purpose of this clustering is to analyze or predict learning behaviors. The module can be divided into four parts: data preprocessing, first-step clustering, second-step clustering and database saving. In data preprocessing, information regarding students’ learning history is extracted and organized as features for further analysis. The first-step clustering conducts a preliminary clustering of students participating in the course based on their learning behaviors.

The second-step clustering conducts a more detailed analysis of the results of first-step clustering to ensure precise student grouping. Finally, the API server saves the database, which can then provide information to teachers or students.

As Fig. 1 shows, the clustering module can be divided into three steps: data preprocessing, student clustering and database saving. We introduce the three parts in sequence.

Input Data:

Course ID

Crontab

Clustering System Clustering Algorithm

Data Pre-Processing 1. Data extraction 2. Missing value 3. Dimension reducing

First-step K-Means

Second-step K-Means

Max

Mid

Min

Completing Exercise

Video Testing

Forum GiveUp Disengaging

DropOut user

request reponse

reponse Server

request

Database

Data Analysis Module

Fig. 1 Architecture of clustering module

(8)

2.1 Data Pre-processing

In the data pre-processing stage, the clustering module extracts the behavioral characteristics of students from the database, which the learning log uses as features in the subsequent cluster analysis. Table 2 lists the student features extracted, such as video, exercise, and forum behavior.

Table 2　Users’ learning behavior

Feature Type Feature Name Meaning

Video Video_Finish_Ratio The completion rate of students watching the video Exercise Exercise_Finish_Ratio The completion rate of Students’ practice exercises

Forum • Forum_Post

• Forum_Reply

• Forum_Like

• Number of posts in the forum

• Number of responses in the forum

• Number of clicks that are liked in the forum Exam Exam_Page Number of visits to the exam page, which we treat

as a degree of attention to the exam

In applied statistics, standardization can have several meanings. In the simplest sense, standardization allows numerical results to be ranked and their weights adjusted accordingly. In more complex cases, standardization is used to align adjusted values. In educational evaluation applications, the standardization of scores can be used to align score distribution with normal distribution. Furthermore, standardization minimizes the effect of outliers on results, hence enabling a more precise comparison of features. In this study, as some data were numerical values and some were ratios, we used the scikit-learn Standard Scaler method to normalize features by removing the mean and scaling to unit variance. We use this method to scale the value of all features in the range [-1, 1].

z = ( X –) / (1)

2.2 Clustering Algorithm

In the student clustering step, we use the k-means method as our algorithm, which is an unsupervised method. The k-means algorithm is used to partition a given set of observations into a pre-defined amount of k clusters.

In the first step, the first k-means algorithm is executed. Because we want to divide students

(9)

into three levels, we take the value of k = 3. This step aims to make preliminary groupings for students. We use video and exercise features to classify students into the following groups, which represent the degree of learning: “Max,” “Mid” and “Min.” We use the two most basic feature values of the open course to achieve a preliminary grouping.

The advantage of this method is that it can analyze grouping results in a step-by-step manner.

If we use all features at once, it may cause the results to be affected by a specific feature value.

Therefore, in this step, we use the most basic video and exercise feature values to assign students to groups of different learning levels. In the next stage of clustering, we add new features to group students more accurately.

In the second step, we use the k-means individually for the three groups, which have different levels, but add new features, such as forum behavior and number of clicks on the exam page, to identify students with different learning behaviors and ensure precise student groupings.

For example, we want to find students who are more interested in the exam or who only like to interact in the forum.

In the Max group, we divide all students into two groups (k = 2). Because the features used in the first stage are videos and exercises, students in this group have a certain degree of performance in both of the above. Therefore, after adding new features, we use k-means algorithms to divide students into two groups: students with normal learning behaviors and those with special learning behaviors.

However, the Mid group is rather special. Since most of the open-course students are assigned to this group in the first stage, we set the k value to 3 when executing the k-means algorithm. In addition to students with normal learning behaviors and those with special learning behaviors, this group includes students who quit midway through the semester.

In the Min group, we divide students into two groups (k = 2). One of the groups represents students who dropped out and never studied after the course, and the other group of students may have some learning behaviors, but we classify these students as those with special learning behaviors. This division aims to identify students with special behaviors and compare them with other groups of students who also have special learning behaviors.

Finally, we compare students with special behaviors, which is forum and exam page, and use the pre-defined group types to classify these students. We average four types of features and compare each student with these values to assign students to the most appropriate groups.

This method ensures that students who have special learning behaviors for each course will be

unaffected by other courses that have different features.

(10)

2.3 Pre-defined Group

We use the data of the Introduction to Computer Networks 2015-2017 course to define our group types. We then use our clustering module to execute the data of these three courses and collect data from students with special behaviors for analysis by section. As shown in Table 3, we pre-define eight groups of students. For the new course, we use the clustering module to distribute students according to the definitions of these eight groups.

Table 3　Group definition list Group type Explanation

Completing Students with very high completion rates for video viewing and practice exercises

Video Students with a preference for video learning who do not like to complete exercises

Exercise Students with a preference for exercise learning who do not like to watch videos

Testing Students who do not display many learning behaviors (video and exercise) but are concerned about exam information

Forum Students who do not display many learning behaviors (video and exercise) but have a high degree of participation in forums

GiveUp Students who break their learning due to certain factors in the middle of the semester and even half-way through the semester

Disengaging Students who only participate in the course for the first stage

DropOut Students who only enrolled in the course but did not display any learning behavior

Prediction Module

In our proposed clustering analysis system, we use a deep learning method to implement a prediction clustering result module. This system aims to predict the types of groups that students may be classified into for the next week, which we use to judge whether or not a student’s learning style will change.

Fig. 2 shows the architecture of the prediction module, which is divided into two main parts:

the judge change model, which judges whether students will change group in the next week, and

(11)

the predicted clustering result model, which predicts students’ grouping results for the following week. The predicted clustering result model uses the output of the judge change model as its input. The following sections explain these two models in detail.

Input Layer

Output Input Layer

predict clustering result model judge change model

Outpou Layer (predict results)

0: Non change 1: change

Fig. 2 　Architecture of prediction module

1. Judge Change Model

This model predicts whether students will change their group in the next week. As shown in Fig. 3, we use the multilayer perceptron (Gardner & Dorling, 1998; Kruse et al., 2013) method in deep learning to build our model.

Using the feature types of Fig. 3 as the input features of this model, the “Learning_level” is the output from first-step clustering. Because it has three different results, we assign 0 to the “Min”

group for the lowest order and increase it sequentially. However, because students’ grouping results have no order, we cannot compare the order of the “Forum” and “Exam” groups. To overcome this problem, we use the one-hot encode method (Lin & Newton, 1989) (Fig. 4), which gives us eight different groups from which we generate eight new feature values. The input layer therefore contains 13 features or nodes.

We use Keras (Chang, Feig, Moser, & Thuraisingham, 2015) to build a neural network

model comprising five layers. Because we take a positive number of features, we use the ReLu

function as our activation function. However, the output is 0 or 1, so we use the Sigmoid function

(12)

as the activation function in the output layer.

We summarize this model as follows:

1. Input Layer: 13 features; this layer needs to provide 13-dimensional data.

2. Hidden Layer: three hidden layers; because all our eigenvalues are positive, the activation function selects Relu.

3. Output layer: 0 means the student will not change the clustering results and 1 means the student will change the clustering results.

Input Layer

Hidden Layer

Output Layer

id 0 1 2 3 4 5~12

Feature Video Exercise

Forum Exam_page Learning_level Clustering_result

Output 0: Non Change 1: Change

Fig. 3　Model used to judge whether or not students will change group

Completing (A) Video (B)

Disengagin (X) Dropout (Y) Completing

Group

Video

Disengagin Dropout

clustering_A 0 0

0 0

1 0

0 0

0 1 0

1

0 0

clustering_B …… clustering_X clustering_Y

Fig. 4 　Example of one-hot encode method

2. Prediction Clustering Results Model

The main purpose of building this model is to predict the type of group in which students

will be classified for the next week. As shown in Fig. 5, the output layer no longer has only one

node, unlike the judge change model. Because there are eight types of groups, the eight nodes of

(13)

the output layer represent eight different groups. The input layer needs 14 nodes because we add the output of the judge change model to assist the training. As the output layer of this model is more complicated than that of the judge change model, we increased the number of hidden layers to achieve better training.

Input Layer Hidden Layer Output Layer

Fig. 5　Model used to predict student grouping results

During the training process, if we have been training with the same training data, the training data will always get higher accuracy than the testing data. Therefore, we use the k-fold cross-validation (Kohavi, 2001) method to train our model. This method divides all training data into k equal parts. Taking k = 10 as an example, nine aliquots of data are used as training data and 1 as testing data. In this way, interactive training is achieved to avoid the test data affecting the accuracy of the entire model. As shown in Table 4, we use 10-fold cross-validation to train the model. The accuracy of our training data is not too different from that of the testing data.

Therefore, we can conclude that our trained model is unbiased.

Table 4 　Accuracy of 10-fold cross-validation

K 1 2 3 4 5 6 7 8 9 10

Train-acc (%) 91.1 91.2 92.0 88.9 93.1 90.4 88.9 92.3 90.8 90.4

Test-acc (%) 92.3 92.4 92.7 91.5 92.9 92.3 91.4 92.9 92.1 92.1

(14)

We select the ReLu function as the activation function because the input features are all positive. However, we choose the softmax function as the activation function in the output layer instead of sigmoid because the sigmoid function maps the output to [0, 1], which is used for two classifications, whereas the softmax function maps the n-dimensional vector to (0, 1).

Subsequently, we choose the largest value as our prediction results.

3. Questionnaire Analysis System

The MOOC platform used in this study did not collect user background information when students registered for accounts. Because the users of the MOOC platform are international and highly diverse, their motivations, educational levels and majors are difficult to be determined. We thus designed a user interface questionnaire to collect information regarding users’ backgrounds and motives when they registered for the course, because these factors typically affect student learning behavior. The questionnaire results helped us improve our data analysis model.

In this module, we used questionnaires to understand students’ motivation and learning styles. We used some learning motivation research (Magen-Nagar & Cohen, 2017; Kizilcec et al., 2017) to design our questionnaire and design four different stages of questions in the following order: registration, early stage of course, mid-course (approximately mid-term exam period), and end of course.

We divided the two categories of APIs into motivational and learning style questionnaires.

During the registration phase, we administered the questionnaire to students to determine their purpose for enrolling in the course and understand their learning motivation. The system asked students follow-up questions at different time points to ensure that students’ learning style remains unaffected by external factors during their learning.

Implementation and Experiment

1. User Projection Analysis

We used the 2016 Introduction to Computer Networks course as our experimental data (Table 5) to explore the relation between students’ learning behavior and course completion rates. The defined groups include “Completing,” “Video,” “Exercise,” “Forum,” “GiveUp,”

“Testing,” “Disengaging,” and “Dropout.” As shown in Fig. 6, we project all students onto a two-

dimensional plane. The x-axis reflects the ratio of students that watched videos, and the y-axis

(15)

reflects the ratio of students that finished practice exercises. The “Completing” group, in the upper right corner, displays a better learning behavior. We also identified some students with special learning behaviors but cannot guarantee that good learning behaviors correlate with the course completion rate.

Table 5　Information of the experiment course

Course Introduction to Computer Networks

Duration 19-09-2016 to 05-12-2016

Number of students 1,491

Number of exams 2

Number of pass students 245

pass rate (%) 16.43

1.0

0.8

0.6

0.4

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0

Completing Video Exercise

Exam Forum GiveUp

Disengaging DropOut

Fig. 6　Projection of students’ clustering results

Table 6 showed that in addition to the “Forum” group, each group contained students that

passed the exam. First, we found that students who only like to watch movies had a much lower

pass rate than other students who normally studied. The reason for the positive result may be that

(16)

exercises can directly help students master the concept. The other aspect may be the problem type of exercises being closer to exam types; thus, there is a big advantage in performing the exercises first. Thus, we found that exercises have a great influence on passing the test.

Table 6 　Students’ clustering results

Type of Group Number of students Number of passing students pass rate (%)

Completing 179 139 77.09

Video 120 37 30.83

Exercise 6 5 83.33

Exam 38 24 63.15

Forum 6 0 0

GiveUp 34 15 44.15

Disengaging 185 6 3.24

DropOut 923 19 2.05

Total Number of students: 1,491

We found that students who passed the exam were not necessarily classified in the top group.

For example, the “Give Up” group also had a high passing rate. Although many of the students in this group gave up after the first mid-term exam, they exhibited complete learning behaviors before the exam. Although we were unsure regarding the type of external factors affecting the students, this result highlighted the importance of exploring students’ motivation and learning styles. The results also revealed that some people use alternative accounts to register for courses to obtain exam questions; it is undeniable that learning motivation and professional field had a great impact on student learning behavior.

2. Accuracy of Prediction Result Module

Having designed a prediction model that was implemented using deep learning methods and trained the model using training data from the Introduction of Computer Networks 2015-2017 course (Table 7), we thus tested the accuracy of our model using different courses.

First, we selected courses that have the same curriculum model to test. As shown in Table

8, we find that the Precision value is higher than 70% except for the course “Introduction to

Calculus.” Although this course mode and training data were similar, we found that most of

the exercises were computational questions, which led to a drop in the rate at which students

(17)

performed practice exercises. However, as our models rely heavily on the feature of the exercise, this phenomenon led to unexpected prediction results. Notably, an average F1 score of 82% indicated that our module performed well in the same curriculum model. Therefore, our proposed model can successfully find students whose clustering results will change in the next week.

Next, we used the course model to compare different classes. As shown in Tables 9 and 10, we can find courses with different curriculum models, with lower Recall and Precision.

Through this experiment, we found that if all exercises were arranged behind videos, it may result in a lower completion rate of the exercises. However, putting videos with the corresponding exercises will improve students’ willingness to practice. Owing to the relatively low completion rate of these four courses, some courses did not have exercises, resulting in a low F1 score of our modules. However, the accuracy of the four courses was high because most students do not change groups. Our goal was to find those students who had changed groups and successfully predicted their clustering results. This experiment showed that our modules must be used in the same curriculum model.

Table 7　Information of the 2015-2017 Introduction to Computer Networks Course

Course Introduction to Computer Networks

Duration 14-09-2015 to

14-12-2015

19-09-2016 to 05-12-2016

18-09-2017 to 27-11-2017

Number of students 7145 1491 1140

Number of exams 3 2 2

Number of passing students 793 245 198

pass rate (%) 9.36 16.43 17.36

Fee Free Free Free

Table 8　Value of confusion matrix for courses with the same curriculum model

Course TP TN FP FN

Topics of Investment 45 936 14 1

Introduction to Calculus 22 418 14 1

Introduction to Computer Programming 17 156 4 3

Principles of Economics 15 105 5 1

(18)

Table 9　Value of confusion matrix for courses with different curriculum models

Course TP TN FP FN

Introduction to Life Science 2 79 3 1

Introduction to Computer Science 2 177 4 5

General Chemistry 7 104 8 7

General Physics 13 62 4 8

Table 10　Analysis of confusion matrix for courses with different curriculum models

Course Accuracy

(%)

Recall (%)

Precision (%)

F1 score (%)

Introduction to Life Science 95.29 66.67 40 50

Introduction to Computer Science 95.2 28.57 33.33 30.76

General Chemistry 88.09 50. 46.67 48.27

General Physics 86.2 61.9 76.47 68.4

The aforementioned results showed that our module can be applied as long as it had the same curriculum model as the “Introduction to Computer Networks.” However, as the number of students in the “Introduction of Life Science” course is small and only a few students were likely to change groups, meaning that our module may have a greater accuracy for a student as long as it predicted the wrong one.

3. Relation between Motivation and Clustering Results

This section presented the relation between student motivation and clustering results. We obtained 789 questionnaires from students on testing courses and divided the students’ motivation into “extrinsic factors” and “intrinsic factors.”

We defined extrinsic factors as substantial benefits, such as a certificate or credit and intrinsic factors as improvement self-ability. As shown in Tables 11 and 12, because students who responded to the questionnaire attended courses, the sum of the Completing and Video groups was greater than that of the DropOut group and the ratio of DropOut was approximately 30%, which was relatively low for the MOOC platform.

There were still some students in the “Drop Out” group. The questionnaire results indicate

that the dropout ratio related to intrinsic factors for self-enhancement was lower than that for

other factors.

(19)

Table 11　Relation between extrinsic factors and clustering results (Extrinsic factors)

Cluster Credit (%) Certificate (%)

Completing 8.6 11.9

Video 26 27.1

Exercise 0 0

Exam 1 0.8

Forum 1.6 3.3

GiveUp 7.6 5.

Disengaging 18.7 14.8

DropOut 36.3 36.8

Table 12　Relation between intrinsic factors and clustering results (Intrinsic factors) Cluster Interested (%) Peer recommendation (%) Enhance professional (%)

Completing 11.2 9.8 18

Video 22.9 25.6 31.9

Exercise 0 0 0

Exam 0.1 0 0

Forum 1.4 0.4 0

GiveUp 6.3 10.3 5.6

Disengaging 19.8 18.7 19.4

DropOut 38 34.9 25

Conclusion and Future Work

Our study aimed to classify students into different groups at the beginning of a course to

solve the cold-start problem. We used data from the previous iteration of the course to define

groups using the clustering module and a questionnaire module to identify student motivation

and learning strategies during the course. We used questionnaire results to achieve our goal

of grouping students at the start of the course. As the course progressed, we executed a daily

clustering system. Because questionnaire results were only a preliminary reference that indicated

the groups to which the students belonged, we continued to optimize the clustering situation to

achieve a more accurate grouping of the students. We can also optimize the definition of groups

or add new groups after testing a larger number of samples. Although numerous studies have

(20)

grouped students in MOOCs, they have mostly performed pos-tcourse analyses. By contrast, our proposed method that used questionnaire results and previously defined groups to classify students at the start of a course enabled more precise grouping as the course progresses based on the accumulated information regarding student learning behavior.

The proposed system also implemented a deep learning prediction model, which combined

student learning behavior with the week’s clustering results. This system can be used to predict

students’ clustering results for the next week. This system aimed to provide teachers and students

clustering results for the next week to recommend suitable learning strategies that can decide the

most appropriate guidance and more adaptive counseling. In the future, we will use the clustering

results and develop a diagnostic system to provide a reference basis for future adaptive videos

and exercises. According to this system, students can find out their learning weaknesses without

teachers’ pro-active help. Furthermore, they can learn or review the chapter provided by this

diagnostic system.

(21)

References

Adams Becker, S., Cummins, M., Davis, A., Freeman, A., Hall Giesinger, C., & Ananthanarayanan, V. (2017). NMC Horizon Report: 2017 Higher Education Edition. Austin, Texas : The New Media Consortium.

Alexandron, G., Ruiperez-VaLiente, J. A., Chen, Z. Z., Munoz-Merina, P. J. , & Pritchard, D. E.

(2017) . Copying@Scale: Using Harvesting Accounts for Collecting Correct Answers in a MOOC. Computers & Education, 108, 96-144. doi: 10.1016/j.compedu.2017.01.015.

Anderberg, M. R. (1973). Cluster Analysis for Applications. New York, NY: Academic Press.

Chang, R., Feig, E., Moser, L., & Thuraisingham, B. (2015). Message from the General Chairs.

Proceedings of 2015 IEEE International Conference on Web Services (ICWS), xvi–xvii. doi:

10.1109/ICWS.2015.5

Cybenko, G. (1989). Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2: 303-314. doi: 10.1007/BF02551274.

de Barba, P., Kennedy, G., & Ainley, M. (2016). The Role of Students’ Motivation and Participation in Predicting Performance in a MOOC. Journal of Computer Assisted Learning, 32(3), 218-231. doi: 10.1111/jcal.12130.

Elton, L. (1996). Strategies to Enhance Student Motivation: A Conceptual Analysis. Studies in Higher Education, 21(1), 57-68. doi: 10.1080/03075079612331381457.

Fauvel, S., Yu, H., Miao, C. Y., Cui, L. Z., Song, H. J., Zhang, L.,… & Leung, C. (2018).

Artificial Intelligence Powered MOOCs: A Brief Survey. Proceedings of 2018 IEEE International Conference on Agents (ICA), pp. 56-61. doi: 10.1109/AGENTS.2018.8460059.

Gardner, M. W., & Dorling, S. R. (1998). Artificial Neural Networks (The Multilayer Perceptron) - A Review of Applications in the Atmospheric Sciences. Atmospheric Environment, 32(14- 15), 2627-2636. doi: 10.1109/10.1016/S1352-2310(97)00447-0

Gasevic, D., Kovanovic, V., Joksimovic, S., & Siemens G. (2014). Where Is Research on Massive Open Online Courses Headed? A Data Analysis of the MOOC Research Initiative.

The International Review of Research in Open and Distributed Learning, 15(5). doi:

10.19173/irrodl.v15i5.1954.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: The MIT press.

Hastie, T., Friedman, J., & Tibshirani, R. (2009). The Elements of Statistical Learning (pp. 485- 585). Berlin, German: Springer.

Hew, K. F., & Cheung, W. S. (2014). Students’ and Instructors’ Use of Massive Open Online

(22)

Courses (MOOCs): Motivations and Challenges. Educational Research Review, 12(1), 45- 58. doi: 10.1016/j.edurev.2014.05.001.

Hill, P. (2013, March 10). Emerging Student Patterns in Moocs: A (Revised) Graphical View.

[Web blog message]. Retrieved from https://mfeldstein.com/emerging-student-patterns-in- moocs-a-revised-graphical-view/

Huang, N. F., Hsu, I. H., Lee, C. A., Chen, H. C., Tzeng, J. W., & Fang T. T. (2018). The Clustering Analysis System Based on Students’ Motivation and Learning Behavior.

Proceedings of 2018 Learning With MOOCS (LWMOOCS), pp. 117-119. doi: 10.1109/

LWMOOCS.2018.8534611.

Kastrati, Z., Imran, A. S., & Kurti, A. (2019). Integrating Word Embeddings and Document Topics with Deep Learning in a Video Classification Framework. Pattern Recognition Letters. 128, 85-92. doi: 10.1016/j.patrec.2019.08.019.

Khalil, M., & Ebner, M. (2017). Clustering Patterns of Engagement in Massive Open Online Courses (moocs): The Use of Learning Analytics to Reveal Student Categories. Journal of Computing in Higher Education, 29(1), 114-132. doi: 10.1007/s12528-016-9126-9.

Kizilcec, R. F., Pérez-Sanagustín, M., & Maldonado, J. J. (2017). Self-regulated Learning Strategies Predict Learner Behavior and Goal Attainment in Massive Open Online Courses.

Computers & Education, 104, 18-33. doi: 10.1016/j.compedu.2016.10.001.

Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Steinbrecher, M., & Held, P. (2013). Multi- Layer Perceptrons. In R. Kruse, C. Borgelt, F. Klawonn, C. Moewes, M. Steinbrecher, & P.

Heldb (Eds), Computational Intelligence (pp. 47-81). Berlin, German: Springer.

Kohavi, R. (2001). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th international joint conference on Artificial intelligence, vol. 2, 1137-1143.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 436-444. doi: 10.1038/

nature14539.

Lee, Y. (2019). Using Self-Organizing Map and Clustering to Investigate Problem-Solving Patterns in the Massive Open Online Course: An Exploratory Study. Journal of Educational Computing Research, 57(2), 471-490. doi:10.1177/0735633117753364

Leshno, M., Lin, V. Y., Pinkus, A., & Schocken, S. (1993). Multilayer Feedforward Networks With a Non-polynomial Activation Function Can Approximate A ny Function. Neural networks, 6(6), 861-867. doi: 10.1016/S0893-6080(05)80131-5

Li, Y., & Li. H. (2017). MOOC-FRS: A New Fusion Recommender S ystem f or MOOCs.

Proceedings of the the 2017 IEEE 2nd Advanced Information Technology, Electronic and

(23)

Automation Control Conference (IAEAC), pp. 1481-1488. doi: 10.1109/IAEAC.2017.8054260.

Lin, B., & Newton, A. R. (1989). A Generalized Approach to the Constrained Cubical Embedding Problem. Proceedings of the 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors. 400-403. doi: 10.1109/ICCD.1989.63396.

Magen-Nagar, N., & Cohen, L. (2017). Learning Strategies as a Mediator for Motivation and a Sense of Achievement Mmong Students who Study in MOOCs. Education and Information Technologies, 22(3), 1271-1290. doi: 10.1007/s10639-016-9492-y.

Nair, V., & Hinton, G. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines.

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 807-814.

Rafferty, J., Shellito, P., Hyman, N. H., & Buie, W. D. (2006). Practice Parameters for Sigmoid Diverticulitis. Diseases of the Colon & Rectum, 49(7), 939-944. doi: 10.1007/s10350-006- 0578-2.

Rodrigues, R. L., Ramos, J. L. C., Silva, J. C. S., & Gomes, A. S. (2016). Discovery Engagement Patterns MOOC Through Cluster Analysis. IEEE Latin America Transactions, 14(9), 4129- 4135. doi: 10.1109/TLA.2016.7785943

Romero, C., & Ventura, S. (2016). Educational Data Science in Massive Open Online Courses.

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 7(1). 1-12. doi:

10.1002/widm.1187.

Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. (2002). Methods and Metrics for Cold-start Recommendations. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 253-260. doi:

10.1145/564376.564421.

Schmidhuber, J. (2015) Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117. doi: 10.1016/j.neunet.2014.09.003.

Schwendimann, B. A., Rodríguez-Triana, M. J., Vozniuk, A., Prieto, L. P., Boroujeni, M. S., Holzer, A., …, & Dillenbourg, P. (2017). Perceiving Learning at a Glance: A systematic Literature Review of Learning Dashboard Research. IEEE Transactions on Learning Technologies, 10(1), 30-41. doi: 10.1109/TLT.2016.2599522.

Su, Y. S., Huang, C. S. J., Ding, T. J. (2016). Examining the Effects of MOOCs Learners’ Social Searching Results on Learning Behaviors and Learning Outcomes. Eurasia Journal of Mathematics, Science and Technology Education, 12(9), 2517-2529. doi: 10.12973/

eurasia.2016.1282a.

Tang, S. (2017). Learning Mechanism and Function Characteristics of MOOC in the Process of

Higher Education. Eurasia Journal of Mathematics, Science and Technology Education,

(24)

13(12), 8067-8072. doi: 10.12973/ejmste/80769

Veletsianos, G., & Shepherdson, P. (2016). A Systematic Analysis and Synthesis of the Empirical Mooc Literature Published in 2013-2015. The International Review of Research in Open and Distributed Learning, 17(2). doi: 10.19173/irrodl.v17i2.2448.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA: Morgan Kaufmann.

作者簡介

曾建維，國立清華大學教學發展中心，助理研究員

Jian-Wei Tzeng is an Assistant Research in the Research Fellow of Center for Teaching and Learning Development, National Tsing Hua University, Hsinchu, Taiwan.

黃能富，國立清華大學資訊工程學系，教授

Nen-Fu Huang is a Professor in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.

李加安，國立清華大學資訊工程學系，博士候選人（通訊作者）

Chia-An Lee is a Ph. D. Candidate in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan. (Corresponding Author)

陳羿先，國立清華大學資訊工程學系，碩士生

Yi-Hsien Chen is a Master Student in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.

莊安琦，國立清華大學資訊工程學系，碩士生

An-Chi Chuang is a Master Student in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.

收稿日期：民國109年05月28日

修正日期：民國109年08月11日

接受日期：民國109年08月18日

(25)

Appendix A

Details of the testing course

Course Number of

students Fee Description

Topics of Investment 933 Free Curriculum model and training data are similar.

Introduction to Calculus 696 Charge Curriculum model and training data are similar, but exercises are more computational.

Introduction to Life Science

129 Charge Course videos are all before the exercise.

General Chemistry 347 Charge Don’t have any exercises.

General Physics 132 Charge Don’t have any exercises.

Introduction to Computer Science

281 Charge Course videos are all before the exercise.

Introduction to Computer Programming

326 Charge Curriculum model and training data are similar.

Principles of Economics 178 Charge Curriculum model and training data are

similar.

(26)

基於學生學習動機與行為之磨課師分群 分析系統

曾建維 ¹ 　黃能富 ² 　李加安 ² 　陳羿先 ² 　莊安琦 ²

1

國立清華大學教學發展中心

2

國立清華大學資訊工程學系

摘　要

隨著大規模開放線上課程（磨課師）的推展，應用教育大數據與人工智慧分析學習行為的模式也更加盛行，本研究開發一個基於大數據分析學習行為的分群系統，並以 2016 年開設之「計算機網路概論」磨課師課程做為數據來源。系統將基於 K-means 分群模組，分析學生觀看影片的比例與完成練習題的比例並將學生分群。另一方面，透過基於深度學習的預測模組，可以預測該學生於隔一週的學習行為是否改變。本系統希望透過預測模組提供老師與學生分群結果，並針對分群結果提供適合個人的學習策略推薦，達到引導學習與提供適性化學習之效果。

關鍵字：磨課師、教育大數據、K-means 分群、深度學習、適性化學習