CLUSTERING ANALYSIS SYSTEM BASED ON STUDENTS’ MOTIVATION AND
LEARNING BEHAVIOR IN MOOC
Jian-Wei Tzeng 1 Nen-Fu Huang 2 Chia-An Lee 2 Yi-Hsien Chen 2 An-Chi Chuang 2
1
Center for Teaching and Learning Development, National Tsing Hua University
2
Department of Computer Science, National Tsing Hua University
ABSTRACT
With an increase in the number of massive open online course (MOOC), the amount of learning educational big data has also increased. Artificial intelligence technology is now applicable to big data analysis. This study implements a clustering system based on learning educational big data. This study uses the MOOC course offered at a university in the north of Taiwan as experimental data. Meanwhile, observing the ratio of the number of students who watched videos to that of students who finished practice exercises, we clustered students into different groups by using a K-means clustering module. Then, we use the deep learning prediction module to determine whether or not the students will change their clustering result next week. This system aims to provide teachers and students clustering results for the next week to recommend suitable learning strategies, which can facilitate the most appropriate guidance and more adaptive counseling.
Keywords: cluster analysis, deep learning, K-means, learning educational Big Data, MOOC
Introduction
Massive open online courses (MOOCs) refer to a new type of online class that provides open
access and offers interactive participation through video lectures, tests, and discussion forums
(Fauvel et al., 2018; Lee, 2019). With their increasing popularity, several enrollments in MOOCs
have generated considerable educational big data in terms of online activities and logs, which
might be valuable for academe and practitioners (Kastrati, Imran, & Kurti, 2019). The innovative use of video-based learning has attracted many researchers and practitioners in the education field. MOOCs also feature discussion forums in which learners can interact with teachers and fellow learners by sharing their thoughts or raising questions, and online practice systems enable students to evaluate their progress (Su, Huang, & Ding, 2016). Thus, MOOCs provide a high- quality autonomous learning environment, which is considered a novel learning approach in the 21st century (Tang, 2017).
The autonomous learning environment of MOOCs offers considerable flexibility and allows students to choose from a range of course topics. However, because MOOCs attract several geographically dispersed students (Veletsianos & Shepherdson, 2016), the students have diverse needs, interests, motives, and learning styles. For example, not all students aim to obtain the course certificate or successfully complete the course: some students might be interested only in learning certain concepts in the course, whereas others might have already understood a course topic and aim to only apply and consolidate their knowledge through discussions with teachers and other learners.
It is thus essential to identify students’ learning motives (de Braba, Kennedy, & Ainley, 2016).
Because student diversity implies diverse learning motives, the diversity of students enrolled in MOOCs should also be investigated. Thus, we propose a clustering analysis system that categorizes the students’ learning styles according to their learning motivation and behavior.
We design a new method that relies on previous courses, which indicate the students’
learning behaviors, to define groups based on this. Some students prefer to use videos to learn, whereas others prefer doing practice exercises; thus, we define students who prefer videos as the “Video” group and those who prefer practice exercises as the “Exercise” group.
Subsequently, during the starting period of new courses, we administered a questionnaire to incoming students of the course to determine their learning motivation and analyze their learning style. We combined the questionnaire results with the previously defined groups to determine which group the students belong to. Although many studies have suggested grouping students in MOOCs (Gasevic, Kovanovic, Joksimovic, & Siemens, 2014; Rodrigues, Ramos, Silva, &
Gomes, 2016), these studies have mostly performed post-course analyses, meaning that they used learners’ behavior data after they have finished their courses to conduct machine learning.
By contrast, our proposed method classifies students at an early stage of a course, meaning that
as the course progresses, we can obtain more precise grouping based on the accumulation of
information regarding students’ learning behaviors. Research prerequisites are students must
have the behavior of watching videos and answering exercises to analyze. In summary, this paper
proposes a method involving the use of a questionnaire analysis, complemented by a clustering
system and provides examples of how the method can be used in MOOC platforms. The innovative clustering system is tested and integrated into a live MOOC platform in Taiwan.
We use the 2016 Introduction to Computer Networks course as our experimental data. In addition, we use the clustering module to cluster the students of the course into eight groups and conduct user projection analysis so that we can explore the relation between the students’ learning behaviors and motivation. The defined groups include “Completing,” “Video,” “Exercise,”
“Forum,” “GiveUp,” “Testing,” “Disengaging,” and “DropOut.” (Huang et al., 2018). Then, we use several courses, including the 2015-2017 Introduction to Computer Networks and eight other courses, including “Topics of Investment,” “Introduction to Calculus,” “Introduction to Computer Programming,” “Principles of Economics,” “Introduction to Life Science,” “Introduction to Computer Science,” “General Chemistry” and “General Physics” (Appendix 1), to perform group predictions in the next week. We obtained more than 70% precision and more than 80% F1 score. This proves that our model is good for predicting the students’ next week’s grouping in other courses. Because it is difficult for teachers to observe and analyze students’ learning behaviors when using MOOCs, this paper has designed the system such that the test results are highly positive and can provide the student’s clustering results for the next week, due to recommending suitable learning strategies that can decide the most appropriate guidance and more adaptive counseling.
Literature Review
1. Learning Educational Big Data
The 2017 NMC Horizon Report indicated that artificial intelligence (AI) will strengthen the online teaching model, which will help the adaptive learning and academic research process, making the interaction between teachers and students more intuitive and frequent (Adams Becker et al., 2017). MOOCs promote a new generation of digital learning programs for the government in Taiwan, emphasize the multicurricular operation mode, and promote the transformation of learning and teaching to foster a quality education environment with appropriateness and education for all. MOOCs derive learning characteristic data and diversified test questions, including learners’
personal data (such as gender, age, educational level and place of residence) and the diverse data of
test questions (such as the number of candidates, number of passers, number of questions, responses
to test questions, and comments). Several scholars have suggested that learning analysis is a viable
approach for improving the quality and fitness of MOOCs (Y. Li & H. Li, 2017).
With the maturity of data exploration and related algorithms, an increasing number of researchers are using the Log Data of the learning platform to study learning outcomes (Schwendimann et al., 2017; Alexandron, Ruiperez-VaLiente, Chen, Munoz-Merina, & Pritchard, 2017). These studies have developed emerging research methods and areas, such as Educational Data Mining and Learning Analytics, all of which use learners’ login data on the platform to analyze the learning process (Romero & Ventura, 2016). Driven by AI, educational teaching scenes, forms (such as multimedia and virtual reality), and other aspects have produced new educational patterns. Big data has been widely used in educational fields, providing not only teachers but also academic institutions to students. The activity is evaluated, and based on this, the method of intervention guidance is determined to improve the learning effect.
2. Deep Learning Model
In this thesis, we use deep learning technology (LeCun, Bengio, & Hinton, 2015; Goodfellow, Bengio, & Courville, 2016; Schmidhuber, 2015) to build our prediction model. This section introduces the working of the deep learning model. A deep neural network is a mathematical model that mimics the biological nervous system. This network usually has several layers, which have tens to hundreds of neurons. The neuron multiplies the output of the previous layer by weights and then adds them together. After that, the output is converted by the activation function (Leshno, Lin, Pinkus, & Schocken, 1993) and the converted value is taken as the output of the neuron.
To simulate the neural network of a living being, the activation function is usually a nonlinear transformation. The traditional activation function is the Sigmoid function (Cybenko, 1989), which is a binary classification function. However, in deep neural networks, the learning effect of the Sigmoid function is relatively poor and the ReLu function (Nair & Hinton, 2010) or Softmax function (Rafferty, Shellito, Hyman, & Buie, 2006) is often used.
The architecture of a neural network indicates the number of layers, the number of neurons in each layer, and the way neurons are connected by the type of activation function. These parameters must be set before training. Therefore, the quality of the parameter setting also affects the performance of the neural network. The learning and training process of a neural network involves trying to find the best setting.
3. Cluster Analysis
Cluster analysis (Anderberg, 1973) is a technique for statistical data analysis that is used
in various processes and fields, such as machine learning (Witten, Frank, Hall, & Pal, 2016),
information retrieval, pattern recognition, image analysis, and bioinformatics. In cluster analysis,
data are condensed and samples are grouped into subclusters according to similarity in attributes.
Similarity is generally determined based on the distances between objects. A smaller relative distance indicates a higher similarity. After clustering, small differences exist within clusters or groups and large differences exist between clusters or groups. Accordingly, objects within the same subcluster exhibit some similar attributes, which is commonly manifested as closer proximity with each other in the coordinate system. Normally, data clustering is classified as an
“unsupervised learning method” (Hastie, Friedman, & Tibshirani, 2009), which means learning without a fixed answer. Thus, in the clustering method, there is no certain result.
Previous studies on MOOC clustering have used statistical methods to analyze students.
The statistical method usually involves directly assigning students to a group after setting the threshold of each group. Some methods use the concept of education to directly define the student group and then group the students according to their learning behavior patterns. Kizilcec, Pérez-Sanagustín, & Maldonado (2017) indicated different ideas proposed by Hill (2013) for classifying students. Based on the new data obtained from Stanford University professors, the authors described four student models that appear in the Coursera-style MOOC, created a new “No- Shows” category and renamed as “Lurkers” or “Observers.” They also provided a more rigorous description of each model to help define potential data collection that identifies these groups.
Subsequently, based primarily on Cryer’s scheme of Elton (Khalil & Ebner, 2017), which is an educational philosophy and incorporates a grouping approach to student learning behavior, Khalil and Ebner proposed a clustering method based on learning motivation (Elton, 1996). This method aims to use cluster analysis to explore the importance of learning motivation.
We re-defined the new group of type by combining the grouping of related studies and features of our MOOC platform. Compared with related studies, our proposed system can provide timely feedback to students. Compared to most related studies, which performed a cluster analysis after the end of the course, our system performs a cluster analysis as the course progresses and predicts the next week’s clustering to provide teachers or students the information to improve the completion rate.
System Architecture
In this study, we use students’ learning behaviors to design a clustering analysis system and
apply it to ShareCourse. The cluster analysis system classifies students into groups. Additionally,
we design a questionnaire analysis module that we use to analyze students’ background and learning
motives. The questionnaire enables us to investigate whether students’ background and motives affect their learning progress, and results help to define the student grouping during the starting period of the course to overcome the cold-start problem (Schein, Popescul, Ungar, & Pennock, 2002). Finally, we designed a prediction module by using deep learning methods to implement this system, which we can use to predict which group a student may change to in the next week, thereby learning about changes in student learning styles and predicting student learning behavior.
Our proposed system comprises four parts: the clustering module, the questionnaire module, the deep learning prediction module and the application programming interface (API) server. The clustering module extracts the behavioral features of students from the database to use as features in a subsequent cluster analysis and the API server saves the results in the database. The deep learning prediction module uses the results of the clustering module and the students’ learning features to predict the results of the students’ groups for the next week. The questionnaire module comprises a user interface through which questionnaire results are collected, and the API server saves the results in the database. The API server provides two main functions: (1) a database for use in the two aforementioned systems and (2) a function to provide teachers with information regarding students.
1. ShareCourse Database
We organize the database into two parts: the learning behavior database and the students’
background database. The learning behavior database, MongoDB, saves users’ learning behavior data, including data from the account, visited page log, forum boards, exercises, videos, and exams (Table 1). The clustering analysis system collects students’ learning behavior data saved in MongoDB and uses the data to classify students into different clusters.
We use students’ background data to create a student learning log and re-establish a database table. We save students’ background information and motives in the Share Course database server (MySQL).
Table 1 User activity logs module
Type Account Page Video Forum Exercise Exam
login leave play Post exerciseType getScore
logout pause Reply studentAns studentAns
leave like getScore
change rate
seek
video time
2. Clustering Module
The proposed method in the Introduction applies machine learning to group students into clusters. The purpose of this clustering is to analyze or predict learning behaviors. The module can be divided into four parts: data preprocessing, first-step clustering, second-step clustering and database saving. In data preprocessing, information regarding students’ learning history is extracted and organized as features for further analysis. The first-step clustering conducts a preliminary clustering of students participating in the course based on their learning behaviors.
The second-step clustering conducts a more detailed analysis of the results of first-step clustering to ensure precise student grouping. Finally, the API server saves the database, which can then provide information to teachers or students.
As Fig. 1 shows, the clustering module can be divided into three steps: data preprocessing, student clustering and database saving. We introduce the three parts in sequence.
Input Data:
Course ID
Crontab
Clustering System Clustering Algorithm
Data Pre-Processing 1. Data extraction 2. Missing value 3. Dimension reducing
First-step K-Means
Second-step K-Means
Max
Mid
Min
Completing Exercise
Video Testing
Forum GiveUp Disengaging
DropOut user
request reponse
reponse Server
request
Database
Data Analysis Module
Fig. 1 Architecture of clustering module
2.1 Data Pre-processing
In the data pre-processing stage, the clustering module extracts the behavioral characteristics of students from the database, which the learning log uses as features in the subsequent cluster analysis. Table 2 lists the student features extracted, such as video, exercise, and forum behavior.
Table 2 Users’ learning behavior
Feature Type Feature Name Meaning
Video Video_Finish_Ratio The completion rate of students watching the video Exercise Exercise_Finish_Ratio The completion rate of Students’ practice exercises
Forum • Forum_Post
• Forum_Reply
• Forum_Like
• Number of posts in the forum
• Number of responses in the forum
• Number of clicks that are liked in the forum Exam Exam_Page Number of visits to the exam page, which we treat
as a degree of attention to the exam
In applied statistics, standardization can have several meanings. In the simplest sense, standardization allows numerical results to be ranked and their weights adjusted accordingly. In more complex cases, standardization is used to align adjusted values. In educational evaluation applications, the standardization of scores can be used to align score distribution with normal distribution. Furthermore, standardization minimizes the effect of outliers on results, hence enabling a more precise comparison of features. In this study, as some data were numerical values and some were ratios, we used the scikit-learn Standard Scaler method to normalize features by removing the mean and scaling to unit variance. We use this method to scale the value of all features in the range [-1, 1].
z = ( X –) / (1)
2.2 Clustering Algorithm
In the student clustering step, we use the k-means method as our algorithm, which is an unsupervised method. The k-means algorithm is used to partition a given set of observations into a pre-defined amount of k clusters.
In the first step, the first k-means algorithm is executed. Because we want to divide students
into three levels, we take the value of k = 3. This step aims to make preliminary groupings for students. We use video and exercise features to classify students into the following groups, which represent the degree of learning: “Max,” “Mid” and “Min.” We use the two most basic feature values of the open course to achieve a preliminary grouping.
The advantage of this method is that it can analyze grouping results in a step-by-step manner.
If we use all features at once, it may cause the results to be affected by a specific feature value.
Therefore, in this step, we use the most basic video and exercise feature values to assign students to groups of different learning levels. In the next stage of clustering, we add new features to group students more accurately.
In the second step, we use the k-means individually for the three groups, which have different levels, but add new features, such as forum behavior and number of clicks on the exam page, to identify students with different learning behaviors and ensure precise student groupings.
For example, we want to find students who are more interested in the exam or who only like to interact in the forum.
In the Max group, we divide all students into two groups (k = 2). Because the features used in the first stage are videos and exercises, students in this group have a certain degree of performance in both of the above. Therefore, after adding new features, we use k-means algorithms to divide students into two groups: students with normal learning behaviors and those with special learning behaviors.
However, the Mid group is rather special. Since most of the open-course students are assigned to this group in the first stage, we set the k value to 3 when executing the k-means algorithm. In addition to students with normal learning behaviors and those with special learning behaviors, this group includes students who quit midway through the semester.
In the Min group, we divide students into two groups (k = 2). One of the groups represents students who dropped out and never studied after the course, and the other group of students may have some learning behaviors, but we classify these students as those with special learning behaviors. This division aims to identify students with special behaviors and compare them with other groups of students who also have special learning behaviors.
Finally, we compare students with special behaviors, which is forum and exam page, and use the pre-defined group types to classify these students. We average four types of features and compare each student with these values to assign students to the most appropriate groups.
This method ensures that students who have special learning behaviors for each course will be
unaffected by other courses that have different features.
2.3 Pre-defined Group
We use the data of the Introduction to Computer Networks 2015-2017 course to define our group types. We then use our clustering module to execute the data of these three courses and collect data from students with special behaviors for analysis by section. As shown in Table 3, we pre-define eight groups of students. For the new course, we use the clustering module to distribute students according to the definitions of these eight groups.
Table 3 Group definition list Group type Explanation
Completing Students with very high completion rates for video viewing and practice exercises
Video Students with a preference for video learning who do not like to complete exercises
Exercise Students with a preference for exercise learning who do not like to watch videos
Testing Students who do not display many learning behaviors (video and exercise) but are concerned about exam information
Forum Students who do not display many learning behaviors (video and exercise) but have a high degree of participation in forums
GiveUp Students who break their learning due to certain factors in the middle of the semester and even half-way through the semester
Disengaging Students who only participate in the course for the first stage
DropOut Students who only enrolled in the course but did not display any learning behavior
Prediction Module
In our proposed clustering analysis system, we use a deep learning method to implement a prediction clustering result module. This system aims to predict the types of groups that students may be classified into for the next week, which we use to judge whether or not a student’s learning style will change.
Fig. 2 shows the architecture of the prediction module, which is divided into two main parts:
the judge change model, which judges whether students will change group in the next week, and
the predicted clustering result model, which predicts students’ grouping results for the following week. The predicted clustering result model uses the output of the judge change model as its input. The following sections explain these two models in detail.
Input Layer
Output Input Layer
predict clustering result model judge change model
Outpou Layer (predict results)
0: Non change 1: change
Fig. 2 Architecture of prediction module
1. Judge Change Model
This model predicts whether students will change their group in the next week. As shown in Fig. 3, we use the multilayer perceptron (Gardner & Dorling, 1998; Kruse et al., 2013) method in deep learning to build our model.
Using the feature types of Fig. 3 as the input features of this model, the “Learning_level” is the output from first-step clustering. Because it has three different results, we assign 0 to the “Min”
group for the lowest order and increase it sequentially. However, because students’ grouping results have no order, we cannot compare the order of the “Forum” and “Exam” groups. To overcome this problem, we use the one-hot encode method (Lin & Newton, 1989) (Fig. 4), which gives us eight different groups from which we generate eight new feature values. The input layer therefore contains 13 features or nodes.
We use Keras (Chang, Feig, Moser, & Thuraisingham, 2015) to build a neural network
model comprising five layers. Because we take a positive number of features, we use the ReLu
function as our activation function. However, the output is 0 or 1, so we use the Sigmoid function
as the activation function in the output layer.
We summarize this model as follows:
1. Input Layer: 13 features; this layer needs to provide 13-dimensional data.
2. Hidden Layer: three hidden layers; because all our eigenvalues are positive, the activation function selects Relu.
3. Output layer: 0 means the student will not change the clustering results and 1 means the student will change the clustering results.
Input Layer
Hidden Layer
Output Layer
id 0 1 2 3 4 5~12
Feature Video Exercise
Forum Exam_page Learning_level Clustering_result
Output 0: Non Change 1: Change
Fig. 3 Model used to judge whether or not students will change group
Completing (A) Video (B)
Disengagin (X) Dropout (Y) Completing
Group
Video
Disengagin Dropout
clustering_A 0 0
0 0
0 0
0 0
0 0
1 0
0 0
0 1 0
1
0 0
clustering_B …… clustering_X clustering_Y
Fig. 4 Example of one-hot encode method
2. Prediction Clustering Results Model
The main purpose of building this model is to predict the type of group in which students
will be classified for the next week. As shown in Fig. 5, the output layer no longer has only one
node, unlike the judge change model. Because there are eight types of groups, the eight nodes of
the output layer represent eight different groups. The input layer needs 14 nodes because we add the output of the judge change model to assist the training. As the output layer of this model is more complicated than that of the judge change model, we increased the number of hidden layers to achieve better training.
Input Layer Hidden Layer Output Layer
Fig. 5 Model used to predict student grouping results
During the training process, if we have been training with the same training data, the training data will always get higher accuracy than the testing data. Therefore, we use the k-fold cross-validation (Kohavi, 2001) method to train our model. This method divides all training data into k equal parts. Taking k = 10 as an example, nine aliquots of data are used as training data and 1 as testing data. In this way, interactive training is achieved to avoid the test data affecting the accuracy of the entire model. As shown in Table 4, we use 10-fold cross-validation to train the model. The accuracy of our training data is not too different from that of the testing data.
Therefore, we can conclude that our trained model is unbiased.
Table 4 Accuracy of 10-fold cross-validation
K 1 2 3 4 5 6 7 8 9 10
Train-acc (%) 91.1 91.2 92.0 88.9 93.1 90.4 88.9 92.3 90.8 90.4
Test-acc (%) 92.3 92.4 92.7 91.5 92.9 92.3 91.4 92.9 92.1 92.1
We select the ReLu function as the activation function because the input features are all positive. However, we choose the softmax function as the activation function in the output layer instead of sigmoid because the sigmoid function maps the output to [0, 1], which is used for two classifications, whereas the softmax function maps the n-dimensional vector to (0, 1).
Subsequently, we choose the largest value as our prediction results.
3. Questionnaire Analysis System
The MOOC platform used in this study did not collect user background information when students registered for accounts. Because the users of the MOOC platform are international and highly diverse, their motivations, educational levels and majors are difficult to be determined. We thus designed a user interface questionnaire to collect information regarding users’ backgrounds and motives when they registered for the course, because these factors typically affect student learning behavior. The questionnaire results helped us improve our data analysis model.
In this module, we used questionnaires to understand students’ motivation and learning styles. We used some learning motivation research (Magen-Nagar & Cohen, 2017; Kizilcec et al., 2017) to design our questionnaire and design four different stages of questions in the following order: registration, early stage of course, mid-course (approximately mid-term exam period), and end of course.
We divided the two categories of APIs into motivational and learning style questionnaires.
During the registration phase, we administered the questionnaire to students to determine their purpose for enrolling in the course and understand their learning motivation. The system asked students follow-up questions at different time points to ensure that students’ learning style remains unaffected by external factors during their learning.
Implementation and Experiment
1. User Projection Analysis
We used the 2016 Introduction to Computer Networks course as our experimental data (Table 5) to explore the relation between students’ learning behavior and course completion rates. The defined groups include “Completing,” “Video,” “Exercise,” “Forum,” “GiveUp,”
“Testing,” “Disengaging,” and “Dropout.” As shown in Fig. 6, we project all students onto a two-
dimensional plane. The x-axis reflects the ratio of students that watched videos, and the y-axis
reflects the ratio of students that finished practice exercises. The “Completing” group, in the upper right corner, displays a better learning behavior. We also identified some students with special learning behaviors but cannot guarantee that good learning behaviors correlate with the course completion rate.
Table 5 Information of the experiment course
Course Introduction to Computer Networks
Duration 19-09-2016 to 05-12-2016
Number of students 1,491
Number of exams 2
Number of pass students 245
pass rate (%) 16.43
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Completing Video Exercise
Exam Forum GiveUp
Disengaging DropOut
Fig. 6 Projection of students’ clustering results
Table 6 showed that in addition to the “Forum” group, each group contained students that
passed the exam. First, we found that students who only like to watch movies had a much lower
pass rate than other students who normally studied. The reason for the positive result may be that
exercises can directly help students master the concept. The other aspect may be the problem type of exercises being closer to exam types; thus, there is a big advantage in performing the exercises first. Thus, we found that exercises have a great influence on passing the test.
Table 6 Students’ clustering results
Type of Group Number of students Number of passing students pass rate (%)
Completing 179 139 77.09
Video 120 37 30.83
Exercise 6 5 83.33
Exam 38 24 63.15
Forum 6 0 0
GiveUp 34 15 44.15
Disengaging 185 6 3.24
DropOut 923 19 2.05
Total Number of students: 1,491
We found that students who passed the exam were not necessarily classified in the top group.
For example, the “Give Up” group also had a high passing rate. Although many of the students in this group gave up after the first mid-term exam, they exhibited complete learning behaviors before the exam. Although we were unsure regarding the type of external factors affecting the students, this result highlighted the importance of exploring students’ motivation and learning styles. The results also revealed that some people use alternative accounts to register for courses to obtain exam questions; it is undeniable that learning motivation and professional field had a great impact on student learning behavior.
2. Accuracy of Prediction Result Module
Having designed a prediction model that was implemented using deep learning methods and trained the model using training data from the Introduction of Computer Networks 2015-2017 course (Table 7), we thus tested the accuracy of our model using different courses.
First, we selected courses that have the same curriculum model to test. As shown in Table
8, we find that the Precision value is higher than 70% except for the course “Introduction to
Calculus.” Although this course mode and training data were similar, we found that most of
the exercises were computational questions, which led to a drop in the rate at which students
performed practice exercises. However, as our models rely heavily on the feature of the exercise, this phenomenon led to unexpected prediction results. Notably, an average F1 score of 82% indicated that our module performed well in the same curriculum model. Therefore, our proposed model can successfully find students whose clustering results will change in the next week.
Next, we used the course model to compare different classes. As shown in Tables 9 and 10, we can find courses with different curriculum models, with lower Recall and Precision.
Through this experiment, we found that if all exercises were arranged behind videos, it may result in a lower completion rate of the exercises. However, putting videos with the corresponding exercises will improve students’ willingness to practice. Owing to the relatively low completion rate of these four courses, some courses did not have exercises, resulting in a low F1 score of our modules. However, the accuracy of the four courses was high because most students do not change groups. Our goal was to find those students who had changed groups and successfully predicted their clustering results. This experiment showed that our modules must be used in the same curriculum model.
Table 7 Information of the 2015-2017 Introduction to Computer Networks Course
Course Introduction to Computer Networks
Duration 14-09-2015 to
14-12-2015
19-09-2016 to 05-12-2016
18-09-2017 to 27-11-2017
Number of students 7145 1491 1140
Number of exams 3 2 2
Number of passing students 793 245 198
pass rate (%) 9.36 16.43 17.36
Fee Free Free Free
Table 8 Value of confusion matrix for courses with the same curriculum model
Course TP TN FP FN
Topics of Investment 45 936 14 1
Introduction to Calculus 22 418 14 1
Introduction to Computer Programming 17 156 4 3
Principles of Economics 15 105 5 1
Table 9 Value of confusion matrix for courses with different curriculum models
Course TP TN FP FN
Introduction to Life Science 2 79 3 1
Introduction to Computer Science 2 177 4 5
General Chemistry 7 104 8 7
General Physics 13 62 4 8
Table 10 Analysis of confusion matrix for courses with different curriculum models
Course Accuracy
(%)
Recall (%)
Precision (%)
F1 score (%)
Introduction to Life Science 95.29 66.67 40 50
Introduction to Computer Science 95.2 28.57 33.33 30.76
General Chemistry 88.09 50. 46.67 48.27
General Physics 86.2 61.9 76.47 68.4
The aforementioned results showed that our module can be applied as long as it had the same curriculum model as the “Introduction to Computer Networks.” However, as the number of students in the “Introduction of Life Science” course is small and only a few students were likely to change groups, meaning that our module may have a greater accuracy for a student as long as it predicted the wrong one.
3. Relation between Motivation and Clustering Results
This section presented the relation between student motivation and clustering results. We obtained 789 questionnaires from students on testing courses and divided the students’ motivation into “extrinsic factors” and “intrinsic factors.”
We defined extrinsic factors as substantial benefits, such as a certificate or credit and intrinsic factors as improvement self-ability. As shown in Tables 11 and 12, because students who responded to the questionnaire attended courses, the sum of the Completing and Video groups was greater than that of the DropOut group and the ratio of DropOut was approximately 30%, which was relatively low for the MOOC platform.
There were still some students in the “Drop Out” group. The questionnaire results indicate
that the dropout ratio related to intrinsic factors for self-enhancement was lower than that for
other factors.
Table 11 Relation between extrinsic factors and clustering results (Extrinsic factors)
Cluster Credit (%) Certificate (%)
Completing 8.6 11.9
Video 26 27.1
Exercise 0 0
Exam 1 0.8
Forum 1.6 3.3
GiveUp 7.6 5.
Disengaging 18.7 14.8
DropOut 36.3 36.8
Table 12 Relation between intrinsic factors and clustering results (Intrinsic factors) Cluster Interested (%) Peer recommendation (%) Enhance professional (%)
Completing 11.2 9.8 18
Video 22.9 25.6 31.9
Exercise 0 0 0
Exam 0.1 0 0
Forum 1.4 0.4 0
GiveUp 6.3 10.3 5.6
Disengaging 19.8 18.7 19.4
DropOut 38 34.9 25
Conclusion and Future Work
Our study aimed to classify students into different groups at the beginning of a course to
solve the cold-start problem. We used data from the previous iteration of the course to define
groups using the clustering module and a questionnaire module to identify student motivation
and learning strategies during the course. We used questionnaire results to achieve our goal
of grouping students at the start of the course. As the course progressed, we executed a daily
clustering system. Because questionnaire results were only a preliminary reference that indicated
the groups to which the students belonged, we continued to optimize the clustering situation to
achieve a more accurate grouping of the students. We can also optimize the definition of groups
or add new groups after testing a larger number of samples. Although numerous studies have
grouped students in MOOCs, they have mostly performed pos-tcourse analyses. By contrast, our proposed method that used questionnaire results and previously defined groups to classify students at the start of a course enabled more precise grouping as the course progresses based on the accumulated information regarding student learning behavior.
The proposed system also implemented a deep learning prediction model, which combined
student learning behavior with the week’s clustering results. This system can be used to predict
students’ clustering results for the next week. This system aimed to provide teachers and students
clustering results for the next week to recommend suitable learning strategies that can decide the
most appropriate guidance and more adaptive counseling. In the future, we will use the clustering
results and develop a diagnostic system to provide a reference basis for future adaptive videos
and exercises. According to this system, students can find out their learning weaknesses without
teachers’ pro-active help. Furthermore, they can learn or review the chapter provided by this
diagnostic system.
References
Adams Becker, S., Cummins, M., Davis, A., Freeman, A., Hall Giesinger, C., & Ananthanarayanan, V. (2017). NMC Horizon Report: 2017 Higher Education Edition. Austin, Texas : The New Media Consortium.
Alexandron, G., Ruiperez-VaLiente, J. A., Chen, Z. Z., Munoz-Merina, P. J. , & Pritchard, D. E.
(2017) . Copying@Scale: Using Harvesting Accounts for Collecting Correct Answers in a MOOC. Computers & Education, 108, 96-144. doi: 10.1016/j.compedu.2017.01.015.
Anderberg, M. R. (1973). Cluster Analysis for Applications. New York, NY: Academic Press.
Chang, R., Feig, E., Moser, L., & Thuraisingham, B. (2015). Message from the General Chairs.
Proceedings of 2015 IEEE International Conference on Web Services (ICWS), xvi–xvii. doi:
10.1109/ICWS.2015.5
Cybenko, G. (1989). Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2: 303-314. doi: 10.1007/BF02551274.
de Barba, P., Kennedy, G., & Ainley, M. (2016). The Role of Students’ Motivation and Participation in Predicting Performance in a MOOC. Journal of Computer Assisted Learning, 32(3), 218-231. doi: 10.1111/jcal.12130.
Elton, L. (1996). Strategies to Enhance Student Motivation: A Conceptual Analysis. Studies in Higher Education, 21(1), 57-68. doi: 10.1080/03075079612331381457.
Fauvel, S., Yu, H., Miao, C. Y., Cui, L. Z., Song, H. J., Zhang, L.,… & Leung, C. (2018).
Artificial Intelligence Powered MOOCs: A Brief Survey. Proceedings of 2018 IEEE International Conference on Agents (ICA), pp. 56-61. doi: 10.1109/AGENTS.2018.8460059.
Gardner, M. W., & Dorling, S. R. (1998). Artificial Neural Networks (The Multilayer Perceptron) - A Review of Applications in the Atmospheric Sciences. Atmospheric Environment, 32(14- 15), 2627-2636. doi: 10.1109/10.1016/S1352-2310(97)00447-0
Gasevic, D., Kovanovic, V., Joksimovic, S., & Siemens G. (2014). Where Is Research on Massive Open Online Courses Headed? A Data Analysis of the MOOC Research Initiative.
The International Review of Research in Open and Distributed Learning, 15(5). doi:
10.19173/irrodl.v15i5.1954.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: The MIT press.
Hastie, T., Friedman, J., & Tibshirani, R. (2009). The Elements of Statistical Learning (pp. 485- 585). Berlin, German: Springer.
Hew, K. F., & Cheung, W. S. (2014). Students’ and Instructors’ Use of Massive Open Online
Courses (MOOCs): Motivations and Challenges. Educational Research Review, 12(1), 45- 58. doi: 10.1016/j.edurev.2014.05.001.
Hill, P. (2013, March 10). Emerging Student Patterns in Moocs: A (Revised) Graphical View.
[Web blog message]. Retrieved from https://mfeldstein.com/emerging-student-patterns-in- moocs-a-revised-graphical-view/
Huang, N. F., Hsu, I. H., Lee, C. A., Chen, H. C., Tzeng, J. W., & Fang T. T. (2018). The Clustering Analysis System Based on Students’ Motivation and Learning Behavior.
Proceedings of 2018 Learning With MOOCS (LWMOOCS), pp. 117-119. doi: 10.1109/
LWMOOCS.2018.8534611.
Kastrati, Z., Imran, A. S., & Kurti, A. (2019). Integrating Word Embeddings and Document Topics with Deep Learning in a Video Classification Framework. Pattern Recognition Letters. 128, 85-92. doi: 10.1016/j.patrec.2019.08.019.
Khalil, M., & Ebner, M. (2017). Clustering Patterns of Engagement in Massive Open Online Courses (moocs): The Use of Learning Analytics to Reveal Student Categories. Journal of Computing in Higher Education, 29(1), 114-132. doi: 10.1007/s12528-016-9126-9.
Kizilcec, R. F., Pérez-Sanagustín, M., & Maldonado, J. J. (2017). Self-regulated Learning Strategies Predict Learner Behavior and Goal Attainment in Massive Open Online Courses.
Computers & Education, 104, 18-33. doi: 10.1016/j.compedu.2016.10.001.
Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Steinbrecher, M., & Held, P. (2013). Multi- Layer Perceptrons. In R. Kruse, C. Borgelt, F. Klawonn, C. Moewes, M. Steinbrecher, & P.
Heldb (Eds), Computational Intelligence (pp. 47-81). Berlin, German: Springer.
Kohavi, R. (2001). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th international joint conference on Artificial intelligence, vol. 2, 1137-1143.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521, 436-444. doi: 10.1038/
nature14539.
Lee, Y. (2019). Using Self-Organizing Map and Clustering to Investigate Problem-Solving Patterns in the Massive Open Online Course: An Exploratory Study. Journal of Educational Computing Research, 57(2), 471-490. doi:10.1177/0735633117753364
Leshno, M., Lin, V. Y., Pinkus, A., & Schocken, S. (1993). Multilayer Feedforward Networks With a Non-polynomial Activation Function Can Approximate A ny Function. Neural networks, 6(6), 861-867. doi: 10.1016/S0893-6080(05)80131-5
Li, Y., & Li. H. (2017). MOOC-FRS: A New Fusion Recommender S ystem f or MOOCs.
Proceedings of the the 2017 IEEE 2nd Advanced Information Technology, Electronic and
Automation Control Conference (IAEAC), pp. 1481-1488. doi: 10.1109/IAEAC.2017.8054260.
Lin, B., & Newton, A. R. (1989). A Generalized Approach to the Constrained Cubical Embedding Problem. Proceedings of the 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors. 400-403. doi: 10.1109/ICCD.1989.63396.
Magen-Nagar, N., & Cohen, L. (2017). Learning Strategies as a Mediator for Motivation and a Sense of Achievement Mmong Students who Study in MOOCs. Education and Information Technologies, 22(3), 1271-1290. doi: 10.1007/s10639-016-9492-y.
Nair, V., & Hinton, G. (2010). Rectified Linear Units Improve Restricted Boltzmann Machines.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 807-814.
Rafferty, J., Shellito, P., Hyman, N. H., & Buie, W. D. (2006). Practice Parameters for Sigmoid Diverticulitis. Diseases of the Colon & Rectum, 49(7), 939-944. doi: 10.1007/s10350-006- 0578-2.
Rodrigues, R. L., Ramos, J. L. C., Silva, J. C. S., & Gomes, A. S. (2016). Discovery Engagement Patterns MOOC Through Cluster Analysis. IEEE Latin America Transactions, 14(9), 4129- 4135. doi: 10.1109/TLA.2016.7785943
Romero, C., & Ventura, S. (2016). Educational Data Science in Massive Open Online Courses.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 7(1). 1-12. doi:
10.1002/widm.1187.
Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. (2002). Methods and Metrics for Cold-start Recommendations. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 253-260. doi:
10.1145/564376.564421.
Schmidhuber, J. (2015) Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117. doi: 10.1016/j.neunet.2014.09.003.
Schwendimann, B. A., Rodríguez-Triana, M. J., Vozniuk, A., Prieto, L. P., Boroujeni, M. S., Holzer, A., …, & Dillenbourg, P. (2017). Perceiving Learning at a Glance: A systematic Literature Review of Learning Dashboard Research. IEEE Transactions on Learning Technologies, 10(1), 30-41. doi: 10.1109/TLT.2016.2599522.
Su, Y. S., Huang, C. S. J., Ding, T. J. (2016). Examining the Effects of MOOCs Learners’ Social Searching Results on Learning Behaviors and Learning Outcomes. Eurasia Journal of Mathematics, Science and Technology Education, 12(9), 2517-2529. doi: 10.12973/
eurasia.2016.1282a.
Tang, S. (2017). Learning Mechanism and Function Characteristics of MOOC in the Process of
Higher Education. Eurasia Journal of Mathematics, Science and Technology Education,
13(12), 8067-8072. doi: 10.12973/ejmste/80769
Veletsianos, G., & Shepherdson, P. (2016). A Systematic Analysis and Synthesis of the Empirical Mooc Literature Published in 2013-2015. The International Review of Research in Open and Distributed Learning, 17(2). doi: 10.19173/irrodl.v17i2.2448.
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA: Morgan Kaufmann.
作者簡介
曾建維,國立清華大學教學發展中心,助理研究員
Jian-Wei Tzeng is an Assistant Research in the Research Fellow of Center for Teaching and Learning Development, National Tsing Hua University, Hsinchu, Taiwan.
黃能富,國立清華大學資訊工程學系,教授
Nen-Fu Huang is a Professor in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
李加安,國立清華大學資訊工程學系,博士候選人(通訊作者)
Chia-An Lee is a Ph. D. Candidate in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan. (Corresponding Author)
陳羿先,國立清華大學資訊工程學系,碩士生
Yi-Hsien Chen is a Master Student in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
莊安琦,國立清華大學資訊工程學系,碩士生
An-Chi Chuang is a Master Student in the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
收稿日期:民國109年05月28日
修正日期:民國109年08月11日
接受日期:民國109年08月18日
Appendix A
Details of the testing course
Course Number of
students Fee Description
Topics of Investment 933 Free Curriculum model and training data are similar.
Introduction to Calculus 696 Charge Curriculum model and training data are similar, but exercises are more computational.
Introduction to Life Science
129 Charge Course videos are all before the exercise.
General Chemistry 347 Charge Don’t have any exercises.
General Physics 132 Charge Don’t have any exercises.
Introduction to Computer Science
281 Charge Course videos are all before the exercise.
Introduction to Computer Programming
326 Charge Curriculum model and training data are similar.
Principles of Economics 178 Charge Curriculum model and training data are
similar.
基於學生學習動機與行為之磨課師分群 分析系統
曾建維 1 黃能富 2 李加安 2 陳羿先 2 莊安琦 2
1
國立清華大學教學發展中心
2