D : denotes the maximum depth of the content tree (CT)
L0~LD-1 : denote the levels of CT descending from the top level to the lowest level KV : denotes the keyword vector of a content node (CN)
FV : denotes the feature vector of a CN
Input : a CT with keyword vectors Output : a CT with feature vectors
Step 1: For i = LD-1 to L0
1.1: For each CNj in Li of this CT
1.1.1: If the CNj is a leaf-node, FVCNj = KVCNj
Else, FVCNj = (1-α) KVCNj + α * avg.(FVs of its child-nodes) Step 2: Return CT with feature vectors
4.3 Level-wise Content Clustering Module
After structure transforming and representative feature enhancing, we apply the clustering technique to create the relationships among content nodes (CNs) of content trees (CTs). In this thesis, we propose a Directed Acyclic Graph (DAG), called Level-wise Content Clustering Graph (LCCG), to store the related information of each cluster. Based upon the LCCG, the desired learning content including general and specific LOs can be retrieved for users.
4.3.1 Level-wise Content Clustering Graph (LCCG)
Figure 4.6: The Representation of Level-wise Content Clustering Graph
As shown in Figure 4.6, LCCG is a multi-stage graph with relationships information among learning objects, e.g., a Directed Acyclic Graph (DAG). Its definition is described in Definition 4.2:
Definition 4.2: Level-wise Content Clustering Graph (LCCG) Level-wise Content Clustering Graph (LCCG) = (N, E), where z N = { (CF0, CNL0), (CF1, CNL1), …, (CFm, CNLm) }.
It stores the related information, Cluster Feature (CF) and Content Node
List (CNL), in a cluster, called LCC-Node. The CNL stores the indexes of learning objects included in this LCC-Node.
z E = { nini+1 | 0≦ i < the depth of LCCG }.
It denotes the link edge from node ni in upper stage to ni+1 in immediate lower stage.
For the purpose of content clustering, the number of the stages of LCCG is equal to the maximum depth (δ) of CT, and each stage handles the clustering result of these CNs in the corresponding level of different CTs. That is, the top stage of LCCG stores the clustering results of the root nodes in the CTs, and so on. In addition, in LCCG, the Cluster Feature (CF) stores the related information of a cluster. It is similar with the Cluster Feature proposed in the Balance Iterative Reducing and Clustering using Hierarchies (BIRCH) clustering algorithm and defined as follows.
Definition 4.3: Cluster Feature
The Cluster Feature (CF) = (N, VS , CS), where
z N: it denotes the number of the content nodes (CNs) in a cluster.
z VS =
∑
= N i FVi1 . It denotes the sum of feature vectors (FVs) of CNs.
z CS =| / | | / |
1V N VS N
N
i i =
∑
=v . It denotes the average value of the feature
vector sum in a cluster. The | | denotes the Euclidean distance of the feature vector. The (VS /N) can be seen as the Cluster Center (CC) of a cluster.
Moreover, during content clustering process, if a content node (CN) in a content tree (CT) with feature vector ( FV ) is inserted into the cluster CFA = (NA, VS , CSA A),
the new CFA = (NA+1, VSA+FV ,
(
VSA +FV)
/(
NA +1)
). An example of Cluster Feature (CF) and Content Node List (CNL) is shown in Example 4.5.Example 4.5: Cluster Feature (CF) and Content Node List (CNL)
Assume a cluster C0 stores in the LCC-Node NA with (CFA, CNLA) and contains four CNs: CN01, CN02, CN03, and CN04, which include four feature vectors, <3,3,2>,
<3,2,2>, <2,3,2> and <4,4,2>, respectively. Then, the VS = <12,12,8>, the CC A
=VS /NA A = <3,3,2>, and the CSA = |CC| = (9+9+4)1/2 = 4.69. Thus, the CFA = (4,
<12,12,8>, 4.69), and CNLA = { CN01, CN02, CN03, CN04}
4.3.2 Incremental Level-wise Content Clustering Algorithm
Based upon the definition of LCCG, we propose an Incremental Level-wise Content Clustering Algorithm, called ILCC-Alg, to create the LCC-Graph according to the CTs transformed from learning objects. The ILCC-Alg includes two processes:
1) Single Level Clustering Process, 2) Content Cluster Refining Process, and 3) Concept Relation Connection Process. Figure 4.7 illustrates the flowchart of ILCC-Alg.
Figure 4.7: The Process of ILCC-Algorithm
(1) Single Level Clustering Process
In this process, the content nodes (CNs) of CT in each tree level can be clustered by different similarity threshold. The content clustering process is started from the lowest level to the top level in CT. All clustering results are stored in the LCCG. In addition, during content clustering process, the similarity measure between a CN and an LCC-Node is defined by the cosine function which is the most common for the document clustering. It means that, given a CN NA and an LCC-Node LCCNA, the similarity measure is calculated by
A
The larger the value is, the more similar two feature vectors are. And the cosine value will be equal to 1 if these two feature vectors are totally the same.
The basic concept of Incremental Single Level Clustering Algorithm (ISLC-Alg) is also described in Figure 4.8. In Figure 4.8.1, we have an existing clustering result and two new objects, CN4 and CN5, needed to be clustered. First we compute the similarity between CN4 and the existing clusters, LCC-Node1 and LCC-Node2. In this example, the similarities between them are all smaller than the similarity threshold.
That means the concept of CN4 is not similar with the concepts of existing clusters, so we treat CN4 as a new cluster LCC-Node3. Then we cluster the next new object, CN5. After computing and comparing the similarities between CN5 and existing clusters, we find CN5 is similar enough with LCC-Node2, so we put CN5 into LCC-Node2 and update the feature of this cluster. The final result of this example is shown in Figure 4.8.4. Moreover, the detail of ISLC-Alg is shown in Algorithm 4.1.
Figure 4.8: An Example of Incremental Single Level Clustering
Algorithm 4.4: Incremental Single Level Clustering Algorithm (ISLC-Alg)