Hierarchy Organizing Algorithm (AlgHO) - 基於本體論數位學習內容擷取之研究

F: the input folksonomy

Th_split: the threshold for splitting a tag Th_general: the threshold for generalizing tags Output:

O: the output index

Step 1. (Splitting) for each tag t in F 1.1 if t.tm_num > Th_ split then create a new tag n_tag

move half of teaching materials from t to n_tag re-compute the attributes of t and n_tag

Step 2. (Generalization)

2.1 calculate the number of tags in F

2.2 if the number is greater than Th_ general then

create a new tag n_tag

copy teaching materials from the two tags to n_tag assign n_tag as a parent node of the two tag

re-compute the attributes of the two tags and n_tag

In Step 1.1, t.tm_num means the number of teaching materials annotated by the tag t. Also, a threshold, Th_split, is set to decide whether to split the tag. In Step 2.2, a threshold, Th_general, is set to decide whether to generalize two tags.

4.5 Evaluation

To evaluate the performance of the proposed approach for information retrieval, two experiments are conducted. Two synthetic learning object repositories (LOR) are used in this experiment. The first LOR contains 1,200,000 SCORM-compliant documents [91], which are converted from Web pages related to educational domains.

After stop-word cleansing, there remains 2,570,623 distinct index terms. The size of this LOR is around 20 GB. The other LOR contains 2,400,000 SCORM-compliant documents, which are converted from technical papers related to computer science domains. After stop-word elimination, there remains 4,730,384 distinct index terms.

The size of this LOR is around 40 GB. A set of 20 queries was prepared for the performance evaluation. For example, one of these queries is “teaching materials about fern plants.”

We compare the performance of the proposed method with that of the keyword-based method. The values of α1 and α2 are both 0.5. The value of β is 0.3. As shown in Figure 4.3, the proposed method can significantly improve the performance with respect to precision and recall.

Figure 4.3 Comparison with the keyword-based method

We compare the performance of different similarity functions. As shown in Figure 4.4, the proposed similarity (All) got the best performance. Also, the level-wise similarities have larger impact on performance than metadata. Finally, the level_1 similarity has larger impact than Level_2.

Level 1 Level 2 Metadata All Similarity Function

Precision Recall

Figure 4.4 Comparison of Different Similarity Functions

The results show that the proposed approach attains better performance than the traditional keyword-based search. The primary reason is that the proposed method

semantic meanings of a given query. For example, a query for “fern plants” would return relevant results about plants which are present nearby, in a location-aware manner. Therefore, the proposed method can get better precision.

Chapter 5 Learning Content Retrieval on Sensor Networks

5.1 Location-aware Retrieval

As described in Chapter 4, the fast development of wireless communication and sensor technologies has made ubiquitous learning (u-learning) emerge as a promising learning paradigm, which can sense the situation of learners and provide adaptive supports to students. Context-awareness is one major characteristic of u-learning, where the situation or environment of a learner can be sensed. Advantages of context-aware learning are two-folded. In the passive aspect, it can alleviate environmental limitations. In the active aspect, it can utilize available resources to facilitate learning.

Learning activities in ubiquitous environments are directed by instructional strategies, which are general approaches, instead of specific methods. As shown in Figure 5.1, instructional activities depend on instructional strategies, which are based on pedagogic theories. Designers of learning activities should utilize the advantages of u-learning environments to realize pedagogic goals.

Figure 5.1 Layered relation of instructional activities, strategies and theories

Retrieval of learning content, hereafter named Content Retrieval (CR), is an important activity in u-learning, especially for on-line data searching and cooperative problem solving. Furthermore, both teachers and students need to retrieve learning content for teaching and learning, respectively. Conventional keyword-based content retrieval schemes do not take context information into consideration, so they cannot satisfy the basic requirement of u-learning, which is to provide users with adaptive results. To support context-aware learning, learning content needs to be provided according to learners’ contexts. For example, when a student can not identify an insect in the u-learning course, s/he can access a learning object repository for more information by submitting a query. As we can image, queries are most likely ambiguous and need refinement. If context information can be applied to refine the original query, it will be easier for learners to retrieve relevant content.

As shown in Figure 5.2, we classify the schemes of content retrieval into static and dynamic ones according to the adaptability of the retrieved results. For static CR, the retrieved result only depends on the query, independent of users and contexts.

Dynamic CR can be further divided into personalized, context-aware and other schemes, according to the factors that are considered by the adaptive mechanisms of CR. The former is adapted to subjective factors of learners, such as user profile, preference, etc. In other words, the same query submitted by different persons could result in different results retrieved. The latter, context-aware CR, is adapted to objective factors of learners, like time, place, device, activity, peers, etc. Hence, the same query issued in different contexts could get different results.

Content Retrieval

Figure 5.2 Classification of Content Retrieval

Retrieval of learning content is a universal requirement for many learning scenarios, such as Intelligent Tutoring Systems, Zone of Proximal Development, etc.

However, each scenario has its own needs for content retrieval. In particular, one important characteristic of context-aware ubiquitous learning is to provide right content to learners at right place and right time. That is, it is desirable for a retrieval system to find the content which is adapted to the learners’ context.

In this chapter, we investigate the context-aware learning content retrieval problem, which is to retrieve relevant learning contents from a repository for a given query and context information, improving precision and recall of retrieval. The difficulties are described as follows. First, context information needs to be taken into

schemes have to be enhanced to be context-aware. Second, needs for teaching materials are related to pedagogic factors, such as instructional strategies and goals. It is desirable to design a retrieval scheme which is flexible enough to be able to cooperate with various instructional strategies. Third, characteristics of standardized learning content must be considered to improve the accuracy of similarity comparison, such as metadata and structural information. By the way, the acquisition of context information requires extensive deployment of sensors. In this paper, we assume that the module of context acquisition is available, and focus on the first three issues.

To overcome the aforementioned difficulties, our idea is a strategy-driven approach enabled by a knowledge-based query expansion method. First, we intend to expand the original query by acquired context information, in order to retrieve content which is adapted to learners’ contextual environments. We adopt the technique of query expansion because most queries in web searching are short and ambiguous, thus needing refinement. Second, we propose a knowledge-based approach to expanding queries based on instructional strategies. According to our observation on ontologies, such as WordNet, basic strategies of query expansion include generalization, specialization, association, their combination, etc. For example, when the educator aims to encourage the learners to think in a higher level, it may be appropriate to adopt an expansion technique of generalization, which offers more general keywords for content retrieval. In this study, we assume that the content about entities near to the learner is more relevant than that far from the learner. For example, when walking by a fern plant, we may want to find some content introducing the fern. Also, this work assumes the instructional strategy and the strategy mapping are defined by experts in advance, which will be our future work. However, we focus on applying retrieval strategies to realize context-aware content retrieval.

Based on this idea, we designed a system consisting of four components:

knowledge transformation, query expansion, content retrieval and user interface. In the knowledge transformation component, algorithms of ontology building and rule generation are proposed for teachers to easily construct an ontology from course outlines. The purposes of the ontology are to generate rules of query expansion and to construct taxonomic index of learning object repository. The other three components work as follows. In user interface, the user submits a query, and the context information is extracted by sensors. Next, in query expansion component, candidate keywords are inferred for query expansion, and keywords with entities nearer to the learner are selected. Finally, in content retrieval component, results are retrieved according to the expanded query, and are ranked by a similarity function considering characteristics of learning content. To speed up the searching process, we use a taxonomic index, which is generated by reorganizing the existing documents based on a bottom-up approach [92].

We think the proposed context-aware retrieval method can benefit the ubiquitous learning scenario by providing suitable content adapted to learners’ context and instructors’ strategies. Experiments have been conducted to show evidence of this claim. First, an experiment involving 30 elementary school students is conducted to show the learning performance affected by the proposed retrieval method. Next, a survey involving 12 elementary school teachers is performed to understand their degree of satisfaction for this system. The results show that this system can speed up the retrieval process, thus facilitating the learning activity. Also, the comments of teachers indicate that this system can effectively find suitable contents adapted to context and instructional strategies.

a strategy-driven approach to solving the context-aware learning content retrieval problem. This new approach integrates pedagogic requirements and technical solutions, thus benefiting both the parties. Second, a knowledge-based system is designed to support query expansion, which can increase maintainability. Also, the flexibility of the knowledge-based approach facilitates future integration with educational strategies. In addition, the distance-based keyword selection can achieve context-awareness. Third, knowledge transformation algorithms are designed for automatic derivation of ontology and query expansion rules, thus avoiding the difficulty for teachers to manually construct ontology and code rules. Finally, we have built a prototype and experimental results show that this approach can attain accuracy and is helpful to context-aware learning.

5.2 Problem Description

We assume that a context detection module is available, which can extract users’

context information. Next, several definitions will be introduced, including teaching materials, learning object repository, a query, context and a similarity function.

The symbols in Table 5.1 are used throughout this paper.

Table 5.1 The notation used in this chapter

Symbol Description CP Content Package

LOR Learning Object Repository wi Weighting element i of CP vector V Set of terms in the vocabulary

Q Query

vQ Vector representing the query LCx The x-coordinate of location context LCy The y-coordinate of location context sim() Similarity function

We represent the query as a weight vector. Its formal definition is as follows.

Definition 5.1. Query.

A Query is used by a user to specify the TMs s/he wants. Users can express their queries in two forms: keyword-based and metadata-based. A keyword-based query is a vector of keyword weights, which mean the concepts about the desired contents. A metadata-based query is a list of (Attribute, Value) pairs, which describe the properties of TMs.

■ We will now define the notion of similarity between a query and a content package, which means the relevance of the content package to the query.

Definition 5.2. Similarity.

Let Q be a query with query vector vQ, and TM be a content package. The similarity function is denoted by sim(Q, TM).

■ In order to determine the degree of relevance of a query and a teaching material, the similarity function has to be defined. Conventional similarity functions, such as the cosine function, are not suitable for SCORM-compliant teaching materials which are characterized by textual content, metadata and structural information. Here, a similarity measure Sim between a query Q and a teaching material TM is proposed by combining a keyword-based similarity and a metadata-based similarity. The keyword similarity SimKeyword adopts a cosine function to measure the text similarity between a query and a TM. The metadata similarity SimMetadata is defined to be the number of matched attributes divided by the number of all attributes. Therefore, the range of these two similarity terms, Sim and Sim , are both in [0, 1]. The similarity

)

where the factor α, 0 < α < 1, is used to control the relative weighting of keyword similarity and metadata similarity.

This work focuses on context information related to location. We assume that context information of location can be acquired by extensively deployed sensors and built-in maps.

Definition 5.3. Location Context.

The Location Context is represented by a two-dimensional coordinate, (LCx, LCy), where LCx is the x-coordinate, and LCy is the y-coordinate. These coordinates correspond to a map of the campus.

■ Based on the definitions above, the Learning Content Retrieval Problem on Sensor Network (LCRP-Sensor) can be defined as follows.

Definition 5.4. Learning Content Retrieval Problem on Sensor Network (LCRP-Sensor).

Given a query and context information, retrieve relevant learning contents from a repository, ranking by a similarity function. The goal is to improve precision and recall of retrieval.

■

5.3 Ontology Building

Ontology building has been considered as a craft rather than an engineering activity. Traditionally, the process of ontology building requires the participation of domain experts and knowledge engineers. Although a number of automatic

technologies of ontology construction have been proposed, it is still not easy for teachers, domain experts, to build ontology. Therefore, the ontology building algorithm is proposed for teachers to easily derive an ontology from course outline. In this algorithm, an “expert” means an educator who is also good at knowledge engineering.

Before describing the process of ontology building, we give a general definition of an ontology.

Definition 5.5. Ontology.

An Ontology is a conceptualization of a domain, which is defined as a quadruple O= (C, A, R, X), where

y C is a set of concepts;

y A is a collection of attributes sets, one for each concept;

y R is a set of relations on C×C;

y X is a set of axioms.

■ Example 5.1. A Campus_Plant_Course Ontology.

A Campus_Plant_Course Ontology OCPC = (C, A, R, X) is an ontology where its components are endowed as follows.

C = {“Plant,” “Structure,” “Fern,” …}

A = {Keyword, Type, Location, Level}

R = {“is_a,” “related_to”}

X = { IF is_a(“A”, “B”) and is_a(“B”, “C”) THEN is_a(“A”, “C”)}

■ In this study, the ontology is derived from a pre-defined course outline, which

usually organized by the teacher before the class begins. Without loss of generality, a course outline is defined as a two-level structure, chapters and sections.

Definition 5.6. Course Outline.

A Course Outline is a two-level tree-like representation of the table of content for a course. A course outline consists of a limited number of Chapters, which in turn consists of a limited number of Sections.

■ Example 5.2. A Campus_Plant Course Outline

A Campus_Plant Course Outline COCP can be represented as follows.

Course Name: Plants in the Campus Chapter 1. Introduction to Plants

Section 1.1. What is plants?

Section 1.2. Classification of Plants Chapter 2. Structures of Plants

Section 2.1. Flowers Section 2.2. Leaves Section 2.3. Fruits Chapter 3. Growth of Plants Section 3.1. Budding

Section 3.2. The Growing Process

Chapter 4. Identification of Plants in the Campus

■ After the course outline is determined, the teacher can derive an ontology from the course outline, following the steps of the Ontology Building Algorithm. This is a special-purpose algorithm, which is designed for constructing the ontology of a course about plants in a campus. Teachers who teach this kind of courses can follow this algorithm to generate an ontology.

在文檔中基於本體論數位學習內容擷取之研究 (頁 53-66)