• 沒有找到結果。

The metadata effectiveness evaluation

Chapter 6 Behavior modeling of content tagging for learning resource management

6.3 The metadata effectiveness evaluation

With the consensus domain concept hierarchy, the defects of tags still need to be resolved. Accordingly, tag effectiveness criterion is proposed to resolve the defects of tags. Assume there are tags T1 and T2 collected in folksonomy way. The tags can be mapped to the concepts in the given concept hierarchy with nodes C1, C1-1, C1-2, and C1-3. The tag effectiveness criterion of tags can be obtained such as T1.effectiveness, T2.effectiveness. The Threshold represents the minimum value of required tag effectiveness. Three heuristic rules for the tag effectiveness refinement are as follows.

The first situation is shown in Figure 6.7. If the associated concepts C1 and C1-3

of two tags T1 and T2 have the consist_of relation, then the tags are synonym or hyponym. The synonym and hyponym can be resolved by selecting the tag with highest effectiveness value.

IF AND

Figure 6.7 The synonym resolution heuristic rule

The second situation is shown in Figure 6.8. If the tags T1 and T2 are in the same sub-tree of the concept hierarchy, and the effectiveness value of tag T1 is lower than the required threshold, then the tag T1 might be the redundant tag. The redundant tag can be eliminated. T2.effectiveness Threshold

C1

Figure 6.8 The redundancy resolution heuristic rule

The third situation is shown in Figure 6.9 If there is a course-grained tag and its effectiveness vale is lower than the required threshold, then the tags of the test item is incomplete. The incompleteness can be handled by drilling down the concept

71

hierarchy and suggest the fine-grained tags to evaluate the effectiveness again. If the candidate fine-grained tags are effective then the incompleteness is resolved. Else we should ask the users to add more tags to improve the metadata of test item.

Test item Test item

Add effective tag T4

T3.effectiveness < Threshold T4.effectiveness Threshold IF

Figure 6.9 The incompleteness resolution heuristic rule

Based on the ideas above, the tag effectiveness refinement heuristic rules are proposed to resolve different defects as follows.

 R1: Synonym resolution heuristic rule

If (T1 and T2 have consist_of relation) AND (T2. effectiveness > T1. effectiveness) then select the tag T2 Else select tag T1.

 R2: Redundancy resolution heuristic rule

If (T1.effectiveness < Threshold ) then delete the tag T1.

 R3: Incompleteness resolution heuristic rules

If (concept of tag T1 has children concepts) then suggest tags of children concepts as candidate tags (assume T3 and T4). While (T4.effectiveness Threshold ) then add tag T4.

In the proposed tag effectiveness refinement heuristic rules, the effectiveness of tags is an important factor to decide which tags should be selected or eliminated. In

our approach, the tag effectiveness evaluation can be obtained from the pretesting result of students. For each test item, if the students have high achievements for the tagged concepts and correctly answer the item, then the tags are effective. Else, the tags have lower effectiveness value and need to be refined or eliminated. Thus, the Item Response Theory (IRT) [34][79] is applied to evaluate the effectiveness of tags based on students‟ learning achievements, discrimination of test items and difficulty degree of test items. The IRT-based Tag Effectiveness Criterion is defined as followings.

Definition 6.4 The IRT-based Tag Effectiveness Criterion

For the item j, the value Pj indicates the difficulty degree and the value Dj indicates the discrimination degree. For the student i, the learning ability value

xi ranging from -3.5 to 3.5 indicate the weighted learning degree from low to high. Thus, the Sij, the tags effectiveness of the test item j via score of student i, can be evaluated by the modified logistic model of Item Response Theory equation,

1.7 ( )

1

1

j i j

ij D x P

s

e

 

, e =2.719 (1)

Next, for tag k, let Rk be the sum of Sij scores of correct items and Wk be the sum of Sij scores of incorrect items. The factor

is an arbitrary small positive real number to make sure that the nature logarithm function is well defined. Thus, the Ck, the IRT-based Tag Effectiveness Criterion of the tag k, is defined as follows.

Ck = ln ((Rk+

) / (Wk+

)) (2)

73

With our experience, the criterion value ranges from -4 to 4. The larger value indicates higher effectiveness and negative value indicates no effectiveness.

With the defined IRT-based tag effectiveness criterion and the heuristic rules proposed above, the algorithm of the IRT-Based Metadata Reengineering Scheme is proposed as follows.

Algorithm 6.1. The IRT-Based Metadata Reengineering Scheme (IRT-RES) Input: tags of self-tagged test items, domain concept hierarchy, students‟ pretesting

results on test items.

Output: consensus tags.

//Phase 1. Construct the concept hierarchy of tags:

Step 1. Map the tags of the metadata to the concept hierarchy.

Step 2. For each concept node, if there are multiple tags on the same concept node, then resolve the ambiguity by keeping one unique tag.

//Phase 2. Tag refinement based on IRT-based tag effectiveness:

Step 3. For all tags, execute the Tag effectiveness evaluation algorithm (defined in Section 4.1) to obtain the tag effectiveness criterion using students‟ pretesting results on test items.

Step 4. For tags of each test item, apply the tag effectiveness refinement heuristic rules to obtain the tags defects. Suggest the inference results to the users to support the tags consensus building. Repeating the tag refinement process of Phase 2 until all defects of tags are resolved and then output the consensus tags.

6.4 Evaluation

To evaluate the effectiveness of the tag, an experiment has been done. The experiment results showed the tag refinement results and users‟ satisfaction for our approach.

1) Experiment of Social tags versus IRT-MRS

In the experiment, our goal aims to prove that the IRT-MRS can effectvely refine the social tags collected in folksonomy way. The item bank metadata reengineering experiment selected the scope of Trigonometric Function in 11th grade Mathematics as the experiment domain. The test items were selectd from the term examination to make sure the quality and teachers were asked to provide the tags in folksonomy way as the test item metadata. There were 21 senior high school teachers participated as folks in the social tagging process. Two domain experts who have ten years experiences in the 11th grade Mathematics for test sheet composition and taxonomy construction were participated to evaluate the effectiveness of tags constructed by our approach.

The concptual view of the experiment procedures is shown in Figure 6.10. For the same test items, the tags A constructed by 21 teachers in folksonomy way were compared to the tags B constructed by our approach. The 18 classes of high school students‟ pretesting scores of those test items were input to the metadata reengineering process. Afterward, the effectiveness of tags A and tags B were evaluated by two experts. Two experts were asked to provide their tags as the evaluation test cases. If the tags B are closer to the experts‟ tags than tags A, then we can conclude that our approach can effectively refine the tags.

75

Figure 6.10 The experiment of tag evaluation by experts

2) Experiment results

The Figure 6.11 shows the collected social tags in the experiment. The tags were mapped to the concept hierarchy and assigned with the unique identification.

Trigonametric Functions

Figure 6.11 The tags of the trigonometric function domain

The tags of test items obtained from traditional folksonomy, IRT-MRS, expert 1 and expert 2 are shown in Table 6.1. The test items are sorted by tag size of traditional folksonomy. The tags given by expert 1 and expert 2 are test cases.

Table 6.1 Tags of test items collected from folksonomy, IRT-MRS, and experts

Test

In IRT-MRS, the calibration software for the three parameter logistic IRT model

77

is used to obtain the IRT-based tag effectiveness criterion. The average number of altered tags for test items with different tag sizes is shown in Figure 6.12. The test items with larger tag size are more diversified and thus the IRT-MRS refines more number of tags.

Figure 6.12 The average number of altered tags for test items with different social tag sizes

The tags were evaluated by the average number of differences from the tags given by two experts.

The smaller differences mean that the tags are more effective. The comparison of the social tags and tags of IRT-MRS based on two experts‟ tags are shown in Figure 6.13 and Figure 6.14. In general, the tags of IRT-MRS can improve the social tags with tag sizes larger than two.

Figure 6.13 Comparison of the social tags and IRT-MRS based on expert 1’s tags

Figure 6.14 Comparison of the social tags and IRT-MRS based on expert 2’s tags

Afterward, the correctness evaluation for the tags of IRT-MRS was conducted by the questionnaire analysis with the original 21 senior high school teachers. The typical five-level Likert scales item was used in the satisfaction questionnaire. The means of feedbacks in different levels had 14.3% general, 66.7% agreed and 19.0% strongly agreed for the correctness of the proposed approach. The feedbacks of open questionnaire from the teachers showed that they were amazing to the automatic tag refinement results. The IRT-MRS can effectively suggest them the tested concept of the test items from the students‟ pretesting reflection. Therefore, we may conclude that the IRT-MRS can successfully help teachers build the consensus and acceptable tags.

79