Experiment 1. Different Weight - 使用角色及主題資訊偵測新聞事件關係

Chapter 5 Experiments

5.5 Experiment 1. Different Weight

We will introduce the first experiment using the operations introduced in section 4.1 to detect the relationship between news articles in this section.

This experiment is to observe and discuss the effect of the similarity scores’ weight combination strategy of the five parts.

5.5.1. Static Weight

The similarity operation formula is shown below:

Sim (D1, D2)=

This score linear combined of five similarity score as person role part, location role part, organization role part, noun part and verb part. In this experiment we will try some weight combination to observe what weight combination of similarity score do well in relationship detection.

We set each of the weight parameters (

ω

_P,

ω

_L,

ω

_O,

ω

_N ,

ω

_V) between 0 to 1 and their sum to be 1.

Table 2 is one part of result in this experiment. The value in the first row (besides the “base”) means the combination method (M) with an ID (1~6), each of them correspond to a weight combination. And the “base” in first row means the experiment result in Chapter 3, we put it here in order to easily discuss and compare with result of experiment 1.

Base M1 M2 M3 M4 M5

ωP -- 1.0 0.0 0.0 0.0 0.0

ωL -- 0.0 1.0 0.0 0.0 0.0

ωO -- 0.0 0.0 1.0 0.0 0.0

ωN -- 0.0 0.0 0.0 1.0 0.0

ωV -- 0.0 0.0 0.0 0.0 1.0

Recall 0.68 0.57 0.17 0.25 0.68 0.69 Precision 0.36 0.66 0.54 0.66 0.17 0.16

Table 2. Static Weight Result (Single)

The method M1~M5 in Table7.1 are just using one part of the five element as the whole weight in similarity score. Method M6 let each part has the same weight to calculate the similarity score.

Let’s start to discuss the result of each combination strategy and experiment results.

M1:

M1 sets the

ω

_P to be 1 and sets others to be 0 so that we can clearly understand the effect of person role on the relationship detection. In this combination, we get a result as recall 0.57 and precision 0.66. Compare with the result of baseline, M1’s recall is not well but the precision is much better. The recall part means that, in the experiment data se, if we just using the role information to detect relationship between news events, we would miss detect about half of the correct event dependent relation. But in the precision part of the experiment result, it shows that using the person role similarity as whole weight would avoid making too many erroneous judgments of letting independent events to be dependent.

M2, M3:

The result of weight combination of M2 and M3 play very poor scores of recall and better then baseline scores of precision. The pool recall of M2 and M3 is reasonable because that the location and organization play much less important roles then person role information in the experiment dataset, most of the news articles in dataset happened around person, only a little news articles using the locations or organizations as the most important characters.

M4, M5:

The weight distribute of M4 and M5 means only using the topic part in relationship detecting. The result shows that only using topic part would has the similar result with baseline in recall and a very poor result in precision.

Table 3 is another weight combination experiment result, this strategy using the person role part as 0.9 and other one as 0.1.

M6 M7 M8 M9

ωP 0.9 0.9 0.9 0.9

ωL 0.1 0.0 0.0 0.0

ωO 0.0 0.1 0.0 0.0

ωN 0.0 0.0 0.1 0.0

ωV 0.0 0.0 0.0 0.1

Recall 0.54 0.54 0.66 0.66 Precision 0.72 0.75 0.64 0.62

Table 3. Static Weight Result (Combine)

M6, M7:

Person role similarity and one other role part similarity using the weight combination strategy as M6 and M8 in Table 3, compare with the method M1 which just using the person role part, this kind of method provide a better precision score. It means using would help arising the precision.

M8, M9:

M8 and M9 using the strategy as person role to be the main part and location role or organization role to be the auxiliary element with weight 0.1. The result shows that

both recall and precision exceed 60% in this combination strategy, it means that the combination of role similarity and topic similarity would have a good balance in recall and precision.

Table 4 is a weight combination experiment results, this combinative strategy is giving the person role similarity most weight (0.6) and other parts averagely shared the remaining weight (each of 0.1).

M10

ωP 0.6

ωL 0.1

ωO 0.1

ωN 0.1

ωV 0.1

Recall 0.72 Precision 0.79

Table 4. Static Weight Result (All)

This combination strategy lets the result to be so far the best result in our experiments, both recall and precision are above 70%, it tells us that using all parts of role and topic information might provide the whole information we need to detect the relationship between news events.

5.5.2. Dynamic Weight

The experiment result shown in section 6.6.1 tell us that the person role vectors play the most important role in the case of “洗錢案”. The reason of getting this result is because that most of the 162 news articles happened mainly around the persons, but

locations and organization play the most important role in little news articles of the experiment dataset.

Our goal in this thesis is to detect relationship between events in all kind of domain, we do not have a perfect static weight combination to fit all news article sets. Even in one news set which are describes a specific event, they might focus on different kind of roles in the whole event.

For example, our experiment dataset, the 162 news articles, although most of them are focus on person behavior, but some of them are focus on organizations and some are focus on locations. If we can automatically detect the weight strategy of news article pairs, the relationship detection result might be better. So we make a variant weight combinative strategy to dynamic decide each role vector part’s weight.

In this experiment, we use the combination strategy as below:

For each news article pair:

1. As the combination strategy of method M11, we select a main part and set its’

weight as 0.6, and others as 0.1.

2. The main part is chosen from the three role similarity part (person, location and organization). We select the part which owns the most amount of entity names in the news document pair.

Table7.4 shows the result of this combination strategy.

Variant Weight Recall 0.80 Precision 0.79

Table 5. Dynamic Weight Result

Using the dynamic weight method improve the news article relationship detection on the part of news articles which discussed about the organizations.

5.5.3. Comparison

We will compare the experiment results by the evolution graph in this section.

Figure 7 to Figure 9 show the evolution graphs of baseline, static weight and dynamic weight.

Figure 7. Evolution Graph of Baseline

Figure 8. Evolution Graph of Static Ally Weight

Figure 9. Evolution Graph of Dynamic Ally Weight

The blue line in the graph means the correct relationships between events to be recognize by our experiment processes, and the yellow lines means the correct relationships which miss detected by our experiment processes. The red line means the relationships which are not in the correct answer set but detected to be correct relationships by our experiment processes.

Figure 7 shows that the baseline detected too much noise by observing the high density of red lines. Compare with Figure 7, the Figure 8 and Figure 9 have much less red lines. It proved that our method could effectively decrease the noise of building an evolution graph.

Compare the Figure 8 and Figure 9, the biggest different of this two result is at the part that news events which discussed about organizations. The static weight strategy set the person as the best weight in our experiments so it could not detects the relationships between news events which discussed about the locations or the organizations well.

在文檔中使用角色及主題資訊偵測新聞事件關係 (頁 43-51)