• 沒有找到結果。

CHAPTER 3 THE MODEL

3.1 R EVIEW M INING

3.1.1 Word Set Expanding

In existing research, the semantics of single article are usually classified in four categories: positive, negative, objective, and subjective. Since the reviews are not evaluated based on these classification, thus, we will not follow previous work. Instead, we focus on general attitude from reviewers and they are the nodes we want to be discovered. However, previous work indicates that reviewers who write at extreme values (e.g. all positive (hyper spam) or all negative (defaming spam) comments) are hard to be trusted [21]. In addition, objective perspectives are usually descriptions about products without additional reference value are not considered important because they are lack of “emotions,” which has higher

19

possibility to affect the purchasing decisions of others [14]. Thus, in this study, the subjective factors are considered.

Turney and Littman [51] defined two sets of words which represent positive and negative sentiments, respectively:

{ }

{ }

= good, nice, excellent, positive, fortunate, correct, superior bad, nasty, poor, negative, unfortunate, wrong, inferior

p n

S S =

The two sets are decided due to their lack of sensitivity to context. It means that in most situations, articles with these words represent the original meaning no matter how the structure of these articles was. Our model was analyzed based on these two sets.

We expand Sp n+ in order to make a subjective word base. In order to consider the subjective reviews, both word sets will be included in our model. We define:

n both positive and negative semantics. We believe that the ingredients of Sp n+ will carry a complete meaning of “subjective words” which can reach our expectations. Although reviewers can express their emotion completely through subjective comments, which are valuable for other online users, firms would hope not to see any negative comments spreading on the Internet. Nevertheless, a trustable reviewer should always express their thinking fairly.

A reviewer who only write positive reviews is possible to be viewed as an employee of the firm and his / her reviews will not be trusted by others. In order to achieve effective information spreading, it is necessary to select trustable reviewers. Firms should try doing the right things to please these influential reviewers who write negative comments instead

20

ignoring them. Any negative comments will hurt the reputations of firms, especially in the Internet world. Firms should promptly respond to negative comments, contact those reviewers, and find out what can be done to improve the products or services. This will not only remove the sources of bad reputations but also increase positive reputations of firms.

We hope to check if these words of Sp n+ exist in each review to decide the subjective level of each review, but the number of words in the set is too few to do an accurate check. It may ends up with none being discovered. The problem can be solved by expanding original

Sp n+ set. Some online semantic lexicon such as WordNet [57] would be helpful. We plan to extract synonyms of Sp n+ from WordNet to achieve different level of Sp n+ . We have known that the synonyms can be traced from Sp n+ recursively, so the size of word set could be different according to the iteration times.

We mark the word set Skp n+ which denotes kth expansion times. For example, k=1 equals to original 14 items in S1p+n and k =2equals to 14 items in Sp2+n. The sets will grow rapidly according to k value and the number of matches will also increase due to larger word set Skp n+ . Clearly, different value of k will lead to different matching pairs between Skp n+ and test reviews. In our experiment, six degrees of word set expansion will be executed to observe a better expanding level. As k value becomes larger, the system will consume more resource in word matching and the whole system becomes inefficient. The six levels of word matching will be recorded and quantified in the following PMI method. The scores are used to calculate the strength of subjective of these reviews.

21

3.1.2 PMI Strength Level Approach

In this subsection, we use PMI (Pointwise Mutual Information) as a tool to calculate the score of strength of each review as the basis for the results of review analysis. Turney and Littman [51] define PMI in the following equation:

( ) ( )

This equation can measure the semantic association between the matched term

t

in a review and

t

iin word set Sp nk+ by calculating the emerging probability in the whole article.

The key point of PMI calculation is the value of Prc

( )

t t , , i Prr

( )

t , and Prw

( )

ti . We define each of them as follows:

( )

matching mechanism. Thus, the association between these two words is not our real purpose.

The real effect of PMI is that it considers the number of matched words in the whole article before it reflects the subjective strength level of the target. In addition, PMI also takes the word appearance probability in each review and decreases the errors due to unequal number

22

of words in each review. In order to simplify the calculation and retrieve appropriate value for processing, we modify PMI in the following form:

( )

, i log2 Prr

( ) ( )

Prw i PMI t t = ⎡⎣ t t ⎤⎦

The PMI score of each review is calculated by the adaptive PMI equation above. It is based on the viewpoint of each review. Every time the model processes the score of one review, PMI ,

( )

t ti considers all matches in it, meaning that every review in our data set will have its PMI score and it comes from the sum of its all matches. It is obvious that this equation will produce a negative score and it is inconvenient for continued processing.

Because of the negative characteristic of each PMI value, we will standardize every PMI score from every review before combining with other character values:

min

The modified and standardized PMI equation can help acquire the strength score of each review. Then, we can acquire the target node’s score by:

_

This equation sums and averages all reviews’ PMI score of every author. This step is necessary to construct a comparative standard for all authors. After the processing, every author’s text results in our data set can be identified, recorded, and ranked.

23

Now, we have shown how to calculate reviewers’ score of the target node. The review analysis scores of these target nodes are important elements when considering their influence.

They will be combined and processed with other scores by later weighting mechanisms.

相關文件