• 沒有找到結果。

2. Literature Review

2.11 Algorithms

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

2.11 Algorithms

Our study uses the different algorithms in sentiment analysis. We describe six algorithms in following subsections.

2.11.1 Naïve Bayes

Naïve Bayes algorithm based on applying Bayes’ theorem with naïve independence assumptions between features [103]. Given a set of a set of objects, each of which belongs to a known class, and each of which has a known vector of variables, our aim is to construct a rule which will allow us to assign future objects to a class, given only the vectors of variables describing the future objects [104]. Naïve Bayes is very easy to construct, not needing any complicated iterative parameter estimation schemes [104]. This means it may be readily applied to huge datasets. It is easy to interpret, so users unskilled in classifier technology can understand why it is making the

classification it makes [104]. Naive Bayes algorithm is tremendously appealing because of its simplicity, elegance, and robustness. It is one of the oldest formal classification algorithms, and yet even in its simplest form it is often surprisingly effective. It is widely used in areas such as text classification and spam filtering [104].

2.11.2 k-Nearest Neighbor (kNN)

In pattern recognition, the k-nearest neighbor algorithm is a non-parametric method used for classification and regression [105]. kNN classification finds a group of k objects in the training set that are closest to the test object and bases the assignment of a label on the predominance of a particular class in this neighborhood [104]. There are three key elements of this approach: a set of labeled objects, e.g., a set of stored records, a distance or similarity metric to compute the distance between objects, and the value of k, the number of nearest neighbors. To classify an unlabeled object, the

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

distance of this object to the labeled objects is computed, its k-nearest neighbors are identified, and the class labels of these nearest neighbors are then used to determine the class label of the object [104]. If k=1, then the object is simply assigned to the class of that single nearest neighbor.

There are several key issues that affect the performance of kNN. One is the choice of k. If k is too small, then the result can be sensitive to noise points. On the other hand, if k is too large, then the neighborhood may include too many points from other classes. Another issue is the approach to combining the class labels [104]. kNN is a type of lazy learning, where the function is only approximated locally and

computation is deferred until classification. The kNN algorithm is among the simplest of all machine learning algorithms. kNN is particularly well suited for multi-modal classes as well as applications in which an object can have many class labels. For example, for the assignment of functions to genes based on expression profiles, some researchers found that kNN outperformed a much more sophisticated classification scheme [106].

2.11.3 Support Vector Machines (SVM)

In today’s machine learning applications, SVM are considered a must try-it offers one of the most robust and accurate methods among all well-known algorithms [107].

It has a sound theoretical foundation, requires only a dozen examples for training, and is insensitive to the number of dimensions [104]. It analyzes data used for

classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories. An SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier [104].

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm. The AdaBoost algorithm [108] proposed by Yoav Freund and Robert Schapire is one of the most important ensemble methods since it has solid theoretical foundation, very accurate prediction, great simplicity, and wide and successful applications. It can be used in conjunction with many other types of learning algorithms is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost refers to a particular method of training a boosted classifier. Every learning algorithm tends to suit some problem types better than others, and types have many different parameters and configurations to adjust before it achieves optimal performance on the dataset.

AdaBoost is sensitive to noisy data and outliers. Problems in machine learning often suffer from the curse of dimensionality-each example may consist of the huge number of potential features, and evaluating every feature can reduce not only the speed of classifier training and execution but in fact reduce predictive power [104].

Unlike SVMs, the AdaBoost training process selects only those features known to improve the predictive power of the model, reducing dimensionality and potentially improving execution time as irrelevant features need not be computed.

2.11.5 Decision Tree

C4.5, a descendant of Conceptual Learning Systems (CLS) and ID3, generates classifiers expressed as decision trees, but it can also construct classifiers in more comprehensible ruleset form [104]. A decision tree is a decision support tool that used a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. For a decision tree, each internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label [109]. The paths from root to leaf

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

represent classification rules. In decision analysis, a decision tree and closely related influence diagram are used as a visual and analytical decision support tool, where the expected values of competing alternatives are calculated.

2.11.6 Classification and Regression Trees (CART)

CART represents a major milestone in the evolution of artificial intelligence, machine learning, non-parametric statistics, and data mining [110]. The CART is important for the comprehensiveness of its study of decision trees, the technical innovations it introduces, its sophisticated discussion of tree-structured data analysis, and its authoritative treatment of large sample theory for trees [104].

The CART decision tree is a binary recursive partitioning procedure capable of processing continuous and nominal attributes both as targets and as predictors. Trees are grown to a maximal size without the use of a stopping rule and then pruned back to the root via cost-complexity pruning. The next split to be pruned is the one contributing least to the overall performance of the tree on training data. The

procedure produces trees that are invariant under any order preserving transformation of the predictor attributes. The CART mechanism is intended to produce not one tree, but a sequence of nested pruned trees, each of which is a candidate to be the optimal tree [104].

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y