General Performance - 投資者的社群行為

CHAPTER 5: EXPERIMENTS

5.2 General Performance

5.1.3 Summary of Performance Categorical Performance

In terms of categorical performance, we noticed that SVM generally performed well in different categories, with Naïve Bayes showing unstable performance: while Naïve Bayes performed best for AUC in general, if underperformed for most categories when it comes to TPR. On the other hand, Decision Trees performed best in TPR, but it has the worse performance in terms of FPR.

An important observation is that both SVM and Decision Trees generally performed within baseline levels, when there are enough training data. For instance, in the “web” category, we see SVM and Decision Tree performing above baseline in terms of TPR and performing below baseline level for FPR.

However, Naïve Bayes underperforms by a wide margin when there are limited training data, which can be seen for “consulting”, “education”, “biotech” and “search” categories.

The important implication is that the prediction model not only works on an aggregate level, it also generally works in terms of individual categories, with the exception of categories with limited data such as “security”, “semiconductor”, “education” and so on. Nonetheless, we see stable performances for Decision Trees and SVM across individual categories, hinting that the prediction model can work across aggregate level and individual category levels.

5.2 General Performance

The main purpose of carrying out the experiments across multiple learning algorithms is to ensure the soundness of using social features as main predictors of investment behavior. As shown in the sections 5.1, the results are generally encouraging: on an aggregate level, all three learning algorithms produced AUC of over 0.50 and TPR of over 50% which means that the strategy has performed above baseline level.

In addition to aggregate performance, the strategy also performed well in terms of individual industry categories: in terms of AUC, all industries across all three learning algorithms performed above the baseline performance of 0.50. In fact, with the exception of Naïve Bayes, we see an AUC over 0.60 to 0.70. Similarly in terms of TPR, all individual industries across three learning algorithms performed

•‧

國

立立政治大

㈻㊫學

•‧

N a tio na

l C h engchi U ni ve rs it y

This means that using social features generally provide consistent performance across different learning algorithms not only in terms of aggregate results, but also in terms of industries. This was inline with our expectations, however, we also recognize that there is instability in performances, especially when using Naïve Bayes as the learning algorithm in terms of AUC, TPR and FPR and when we split the data in terms of categories or aggregating the data.

In order to verify the stability of features and if these algorithms are well suited for our problem, we verified the model by running the same experiment using another subset of the Crunchbase dataset. We are expecting to produce similar results: with SVM and Decision Trees outperforming Naïve Bayes in terms of AUC and TPR, while Naïve Bayes should outperform SVM and Decision Trees in terms of FPR.

5.2.1 Visualizing the Decision Process

An important aspect of our research is to understand the decision making process of investors. Decision Tree plays an important role in this respect since it is not only straightforward to understand, it can also be visualized. The following figure shows a partial decision tree that visualizes the decision making process:

•‧

國

立立政治大

㈻㊫學

•‧

N a tio na

l C h engchi U ni ve rs it y

Fig 5.7: Visualizing the decision process

Notice that the tree begins to split at the root using Preferential Attachment as a core consideration factor, with shortest path playing an important role at the second level. Various splits typically revolve around Preferential Attachment and Shortest Paths; this echoes our general rules for investment where preferential attachment and shortest paths play an important role in investment behaviors.

•‧

model by using a different subset of the Crunchbase dataset. We wanted to verify if our experiment will work even if it is applied to a different dataset, especially if the parties involved are generally of different culture. This time round, we selected RenRen.com as the seednode and repeated the data collection process of collecting all Persons, Financial Organizations and Companies. We selected RenRen.com due to it’s status as China’s Facebook, which will serve as a distinct dataset in terms of seednode and other forms of entities.

The network statistics based on RenRen is as such: we discovered 3582 companies, 721 financial organizations and 3386 persons within the small world of RenRen.

6.2 Data Split for Experiments

To ensure consistency in terms of comparison, we split the data in the same terms as what we did with the dataset based on Facebook: we took a 40% training data split for training with the remaining data for testing purposes; the split is based on timestamps especially for the case for true investments where timestamps are available. False examples were split randomly. This was applied to both aggregate and category experiments across three learning algorithms.

6.3 Results for RenRen’s Small World

6.3.1 Aggregate Experiment

The results were encouraging and were inline with our expectations. Overall, area under curve across three algorithms is above 0.86 as shown in the figure 6.1; SVM and Decision Tree outperformed Naïve Bayes by 0.02 to 0.03 percent:

在文檔中投資者的社群行為 - 政大學術集成 (頁 63-66)