• 沒有找到結果。

Investors Are Social Animals: Modeling Investment Behavior as a Link Prediction Problem

CHAPTER 4: METHOLOGY

4.5 Investors Are Social Animals: Modeling Investment Behavior as a Link Prediction Problem

4.4.7 Where’s the Money? Guidelines for Seeking Investments.

Now that we have seen the basic trends that occur using different metrics, we can quickly summarize the guidelines for seeing investments:

1. Being “closer” to an investor leads to greater occurrence of investment. Our results show that bulk of the investment activities occur within shortest path lengths of 7 and in general, investing activities occur where similarity scores are higher. This was previously explained in section 4.4.1.

2. Being in a popular industry helps if there is limited social relationship. We find that investment activities can still occur when there are no social relationships. But this happens when the companies in question are in popular industries and when they have largely received at least 1 round of funding. This was explained in sections 4.4.2 and 4.4.3.

3. Having too much common neighbors can result in less investment activities. Our research also finds that investment occurrences tend to decrease as number of common neighbor increases.

This could signify that having too many similar neighbors can mean greater competition and hence less room for growth. This was shown in section 4.4.4

4.4.8 Summary of Intuition

From the above descriptive mining experiments, we can see that being “closer” in terms of shortest path and greater similarity in terms of Adamic/Adar score generally leads to investment. However, greater similarity in terms of Common Neighbors, Jaccard Coefficient and Preferential Attachment does not lead to investments in general. In fact, the greater the dissimilarity in terms of Common Neighbors, Jaccard Coefficient and Preferential Attachment, the greater the chances of funding. Using these ideas, we now formulate the problem officially in the next section.

4.5 Investors Are Social Animals: Modeling Investment Behavior as a Link Prediction Problem

We define the problem of predicting investment as a link prediction problem: given an undirected Social

= (V, E) where V represents either an Investor i or a Company c, and e =<i,c>∈ E

•‧

立立 政 治 大

㈻㊫學

•‧

N a tio na

l C h engchi U ni ve rs it y

represents social relationship between an Investor and a Company that occur at time T0, predict if an Investor will invest in a particular Company at T1

Here, social relationship is defined as an instance where a Person (People) has previously or currently works for a particular Company or Financial Organization. Since there is no way of finding out if the People (person) is recruited by the company or wanted to work for that particular company or financial organization in question, we define social relations as undirected. Most importantly, we do not include the act of investment as part of social relations as investment behavior is what we are attempting to predict.

Also note that Investors consists of People, Companies and Financial Organizations. This is due to duality of roles played by People, Companies and Financial Organizations in the CrunchBase dataset.

For example, companies like Microsoft play a dual role of a Company and a Financial Organization when Microsoft invested in Facebook.

The following is a diagrammatic illustration of the concept:

Figure 4.18: Diagrammatic Representation of a Network Containing Investors and Companies

•‧

Based on the above diagram: given a network of Investors and Companies at Time0, where blue vertices represent Investors, green vertices represent companies, blue edges represent investment relationships and green edges represent social relationships, can we predict if iX will invest in cX in Time1?

4.5.1 Modeling Social Relationship

In order to determine the social similarity between an Investor and a Company, we use features based on node neighborhood, graph distance and common node features between an Investor and a Company;

these features are collectively known as social features:

4.5.1.1  Social  Features  

The social features are adapted from graph theories and social network analysis. The general intuition for these features is that the greater the similarity between an Investor and a Company, the more likely that the Investor will invest in the Company. The algorithms used here for our analysis assign a score (x, y) to pairs of nodes <x, y>, based on the input graph GSocial. Nodes X and Y are defined as follows:

Node X represents an Investor, while node Y denotes a Company. This is because we want to compare the similarities of Investors and Companies for the purposes for our research. No comparisons are made when node X equals node Y. We define the set of neighbors of node x to be .The definitions used for our link prediction problem is the same as those covered in section 4.2.7. Therefore, the social features includes: Shortest Path, Jaccard Coefficient, Adamic/Adar, Common Neighbors and Preferential Attachment. In addition to social features adapted from graph theories and social network analysis, we also take into account node information as described in the next section.

4.5.2 Learning Algorithms

In this experiment, we chose Decision Tree based on CART (L. Breiman, J. Friedman, R. Olshen, and C.

Stone. 1984) algorithm as our primary learning method. This is because we wanted a simple to understand model so that companies seeking funding have a better understanding behind investor’s behavior.

More importantly, the model learnt using Decision Trees can be readily visualized; such information can

•‧

To make sure that social features can indeed be used as reliable indicators for predicting investments, we also used SVM with rbf as the kernel (Chang and Lin, 2012) and Naïve Bayes with Bernoulli Model(C.D. Manning, P. Raghavan and H. Schütze, 2008 ) as alternate learning algorithms as a way to cross validate our proposition.

We also selected SVM and Naïve Bayes as they are widely regarded as classical supervised learning algorithms. We selected RBF as SVM’s kernel and the Bernoulli model for Naïve Bayes as our data’s behavior appears to be more suited for such learning models. In addition, implementations of the above algorithms in various programming languages or libraries are readily available; this allows our research to be easily replicated; if we were to use more recent algorithms, implementations may or may not be available.

4.5.3 Significance of Methodology

We believe that our methodology presents several advantages over previous work in terms dataset used, problem formulation/ predictive model and the introduction of new factors for predicting investment behavior.

4.5.3.1 Richness and size of dataset  

We used a dataset from CrunchBase and the size of our network consists of 11916 companies, 12127 people, and 1122 financial organizations within 4 degrees of separation from Facebook. This means that we have a total of 25165 unique nodes in the network. In addition, our dataset consists of very different entities which include people, companies and financial organizations. These entities also consist of various demographic groups. These factors make our dataset richer and larger as compared to previous works. Bakker, Hare Khosravi and Ramadanovic in 2010 made use of financial data predominantly while (Giot et al, 2011) focused on investments in Finland only. In addition Barne, Cronqvist and Siegel in 2010 focused their data on only 96 Taiwanese adults. Similarly Tan and Tan focused on finance professors exclusively.in 2011.

4.5.3.2 Problem formulation and predictive model  

•‧

While previous work presents merits, there is a lack of generalizability in their approach. This might be due to how the problem of predicting investment behavior is being formulated; in our approach, we chose to model investment behavior as a classic link prediction problem. This allows us to build a model in which investment behavior can be predicted.

4.5.3.3 Social relationships as a factor for predicting investment  

Most previous work focused on financial data, psychology, experience etc as factors for predicting investments. We would like to propose the use of social relationships in terms of similarity and differences not only as a factor for predicting investment, but also as a stable and sound possibility.