• 沒有找到結果。

Concepts, Definitions and Examples

CHAPTER 4: METHOLOGY

4.3 Concepts, Definitions and Examples

industry and its recent IPO. We chose 4 degrees of separation as a cutoff point as opposed to 6 degrees of separation due to the fact that recent advances in technology has somewhat reduced the degrees of separation between people as shown in (Backstromet al., 2011). In addition, there are limits to the

“Horizon of Observability” (Friedkin et al., 1981) from the viewpoint of using Facebook as a seed node.

The strategy of selecting Facebook as a seed node stems from the following reasons:

• As discovered by Backstromet (2011), 4 hops can theoretically allow a node to reach any other node within a network

• Since information about nodes reduces as the hops increases, it does not make any sense to gather more information regarding nodes that are over 4 degrees of separation; according to Friedkin (1981), awareness of neighbors drops to almost 0% when the degree of separation hits 4 hops.

• Since our assumption is that CrunchBase is a social network, a dataset of up to 4 degrees of separation given any seednode is representative of CrunchBase.

• As shown in the verification section of this research, we used the same strategy of selecting a single seed node and fan out by up to 4 degrees of separation and repeated the experiment with comparable results. organizations. Examples from our dataset include Mark Zuckerberg of Facebook and Peter Thiel formerly from PayPal and an early financial backer of Facebook .

•‧

Some examples of Companies include Google, Facebook and Microsoft.

4.3.3 Financial Organization

Financial Organizations are organizations that typically perform the act of investment on Companies.

Prominent examples in our dataset include Accel Partners7 (invested in prominent companies such as played by People, Companies and Financial Organizations in the CrunchBase dataset. For example, companies like Microsoft plays the role of a Company yet performed an act of investment on other companies such as Facebook in the early days. Similarly, Peter Thiel is a Person entity, yet invested in Facebook.

4.3.5 Social Graph

Social Graph Gsocial: We define Gsocial = (Vocial, Esocial) as a undirected graph, where vertices can be made up of either a People, Companies or Financial Organizations, while Esocial is formed when a particular

•‧

立立 政 治 大

㈻㊫學

•‧

N a tio na

l C h engchi U ni ve rs it y

Person has a relationship ( such as employment ) with a Company or Financial Organization. There are 1152 people, 922 financial organizations and 7745 companies found in the social graph. A small example of a social graph can be seen in the next figure:

Figure 4.4: Facebook’s social graph

4.3.5 Investment Graph

Investment Graph, Ginvestment: We define Ginvestment = (Vinvestment, Einvestment) as a directed graph, where vertices can be made up of either a Investors or Companies, while Einvestment is formed when an Investor invests in a particular Company. There are 11756 people, 6634 financial organizations and 756 companies found in the investment graph. Take note that there are overlap between entities found in the social graph and the investment graph. An example of an investment graph of Facebook is shown below:

•‧

立立 政 治 大

㈻㊫學

•‧

N a tio na

l C h engchi U ni ve rs it y

Figure 4.5: Facebook’s investment graph

4.3.6 More definitions

We want our research to benefit companies who may or may not have strong background in social network analysis. Hence, we selected the following 5 methods for our analysis due to its simplicity and ease of understanding: shortest paths, Adamic/Adar, Jaccard Coefficient, Common Neighbors and Preferential Attachment. We initially conducted experiments on GSocial and GInvestment based on the metrics such as number of edges, nodes and number of triads in order to compare the graphs on a global scale, but we find that the metrics provided limited information as to how social relationships can affect investments. Therefore, we chose to use graph distance (shortest paths) and methods based on node neighborhoods (Jaccard Coefficient, Adamic/Adar, Common Neighbors and Preferential Attachment) as it allows us to do a one-on-one comparisons between each Investor and Companies.

We selected these metrics as each of these metrics represent an aspect of social relationship and hence allows us to build a predictive model:

•‧

• Shortest paths represents “closeness” between 2 nodes

• Common Neighbors, Jaccard Coeffecient and Adamic/Adar represents neighborhood similarity;

will investors invest in companies based on homophily

• Preferential Attachment demonstrates the social phenomena of “rich get richer”; do companies that received more investments prior result in getting more investments from investors?

In addition, since we are interested in the small world of investing behaviors, methods based on node neighborhoods should provide us with insights as to how social similarity can affect investment behavior.

All methods used for our analysis assign a score (x, y) to pairs of nodes <x, y>, based on the input graph Gsocial.

Nodes X and Y are defined as follows: Node X represents Investor, while node Y denotes Company.

This is because we want to compare the similarities of Investors and Companies for the purposes for our research. No comparisons are made when node X equals node Y. We define the set of neighbors of node x to be .

In general, the greater the similarity based on the scores, the greater the likelihood of investment. The algorithms used for comparing similarities are as follows:

1. Shortest Path. We simply consider the shortest path between Investors and Companies. The general intuition is that the “closer” Investors are to Companies (and vice-versa) the more likely that Investors will invest in such Companies. We define score (x, y) to be the length of the shortest path between an Investor and a Company, where x represents an Investor while y represents a Company.

2. Adamic/Adar. Adamic and Adar in 2001considered similarity between two personal homepages by computing features of the pages and defining the similarity between two pages to be:

where we consider the similarity feature, z to be the common neighbors, while x represents Investor’s features while y represents a Company’s features.

3. Jaccard Coefficient. The Jaccard Coefficient measures the probability that both x and y have a feature f, for a randomly selected feature f that either x or y has. Here, we take f to be neighbors in Gsocial since we are interested in node neighborhood, leading us to the measure score : 4. Common Neighbors. It is considered as the most direct implementation. According to Newman (2012), the general intuition is that the number of common neighbors of node X and node Y has

•‧

立立 政 治 大

㈻㊫學

•‧

N a tio na

l C h engchi U ni ve rs it y

collaboration network. The Score(x,y) for common neighbors is defined as follows:

5. Preferential Attachment. Preferential Attachment suggests that the probability that a new edge has node x as an endpoint is proportional to the current number of neighbors of x (Jeffrey and Milgram, 1969). In our use case, this models the “rich get richer” phenomena where companies which already received investments should receive even more investments as time progresses.

The Score(x, y) for preferential attachment is defined as follows: We use these algorithms to compare each pair of Investor and Company node in the GSocial. For each pair, we take note of the score for each algorithm and mark if the Investor invested in the Company or not.

6. Number of Shortest Paths between Investor and Company. We calculate the shortest path between an Investor and a Company and aggregate the number of paths with the same shortest path score. A node may appear more than once amongst these paths. The intuition here is that an Investor is more likely to invest in a Company if there are more shortest paths connecting them.

This is because more paths could mean that the Company or Investor is more easily reached via multiple shortest paths.