• 沒有找到結果。

Organization of the Dissertation

The rest of this dissertation is organized as follows. In Chapter 2, we propose the DEDS framework consisting of four different sparsification methods and show that, while using only a small portion of the edges causes considerable performance deterioration, our ensemble classifier with high diversity can counter the drop in prediction accuracy. In

Chapter 3, we formulate STGQ for automatic activity planning, and propose Algorithm SGSelect and Algorithm STGSelect with various strategies to find the optimal solution ef-ficiently. In Chapter 4, we further introduce CSGQ, and then design new data structures to avoid redundant exploration of solution space and speed up the processing of consecutive queries. Finally, Chapter 5 concludes this dissertation and presents the future directions.

Chapter 2

Ensemble of Diverse Sparsifications for Link Prediction in Large-Scale

Networks

2.1 Introduction

Link prediction is to forecast the existence of a link between two vertices, and it is an important research topic in network analysis, since many major problems involving net-works can benefit from the process. Recently, the sizes of netnet-works have been increasing rapidly, and this growth results in a significant increase in the computational cost of link prediction. Moreover, these networks may become too large to be stored in main mem-ory. Consequently, processing these networks requires frequent disk access, which may lead to considerable deterioration in performance. As a result, prediction tasks can take days to complete, meaning that dynamic friend suggestions or product recommendations cannot be made to users or customers in a timely manner, and the recommendations may therefore become less useful as time passes.

In link prediction, there are several measurements, known as proximity measures, used to indicate how likely it is for a non-neighboring vertex pair to be connected via an edge in the near future. A possible solution for speeding up link prediction is to design

algo-rithms to approximate each of the proximity measures. For example, the authors in [51]

and [49] proposed methods to achieve a close approximation of proximity measures such as Katz [22] and rooted PageRank [29]. However, building robust and high-accuracy classifiers for link prediction often requires various proximity measures. It would be very complicated if we were to design different algorithms to approximate each of these prox-imity measures. Therefore, a general and flexible solution for lowering computational costs is required.

Inspired by previous research that simplifies large networks to decrease computational costs, we found that reducing the size of networks provides a more general solution.1 Once a network has been sparsified, most proximity measures can benefit from the size reduction of the network and can be calculated faster. If the network is too large to be stored in main memory, decreasing its size also helps to lower the number of disk accesses. Furthermore, when the sparsification ratio is sufficiently small, the sparsified network may be able to fit into main memory, which means the burden of disk access is relieved. However, with such drastic sparsification, many edges in the network are removed, and the information that can be used in link prediction becomes rather limited. In turn, the prediction accuracy would drop significantly under such severe conditions. Therefore, the primary challenge is to reduce the network size considerably while maintaining high prediction accuracy.

In this chapter, we address this issue by proposing a sparsification framework for link prediction called Diverse Ensemble of Drastic Sparsification (DEDS), which consists of sparsifying, training, and ensembling, as shown in Figure 2.1. Specifically, we design four different methods to sparsify the original network, train individual classifiers from the sparsified networks, and ensemble these classifiers appropriately to improve prediction performance. The rightmost sparsified network in Figure 2.1 is obtained from the most straightforward random sparsification. In addition, DEDS incorporates three more so-phisticated sparsification methods, which are based on heuristics for preserving the edges required by different proximity measures. DEDS is able to generate sparsified networks

1Certain previous studies have proposed methods that remove vertices and edges to simplify the network (e.g., [25] and [45]), while other methods only remove edges (e.g., [33] and [47]). In our study, we do not remove vertices, since any vertex may be the target that we want to generate a prediction for.

           

     

                        

Classifier (Degree)

Classifier (Random walk)

Classifier (Short path)

Classifier (Random)

Sparsifying

Ensemble classifier

Training

Ensembling 

Figure 2.1: Flow chart of the DEDS framework.

with significant structural differences, and this increases the diversity of the correspond-ingly trained classifiers, which is key to creating an effective ensemble classifier. As shown in the experimental results, the proposed DEDS framework can effectively relieve the drop in prediction accuracy, while considerably reducing running time.

The main contributions of this chapter are summarized as follows.

• We propose a novel network sparsification framework called DEDS to slim down large networks while preserving important proximity measures that are used in link prediction. Specifically, we design four different sparsification methods by cate-gorizing the proximity measures and then devising heuristics to preserve the edges required by these measures. The proposed DEDS framework can generate sparsi-fied networks with significant structural differences and increase the diversity of the ensemble classifier. We also prove that adopting accuracy-based weighting enables DEDS to further reduce the prediction error. Experimental results show that when the network is drastically sparsified, DEDS can effectively relieve the drop in pre-diction accuracy and considerably raise the AUC value. With a larger sparsification

ratio, DEDS can even outperform the classifier trained from the original network.

• Our proposed DEDS framework is able to significantly reduce the running time of link prediction tasks. According to the experimental results, the prediction cost is substantially reduced after the network is sparsified. Moreover, if the network is too large to be stored in main memory, DEDS helps to lower the number of disk accesses by reducing the network size. When the sparsification ratio is sufficiently small, DEDS provides further efficiency by relieving the burden of disk access, since the sparsified network can fit into main memory.

• In the proposed DEDS framework, all the individual classifiers remain unentan-gled before the final decision is generated, meaning that each individual classifier can be trained and run independently. This enables DEDS to fully utilize all the CPUs or cores to simultaneously train and run the maximum number of individual classifiers. As a result, DEDS can maximize the ensemble size based on the user’s computational ability and considerably increase prediction accuracy.

The rest of this chapter is organized as follows. In Section 2.2, we introduce related works. Section 2.3 provides preliminaries for a supervised framework of link prediction, and describes the datasets and evaluation metrics used throughout this study. In Section 2.4, we propose four different sparsification methods and show that, while using only a small portion of the edges causes considerable performance deterioration, our ensemble classifier with high diversity can counter the drop in prediction accuracy. In Section 2.5, we analyze two strategies which further raise the performance of the ensemble classifier.

More detailed experimental results, such as efficiency analysis, are provided in Section 2.6. We summarize this chapter in Section 2.7.